[ https://issues.apache.org/jira/browse/NUTCH-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Nioche updated NUTCH-951: -------------------------------- Description: I've compared the changes from 2.0 with 1.3 and found the following differences (excluding anything specific to 2.0/GORA) * NUTCH-564 External parser supports encoding attribute (Antony Bowesman, mattmann) * NUTCH-714 Need a SFTP and SCP Protocol Handler (Sanjoy Ghosh, mattmann) * NUTCH-825 Publish nutch artifacts to central maven repository (mattmann) * NUTCH-851 Port logging to slf4j (jnioche) * NUTCH-861 Renamed HTMLParseFilter into ParseFilter * NUTCH-872 Change the default fetcher.parse to FALSE (ab). * NUTCH-876 Remove remaining robots/IP blocking code in lib-http (ab) * NUTCH-880 REST API for Nutch (ab) * NUTCH-883 Remove unused parameters from nutch-default.xml (jnioche) * NUTCH-884 FetcherJob should run more reduce tasks than default (ab) * NUTCH-886 A .gitignore file for Nutch (dogacan) * NUTCH-894 Move statistical language identification from indexing to parsing step * NUTCH-921 Reduce dependency of Nutch on config files (ab) * NUTCH-930 Remove remaining dependencies on Lucene API (ab) * NUTCH-931 Simple admin API to fetch status and stop the service (ab) * NUTCH-932 Bulk REST API to retrieve crawl results as JSON (ab) Let's go through this and decide what to port to 1.3 was: I've compared the changes from 2.0 with 1.3 and found the following differences (excluding anything specific to 2.0/GORA) * NUTCH-564 External parser supports encoding attribute (Antony Bowesman, mattmann) * NUTCH-714 Need a SFTP and SCP Protocol Handler (Sanjoy Ghosh, mattmann) * NUTCH-825 Publish nutch artifacts to central maven repository (mattmann) * NUTCH-851 Port logging to slf4j (jnioche) * NUTCH-861 Renamed HTMLParseFilter into ParseFilter * NUTCH-872 Change the default fetcher.parse to FALSE (ab). * NUTCH-876 Remove remaining robots/IP blocking code in lib-http (ab) * NUTCH-880 REST API for Nutch (ab) * NUTCH-883 Remove unused parameters from nutch-default.xml (jnioche) * NUTCH-884 FetcherJob should run more reduce tasks than default (ab) * NUTCH-886 A .gitignore file for Nutch (dogacan) * NUTCH-894 Move statistical language identification from indexing to parsing step * NUTCH-921 Reduce dependency of Nutch on config files (ab) * NUTCH-930 Remove remaining dependencies on Lucene API (ab) * NUTCH-931 Simple admin API to fetch status and stop the service (ab) * NUTCH-932 Bulk REST API to retrieve crawl results as JSON (ab) * NUTCH-936 LanguageIdentifier should not set empty lang field on NutchDocument (Markus Jelsma via jnioche) Let's go through this and decide what to port to 1.3 > Backport changes from 2.0 into 1.3 > ---------------------------------- > > Key: NUTCH-951 > URL: https://issues.apache.org/jira/browse/NUTCH-951 > Project: Nutch > Issue Type: Task > Affects Versions: 1.3 > Reporter: Julien Nioche > Priority: Blocker > Fix For: 1.3 > > > I've compared the changes from 2.0 with 1.3 and found the following > differences (excluding anything specific to 2.0/GORA) > * NUTCH-564 External parser supports encoding attribute (Antony > Bowesman, mattmann) > * NUTCH-714 Need a SFTP and SCP Protocol Handler (Sanjoy Ghosh, mattmann) > * NUTCH-825 Publish nutch artifacts to central maven repository > (mattmann) > * NUTCH-851 Port logging to slf4j (jnioche) > * NUTCH-861 Renamed HTMLParseFilter into ParseFilter > * NUTCH-872 Change the default fetcher.parse to FALSE (ab). > * NUTCH-876 Remove remaining robots/IP blocking code in lib-http (ab) > * NUTCH-880 REST API for Nutch (ab) > * NUTCH-883 Remove unused parameters from nutch-default.xml (jnioche) > * NUTCH-884 FetcherJob should run more reduce tasks than default (ab) > * NUTCH-886 A .gitignore file for Nutch (dogacan) > * NUTCH-894 Move statistical language identification from indexing to > parsing step > * NUTCH-921 Reduce dependency of Nutch on config files (ab) > * NUTCH-930 Remove remaining dependencies on Lucene API (ab) > * NUTCH-931 Simple admin API to fetch status and stop the service (ab) > * NUTCH-932 Bulk REST API to retrieve crawl results as JSON (ab) > Let's go through this and decide what to port to 1.3 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.