[ 
https://issues.apache.org/jira/browse/NUTCH-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-951:
--------------------------------

    Description: 
I've compared the changes from 2.0 with 1.3 and found the following differences 
(excluding anything specific to 2.0/GORA)

    *  NUTCH-564 External parser supports encoding attribute (Antony Bowesman, 
mattmann)
    *  NUTCH-714 Need a SFTP and SCP Protocol Handler (Sanjoy Ghosh, mattmann)
    *  NUTCH-825 Publish nutch artifacts to central maven repository (mattmann)
    *  NUTCH-851 Port logging to slf4j (jnioche)
    *  NUTCH-861 Renamed HTMLParseFilter into ParseFilter
    *  NUTCH-872 Change the default fetcher.parse to FALSE (ab).
    *  NUTCH-876 Remove remaining robots/IP blocking code in lib-http (ab)
    *  NUTCH-880 REST API for Nutch (ab)
    *  NUTCH-883 Remove unused parameters from nutch-default.xml (jnioche)
    *  NUTCH-884 FetcherJob should run more reduce tasks than default (ab)
    *  NUTCH-886 A .gitignore file for Nutch (dogacan)
    *  NUTCH-894 Move statistical language identification from indexing to 
parsing step
    *  NUTCH-921 Reduce dependency of Nutch on config files (ab)
    *  NUTCH-930 Remove remaining dependencies on Lucene API (ab)
    *  NUTCH-931 Simple admin API to fetch status and stop the service (ab)
    *  NUTCH-932 Bulk REST API to retrieve crawl results as JSON (ab)

Let's go through this and decide what to port to 1.3

  was:
I've compared the changes from 2.0 with 1.3 and found the following differences 
(excluding anything specific to 2.0/GORA)

    *  NUTCH-564 External parser supports encoding attribute (Antony Bowesman, 
mattmann)
    *  NUTCH-714 Need a SFTP and SCP Protocol Handler (Sanjoy Ghosh, mattmann)
    *  NUTCH-825 Publish nutch artifacts to central maven repository (mattmann)
    *  NUTCH-851 Port logging to slf4j (jnioche)
    *  NUTCH-861 Renamed HTMLParseFilter into ParseFilter
    *  NUTCH-872 Change the default fetcher.parse to FALSE (ab).
    *  NUTCH-876 Remove remaining robots/IP blocking code in lib-http (ab)
    *  NUTCH-880 REST API for Nutch (ab)
    *  NUTCH-883 Remove unused parameters from nutch-default.xml (jnioche)
    *  NUTCH-884 FetcherJob should run more reduce tasks than default (ab)
    *  NUTCH-886 A .gitignore file for Nutch (dogacan)
    *  NUTCH-894 Move statistical language identification from indexing to 
parsing step
    *  NUTCH-921 Reduce dependency of Nutch on config files (ab)
    *  NUTCH-930 Remove remaining dependencies on Lucene API (ab)
    *  NUTCH-931 Simple admin API to fetch status and stop the service (ab)
    *  NUTCH-932 Bulk REST API to retrieve crawl results as JSON (ab)
    *  NUTCH-936 LanguageIdentifier should not set empty lang field on 
NutchDocument (Markus Jelsma via jnioche)

Let's go through this and decide what to port to 1.3


> Backport changes from 2.0 into 1.3
> ----------------------------------
>
>                 Key: NUTCH-951
>                 URL: https://issues.apache.org/jira/browse/NUTCH-951
>             Project: Nutch
>          Issue Type: Task
>    Affects Versions: 1.3
>            Reporter: Julien Nioche
>            Priority: Blocker
>             Fix For: 1.3
>
>
> I've compared the changes from 2.0 with 1.3 and found the following 
> differences (excluding anything specific to 2.0/GORA)
>     *  NUTCH-564 External parser supports encoding attribute (Antony 
> Bowesman, mattmann)
>     *  NUTCH-714 Need a SFTP and SCP Protocol Handler (Sanjoy Ghosh, mattmann)
>     *  NUTCH-825 Publish nutch artifacts to central maven repository 
> (mattmann)
>     *  NUTCH-851 Port logging to slf4j (jnioche)
>     *  NUTCH-861 Renamed HTMLParseFilter into ParseFilter
>     *  NUTCH-872 Change the default fetcher.parse to FALSE (ab).
>     *  NUTCH-876 Remove remaining robots/IP blocking code in lib-http (ab)
>     *  NUTCH-880 REST API for Nutch (ab)
>     *  NUTCH-883 Remove unused parameters from nutch-default.xml (jnioche)
>     *  NUTCH-884 FetcherJob should run more reduce tasks than default (ab)
>     *  NUTCH-886 A .gitignore file for Nutch (dogacan)
>     *  NUTCH-894 Move statistical language identification from indexing to 
> parsing step
>     *  NUTCH-921 Reduce dependency of Nutch on config files (ab)
>     *  NUTCH-930 Remove remaining dependencies on Lucene API (ab)
>     *  NUTCH-931 Simple admin API to fetch status and stop the service (ab)
>     *  NUTCH-932 Bulk REST API to retrieve crawl results as JSON (ab)
> Let's go through this and decide what to port to 1.3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to