[ 
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744364#comment-14744364
 ] 

Lewis John McGibbney edited comment on NUTCH-2097 at 9/15/15 6:51 AM:
----------------------------------------------------------------------

Hi Folks,
After being hooked up via [~chrismattmann], I've just spoken with [~ndouba] on 
Skype. This is really exciting work so I asked him to please log a Jira issue 
as a parent issue (which he has done) and we can begin thinking about a Nutch 
3.X branch.
The core work undertaken by Nadeem so far can be summarized as follows
 * Complete Ant + Ivy build system overhaul. e.g. replaced with Apache Maven 
(Non-back compatible)
 * Upgrade of all mapred- --> mapreduce API's in Nutch (Non-back compatible)
 * Complete refactoring of all IO (custom NutchWritable’s) into separate [IO 
package|https://github.com/allfro/nutch/tree/mr2-mvn/nutch-runtime/src/main/java/org/apache/nutch/io]
 * Complete refactoring of all Mapper functions into separate [mapper 
package|https://github.com/allfro/nutch/tree/mr2-mvn/nutch-runtime/src/main/java/org/apache/nutch/mapper]
Complete refactoring of all Reducer functions into separate [reducer 
package|https://github.com/allfro/nutch/tree/mr2-mvn/nutch-runtime/src/main/java/org/apache/nutch/reducer]
 * Introduction of [lib 
package|https://github.com/allfro/nutch/tree/mr2-mvn/nutch-runtime/src/main/java/org/apache/nutch/lib]
 which contains all input and output formats.
 * Upgrade of Hadoop dependencies from 2.4.0 --> 2.7.1

The above package naming conventions of course are intended to provide synergy 
with Apache Hadoop.

My thoughts are a follows: The work which has gone on in Nadeem's mr2-mvn 
branch are too wide and cover too much of the Nutch 1.11-SNAPSHOT (as of commit 
r1697466 NUTCH-2049 Upgrade Trunk to Hadoop > 2.4 stable) code base for us to 
back port them into Nutch trunk (1.11-SNAPSHOT). Both Nadeem and myself 
therefore discussed and proposed that we forward port all commits (post commit 
r1697466) to Nadeem's branch and propose this codebase as Nutch 3.X which will 
lessen the burden on everyone. The burden can be defined as defining a patch 
for each tools, each issue, and each change. That would be hellish. The former 
way as described above is a better solution.

This issues should act as a parent for defining Nutch 3.X based off of Nutch 
1.11-SNAPSHOT.


was (Author: lewismc):
Hi Folks,
After being hooked up via [~chrismattmann], I've just spoken with [~ndouba] on 
Skype. This is really exciting work so I asked him to please log a Jira issue 
as a parent issue (which he has done) and we can begin thinking about a Nutch 
3.X branch.
The core work undertaken by Nadeem so far can be summarized as follows
 * Complete Ant + Ivy build system overhaul. (Non-back compatible)
 * Upgrade of all mapred- --> mapreduce API's in Nutch (Non-back compatible)
 * Complete refactoring of all IO (custom NutchWritable’s) into separate [IO 
package|https://github.com/allfro/nutch/tree/mr2-mvn/nutch-runtime/src/main/java/org/apache/nutch/io]
 * Complete refactoring of all Mapper functions into separate [mapper 
package|https://github.com/allfro/nutch/tree/mr2-mvn/nutch-runtime/src/main/java/org/apache/nutch/mapper]
Complete refactoring of all Reducer functions into separate [reducer 
package|https://github.com/allfro/nutch/tree/mr2-mvn/nutch-runtime/src/main/java/org/apache/nutch/reducer]
 * Introduction of [lib 
package|https://github.com/allfro/nutch/tree/mr2-mvn/nutch-runtime/src/main/java/org/apache/nutch/lib]
 which contains all input and output formats.
 * Upgrade of Hadoop dependencies from 2.4.0 --> 2.7.1

The above package naming conventions of course are intended to provide synergy 
with Apache Hadoop.

My thoughts are a follows: The work which has gone on in Nadeem's mr2-mvn 
branch are too wide and cover too much of the Nutch 1.11-SNAPSHOT (as of commit 
r1697466 NUTCH-2049 Upgrade Trunk to Hadoop > 2.4 stable) code base for us to 
back port them into Nutch trunk (1.11-SNAPSHOT). Both Nadeem and myself 
therefore discussed and proposed that we forward port all commits (post commit 
r1697466) to Nadeem's branch and propose this codebase as Nutch 3.X which will 
lessen the burden on everyone. The burden can be defined as defining a patch 
for each tools, each issue, and each change. That would be hellish. The former 
way as described above is a better solution.

This issues should act as a parent for defining Nutch 3.X based off of Nutch 
1.11-SNAPSHOT.

> Proposal for Nutch 3.x
> ----------------------
>
>                 Key: NUTCH-2097
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2097
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.12
>            Reporter: Nadeem Douba
>            Assignee: Lewis John McGibbney
>
> This is a parent issue which contains a proposal for Nutch 3.x. It's based on 
> my branch (mr2-mvn at https://github.com/allfro/nutch).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to