[jira] [Commented] (NUTCH-2049) Upgrade Trunk to Hadoop > 2.4 stable

Asitang Mishra (JIRA) Tue, 18 Aug 2015 09:09:37 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701495#comment-14701495
 ]


Asitang Mishra commented on NUTCH-2049:
---------------------------------------

Hi Chris,

The Naive Bayes plugin, since has a hadoop job of it's own. does only work in 
local mode and not distributed. Because, the Parse job of which this plugin is 
a part, is also a hadoop job. So, it becomes a nested hadoop job. 

Since, the training part of the plugin is the only one that is a hadoop job 
(and not the classification). I can make a separate tool for training. And keep 
only the classification part in the plugin, which is not a hadoop job (And have 
tested this in distributed mode).

 

> Upgrade Trunk to Hadoop > 2.4 stable
> ------------------------------------
>
>                 Key: NUTCH-2049
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2049
>             Project: Nutch
>          Issue Type: Improvement
>          Components: build
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>              Labels: memex
>             Fix For: 1.11
>
>         Attachments: NUTCH-2049.patch, NUTCH-2049v2.patch
>
>
> Convo here - http://www.mail-archive.com/dev%40nutch.apache.org/msg18225.html
> I am +1 for taking trunk (or a branch of trunk) to explicit dependency on > 
> Hadoop 2.6.
> We can run our tests, we can validate, we can fix.
> I will be doing validation on 2.X in paralegal as this is what I use on my 
> own projects. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NUTCH-2049) Upgrade Trunk to Hadoop > 2.4 stable

Reply via email to