[ 
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986479#comment-13986479
 ] 

Alparslan Avcı commented on NUTCH-1714:
---------------------------------------

Hi [~jnioche],

Thanks for the reviews and tests. For the issues;
bq. There is no progression of the complete status of mappers : they go from 0% 
to 100% for the tasks taking the input from GORA i.e not the injection
As [~lewismc] said, I also do not have any idea. I will also have a look at 
this.
bq. The whole content of the webtable seems to be taken as input for mapreduce. 
I assumed it wouldn't be the case for GORA-119 and that the fetch step for 
instance would get only the entries marked by the Generator. There is 
NUTCH-1674 but this should only add the batchID to the filters according to its 
title.
This 
[patch|https://issues.apache.org/jira/secure/attachment/12642309/NUTCH-1714v4.patch]
 only contains updates for using gora-0.4 in Nutch. And in NUTCH-1674, we only 
have fixes for batchId filters. As I said in the comment;
bq. In the patch I added, I applied the possible filters (which are only 
batchId filters for now) to the jobs. After the implementation of new Hbase 
filters and filterset on Gora, we can add new filters (eg.:Non-existance of 
Mark.FETCH_MARK filter for FetcherJob) and clean the map functions from some 
controls.
we can open another issue to implement other filters for Nutch.
bq. ./nutch readdb -crawlId MYCRAWLIDHERE -stats gets 0 docs but I can see the 
corresponding table in HBase.
I will also try this command. Let me try to find the problem and share the 
results with you.

> Nutch 2.x upgrade to Gora 0.4
> -----------------------------
>
>                 Key: NUTCH-1714
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1714
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Alparslan Avcı
>            Assignee: Alparslan Avcı
>             Fix For: 2.3
>
>         Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch, 
> NUTCH-1714v2.patch, NUTCH-1714v4.patch
>
>
> Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the 
> details in this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to