[ https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986479#comment-13986479 ]
Alparslan Avcı commented on NUTCH-1714: --------------------------------------- Hi [~jnioche], Thanks for the reviews and tests. For the issues; bq. There is no progression of the complete status of mappers : they go from 0% to 100% for the tasks taking the input from GORA i.e not the injection As [~lewismc] said, I also do not have any idea. I will also have a look at this. bq. The whole content of the webtable seems to be taken as input for mapreduce. I assumed it wouldn't be the case for GORA-119 and that the fetch step for instance would get only the entries marked by the Generator. There is NUTCH-1674 but this should only add the batchID to the filters according to its title. This [patch|https://issues.apache.org/jira/secure/attachment/12642309/NUTCH-1714v4.patch] only contains updates for using gora-0.4 in Nutch. And in NUTCH-1674, we only have fixes for batchId filters. As I said in the comment; bq. In the patch I added, I applied the possible filters (which are only batchId filters for now) to the jobs. After the implementation of new Hbase filters and filterset on Gora, we can add new filters (eg.:Non-existance of Mark.FETCH_MARK filter for FetcherJob) and clean the map functions from some controls. we can open another issue to implement other filters for Nutch. bq. ./nutch readdb -crawlId MYCRAWLIDHERE -stats gets 0 docs but I can see the corresponding table in HBase. I will also try this command. Let me try to find the problem and share the results with you. > Nutch 2.x upgrade to Gora 0.4 > ----------------------------- > > Key: NUTCH-1714 > URL: https://issues.apache.org/jira/browse/NUTCH-1714 > Project: Nutch > Issue Type: Improvement > Reporter: Alparslan Avcı > Assignee: Alparslan Avcı > Fix For: 2.3 > > Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch, > NUTCH-1714v2.patch, NUTCH-1714v4.patch > > > Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the > details in this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)