[ https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989297#comment-13989297 ]
Navid Shekoufa commented on NUTCH-1714: --------------------------------------- [~lewismc] Sorry about my late response! bq. Can you elaborate? Do you mean it is taking ALL records? What are your settings like for generate.max.count? The default of -1 could have a significant impact... and may be the reason you are feeding all/many rows. No the generate.max.count is not set to -1! And from my understanding [NUTCH-1674] isn't intended to apply filter on Generate phase, so I guess it's clear for me now that why is the reason the GeneratorJob inputs all the records from database to the Mapper! Now there's another question, I'm a little bit confused! After applying [NUTCH-1714] and [NUTCH-1674] patches from what they imply there should be approximately a fixed duration for each step, i.e. Fetch, Parse, UpdateDB and Index (Correct me if I'm wrong!) of course not precisely but approximately a fixed duration is expected! Now after one day of crawling with a TopN of 10,000 the reduce phase of my DbUpdaterJob duration has change from around 6 minutes to 15+ minutes! I mean if there is a fixed amount of input for the mapper of DbUpdaterJob, i.e. 10000 map input records, give it or take, shouldn't the reduce process time always be around the time duration?! And also all other phases mappers have experienced a noticeable increase in their processing duration! From what I see the expansion in the database still affects the filtered Fetcher, Parser,DbUpdater and Indexer altogether! Am I going in a wrong direction or this issue I have is really a valid one? > Nutch 2.x upgrade to Gora 0.4 > ----------------------------- > > Key: NUTCH-1714 > URL: https://issues.apache.org/jira/browse/NUTCH-1714 > Project: Nutch > Issue Type: Improvement > Reporter: Alparslan Avcı > Assignee: Alparslan Avcı > Fix For: 2.3 > > Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch, > NUTCH-1714v2.patch, NUTCH-1714v4.patch, NUTCH-1714v5.patch > > > Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the > details in this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)