[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13670376#comment-13670376 ] lufeng commented on NUTCH-1545: --- Committed for nutch 2.2 revision 1487875. by Feng. Thanks Tejas and Lewis. capture batchId and remove references to segments in 2.x crawl script. -- Key: NUTCH-1545 URL: https://issues.apache.org/jira/browse/NUTCH-1545 Project: Nutch Issue Type: Task Affects Versions: 2.1 Reporter: Lewis John McGibbney Assignee: lufeng Priority: Minor Fix For: 2.3 Attachments: NUTCH-1545.patch, NUTCH-1545-v2.patch The concept of segment is replaced by batchId in 2.x I'm currently getting rid of segments references in 2.x This issue was flagged up and separate from NUTCH-1532 which I am working on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13670399#comment-13670399 ] Hudson commented on NUTCH-1545: --- Integrated in Nutch-nutchgora #625 (See [https://builds.apache.org/job/Nutch-nutchgora/625/]) NUTCH-1545 capture batchId and remove references to segments in 2.x crawl script. (Revision 1487875) Result = SUCCESS fenglu : http://svn.apache.org/viewvc/nutch/branches/2.x/?view=revrev=1487875 Files : * /nutch/branches/2.x/CHANGES.txt * /nutch/branches/2.x/src/bin/crawl * /nutch/branches/2.x/src/java/org/apache/nutch/crawl/GeneratorJob.java capture batchId and remove references to segments in 2.x crawl script. -- Key: NUTCH-1545 URL: https://issues.apache.org/jira/browse/NUTCH-1545 Project: Nutch Issue Type: Task Affects Versions: 2.1 Reporter: Lewis John McGibbney Assignee: lufeng Priority: Minor Fix For: 2.2 Attachments: NUTCH-1545.patch, NUTCH-1545-v2.patch The concept of segment is replaced by batchId in 2.x I'm currently getting rid of segments references in 2.x This issue was flagged up and separate from NUTCH-1532 which I am working on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669392#comment-13669392 ] Lewis John McGibbney commented on NUTCH-1545: - Hi Feng, do you want to commit the fix for this? This is the issue that Chris is having on the mailing list and we really should get the fix in to 2.2. capture batchId and remove references to segments in 2.x crawl script. -- Key: NUTCH-1545 URL: https://issues.apache.org/jira/browse/NUTCH-1545 Project: Nutch Issue Type: Task Affects Versions: 2.1 Reporter: Lewis John McGibbney Assignee: lufeng Priority: Minor Fix For: 2.3 Attachments: NUTCH-1545.patch, NUTCH-1545-v2.patch The concept of segment is replaced by batchId in 2.x I'm currently getting rid of segments references in 2.x This issue was flagged up and separate from NUTCH-1532 which I am working on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662057#comment-13662057 ] lufeng commented on NUTCH-1545: --- Hi Tejas yes, the patch is just put random batchId generater from code to crawl script. User don't have to bother this. capture batchId and remove references to segments in 2.x crawl script. -- Key: NUTCH-1545 URL: https://issues.apache.org/jira/browse/NUTCH-1545 Project: Nutch Issue Type: Task Affects Versions: 2.1 Reporter: Lewis John McGibbney Assignee: lufeng Priority: Minor Fix For: 2.2 Attachments: NUTCH-1545.patch, NUTCH-1545-v2.patch The concept of segment is replaced by batchId in 2.x I'm currently getting rid of segments references in 2.x This issue was flagged up and separate from NUTCH-1532 which I am working on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662230#comment-13662230 ] Tejas Patil commented on NUTCH-1545: +1 for commit. capture batchId and remove references to segments in 2.x crawl script. -- Key: NUTCH-1545 URL: https://issues.apache.org/jira/browse/NUTCH-1545 Project: Nutch Issue Type: Task Affects Versions: 2.1 Reporter: Lewis John McGibbney Assignee: lufeng Priority: Minor Fix For: 2.2 Attachments: NUTCH-1545.patch, NUTCH-1545-v2.patch The concept of segment is replaced by batchId in 2.x I'm currently getting rid of segments references in 2.x This issue was flagged up and separate from NUTCH-1532 which I am working on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655807#comment-13655807 ] Tejas Patil commented on NUTCH-1545: I dont fully understand the significance of batchID.. but from the latest patch, it seems that the responsibility of generating a random batchId is pushed from code to the end user. What if users pass the same batchId value again and again ? A better deal would be to make that param optional and let the code continue generate a random batchID when users dont provide any batchID. So crawl script would generate a random id and pass to the generate command. Users dont have to bother about the random ID and would not end up re-using stale batchIDs. What say ? capture batchId and remove references to segments in 2.x crawl script. -- Key: NUTCH-1545 URL: https://issues.apache.org/jira/browse/NUTCH-1545 Project: Nutch Issue Type: Task Affects Versions: 2.1 Reporter: Lewis John McGibbney Assignee: lufeng Priority: Minor Fix For: 2.2 Attachments: NUTCH-1545.patch, NUTCH-1545-v2.patch The concept of segment is replaced by batchId in 2.x I'm currently getting rid of segments references in 2.x This issue was flagged up and separate from NUTCH-1532 which I am working on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13615422#comment-13615422 ] lufeng commented on NUTCH-1545: --- yes, the concept of crawldb is not used in 2.x, and grab the generate return batchId is also a TODO issue in bin/crawl script. i will fix these later. thanks Lewis. capture batchId and remove references to segments in 2.x crawl script. -- Key: NUTCH-1545 URL: https://issues.apache.org/jira/browse/NUTCH-1545 Project: Nutch Issue Type: Task Affects Versions: 2.1 Reporter: Lewis John McGibbney Priority: Minor Fix For: 2.2 Attachments: NUTCH-1545.patch The concept of segment is replaced by batchId in 2.x I'm currently getting rid of segments references in 2.x This issue was flagged up and separate from NUTCH-1532 which I am working on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614425#comment-13614425 ] Lewis John McGibbney commented on NUTCH-1545: - There are problems here. Firstly we do not maintain the concept of crawldb locally. We also generate batchId's randomly within the GeneratorJob as follows {code} batchId = (curTime / 1000) + - + randomSeed; {code} We need to capture this value within the crawl script and utilise it in fetching, parsing, etc. capture batchId and remove references to segments in 2.x crawl script. -- Key: NUTCH-1545 URL: https://issues.apache.org/jira/browse/NUTCH-1545 Project: Nutch Issue Type: Task Affects Versions: 2.1 Reporter: Lewis John McGibbney Priority: Minor Fix For: 2.2 Attachments: NUTCH-1545.patch The concept of segment is replaced by batchId in 2.x I'm currently getting rid of segments references in 2.x This issue was flagged up and separate from NUTCH-1532 which I am working on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira