[ https://issues.apache.org/jira/browse/NUTCH-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liuqibj updated NUTCH-2143: --------------------------- Attachment: patch The fix patch attached:) > GeneratorJob ignores batch id passed as argument > ------------------------------------------------ > > Key: NUTCH-2143 > URL: https://issues.apache.org/jira/browse/NUTCH-2143 > Project: Nutch > Issue Type: Bug > Components: generator > Affects Versions: 2.3.1 > Reporter: Sebastian Nagel > Assignee: Lewis John McGibbney > Priority: Blocker > Fix For: 2.3.1 > > Attachments: patch > > > The batch id passed to GeneratorJob by option/argument -batchId <id> is > ignored and a generated batch id is used to mark the current batch. Log > snippets from a run of bin/crawl: > {noformat} > bin/nutch generate ... -batchId 1444941073-14208 > ... > GeneratorJob: generated batch id: 1444941074-858443668 containing 1 URLs > Fetching : > bin/nutch fetch ... 1444941073-14208 ... > ... > QueueFeeder finished: total 0 records. Hit by time limit :0 > {noformat} > The generated URLs are marked with the wrong batch id: > {noformat} > hbase(main):010:0> scan 'test_webpage' > ROW COLUMN+CELL > org.apache.nutch:http/ column=f:bid, timestamp=1444941077080, > value=1444941074-858443668 > ... > org.apache.nutch:http/ column=mk:_gnmrk_, timestamp=1444941077080, > value=1444941074-858443668 > {noformat} > and fetcher will not fetch anything. This problem was reported by Sherban > Drulea > [[1|https://www.mail-archive.com/user@nutch.apache.org/msg13894.html]], > [[2|https://www.mail-archive.com/user@nutch.apache.org/msg13912.html]]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)