[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-05-30 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13670376#comment-13670376
 ] 

lufeng commented on NUTCH-1545:
---

Committed for nutch 2.2 revision 1487875. by Feng. Thanks Tejas and Lewis.

 capture batchId and remove references to segments in 2.x crawl script.
 --

 Key: NUTCH-1545
 URL: https://issues.apache.org/jira/browse/NUTCH-1545
 Project: Nutch
  Issue Type: Task
Affects Versions: 2.1
Reporter: Lewis John McGibbney
Assignee: lufeng
Priority: Minor
 Fix For: 2.3

 Attachments: NUTCH-1545.patch, NUTCH-1545-v2.patch


 The concept of segment is replaced by batchId in 2.x
 I'm currently getting rid of segments references in 2.x
 This issue was flagged up and separate from NUTCH-1532 which I am working on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-05-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13670399#comment-13670399
 ] 

Hudson commented on NUTCH-1545:
---

Integrated in Nutch-nutchgora #625 (See 
[https://builds.apache.org/job/Nutch-nutchgora/625/])
NUTCH-1545 capture batchId and remove references to segments in 2.x crawl 
script. (Revision 1487875)

 Result = SUCCESS
fenglu : http://svn.apache.org/viewvc/nutch/branches/2.x/?view=revrev=1487875
Files : 
* /nutch/branches/2.x/CHANGES.txt
* /nutch/branches/2.x/src/bin/crawl
* /nutch/branches/2.x/src/java/org/apache/nutch/crawl/GeneratorJob.java


 capture batchId and remove references to segments in 2.x crawl script.
 --

 Key: NUTCH-1545
 URL: https://issues.apache.org/jira/browse/NUTCH-1545
 Project: Nutch
  Issue Type: Task
Affects Versions: 2.1
Reporter: Lewis John McGibbney
Assignee: lufeng
Priority: Minor
 Fix For: 2.2

 Attachments: NUTCH-1545.patch, NUTCH-1545-v2.patch


 The concept of segment is replaced by batchId in 2.x
 I'm currently getting rid of segments references in 2.x
 This issue was flagged up and separate from NUTCH-1532 which I am working on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-05-29 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669392#comment-13669392
 ] 

Lewis John McGibbney commented on NUTCH-1545:
-

Hi Feng, do you want to commit the fix for this? This is the issue that Chris 
is having on the mailing list and we really should get the fix in to 2.2.

 capture batchId and remove references to segments in 2.x crawl script.
 --

 Key: NUTCH-1545
 URL: https://issues.apache.org/jira/browse/NUTCH-1545
 Project: Nutch
  Issue Type: Task
Affects Versions: 2.1
Reporter: Lewis John McGibbney
Assignee: lufeng
Priority: Minor
 Fix For: 2.3

 Attachments: NUTCH-1545.patch, NUTCH-1545-v2.patch


 The concept of segment is replaced by batchId in 2.x
 I'm currently getting rid of segments references in 2.x
 This issue was flagged up and separate from NUTCH-1532 which I am working on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-05-20 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662057#comment-13662057
 ] 

lufeng commented on NUTCH-1545:
---

Hi Tejas

yes, the patch is just put random batchId generater from code to crawl script. 
User don't have to bother this.

 capture batchId and remove references to segments in 2.x crawl script.
 --

 Key: NUTCH-1545
 URL: https://issues.apache.org/jira/browse/NUTCH-1545
 Project: Nutch
  Issue Type: Task
Affects Versions: 2.1
Reporter: Lewis John McGibbney
Assignee: lufeng
Priority: Minor
 Fix For: 2.2

 Attachments: NUTCH-1545.patch, NUTCH-1545-v2.patch


 The concept of segment is replaced by batchId in 2.x
 I'm currently getting rid of segments references in 2.x
 This issue was flagged up and separate from NUTCH-1532 which I am working on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-05-20 Thread Tejas Patil (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662230#comment-13662230
 ] 

Tejas Patil commented on NUTCH-1545:


+1 for commit.

 capture batchId and remove references to segments in 2.x crawl script.
 --

 Key: NUTCH-1545
 URL: https://issues.apache.org/jira/browse/NUTCH-1545
 Project: Nutch
  Issue Type: Task
Affects Versions: 2.1
Reporter: Lewis John McGibbney
Assignee: lufeng
Priority: Minor
 Fix For: 2.2

 Attachments: NUTCH-1545.patch, NUTCH-1545-v2.patch


 The concept of segment is replaced by batchId in 2.x
 I'm currently getting rid of segments references in 2.x
 This issue was flagged up and separate from NUTCH-1532 which I am working on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-05-13 Thread Tejas Patil (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13655807#comment-13655807
 ] 

Tejas Patil commented on NUTCH-1545:


I dont fully understand the significance of batchID.. but from the latest 
patch, it seems that the responsibility of generating a random batchId is 
pushed from code to the end user. What if users pass the same batchId value 
again and again ?

A better deal would be to make that param optional and let the code continue 
generate a random batchID when users dont provide any batchID. So crawl script 
would generate a random id and pass to the generate command. Users dont have to 
bother about the random ID and would not end up re-using stale batchIDs.
What say ?

 capture batchId and remove references to segments in 2.x crawl script.
 --

 Key: NUTCH-1545
 URL: https://issues.apache.org/jira/browse/NUTCH-1545
 Project: Nutch
  Issue Type: Task
Affects Versions: 2.1
Reporter: Lewis John McGibbney
Assignee: lufeng
Priority: Minor
 Fix For: 2.2

 Attachments: NUTCH-1545.patch, NUTCH-1545-v2.patch


 The concept of segment is replaced by batchId in 2.x
 I'm currently getting rid of segments references in 2.x
 This issue was flagged up and separate from NUTCH-1532 which I am working on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-03-27 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13615422#comment-13615422
 ] 

lufeng commented on NUTCH-1545:
---

yes, the concept of crawldb is not used in 2.x, and grab the generate return 
batchId is also a TODO issue in bin/crawl script. i will fix these later. 
thanks Lewis.

 capture batchId and remove references to segments in 2.x crawl script.
 --

 Key: NUTCH-1545
 URL: https://issues.apache.org/jira/browse/NUTCH-1545
 Project: Nutch
  Issue Type: Task
Affects Versions: 2.1
Reporter: Lewis John McGibbney
Priority: Minor
 Fix For: 2.2

 Attachments: NUTCH-1545.patch


 The concept of segment is replaced by batchId in 2.x
 I'm currently getting rid of segments references in 2.x
 This issue was flagged up and separate from NUTCH-1532 which I am working on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-03-26 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614425#comment-13614425
 ] 

Lewis John McGibbney commented on NUTCH-1545:
-

There are problems here.
Firstly we do not maintain the concept of crawldb locally.
We also generate batchId's randomly within the GeneratorJob as follows
{code}
batchId = (curTime / 1000) + - + randomSeed;
{code}
We need to capture this value within the crawl script and utilise it in 
fetching, parsing, etc.

 capture batchId and remove references to segments in 2.x crawl script.
 --

 Key: NUTCH-1545
 URL: https://issues.apache.org/jira/browse/NUTCH-1545
 Project: Nutch
  Issue Type: Task
Affects Versions: 2.1
Reporter: Lewis John McGibbney
Priority: Minor
 Fix For: 2.2

 Attachments: NUTCH-1545.patch


 The concept of segment is replaced by batchId in 2.x
 I'm currently getting rid of segments references in 2.x
 This issue was flagged up and separate from NUTCH-1532 which I am working on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira