[jira] [Comment Edited] (NUTCH-1556) enabling updatedb to accept batchId

2013-12-04 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839242#comment-13839242
 ] 

Otis Gospodnetic edited comment on NUTCH-1556 at 12/4/13 7:23 PM:
--

Reopening because this issue has a new patch that should be committed.

Wait the patch that was added here is the same as the patch in NUTCH-1667.


was (Author: otis):
Reopening because this issue has a new patch that should be committed.

> enabling updatedb to accept batchId 
> 
>
> Key: NUTCH-1556
> URL: https://issues.apache.org/jira/browse/NUTCH-1556
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 2.2
>Reporter: kaveh minooie
> Fix For: 2.3
>
> Attachments: NUTCH-1556-batchId.patch, NUTCH-1556-v2.patch, 
> NUTCH-1556-v3.patch, NUTCH-1556.patch
>
>
> So the idea here is to be able to run updatedb and fetch for different 
> batchId simultaneously. I put together a patch. it seems to be working ( it 
> does skip the rows that do not match the batchId), but I am worried if and 
> how it might affect the sorting in the reduce part. anyway check it out. 
> it also change the command line usage to this:
> Usage: DbUpdaterJob ( | -all) [-crawlId ]



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (NUTCH-1556) enabling updatedb to accept batchId

2014-02-05 Thread Koen Smets (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892075#comment-13892075
 ] 

Koen Smets edited comment on NUTCH-1556 at 2/5/14 1:08 PM:
---

Should be reconsidered as this causes a lot of refetched pages as indicated by 
NUTCH-1679.


was (Author: ksmets):
Should be reconsidered as this causes a lot of already fetched pages as 
indicated by NUTCH-1679.

> enabling updatedb to accept batchId 
> 
>
> Key: NUTCH-1556
> URL: https://issues.apache.org/jira/browse/NUTCH-1556
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 2.2
>Reporter: kaveh minooie
> Fix For: 2.3
>
> Attachments: NUTCH-1556-batchId.patch, NUTCH-1556-v2.patch, 
> NUTCH-1556-v3.patch, NUTCH-1556.patch
>
>
> So the idea here is to be able to run updatedb and fetch for different 
> batchId simultaneously. I put together a patch. it seems to be working ( it 
> does skip the rows that do not match the batchId), but I am worried if and 
> how it might affect the sorting in the reduce part. anyway check it out. 
> it also change the command line usage to this:
> Usage: DbUpdaterJob ( | -all) [-crawlId ]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)