[jira] [Commented] (NUTCH-1556) enabling updatedb to accept batchId
[ https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759123#comment-13759123 ] lufeng commented on NUTCH-1556: --- Committed revision 1520332 in 2.x HEAD Thanks kaveh. enabling updatedb to accept batchId Key: NUTCH-1556 URL: https://issues.apache.org/jira/browse/NUTCH-1556 Project: Nutch Issue Type: Improvement Affects Versions: 2.2 Reporter: kaveh minooie Fix For: 2.3 Attachments: NUTCH-1556.patch, NUTCH-1556-v2.patch, NUTCH-1556-v3.patch So the idea here is to be able to run updatedb and fetch for different batchId simultaneously. I put together a patch. it seems to be working ( it does skip the rows that do not match the batchId), but I am worried if and how it might affect the sorting in the reduce part. anyway check it out. it also change the command line usage to this: Usage: DbUpdaterJob (batchId | -all) [-crawlId id] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1556) enabling updatedb to accept batchId
[ https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng resolved NUTCH-1556. --- Resolution: Fixed enabling updatedb to accept batchId Key: NUTCH-1556 URL: https://issues.apache.org/jira/browse/NUTCH-1556 Project: Nutch Issue Type: Improvement Affects Versions: 2.2 Reporter: kaveh minooie Fix For: 2.3 Attachments: NUTCH-1556.patch, NUTCH-1556-v2.patch, NUTCH-1556-v3.patch So the idea here is to be able to run updatedb and fetch for different batchId simultaneously. I put together a patch. it seems to be working ( it does skip the rows that do not match the batchId), but I am worried if and how it might affect the sorting in the reduce part. anyway check it out. it also change the command line usage to this: Usage: DbUpdaterJob (batchId | -all) [-crawlId id] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1556) enabling updatedb to accept batchId
[ https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759168#comment-13759168 ] Hudson commented on NUTCH-1556: --- SUCCESS: Integrated in Nutch-nutchgora #746 (See [https://builds.apache.org/job/Nutch-nutchgora/746/]) NUTCH-1556 enabling updatedb to accept batchId (fenglu: http://svn.apache.org/viewvc/nutch/branches/2.x/?view=revrev=1520332) * /nutch/branches/2.x/CHANGES.txt * /nutch/branches/2.x/src/bin/crawl * /nutch/branches/2.x/src/java/org/apache/nutch/crawl/DbUpdateMapper.java * /nutch/branches/2.x/src/java/org/apache/nutch/crawl/DbUpdaterJob.java enabling updatedb to accept batchId Key: NUTCH-1556 URL: https://issues.apache.org/jira/browse/NUTCH-1556 Project: Nutch Issue Type: Improvement Affects Versions: 2.2 Reporter: kaveh minooie Fix For: 2.3 Attachments: NUTCH-1556.patch, NUTCH-1556-v2.patch, NUTCH-1556-v3.patch So the idea here is to be able to run updatedb and fetch for different batchId simultaneously. I put together a patch. it seems to be working ( it does skip the rows that do not match the batchId), but I am worried if and how it might affect the sorting in the reduce part. anyway check it out. it also change the command line usage to this: Usage: DbUpdaterJob (batchId | -all) [-crawlId id] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1517) CloudSearch indexer
[ https://issues.apache.org/jira/browse/NUTCH-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759439#comment-13759439 ] Daniel Ciborowski commented on NUTCH-1517: -- Does the above patch disable solr indexing? CloudSearch indexer --- Key: NUTCH-1517 URL: https://issues.apache.org/jira/browse/NUTCH-1517 Project: Nutch Issue Type: New Feature Components: indexer Reporter: Julien Nioche Fix For: 1.9 Attachments: 0023883254_1377197869_indexer-cloudsearch.patch Once we have made the indexers pluggable, we should add a plugin for Amazon CloudSearch. See http://aws.amazon.com/cloudsearch/. Apparently it uses a JSON based representation Search Data Format (SDF), which we could reuse for a file based indexer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1517) CloudSearch indexer
[ https://issues.apache.org/jira/browse/NUTCH-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759865#comment-13759865 ] Tom Hill commented on NUTCH-1517: - I believe you can configure either in nutch-default.xml, but not both. CloudSearch indexer --- Key: NUTCH-1517 URL: https://issues.apache.org/jira/browse/NUTCH-1517 Project: Nutch Issue Type: New Feature Components: indexer Reporter: Julien Nioche Fix For: 1.9 Attachments: 0023883254_1377197869_indexer-cloudsearch.patch Once we have made the indexers pluggable, we should add a plugin for Amazon CloudSearch. See http://aws.amazon.com/cloudsearch/. Apparently it uses a JSON based representation Search Data Format (SDF), which we could reuse for a file based indexer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira