[jira] [Commented] (NUTCH-1556) enabling updatedb to accept batchId

2013-09-05 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759123#comment-13759123
 ] 

lufeng commented on NUTCH-1556:
---

Committed revision 1520332 in 2.x HEAD
Thanks kaveh. 

 enabling updatedb to accept batchId 
 

 Key: NUTCH-1556
 URL: https://issues.apache.org/jira/browse/NUTCH-1556
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 2.2
Reporter: kaveh minooie
 Fix For: 2.3

 Attachments: NUTCH-1556.patch, NUTCH-1556-v2.patch, 
 NUTCH-1556-v3.patch


 So the idea here is to be able to run updatedb and fetch for different 
 batchId simultaneously. I put together a patch. it seems to be working ( it 
 does skip the rows that do not match the batchId), but I am worried if and 
 how it might affect the sorting in the reduce part. anyway check it out. 
 it also change the command line usage to this:
 Usage: DbUpdaterJob (batchId | -all) [-crawlId id]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (NUTCH-1556) enabling updatedb to accept batchId

2013-09-05 Thread lufeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lufeng resolved NUTCH-1556.
---

Resolution: Fixed

 enabling updatedb to accept batchId 
 

 Key: NUTCH-1556
 URL: https://issues.apache.org/jira/browse/NUTCH-1556
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 2.2
Reporter: kaveh minooie
 Fix For: 2.3

 Attachments: NUTCH-1556.patch, NUTCH-1556-v2.patch, 
 NUTCH-1556-v3.patch


 So the idea here is to be able to run updatedb and fetch for different 
 batchId simultaneously. I put together a patch. it seems to be working ( it 
 does skip the rows that do not match the batchId), but I am worried if and 
 how it might affect the sorting in the reduce part. anyway check it out. 
 it also change the command line usage to this:
 Usage: DbUpdaterJob (batchId | -all) [-crawlId id]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1556) enabling updatedb to accept batchId

2013-09-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759168#comment-13759168
 ] 

Hudson commented on NUTCH-1556:
---

SUCCESS: Integrated in Nutch-nutchgora #746 (See 
[https://builds.apache.org/job/Nutch-nutchgora/746/])
NUTCH-1556 enabling updatedb to accept batchId (fenglu: 
http://svn.apache.org/viewvc/nutch/branches/2.x/?view=revrev=1520332)
* /nutch/branches/2.x/CHANGES.txt
* /nutch/branches/2.x/src/bin/crawl
* /nutch/branches/2.x/src/java/org/apache/nutch/crawl/DbUpdateMapper.java
* /nutch/branches/2.x/src/java/org/apache/nutch/crawl/DbUpdaterJob.java


 enabling updatedb to accept batchId 
 

 Key: NUTCH-1556
 URL: https://issues.apache.org/jira/browse/NUTCH-1556
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 2.2
Reporter: kaveh minooie
 Fix For: 2.3

 Attachments: NUTCH-1556.patch, NUTCH-1556-v2.patch, 
 NUTCH-1556-v3.patch


 So the idea here is to be able to run updatedb and fetch for different 
 batchId simultaneously. I put together a patch. it seems to be working ( it 
 does skip the rows that do not match the batchId), but I am worried if and 
 how it might affect the sorting in the reduce part. anyway check it out. 
 it also change the command line usage to this:
 Usage: DbUpdaterJob (batchId | -all) [-crawlId id]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1517) CloudSearch indexer

2013-09-05 Thread Daniel Ciborowski (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759439#comment-13759439
 ] 

Daniel Ciborowski commented on NUTCH-1517:
--

Does the above patch disable solr indexing?

 CloudSearch indexer
 ---

 Key: NUTCH-1517
 URL: https://issues.apache.org/jira/browse/NUTCH-1517
 Project: Nutch
  Issue Type: New Feature
  Components: indexer
Reporter: Julien Nioche
 Fix For: 1.9

 Attachments: 0023883254_1377197869_indexer-cloudsearch.patch


 Once we have made the indexers pluggable, we should add a plugin for Amazon 
 CloudSearch. See http://aws.amazon.com/cloudsearch/. Apparently it uses a 
 JSON based representation Search Data Format (SDF), which we could reuse for 
 a file based indexer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1517) CloudSearch indexer

2013-09-05 Thread Tom Hill (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759865#comment-13759865
 ] 

Tom Hill commented on NUTCH-1517:
-

I believe you can configure either in nutch-default.xml, but not both.



 CloudSearch indexer
 ---

 Key: NUTCH-1517
 URL: https://issues.apache.org/jira/browse/NUTCH-1517
 Project: Nutch
  Issue Type: New Feature
  Components: indexer
Reporter: Julien Nioche
 Fix For: 1.9

 Attachments: 0023883254_1377197869_indexer-cloudsearch.patch


 Once we have made the indexers pluggable, we should add a plugin for Amazon 
 CloudSearch. See http://aws.amazon.com/cloudsearch/. Apparently it uses a 
 JSON based representation Search Data Format (SDF), which we could reuse for 
 a file based indexer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira