[jira] Issue Comment Edited: (NUTCH-664) Possibility to update already stored documents.
[ https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651458#action_12651458 ] skhil edited comment on NUTCH-664 at 12/2/08 1:29 AM: --- Good news! So, I'll wait until 1.0 and prepare project for hbase-solr! was (Author: skhil): Good news! So, I'll wait until 1.0 and prepare project for hbase-solr/katta/etc! Possibility to update already stored documents. --- Key: NUTCH-664 URL: https://issues.apache.org/jira/browse/NUTCH-664 Project: Nutch Issue Type: Wish Reporter: Sergey Khilkov Priority: Minor We have huge index of stored documents. It is high cost procedure to fetch page, merge indexes any time we update some information about page. The information can be changed 1-3 times per day. At this moment we have to store changed info in database, but in this case we have lots of problems with sorting, search restricions and so on. Lucene itself allows delete single document and add new one into existing index. But there is a problem with hadoop... As I understand hadoop filesystem has no possibility to write in random positions. But it will be great feature if nutch will be able to update created index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-664) Possibility to update already stored documents.
[ https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651458#action_12651458 ] Sergey Khilkov commented on NUTCH-664: -- Good news! So, I'll wait until 1.0 and prepare project for hbase-solr/katta/etc! Possibility to update already stored documents. --- Key: NUTCH-664 URL: https://issues.apache.org/jira/browse/NUTCH-664 Project: Nutch Issue Type: Wish Reporter: Sergey Khilkov Priority: Minor We have huge index of stored documents. It is high cost procedure to fetch page, merge indexes any time we update some information about page. The information can be changed 1-3 times per day. At this moment we have to store changed info in database, but in this case we have lots of problems with sorting, search restricions and so on. Lucene itself allows delete single document and add new one into existing index. But there is a problem with hadoop... As I understand hadoop filesystem has no possibility to write in random positions. But it will be great feature if nutch will be able to update created index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-664) Possibility to update already stored documents.
[ https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650912#action_12650912 ] Sergey Khilkov commented on NUTCH-664: -- Yes, It will be great to have changeDocument() method of IndexWriter class. Hope it's possible ) Possibility to update already stored documents. --- Key: NUTCH-664 URL: https://issues.apache.org/jira/browse/NUTCH-664 Project: Nutch Issue Type: Wish Reporter: Sergey Khilkov Priority: Minor We have huge index of stored documents. It is high cost procedure to fetch page, merge indexes any time we update some information about page. The information can be changed 1-3 times per day. At this moment we have to store changed info in database, but in this case we have lots of problems with sorting, search restricions and so on. Lucene itself allows delete single document and add new one into existing index. But there is a problem with hadoop... As I understand hadoop filesystem has no possibility to write in random positions. But it will be great feature if nutch will be able to update created index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (NUTCH-664) Possibility to update already stored documents.
Possibility to update already stored documents. --- Key: NUTCH-664 URL: https://issues.apache.org/jira/browse/NUTCH-664 Project: Nutch Issue Type: New Feature Reporter: Sergey Khilkov We have huge index of stored documents. It is high cost procedure to fetch page, merge indexes any time we update some information about page. The information can be changed 1-3 times per day. At this moment we have to store changed info in database, but in this case we have lots of problems with sorting, search restricions and so on. Lucene itself allows delete single document and add new one into existing index. But there is a problem with hadoop... As I understand hadoop filesystem has no possibility to write in random positions. But it will be great feature if nutch will be able to update created index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.