Re: Incremental Field Updates
for this scenario on how it is handled in your systems? Thanks, [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA Shai and I would like to start working on the proposal to Incremental Field Updates outlined here ( http://markmail.org/message/zhrdxxpfk6qvdaex ). View on issues.apache.org Preview by Yahoo --- Thanks n Regards, Sandeep Ramesh Khanzode - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org javascript:; For additional commands, e-mail: java-user-h...@lucene.apache.org javascript:;
Re: Incremental Field Updates
In case of sorting, updatable DocValues may be what you are looking for. But updatable fields for searching is a different beast. A sample approach is documented at http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/ The general problems with updatable postings-list AFAIK are 1. Impossible to correctly score updated documents 2. Segment Merges could miss out updates 3. Might behave in-correctly with NRT 4. Freq updates could end-up creating lots of files because of append-only nature of lucene... May be if you are not too worried about scoring, correct NRT behavior etc you can attempt a solution like the RedisCodec stuff... Segregating static dynamic fields into 2 separate indexes as described here http://www.lucenerevolution.org/2013/Sidecar-Index-Solr-Components-for-Parallel-Index-Management may be of some use to you -- Ravi On Wed, Jul 2, 2014 at 7:29 PM, Shai Erera ser...@gmail.com wrote: Using BinaryDocValues is not recommended for all scenarios. It is a catchall alternative to the other DocValues types. I would not use it unless it makes sense for your application, even if it means that you need to re-index a document in order to update a single field. DocValues are not good for search - by search I assume you mean take a query such as apache AND lucene and find all documents which contain both terms under the same field. They are good for sorting and faceting though. So I guess the answer to your question is it depends (it always is!) - I would use DocValues for sorting and faceting, but not for regular search queries. And I would use BinaryDocValues only when the other DocValues types don't match. Also, note that the current field-level update of DocValues is not always better than re-indexing the document, you can read here for more details: http://shaierera.blogspot.com/2014/04/benchmarking-updatable-docvalues.html Shai On Tue, Jul 1, 2014 at 9:17 PM, Sandeep Khanzode sandeep_khanz...@yahoo.com.invalid wrote: Hi Shai, So one follow-up question. Assume that my use case is to have approx. ~50M documents indexed with each document having about ~10-15 indexed but not stored fields. These fields will never change, but there are another ~5-6 fields that will change and will continue to change after the index is written. These ~5-6 fields may also be multivalued. The size of this index turns out to be ~120GB. In this case, I would like to sort or facet or search on these ~5-6 fields. Which approach do you suggest? Should I use BinaryDocValues and update using IW or use either a ParallelReader/Join query. --- Thanks n Regards, Sandeep Ramesh Khanzode On Tuesday, July 1, 2014 9:53 PM, Shai Erera ser...@gmail.com wrote: Except that Lucene now offers efficient numeric and binary DocValues updates. See IndexWriter.updateNumeric/Binary... On Jul 1, 2014 5:51 PM, Erick Erickson erickerick...@gmail.com wrote: This JIRA is complicated, don't really expect it in 4.9 as it's been hanging around for quite a while. Everyone would like this, but it's not easy. Atomic updates will work, but you have to stored=true for all source fields. Under the covers this actually reads the document out of the stored fields, deletes the old one and adds it over again. FWIW, Erick On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode sandeep_khanz...@yahoo.com.invalid wrote: Hi, I wanted to know of the best approach to follow if a few fields in my indexed documents are changing at run time (after index and before or during search), but a majority of them are created at index time. I could see the JIRA given below but it is scheduled for Lucene 4.9, I believe. There are a few other approaches, like maintaining a separate index for changing fields and use either a parallelreader or use a Join. Can everyone share their experience for this scenario on how it is handled in your systems? Thanks, [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA Shai and I would like to start working on the proposal to Incremental Field Updates outlined here ( http://markmail.org/message/zhrdxxpfk6qvdaex ). View on issues.apache.org Preview by Yahoo --- Thanks n Regards, Sandeep Ramesh Khanzode - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Incremental Field Updates
Thanks Ravi, it is good to know general problem with updatable field. In our use-case where we have few fields which update more frequently then main index. We are using this SOLR join contrib patch with DocTransformer for returning data from join core. But this approach has some performance impact if that performance hit acceptable for your use use-case then you can give a try if you are using SOLR. https://issues.apache.org/jira/browse/SOLR-4787 On Thu, Jul 3, 2014 at 3:22 AM, Ravikumar Govindarajan ravikumar.govindara...@gmail.com wrote: In case of sorting, updatable DocValues may be what you are looking for. But updatable fields for searching is a different beast. A sample approach is documented at http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/ The general problems with updatable postings-list AFAIK are 1. Impossible to correctly score updated documents 2. Segment Merges could miss out updates 3. Might behave in-correctly with NRT 4. Freq updates could end-up creating lots of files because of append-only nature of lucene... May be if you are not too worried about scoring, correct NRT behavior etc you can attempt a solution like the RedisCodec stuff... Segregating static dynamic fields into 2 separate indexes as described here http://www.lucenerevolution.org/2013/Sidecar-Index-Solr-Components-for-Parallel-Index-Management may be of some use to you -- Ravi On Wed, Jul 2, 2014 at 7:29 PM, Shai Erera ser...@gmail.com wrote: Using BinaryDocValues is not recommended for all scenarios. It is a catchall alternative to the other DocValues types. I would not use it unless it makes sense for your application, even if it means that you need to re-index a document in order to update a single field. DocValues are not good for search - by search I assume you mean take a query such as apache AND lucene and find all documents which contain both terms under the same field. They are good for sorting and faceting though. So I guess the answer to your question is it depends (it always is!) - I would use DocValues for sorting and faceting, but not for regular search queries. And I would use BinaryDocValues only when the other DocValues types don't match. Also, note that the current field-level update of DocValues is not always better than re-indexing the document, you can read here for more details: http://shaierera.blogspot.com/2014/04/benchmarking-updatable-docvalues.html Shai On Tue, Jul 1, 2014 at 9:17 PM, Sandeep Khanzode sandeep_khanz...@yahoo.com.invalid wrote: Hi Shai, So one follow-up question. Assume that my use case is to have approx. ~50M documents indexed with each document having about ~10-15 indexed but not stored fields. These fields will never change, but there are another ~5-6 fields that will change and will continue to change after the index is written. These ~5-6 fields may also be multivalued. The size of this index turns out to be ~120GB. In this case, I would like to sort or facet or search on these ~5-6 fields. Which approach do you suggest? Should I use BinaryDocValues and update using IW or use either a ParallelReader/Join query. --- Thanks n Regards, Sandeep Ramesh Khanzode On Tuesday, July 1, 2014 9:53 PM, Shai Erera ser...@gmail.com wrote: Except that Lucene now offers efficient numeric and binary DocValues updates. See IndexWriter.updateNumeric/Binary... On Jul 1, 2014 5:51 PM, Erick Erickson erickerick...@gmail.com wrote: This JIRA is complicated, don't really expect it in 4.9 as it's been hanging around for quite a while. Everyone would like this, but it's not easy. Atomic updates will work, but you have to stored=true for all source fields. Under the covers this actually reads the document out of the stored fields, deletes the old one and adds it over again. FWIW, Erick On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode sandeep_khanz...@yahoo.com.invalid wrote: Hi, I wanted to know of the best approach to follow if a few fields in my indexed documents are changing at run time (after index and before or during search), but a majority of them are created at index time. I could see the JIRA given below but it is scheduled for Lucene 4.9, I believe. There are a few other approaches, like maintaining a separate index for changing fields and use either a parallelreader or use a Join. Can everyone share their experience for this scenario on how it is handled in your systems? Thanks, [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA Shai and I would like to start working
Re: Incremental Field Updates
Using BinaryDocValues is not recommended for all scenarios. It is a catchall alternative to the other DocValues types. I would not use it unless it makes sense for your application, even if it means that you need to re-index a document in order to update a single field. DocValues are not good for search - by search I assume you mean take a query such as apache AND lucene and find all documents which contain both terms under the same field. They are good for sorting and faceting though. So I guess the answer to your question is it depends (it always is!) - I would use DocValues for sorting and faceting, but not for regular search queries. And I would use BinaryDocValues only when the other DocValues types don't match. Also, note that the current field-level update of DocValues is not always better than re-indexing the document, you can read here for more details: http://shaierera.blogspot.com/2014/04/benchmarking-updatable-docvalues.html Shai On Tue, Jul 1, 2014 at 9:17 PM, Sandeep Khanzode sandeep_khanz...@yahoo.com.invalid wrote: Hi Shai, So one follow-up question. Assume that my use case is to have approx. ~50M documents indexed with each document having about ~10-15 indexed but not stored fields. These fields will never change, but there are another ~5-6 fields that will change and will continue to change after the index is written. These ~5-6 fields may also be multivalued. The size of this index turns out to be ~120GB. In this case, I would like to sort or facet or search on these ~5-6 fields. Which approach do you suggest? Should I use BinaryDocValues and update using IW or use either a ParallelReader/Join query. --- Thanks n Regards, Sandeep Ramesh Khanzode On Tuesday, July 1, 2014 9:53 PM, Shai Erera ser...@gmail.com wrote: Except that Lucene now offers efficient numeric and binary DocValues updates. See IndexWriter.updateNumeric/Binary... On Jul 1, 2014 5:51 PM, Erick Erickson erickerick...@gmail.com wrote: This JIRA is complicated, don't really expect it in 4.9 as it's been hanging around for quite a while. Everyone would like this, but it's not easy. Atomic updates will work, but you have to stored=true for all source fields. Under the covers this actually reads the document out of the stored fields, deletes the old one and adds it over again. FWIW, Erick On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode sandeep_khanz...@yahoo.com.invalid wrote: Hi, I wanted to know of the best approach to follow if a few fields in my indexed documents are changing at run time (after index and before or during search), but a majority of them are created at index time. I could see the JIRA given below but it is scheduled for Lucene 4.9, I believe. There are a few other approaches, like maintaining a separate index for changing fields and use either a parallelreader or use a Join. Can everyone share their experience for this scenario on how it is handled in your systems? Thanks, [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA Shai and I would like to start working on the proposal to Incremental Field Updates outlined here ( http://markmail.org/message/zhrdxxpfk6qvdaex ). View on issues.apache.org Preview by Yahoo --- Thanks n Regards, Sandeep Ramesh Khanzode - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Incremental Field Updates
Hi, I wanted to know of the best approach to follow if a few fields in my indexed documents are changing at run time (after index and before or during search), but a majority of them are created at index time. I could see the JIRA given below but it is scheduled for Lucene 4.9, I believe. There are a few other approaches, like maintaining a separate index for changing fields and use either a parallelreader or use a Join. Can everyone share their experience for this scenario on how it is handled in your systems? Thanks, [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA Shai and I would like to start working on the proposal to Incremental Field Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex). View on issues.apache.org Preview by Yahoo --- Thanks n Regards, Sandeep Ramesh Khanzode
Re: Incremental Field Updates
This JIRA is complicated, don't really expect it in 4.9 as it's been hanging around for quite a while. Everyone would like this, but it's not easy. Atomic updates will work, but you have to stored=true for all source fields. Under the covers this actually reads the document out of the stored fields, deletes the old one and adds it over again. FWIW, Erick On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode sandeep_khanz...@yahoo.com.invalid wrote: Hi, I wanted to know of the best approach to follow if a few fields in my indexed documents are changing at run time (after index and before or during search), but a majority of them are created at index time. I could see the JIRA given below but it is scheduled for Lucene 4.9, I believe. There are a few other approaches, like maintaining a separate index for changing fields and use either a parallelreader or use a Join. Can everyone share their experience for this scenario on how it is handled in your systems? Thanks, [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA Shai and I would like to start working on the proposal to Incremental Field Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex). View on issues.apache.org Preview by Yahoo --- Thanks n Regards, Sandeep Ramesh Khanzode - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Incremental Field Updates
Except that Lucene now offers efficient numeric and binary DocValues updates. See IndexWriter.updateNumeric/Binary... On Jul 1, 2014 5:51 PM, Erick Erickson erickerick...@gmail.com wrote: This JIRA is complicated, don't really expect it in 4.9 as it's been hanging around for quite a while. Everyone would like this, but it's not easy. Atomic updates will work, but you have to stored=true for all source fields. Under the covers this actually reads the document out of the stored fields, deletes the old one and adds it over again. FWIW, Erick On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode sandeep_khanz...@yahoo.com.invalid wrote: Hi, I wanted to know of the best approach to follow if a few fields in my indexed documents are changing at run time (after index and before or during search), but a majority of them are created at index time. I could see the JIRA given below but it is scheduled for Lucene 4.9, I believe. There are a few other approaches, like maintaining a separate index for changing fields and use either a parallelreader or use a Join. Can everyone share their experience for this scenario on how it is handled in your systems? Thanks, [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA Shai and I would like to start working on the proposal to Incremental Field Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex ). View on issues.apache.org Preview by Yahoo --- Thanks n Regards, Sandeep Ramesh Khanzode - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Incremental Field Updates
Hi Shai, So one follow-up question. Assume that my use case is to have approx. ~50M documents indexed with each document having about ~10-15 indexed but not stored fields. These fields will never change, but there are another ~5-6 fields that will change and will continue to change after the index is written. These ~5-6 fields may also be multivalued. The size of this index turns out to be ~120GB. In this case, I would like to sort or facet or search on these ~5-6 fields. Which approach do you suggest? Should I use BinaryDocValues and update using IW or use either a ParallelReader/Join query. --- Thanks n Regards, Sandeep Ramesh Khanzode On Tuesday, July 1, 2014 9:53 PM, Shai Erera ser...@gmail.com wrote: Except that Lucene now offers efficient numeric and binary DocValues updates. See IndexWriter.updateNumeric/Binary... On Jul 1, 2014 5:51 PM, Erick Erickson erickerick...@gmail.com wrote: This JIRA is complicated, don't really expect it in 4.9 as it's been hanging around for quite a while. Everyone would like this, but it's not easy. Atomic updates will work, but you have to stored=true for all source fields. Under the covers this actually reads the document out of the stored fields, deletes the old one and adds it over again. FWIW, Erick On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode sandeep_khanz...@yahoo.com.invalid wrote: Hi, I wanted to know of the best approach to follow if a few fields in my indexed documents are changing at run time (after index and before or during search), but a majority of them are created at index time. I could see the JIRA given below but it is scheduled for Lucene 4.9, I believe. There are a few other approaches, like maintaining a separate index for changing fields and use either a parallelreader or use a Join. Can everyone share their experience for this scenario on how it is handled in your systems? Thanks, [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA Shai and I would like to start working on the proposal to Incremental Field Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex ). View on issues.apache.org Preview by Yahoo --- Thanks n Regards, Sandeep Ramesh Khanzode - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org