Re: Incremental Field Updates

2014-07-08 Thread Ravikumar Govindarajan
 for this scenario on how it
 is
 handled in your systems? Thanks,
 
  [LUCENE-4258] Incremental Field Updates through Stacked Segments
 -
   ASF
 JIRA
 
 
   [LUCENE-4258] Incremental Field Updates through Stacked
 Segments -
   ASF
 JIRA
  Shai and I would like to start working on the proposal to
  Incremental
 Field Updates outlined here (
http://markmail.org/message/zhrdxxpfk6qvdaex
 ).
  View on issues.apache.org Preview by Yahoo
 
 
  ---
  Thanks n Regards,
  Sandeep Ramesh Khanzode


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 javascript:;
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 javascript:;


  
 



Re: Incremental Field Updates

2014-07-03 Thread Ravikumar Govindarajan
In case of sorting, updatable DocValues may be what you are looking for.

But updatable fields for searching is a different beast.

A sample approach is documented at
http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/

The general problems with updatable postings-list AFAIK are

1. Impossible to correctly score updated documents
2. Segment Merges could miss out updates
3. Might behave in-correctly with NRT
4. Freq updates could end-up creating lots of files because of append-only
nature of lucene...

May be if you are not too worried about scoring, correct NRT behavior etc
you can attempt a solution like the RedisCodec stuff...

Segregating static  dynamic fields into 2 separate indexes as described
here
http://www.lucenerevolution.org/2013/Sidecar-Index-Solr-Components-for-Parallel-Index-Management
may be of some use to you

--
Ravi



On Wed, Jul 2, 2014 at 7:29 PM, Shai Erera ser...@gmail.com wrote:

 Using BinaryDocValues is not recommended for all scenarios. It is a
 catchall alternative to the other DocValues types. I would not use it
 unless it makes sense for your application, even if it means that you need
 to re-index a document in order to update a single field.

 DocValues are not good for search - by search I assume you mean take a
 query such as apache AND lucene and find all documents which contain both
 terms under the same field. They are good for sorting and faceting though.

 So I guess the answer to your question is it depends (it always is!) - I
 would use DocValues for sorting and faceting, but not for regular search
 queries. And I would use BinaryDocValues only when the other DocValues
 types don't match.

 Also, note that the current field-level update of DocValues is not always
 better than re-indexing the document, you can read here for more details:
 http://shaierera.blogspot.com/2014/04/benchmarking-updatable-docvalues.html

 Shai


 On Tue, Jul 1, 2014 at 9:17 PM, Sandeep Khanzode 
 sandeep_khanz...@yahoo.com.invalid wrote:

  Hi Shai,
 
  So one follow-up question.
 
  Assume that my use case is to have approx. ~50M documents indexed with
  each document having about ~10-15 indexed but not stored fields. These
  fields will never change, but there are another ~5-6 fields that will
  change and will continue to change after the index is written. These ~5-6
  fields may also be multivalued. The size of this index turns out to be
  ~120GB.
 
  In this case, I would like to sort or facet or search on these ~5-6
  fields. Which approach do you suggest? Should I use BinaryDocValues and
  update using IW or use either a ParallelReader/Join query.
 
  ---
  Thanks n Regards,
  Sandeep Ramesh Khanzode
 
 
  On Tuesday, July 1, 2014 9:53 PM, Shai Erera ser...@gmail.com wrote:
 
 
 
  Except that Lucene now offers efficient numeric and binary DocValues
  updates. See IndexWriter.updateNumeric/Binary...
 
  On Jul 1, 2014 5:51 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
   This JIRA is complicated, don't really expect it in 4.9 as it's
   been hanging around for quite a while. Everyone would like this,
   but it's not easy.
  
   Atomic updates will work, but you have to stored=true for all
   source fields. Under the covers this actually reads the document
   out of the stored fields, deletes the old one and adds it
   over again.
  
   FWIW,
   Erick
  
   On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
   sandeep_khanz...@yahoo.com.invalid wrote:
Hi,
   
I wanted to know of the best approach to follow if a few fields in my
   indexed documents are changing at run time (after index and before or
   during search), but a majority of them are created at index time.
   
I could see the JIRA given below but it is scheduled for Lucene 4.9,
 I
   believe.
   
There are a few other approaches, like maintaining a separate index
 for
   changing fields and use either a parallelreader or use a Join.
   
Can everyone share their experience for this scenario on how it is
   handled in your systems? Thanks,
   
[LUCENE-4258] Incremental Field Updates through Stacked Segments -
 ASF
   JIRA
   
   
 [LUCENE-4258] Incremental Field Updates through Stacked Segments -
 ASF
   JIRA
Shai and I would like to start working on the proposal to Incremental
   Field Updates outlined here (
  http://markmail.org/message/zhrdxxpfk6qvdaex
   ).
View on issues.apache.org Preview by Yahoo
   
   
---
Thanks n Regards,
Sandeep Ramesh Khanzode
  
   -
   To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
   For additional commands, e-mail: java-user-h...@lucene.apache.org
  
  



Re: Incremental Field Updates

2014-07-03 Thread Gopal Patwa
Thanks Ravi, it is good to know general problem with updatable field. In
our use-case where we have few fields which update more frequently then
main index. We are using this SOLR join contrib patch with DocTransformer
for returning data from join core. But this approach has some performance
impact if that performance hit acceptable for your use use-case then you
can give a try if you are using SOLR.

https://issues.apache.org/jira/browse/SOLR-4787





On Thu, Jul 3, 2014 at 3:22 AM, Ravikumar Govindarajan 
ravikumar.govindara...@gmail.com wrote:

 In case of sorting, updatable DocValues may be what you are looking for.

 But updatable fields for searching is a different beast.

 A sample approach is documented at

 http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/

 The general problems with updatable postings-list AFAIK are

 1. Impossible to correctly score updated documents
 2. Segment Merges could miss out updates
 3. Might behave in-correctly with NRT
 4. Freq updates could end-up creating lots of files because of append-only
 nature of lucene...

 May be if you are not too worried about scoring, correct NRT behavior etc
 you can attempt a solution like the RedisCodec stuff...

 Segregating static  dynamic fields into 2 separate indexes as described
 here

 http://www.lucenerevolution.org/2013/Sidecar-Index-Solr-Components-for-Parallel-Index-Management
 may be of some use to you

 --
 Ravi



 On Wed, Jul 2, 2014 at 7:29 PM, Shai Erera ser...@gmail.com wrote:

  Using BinaryDocValues is not recommended for all scenarios. It is a
  catchall alternative to the other DocValues types. I would not use it
  unless it makes sense for your application, even if it means that you
 need
  to re-index a document in order to update a single field.
 
  DocValues are not good for search - by search I assume you mean take a
  query such as apache AND lucene and find all documents which contain
 both
  terms under the same field. They are good for sorting and faceting
 though.
 
  So I guess the answer to your question is it depends (it always is!) -
 I
  would use DocValues for sorting and faceting, but not for regular search
  queries. And I would use BinaryDocValues only when the other DocValues
  types don't match.
 
  Also, note that the current field-level update of DocValues is not always
  better than re-indexing the document, you can read here for more details:
 
 http://shaierera.blogspot.com/2014/04/benchmarking-updatable-docvalues.html
 
  Shai
 
 
  On Tue, Jul 1, 2014 at 9:17 PM, Sandeep Khanzode 
  sandeep_khanz...@yahoo.com.invalid wrote:
 
   Hi Shai,
  
   So one follow-up question.
  
   Assume that my use case is to have approx. ~50M documents indexed with
   each document having about ~10-15 indexed but not stored fields. These
   fields will never change, but there are another ~5-6 fields that will
   change and will continue to change after the index is written. These
 ~5-6
   fields may also be multivalued. The size of this index turns out to be
   ~120GB.
  
   In this case, I would like to sort or facet or search on these ~5-6
   fields. Which approach do you suggest? Should I use BinaryDocValues and
   update using IW or use either a ParallelReader/Join query.
  
   ---
   Thanks n Regards,
   Sandeep Ramesh Khanzode
  
  
   On Tuesday, July 1, 2014 9:53 PM, Shai Erera ser...@gmail.com wrote:
  
  
  
   Except that Lucene now offers efficient numeric and binary DocValues
   updates. See IndexWriter.updateNumeric/Binary...
  
   On Jul 1, 2014 5:51 PM, Erick Erickson erickerick...@gmail.com
  wrote:
  
This JIRA is complicated, don't really expect it in 4.9 as it's
been hanging around for quite a while. Everyone would like this,
but it's not easy.
   
Atomic updates will work, but you have to stored=true for all
source fields. Under the covers this actually reads the document
out of the stored fields, deletes the old one and adds it
over again.
   
FWIW,
Erick
   
On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
sandeep_khanz...@yahoo.com.invalid wrote:
 Hi,

 I wanted to know of the best approach to follow if a few fields in
 my
indexed documents are changing at run time (after index and before or
during search), but a majority of them are created at index time.

 I could see the JIRA given below but it is scheduled for Lucene
 4.9,
  I
believe.

 There are a few other approaches, like maintaining a separate index
  for
changing fields and use either a parallelreader or use a Join.

 Can everyone share their experience for this scenario on how it is
handled in your systems? Thanks,

 [LUCENE-4258] Incremental Field Updates through Stacked Segments -
  ASF
JIRA


  [LUCENE-4258] Incremental Field Updates through Stacked Segments -
  ASF
JIRA
 Shai and I would like to start working

Re: Incremental Field Updates

2014-07-02 Thread Shai Erera
Using BinaryDocValues is not recommended for all scenarios. It is a
catchall alternative to the other DocValues types. I would not use it
unless it makes sense for your application, even if it means that you need
to re-index a document in order to update a single field.

DocValues are not good for search - by search I assume you mean take a
query such as apache AND lucene and find all documents which contain both
terms under the same field. They are good for sorting and faceting though.

So I guess the answer to your question is it depends (it always is!) - I
would use DocValues for sorting and faceting, but not for regular search
queries. And I would use BinaryDocValues only when the other DocValues
types don't match.

Also, note that the current field-level update of DocValues is not always
better than re-indexing the document, you can read here for more details:
http://shaierera.blogspot.com/2014/04/benchmarking-updatable-docvalues.html

Shai


On Tue, Jul 1, 2014 at 9:17 PM, Sandeep Khanzode 
sandeep_khanz...@yahoo.com.invalid wrote:

 Hi Shai,

 So one follow-up question.

 Assume that my use case is to have approx. ~50M documents indexed with
 each document having about ~10-15 indexed but not stored fields. These
 fields will never change, but there are another ~5-6 fields that will
 change and will continue to change after the index is written. These ~5-6
 fields may also be multivalued. The size of this index turns out to be
 ~120GB.

 In this case, I would like to sort or facet or search on these ~5-6
 fields. Which approach do you suggest? Should I use BinaryDocValues and
 update using IW or use either a ParallelReader/Join query.

 ---
 Thanks n Regards,
 Sandeep Ramesh Khanzode


 On Tuesday, July 1, 2014 9:53 PM, Shai Erera ser...@gmail.com wrote:



 Except that Lucene now offers efficient numeric and binary DocValues
 updates. See IndexWriter.updateNumeric/Binary...

 On Jul 1, 2014 5:51 PM, Erick Erickson erickerick...@gmail.com wrote:

  This JIRA is complicated, don't really expect it in 4.9 as it's
  been hanging around for quite a while. Everyone would like this,
  but it's not easy.
 
  Atomic updates will work, but you have to stored=true for all
  source fields. Under the covers this actually reads the document
  out of the stored fields, deletes the old one and adds it
  over again.
 
  FWIW,
  Erick
 
  On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
  sandeep_khanz...@yahoo.com.invalid wrote:
   Hi,
  
   I wanted to know of the best approach to follow if a few fields in my
  indexed documents are changing at run time (after index and before or
  during search), but a majority of them are created at index time.
  
   I could see the JIRA given below but it is scheduled for Lucene 4.9, I
  believe.
  
   There are a few other approaches, like maintaining a separate index for
  changing fields and use either a parallelreader or use a Join.
  
   Can everyone share their experience for this scenario on how it is
  handled in your systems? Thanks,
  
   [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF
  JIRA
  
  
[LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF
  JIRA
   Shai and I would like to start working on the proposal to Incremental
  Field Updates outlined here (
 http://markmail.org/message/zhrdxxpfk6qvdaex
  ).
   View on issues.apache.org Preview by Yahoo
  
  
   ---
   Thanks n Regards,
   Sandeep Ramesh Khanzode
 
  -
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 


Incremental Field Updates

2014-07-01 Thread Sandeep Khanzode
Hi,

I wanted to know of the best approach to follow if a few fields in my indexed 
documents are changing at run time (after index and before or during search), 
but a majority of them are created at index time.

I could see the JIRA given below but it is scheduled for Lucene 4.9, I believe. 
 

There are a few other approaches, like maintaining a separate index for 
changing fields and use either a parallelreader or use a Join.

Can everyone share their experience for this scenario on how it is handled in 
your systems? Thanks,

[LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA

 
 [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA
Shai and I would like to start working on the proposal to Incremental Field 
Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).   
View on issues.apache.org Preview by Yahoo  
 
 
---
Thanks n Regards,
Sandeep Ramesh Khanzode

Re: Incremental Field Updates

2014-07-01 Thread Erick Erickson
This JIRA is complicated, don't really expect it in 4.9 as it's
been hanging around for quite a while. Everyone would like this,
but it's not easy.

Atomic updates will work, but you have to stored=true for all
source fields. Under the covers this actually reads the document
out of the stored fields, deletes the old one and adds it
over again.

FWIW,
Erick

On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
sandeep_khanz...@yahoo.com.invalid wrote:
 Hi,

 I wanted to know of the best approach to follow if a few fields in my indexed 
 documents are changing at run time (after index and before or during search), 
 but a majority of them are created at index time.

 I could see the JIRA given below but it is scheduled for Lucene 4.9, I 
 believe.

 There are a few other approaches, like maintaining a separate index for 
 changing fields and use either a parallelreader or use a Join.

 Can everyone share their experience for this scenario on how it is handled in 
 your systems? Thanks,

 [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA


  [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA
 Shai and I would like to start working on the proposal to Incremental Field 
 Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).
 View on issues.apache.org Preview by Yahoo


 ---
 Thanks n Regards,
 Sandeep Ramesh Khanzode

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Incremental Field Updates

2014-07-01 Thread Shai Erera
Except that Lucene now offers efficient numeric and binary DocValues
updates. See IndexWriter.updateNumeric/Binary...
On Jul 1, 2014 5:51 PM, Erick Erickson erickerick...@gmail.com wrote:

 This JIRA is complicated, don't really expect it in 4.9 as it's
 been hanging around for quite a while. Everyone would like this,
 but it's not easy.

 Atomic updates will work, but you have to stored=true for all
 source fields. Under the covers this actually reads the document
 out of the stored fields, deletes the old one and adds it
 over again.

 FWIW,
 Erick

 On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
 sandeep_khanz...@yahoo.com.invalid wrote:
  Hi,
 
  I wanted to know of the best approach to follow if a few fields in my
 indexed documents are changing at run time (after index and before or
 during search), but a majority of them are created at index time.
 
  I could see the JIRA given below but it is scheduled for Lucene 4.9, I
 believe.
 
  There are a few other approaches, like maintaining a separate index for
 changing fields and use either a parallelreader or use a Join.
 
  Can everyone share their experience for this scenario on how it is
 handled in your systems? Thanks,
 
  [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF
 JIRA
 
 
   [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF
 JIRA
  Shai and I would like to start working on the proposal to Incremental
 Field Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex
 ).
  View on issues.apache.org Preview by Yahoo
 
 
  ---
  Thanks n Regards,
  Sandeep Ramesh Khanzode

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: Incremental Field Updates

2014-07-01 Thread Sandeep Khanzode
Hi Shai,

So one follow-up question.

Assume that my use case is to have approx. ~50M documents indexed with each 
document having about ~10-15 indexed but not stored fields. These fields will 
never change, but there are another ~5-6 fields that will change and will 
continue to change after the index is written. These ~5-6 fields may also be 
multivalued. The size of this index turns out to be ~120GB. 

In this case, I would like to sort or facet or search on these ~5-6 fields. 
Which approach do you suggest? Should I use BinaryDocValues and update using IW 
or use either a ParallelReader/Join query.
 
---
Thanks n Regards,
Sandeep Ramesh Khanzode


On Tuesday, July 1, 2014 9:53 PM, Shai Erera ser...@gmail.com wrote:
 


Except that Lucene now offers efficient numeric and binary DocValues
updates. See IndexWriter.updateNumeric/Binary...

On Jul 1, 2014 5:51 PM, Erick Erickson erickerick...@gmail.com wrote:

 This JIRA is complicated, don't really expect it in 4.9 as it's
 been hanging around for quite a while. Everyone would like this,
 but it's not easy.

 Atomic updates will work, but you have to stored=true for all
 source fields. Under the covers this actually reads the document
 out of the stored fields, deletes the old one and adds it
 over again.

 FWIW,
 Erick

 On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
 sandeep_khanz...@yahoo.com.invalid wrote:
  Hi,
 
  I wanted to know of the best approach to follow if a few fields in my
 indexed documents are changing at run time (after index and before or
 during search), but a majority of them are created at index time.
 
  I could see the JIRA given below but it is scheduled for Lucene 4.9, I
 believe.
 
  There are a few other approaches, like maintaining a separate index for
 changing fields and use either a parallelreader or use a Join.
 
  Can everyone share their experience for this scenario on how it is
 handled in your systems? Thanks,
 
  [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF
 JIRA
 
 
   [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF
 JIRA
  Shai and I would like to start working on the proposal to Incremental
 Field Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex
 ).
  View on issues.apache.org Preview by Yahoo
 
 
  ---
  Thanks n Regards,
  Sandeep Ramesh Khanzode

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org