subject:"Incremental Field Updates"

Re: Incremental Field Updates

2014-07-08 Thread Ravikumar Govindarajan

 for this scenario on how it
 is
 handled in your systems? Thanks,
 
  [LUCENE-4258] Incremental Field Updates through Stacked Segments
 -
   ASF
 JIRA
 
 
   [LUCENE-4258] Incremental Field Updates through Stacked
 Segments -
   ASF
 JIRA
  Shai and I would like to start working on the proposal to
  Incremental
 Field Updates outlined here (
http://markmail.org/message/zhrdxxpfk6qvdaex
 ).
  View on issues.apache.org Preview by Yahoo
 
 
  ---
  Thanks n Regards,
  Sandeep Ramesh Khanzode


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 javascript:;
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 javascript:;

Re: Incremental Field Updates

2014-07-03 Thread Ravikumar Govindarajan

In case of sorting, updatable DocValues may be what you are looking for.

But updatable fields for searching is a different beast.

A sample approach is documented at
http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/

The general problems with updatable postings-list AFAIK are

1. Impossible to correctly score updated documents
2. Segment Merges could miss out updates
3. Might behave in-correctly with NRT
4. Freq updates could end-up creating lots of files because of append-only
nature of lucene...

May be if you are not too worried about scoring, correct NRT behavior etc
you can attempt a solution like the RedisCodec stuff...

Segregating static dynamic fields into 2 separate indexes as described
here
http://www.lucenerevolution.org/2013/Sidecar-Index-Solr-Components-for-Parallel-Index-Management
may be of some use to you

--
Ravi

On Wed, Jul 2, 2014 at 7:29 PM, Shai Erera ser...@gmail.com wrote:

Using BinaryDocValues is not recommended for all scenarios. It is a
catchall alternative to the other DocValues types. I would not use it
unless it makes sense for your application, even if it means that you need
to re-index a document in order to update a single field.

DocValues are not good for search - by search I assume you mean take a
query such as apache AND lucene and find all documents which contain both
terms under the same field. They are good for sorting and faceting though.

So I guess the answer to your question is it depends (it always is!) - I
would use DocValues for sorting and faceting, but not for regular search
queries. And I would use BinaryDocValues only when the other DocValues
types don't match.

Also, note that the current field-level update of DocValues is not always
better than re-indexing the document, you can read here for more details:
http://shaierera.blogspot.com/2014/04/benchmarking-updatable-docvalues.html

Shai

On Tue, Jul 1, 2014 at 9:17 PM, Sandeep Khanzode
sandeep_khanz...@yahoo.com.invalid wrote:

Hi Shai,

So one follow-up question.

Assume that my use case is to have approx. ~50M documents indexed with
each document having about ~10-15 indexed but not stored fields. These
fields will never change, but there are another ~5-6 fields that will
change and will continue to change after the index is written. These ~5-6
fields may also be multivalued. The size of this index turns out to be
~120GB.

In this case, I would like to sort or facet or search on these ~5-6
fields. Which approach do you suggest? Should I use BinaryDocValues and
update using IW or use either a ParallelReader/Join query.

---
Thanks n Regards,
Sandeep Ramesh Khanzode

On Tuesday, July 1, 2014 9:53 PM, Shai Erera ser...@gmail.com wrote:

Except that Lucene now offers efficient numeric and binary DocValues
updates. See IndexWriter.updateNumeric/Binary...

On Jul 1, 2014 5:51 PM, Erick Erickson erickerick...@gmail.com
wrote:

This JIRA is complicated, don't really expect it in 4.9 as it's
been hanging around for quite a while. Everyone would like this,
but it's not easy.

Atomic updates will work, but you have to stored=true for all
source fields. Under the covers this actually reads the document
out of the stored fields, deletes the old one and adds it
over again.

FWIW,
Erick

On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
sandeep_khanz...@yahoo.com.invalid wrote:
Hi,

I wanted to know of the best approach to follow if a few fields in my
indexed documents are changing at run time (after index and before or
during search), but a majority of them are created at index time.

I could see the JIRA given below but it is scheduled for Lucene 4.9,
I
believe.

There are a few other approaches, like maintaining a separate index
for
changing fields and use either a parallelreader or use a Join.

Can everyone share their experience for this scenario on how it is
handled in your systems? Thanks,

[LUCENE-4258] Incremental Field Updates through Stacked Segments -
ASF
JIRA

[LUCENE-4258] Incremental Field Updates through Stacked Segments -
ASF
JIRA
Shai and I would like to start working on the proposal to Incremental
Field Updates outlined here (
http://markmail.org/message/zhrdxxpfk6qvdaex
).
View on issues.apache.org Preview by Yahoo

---
Thanks n Regards,
Sandeep Ramesh Khanzode

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Incremental Field Updates

2014-07-03 Thread Gopal Patwa

Thanks Ravi, it is good to know general problem with updatable field. In
our use-case where we have few fields which update more frequently then
main index. We are using this SOLR join contrib patch with DocTransformer
for returning data from join core. But this approach has some performance
impact if that performance hit acceptable for your use use-case then you
can give a try if you are using SOLR.

https://issues.apache.org/jira/browse/SOLR-4787

On Thu, Jul 3, 2014 at 3:22 AM, Ravikumar Govindarajan
ravikumar.govindara...@gmail.com wrote:

In case of sorting, updatable DocValues may be what you are looking for.

But updatable fields for searching is a different beast.

A sample approach is documented at

http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/

The general problems with updatable postings-list AFAIK are

May be if you are not too worried about scoring, correct NRT behavior etc
you can attempt a solution like the RedisCodec stuff...

Segregating static dynamic fields into 2 separate indexes as described
here

http://www.lucenerevolution.org/2013/Sidecar-Index-Solr-Components-for-Parallel-Index-Management
may be of some use to you

--
Ravi

On Wed, Jul 2, 2014 at 7:29 PM, Shai Erera ser...@gmail.com wrote:

Using BinaryDocValues is not recommended for all scenarios. It is a
catchall alternative to the other DocValues types. I would not use it
unless it makes sense for your application, even if it means that you
need
to re-index a document in order to update a single field.

DocValues are not good for search - by search I assume you mean take a
query such as apache AND lucene and find all documents which contain
both
terms under the same field. They are good for sorting and faceting
though.

So I guess the answer to your question is it depends (it always is!) -
I
would use DocValues for sorting and faceting, but not for regular search
queries. And I would use BinaryDocValues only when the other DocValues
types don't match.

Also, note that the current field-level update of DocValues is not always
better than re-indexing the document, you can read here for more details:

http://shaierera.blogspot.com/2014/04/benchmarking-updatable-docvalues.html

Shai

On Tue, Jul 1, 2014 at 9:17 PM, Sandeep Khanzode
sandeep_khanz...@yahoo.com.invalid wrote:

Hi Shai,

So one follow-up question.

Assume that my use case is to have approx. ~50M documents indexed with
each document having about ~10-15 indexed but not stored fields. These
fields will never change, but there are another ~5-6 fields that will
change and will continue to change after the index is written. These
~5-6
fields may also be multivalued. The size of this index turns out to be
~120GB.

In this case, I would like to sort or facet or search on these ~5-6
fields. Which approach do you suggest? Should I use BinaryDocValues and
update using IW or use either a ParallelReader/Join query.

---
Thanks n Regards,
Sandeep Ramesh Khanzode

On Tuesday, July 1, 2014 9:53 PM, Shai Erera ser...@gmail.com wrote:

Except that Lucene now offers efficient numeric and binary DocValues
updates. See IndexWriter.updateNumeric/Binary...

On Jul 1, 2014 5:51 PM, Erick Erickson erickerick...@gmail.com
wrote:

This JIRA is complicated, don't really expect it in 4.9 as it's
been hanging around for quite a while. Everyone would like this,
but it's not easy.

Atomic updates will work, but you have to stored=true for all
source fields. Under the covers this actually reads the document
out of the stored fields, deletes the old one and adds it
over again.

FWIW,
Erick

On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
sandeep_khanz...@yahoo.com.invalid wrote:
Hi,

I wanted to know of the best approach to follow if a few fields in
my
indexed documents are changing at run time (after index and before or
during search), but a majority of them are created at index time.

I could see the JIRA given below but it is scheduled for Lucene
4.9,
I
believe.

There are a few other approaches, like maintaining a separate index
for
changing fields and use either a parallelreader or use a Join.

Can everyone share their experience for this scenario on how it is
handled in your systems? Thanks,

[LUCENE-4258] Incremental Field Updates through Stacked Segments -
ASF
JIRA

[LUCENE-4258] Incremental Field Updates through Stacked Segments -
ASF
JIRA
Shai and I would like to start working

Re: Incremental Field Updates

2014-07-02 Thread Shai Erera

Using BinaryDocValues is not recommended for all scenarios. It is a
catchall alternative to the other DocValues types. I would not use it
unless it makes sense for your application, even if it means that you need
to re-index a document in order to update a single field.

DocValues are not good for search - by search I assume you mean take a
query such as apache AND lucene and find all documents which contain both
terms under the same field. They are good for sorting and faceting though.

So I guess the answer to your question is it depends (it always is!) - I
would use DocValues for sorting and faceting, but not for regular search
queries. And I would use BinaryDocValues only when the other DocValues
types don't match.

Also, note that the current field-level update of DocValues is not always
better than re-indexing the document, you can read here for more details:
http://shaierera.blogspot.com/2014/04/benchmarking-updatable-docvalues.html

Shai


On Tue, Jul 1, 2014 at 9:17 PM, Sandeep Khanzode 
sandeep_khanz...@yahoo.com.invalid wrote:

 Hi Shai,

 So one follow-up question.

 Assume that my use case is to have approx. ~50M documents indexed with
 each document having about ~10-15 indexed but not stored fields. These
 fields will never change, but there are another ~5-6 fields that will
 change and will continue to change after the index is written. These ~5-6
 fields may also be multivalued. The size of this index turns out to be
 ~120GB.

 In this case, I would like to sort or facet or search on these ~5-6
 fields. Which approach do you suggest? Should I use BinaryDocValues and
 update using IW or use either a ParallelReader/Join query.

 ---
 Thanks n Regards,
 Sandeep Ramesh Khanzode


 On Tuesday, July 1, 2014 9:53 PM, Shai Erera ser...@gmail.com wrote:



 Except that Lucene now offers efficient numeric and binary DocValues
 updates. See IndexWriter.updateNumeric/Binary...

 On Jul 1, 2014 5:51 PM, Erick Erickson erickerick...@gmail.com wrote:

  This JIRA is complicated, don't really expect it in 4.9 as it's
  been hanging around for quite a while. Everyone would like this,
  but it's not easy.
 
  Atomic updates will work, but you have to stored=true for all
  source fields. Under the covers this actually reads the document
  out of the stored fields, deletes the old one and adds it
  over again.
 
  FWIW,
  Erick
 
  On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
  sandeep_khanz...@yahoo.com.invalid wrote:
   Hi,
  
   I wanted to know of the best approach to follow if a few fields in my
  indexed documents are changing at run time (after index and before or
  during search), but a majority of them are created at index time.
  
   I could see the JIRA given below but it is scheduled for Lucene 4.9, I
  believe.
  
   There are a few other approaches, like maintaining a separate index for
  changing fields and use either a parallelreader or use a Join.
  
   Can everyone share their experience for this scenario on how it is
  handled in your systems? Thanks,
  
   [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF
  JIRA
  
  
[LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF
  JIRA
   Shai and I would like to start working on the proposal to Incremental
  Field Updates outlined here (
 http://markmail.org/message/zhrdxxpfk6qvdaex
  ).
   View on issues.apache.org Preview by Yahoo
  
  
   ---
   Thanks n Regards,
   Sandeep Ramesh Khanzode
 
  -
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-user-h...@lucene.apache.org

Incremental Field Updates

2014-07-01 Thread Sandeep Khanzode

Hi,

I wanted to know of the best approach to follow if a few fields in my indexed 
documents are changing at run time (after index and before or during search), 
but a majority of them are created at index time.

I could see the JIRA given below but it is scheduled for Lucene 4.9, I believe. 
 

There are a few other approaches, like maintaining a separate index for 
changing fields and use either a parallelreader or use a Join.

Can everyone share their experience for this scenario on how it is handled in 
your systems? Thanks,

[LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA

 
 [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA
Shai and I would like to start working on the proposal to Incremental Field 
Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).   
View on issues.apache.org Preview by Yahoo  
 
 
---
Thanks n Regards,
Sandeep Ramesh Khanzode

Re: Incremental Field Updates

2014-07-01 Thread Erick Erickson

This JIRA is complicated, don't really expect it in 4.9 as it's
been hanging around for quite a while. Everyone would like this,
but it's not easy.

Atomic updates will work, but you have to stored=true for all
source fields. Under the covers this actually reads the document
out of the stored fields, deletes the old one and adds it
over again.

FWIW,
Erick

On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
sandeep_khanz...@yahoo.com.invalid wrote:
 Hi,

 I wanted to know of the best approach to follow if a few fields in my indexed 
 documents are changing at run time (after index and before or during search), 
 but a majority of them are created at index time.

 I could see the JIRA given below but it is scheduled for Lucene 4.9, I 
 believe.

 There are a few other approaches, like maintaining a separate index for 
 changing fields and use either a parallelreader or use a Join.

 Can everyone share their experience for this scenario on how it is handled in 
 your systems? Thanks,

 [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA


  [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF JIRA
 Shai and I would like to start working on the proposal to Incremental Field 
 Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).
 View on issues.apache.org Preview by Yahoo


 ---
 Thanks n Regards,
 Sandeep Ramesh Khanzode

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Incremental Field Updates

2014-07-01 Thread Shai Erera

Except that Lucene now offers efficient numeric and binary DocValues
updates. See IndexWriter.updateNumeric/Binary...
On Jul 1, 2014 5:51 PM, Erick Erickson erickerick...@gmail.com wrote:

 This JIRA is complicated, don't really expect it in 4.9 as it's
 been hanging around for quite a while. Everyone would like this,
 but it's not easy.

 Atomic updates will work, but you have to stored=true for all
 source fields. Under the covers this actually reads the document
 out of the stored fields, deletes the old one and adds it
 over again.

 FWIW,
 Erick

 On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
 sandeep_khanz...@yahoo.com.invalid wrote:
  Hi,
 
  I wanted to know of the best approach to follow if a few fields in my
 indexed documents are changing at run time (after index and before or
 during search), but a majority of them are created at index time.
 
  I could see the JIRA given below but it is scheduled for Lucene 4.9, I
 believe.
 
  There are a few other approaches, like maintaining a separate index for
 changing fields and use either a parallelreader or use a Join.
 
  Can everyone share their experience for this scenario on how it is
 handled in your systems? Thanks,
 
  [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF
 JIRA
 
 
   [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF
 JIRA
  Shai and I would like to start working on the proposal to Incremental
 Field Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex
 ).
  View on issues.apache.org Preview by Yahoo
 
 
  ---
  Thanks n Regards,
  Sandeep Ramesh Khanzode

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Incremental Field Updates

2014-07-01 Thread Sandeep Khanzode

Hi Shai,

So one follow-up question.

Assume that my use case is to have approx. ~50M documents indexed with each 
document having about ~10-15 indexed but not stored fields. These fields will 
never change, but there are another ~5-6 fields that will change and will 
continue to change after the index is written. These ~5-6 fields may also be 
multivalued. The size of this index turns out to be ~120GB. 

In this case, I would like to sort or facet or search on these ~5-6 fields. 
Which approach do you suggest? Should I use BinaryDocValues and update using IW 
or use either a ParallelReader/Join query.
 
---
Thanks n Regards,
Sandeep Ramesh Khanzode


On Tuesday, July 1, 2014 9:53 PM, Shai Erera ser...@gmail.com wrote:
 


Except that Lucene now offers efficient numeric and binary DocValues
updates. See IndexWriter.updateNumeric/Binary...

On Jul 1, 2014 5:51 PM, Erick Erickson erickerick...@gmail.com wrote:

 This JIRA is complicated, don't really expect it in 4.9 as it's
 been hanging around for quite a while. Everyone would like this,
 but it's not easy.

 Atomic updates will work, but you have to stored=true for all
 source fields. Under the covers this actually reads the document
 out of the stored fields, deletes the old one and adds it
 over again.

 FWIW,
 Erick

 On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
 sandeep_khanz...@yahoo.com.invalid wrote:
  Hi,
 
  I wanted to know of the best approach to follow if a few fields in my
 indexed documents are changing at run time (after index and before or
 during search), but a majority of them are created at index time.
 
  I could see the JIRA given below but it is scheduled for Lucene 4.9, I
 believe.
 
  There are a few other approaches, like maintaining a separate index for
 changing fields and use either a parallelreader or use a Join.
 
  Can everyone share their experience for this scenario on how it is
 handled in your systems? Thanks,
 
  [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF
 JIRA
 
 
   [LUCENE-4258] Incremental Field Updates through Stacked Segments - ASF
 JIRA
  Shai and I would like to start working on the proposal to Incremental
 Field Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex
 ).
  View on issues.apache.org Preview by Yahoo
 
 
  ---
  Thanks n Regards,
  Sandeep Ramesh Khanzode

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Incremental Field Updates

Re: Incremental Field Updates

Re: Incremental Field Updates

Re: Incremental Field Updates

Incremental Field Updates

Re: Incremental Field Updates

Re: Incremental Field Updates

Re: Incremental Field Updates

8 matches

Site Navigation

Mail list logo

Footer information