[ 
https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423159#comment-13423159
 ] 

Robert Muir commented on LUCENE-4258:
-------------------------------------

I don't think I'm sold on introducing the feature in steps.

I think its critical for something of this magnitude that we figure out the 
design totally, up-front,
so it will work for the major use-cases. I think its fine to implement in steps 
if we need though.

Honestly I think we should throw it all out on the table and get to the real 
problems I think
that most people face today:
# For many document sizes, use-cases (especially rapidly changing stuff): The 
real problem is not the
  speed of lucene reindexing the document, its that the user must rebuild the 
entire document. Solr solved
  this by providing an option where you just say "update field X" and 
internally it reindexes the
  document from stored fields (for that feature to work, the whole thing must 
be stored). We shouldn't
  discard the possibility of implementing cleaner support for a solution like 
this, which wouldnt 
  complicate indexwriter at all.
# A second problem (not solved by the above) is that many people are using 
scoring factors with a variety
  of signals and these are changing often. I think unfortunately, people are 
often putting these in
  a normal indexed field and uninverting these on the fieldcache, requiring the 
whole document to
  be reindexed just because of how they implemented the scoring factor. People 
could instead solve this
  by putting their apps primary key into a docvalues field, allowing them to 
keep these scoring factors
  completely external to lucene (e.g. their own array or whatever), indexed by 
their own primary key. But
  the problem is I think people want lucene to manage this, they don't want to 
implement themselves whats
  necessary to make it consistent with commits etc.

So we can look at several approaches to solving this stuff. I feel like both 
these problems could be
solved via a contrib module without modifying indexwriter at all for many use 
cases: maybe better if
we go for more tight integration. And with those simple approaches I describe 
above, searching doesn't
get any slower.

But if we really feel like we need a "full incremental update API" (i know 
there are a few use cases
where it can help, I'm not discarding that), then I feel like there are a few 
things I want:
* I want scoring to be correct: this is a must. If we provide a incremental 
update API on IW and it doesnt
  achieve the same thing as updateDocument today, then its broken. But I think 
its ok for things to 
  be temporarily off (as long as this is in a consistent way) until merging 
takes place, just like 
  deletes today.
* I want to know for any incremental update API, the cost to search performance.
  I want to know, at what document size is any incremental update API actually 
faster than us just 
  reindexing the document internally, and how much faster is it? I also want us 
to consider that
  compared to the slowdown in search performance. We should know what the 
tradeoffs are before committing 
  such APIs.

I strongly feel like if we just add these incremental APIs to indexwriter 
without being careful about these
things, the end result could be that people use them without thinking and end 
out with slower search
and worse relevance, thats why I am asking so many questions.

                
> Incremental Field Updates through Stacked Segments
> --------------------------------------------------
>
>                 Key: LUCENE-4258
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4258
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Sivan Yogev
>   Original Estimate: 2,520h
>  Remaining Estimate: 2,520h
>
> Shai and I would like to start working on the proposal to Incremental Field 
> Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to