[
https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423159#comment-13423159
]
Robert Muir commented on LUCENE-4258:
-------------------------------------
I don't think I'm sold on introducing the feature in steps.
I think its critical for something of this magnitude that we figure out the
design totally, up-front,
so it will work for the major use-cases. I think its fine to implement in steps
if we need though.
Honestly I think we should throw it all out on the table and get to the real
problems I think
that most people face today:
# For many document sizes, use-cases (especially rapidly changing stuff): The
real problem is not the
speed of lucene reindexing the document, its that the user must rebuild the
entire document. Solr solved
this by providing an option where you just say "update field X" and
internally it reindexes the
document from stored fields (for that feature to work, the whole thing must
be stored). We shouldn't
discard the possibility of implementing cleaner support for a solution like
this, which wouldnt
complicate indexwriter at all.
# A second problem (not solved by the above) is that many people are using
scoring factors with a variety
of signals and these are changing often. I think unfortunately, people are
often putting these in
a normal indexed field and uninverting these on the fieldcache, requiring the
whole document to
be reindexed just because of how they implemented the scoring factor. People
could instead solve this
by putting their apps primary key into a docvalues field, allowing them to
keep these scoring factors
completely external to lucene (e.g. their own array or whatever), indexed by
their own primary key. But
the problem is I think people want lucene to manage this, they don't want to
implement themselves whats
necessary to make it consistent with commits etc.
So we can look at several approaches to solving this stuff. I feel like both
these problems could be
solved via a contrib module without modifying indexwriter at all for many use
cases: maybe better if
we go for more tight integration. And with those simple approaches I describe
above, searching doesn't
get any slower.
But if we really feel like we need a "full incremental update API" (i know
there are a few use cases
where it can help, I'm not discarding that), then I feel like there are a few
things I want:
* I want scoring to be correct: this is a must. If we provide a incremental
update API on IW and it doesnt
achieve the same thing as updateDocument today, then its broken. But I think
its ok for things to
be temporarily off (as long as this is in a consistent way) until merging
takes place, just like
deletes today.
* I want to know for any incremental update API, the cost to search performance.
I want to know, at what document size is any incremental update API actually
faster than us just
reindexing the document internally, and how much faster is it? I also want us
to consider that
compared to the slowdown in search performance. We should know what the
tradeoffs are before committing
such APIs.
I strongly feel like if we just add these incremental APIs to indexwriter
without being careful about these
things, the end result could be that people use them without thinking and end
out with slower search
and worse relevance, thats why I am asking so many questions.
> Incremental Field Updates through Stacked Segments
> --------------------------------------------------
>
> Key: LUCENE-4258
> URL: https://issues.apache.org/jira/browse/LUCENE-4258
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index
> Reporter: Sivan Yogev
> Original Estimate: 2,520h
> Remaining Estimate: 2,520h
>
> Shai and I would like to start working on the proposal to Incremental Field
> Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]