[ https://issues.apache.org/jira/browse/LUCENE-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423159#comment-13423159 ]
Robert Muir commented on LUCENE-4258: ------------------------------------- I don't think I'm sold on introducing the feature in steps. I think its critical for something of this magnitude that we figure out the design totally, up-front, so it will work for the major use-cases. I think its fine to implement in steps if we need though. Honestly I think we should throw it all out on the table and get to the real problems I think that most people face today: # For many document sizes, use-cases (especially rapidly changing stuff): The real problem is not the speed of lucene reindexing the document, its that the user must rebuild the entire document. Solr solved this by providing an option where you just say "update field X" and internally it reindexes the document from stored fields (for that feature to work, the whole thing must be stored). We shouldn't discard the possibility of implementing cleaner support for a solution like this, which wouldnt complicate indexwriter at all. # A second problem (not solved by the above) is that many people are using scoring factors with a variety of signals and these are changing often. I think unfortunately, people are often putting these in a normal indexed field and uninverting these on the fieldcache, requiring the whole document to be reindexed just because of how they implemented the scoring factor. People could instead solve this by putting their apps primary key into a docvalues field, allowing them to keep these scoring factors completely external to lucene (e.g. their own array or whatever), indexed by their own primary key. But the problem is I think people want lucene to manage this, they don't want to implement themselves whats necessary to make it consistent with commits etc. So we can look at several approaches to solving this stuff. I feel like both these problems could be solved via a contrib module without modifying indexwriter at all for many use cases: maybe better if we go for more tight integration. And with those simple approaches I describe above, searching doesn't get any slower. But if we really feel like we need a "full incremental update API" (i know there are a few use cases where it can help, I'm not discarding that), then I feel like there are a few things I want: * I want scoring to be correct: this is a must. If we provide a incremental update API on IW and it doesnt achieve the same thing as updateDocument today, then its broken. But I think its ok for things to be temporarily off (as long as this is in a consistent way) until merging takes place, just like deletes today. * I want to know for any incremental update API, the cost to search performance. I want to know, at what document size is any incremental update API actually faster than us just reindexing the document internally, and how much faster is it? I also want us to consider that compared to the slowdown in search performance. We should know what the tradeoffs are before committing such APIs. I strongly feel like if we just add these incremental APIs to indexwriter without being careful about these things, the end result could be that people use them without thinking and end out with slower search and worse relevance, thats why I am asking so many questions. > Incremental Field Updates through Stacked Segments > -------------------------------------------------- > > Key: LUCENE-4258 > URL: https://issues.apache.org/jira/browse/LUCENE-4258 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index > Reporter: Sivan Yogev > Original Estimate: 2,520h > Remaining Estimate: 2,520h > > Shai and I would like to start working on the proposal to Incremental Field > Updates outlined here (http://markmail.org/message/zhrdxxpfk6qvdaex). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org