[ https://issues.apache.org/jira/browse/LUCENE-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425182#comment-13425182 ]
Robert Muir commented on LUCENE-4272: ------------------------------------- Well I think there are a few other advantages: complexity, e.g. not having to stack segments keeps the number of "dimensions" the same. The general structure of the index would be unchanged as well. to IndexSearcher/Similarity/etc everything would appear just as if someone had deleted and re-added completely like today: this means we dont have to change our search APIs to have maxDoc(field) or anything else: scoring works just fine. it seems possible we could support tryXXX incremental updates by docid via just like LUCENE-4203 too, though thats just an optimization. as far as tiny fields on otherwise massive docs, i think we can break this down into 3 layers: # document 'build' <-- retrieving from your SQL database / sending over the wire / etc # field 'analyze' <-- actually doing the text analysis etc on the doc # field 'indexing' <-- consuming the already-analyzed pieces thru the indexer chain/codec flush/etc Today people 'pay' for 1, 2, and 3. If they use the solr/es approach they only pay 2 and 3 I think? With this approach its just 3. I think for the vast majority of apps it will be fast enough, as I am totally convinced 1 and 2 are the biggest burden on people. I think these are totally possible to fix without hurting search performance. I cant imagine many real world apps where its 3, not 1 and 2, that are their bottleneck AND they are willing to trade off significant search performance for that. > another idea for updatable fields > --------------------------------- > > Key: LUCENE-4272 > URL: https://issues.apache.org/jira/browse/LUCENE-4272 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Robert Muir > > I've been reviewing the ideas for updatable fields and have an alternative > proposal that I think would address my biggest concern: > * not slowing down searching > When I look at what Solr and Elasticsearch do here, by basically reindexing > from stored fields, I think they solve a lot of the problem: users don't have > to "rebuild" their document from scratch just to update one tiny piece. > But I think we can do this more efficiently: by avoiding reindexing of the > unaffected fields. > The basic idea is that we would require term vectors for this approach (as > the already store a serialized indexed version of the doc), and so we could > just take the other pieces from the existing vectors for the doc. > I think we would have to extend vectors to also store the norm (so we dont > recompute that), and payloads, but it seems feasible at a glance. > I dont think we should discard the idea because vectors are slow/big today, > this seems like something we could fix. > Personally I like the idea of not slowing down search performance to solve > the problem, I think we should really start from that angle and work towards > making the indexing side more efficient, not vice-versa. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org