[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664965#action_12664965 ]
Mark Miller commented on LUCENE-1483: ------------------------------------- I think its pretty costly even for non id type fields. In your enum case, their are what, 50 unique values? Even still, you are seeing like a 40% diff, but small enough times to not matter. My test example has 20,000 unique terms for 600,000 documents (lots of overlap, 2-8 char strings, 1-9, I think), so quite a bit short of a primary key - but it still was WAY faster with the new method. Old method non optimized, 79 segments - 1.5 million seeks, WAY slow. Old method, optimized, 1 segment - 20,000 seeks, pretty darn fast. New method, non optimized, 79 segments - 40,000 seeks, pretty darn fast. bq. While there is a big difference between searching a single segment vs multisegments for these things, we already knew about that - thats why you optimize. {quote}Right, but for realtime search you don't have the luxury of optimizing. This patch makes warming time after reopen much faster for a many-segment index for apps that use FieldCache with mostly unique String fields.{quote} Right, I got you - I know we can't optimize. I was just realizing that explaining why 100 segments was so slow was not explaining why the new method on 100 segments was so fast. I still don't think I fully have why that is. I don't think getting to use the unique terms at each segment saves enough seeks for what I am seeing. Especially in this test case, the terms should be pretty evenly distributed across segments... > Change IndexSearcher multisegment searches to search each individual segment > using a single HitCollector > -------------------------------------------------------------------------------------------------------- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 2.9 > Reporter: Mark Miller > Priority: Minor > Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > sortBench.py, sortCollate.py > > > FieldCache and Filters are forced down to a single segment reader, allowing > for individual segment reloading on reopen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org