[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1201#action_1201 ] markrmil...@gmail.com edited comment on LUCENE-1483 at 1/23/09 9:13 AM: -- bq. I was looking after the initial warmup, but noticed no difference. Maybe the string field I used was not distinct enough. What is a good number for a noticeable speed improve (50% distinct terms?). Hes not saying after the warm up, but that the warm up should be faster based on that. Its because of this: The old way, if you had 5 segments with unique terms distributions of 50,000, 6000, 6000, 5, 5, then for the old way, we would try to load all 62,010 terms for every segment - 5 x 5 -310,050. With the new way, we load 50,000 terms for the first, 6000 for the next, then 6000, then 5 and 5: total of 62,010. Even though most of the 50,000 wont be found in the 5 term segment, it still takes a long time to check them all. So the more unique terms and the more segments, the worse the problem got. *edit* little fix on those numbers was (Author: markrmil...@gmail.com): bq. I was looking after the initial warmup, but noticed no difference. Maybe the string field I used was not distinct enough. What is a good number for a noticeable speed improve (50% distinct terms?). Hes not saying after the warm up, but that the warm up should be faster based on that. Its because of this: The old way, if you had 5 segments with unique terms distributions of 50,000, 6000, 6000, 5, 5, then for the old way, we would try to load all 50,000 terms for every segment - 5 x 5 - 250,000. With the new way, we load 50,000 terms for the first, 6000 for the next, then 6000, then 5 and 5: total of 62,000. Even though most of the 50,000 wont be found in the 5 term segment, it still takes a long time to check them all. So the more unique terms and the more segments, the worse the problem got. Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector Key: LUCENE-1483 URL: https://issues.apache.org/jira/browse/LUCENE-1483 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Mark Miller Priority: Minor Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py This issue changes how an IndexSearcher searches over multiple segments. The current method of searching multiple segments is to use a MultiSegmentReader and treat all of the segments as one. This causes filters and FieldCaches to be keyed to the MultiReader and makes reopen expensive. If only a few segments change, the FieldCache is still loaded for all of them. This patch changes things by searching each individual segment one at a time, but sharing the HitCollector used across each segment. This allows FieldCaches and Filters to be keyed on individual SegmentReaders, making reopen much cheaper. FieldCache loading over multiple segments can be much faster as well - with the old method, all unique terms for every segment is enumerated against each segment - because of the likely logarithmic change in terms per segment, this can be very wasteful. Searching individual segments avoids this cost. The term/document statistics from the multireader are used to score results for each segment. When sorting, its more difficult to use a single HitCollector for each sub searcher. Ordinals are not comparable across segments. To account for this, a new field sort enabled HitCollector is introduced that is able to collect and sort across segments (because of its ability to compare ordinals across segments). This TopFieldCollector class will collect the values/ordinals for a given segment, and upon moving to the next segment, translate any ordinals/values so that they can be compared against the values for the new segment. This is done lazily. All and all, the switch seems to provide numerous performance benefits, in both sorted and non sorted search. We were seeing a good loss on indices with lots of segments (1000?) and certain
[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666801#action_12666801 ] thetaphi edited comment on LUCENE-1483 at 1/23/09 4:07 PM: bq. So null -- I cannot be split into sub-readers; empty array -- I am a null reader; array.length 0 -- I do have sequential sub-readers? This is a good optimization. If a MultiReader would return null instead of an empty array, it wouldn't be a problem (the empty reader would be searched with no results). But returning an empty array is better in this case. So gatherSubReaders() should only check for (null) and then add the parent reader itsself to the List and in all other cases do the recursion. was (Author: thetaphi): bq. So null -- I cannot be split into sub-readers; empty array -- I am a null reader; array.length 0 -- I do have sequential sub-readers? This is a good optimization. If a MultiReader would return null instead of an empty array, it wouldn't be a problem (the empty reader would be searched with no results). But returning an empty array is better in this case. So gatherSubReaders() should only check for (null) and then add the parent reader itsself to the List and in all other cases add the array contents maybe using List.addAll(Arrays.asList(array)) instead of the loop. Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector Key: LUCENE-1483 URL: https://issues.apache.org/jira/browse/LUCENE-1483 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Mark Miller Priority: Minor Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py This issue changes how an IndexSearcher searches over multiple segments. The current method of searching multiple segments is to use a MultiSegmentReader and treat all of the segments as one. This causes filters and FieldCaches to be keyed to the MultiReader and makes reopen expensive. If only a few segments change, the FieldCache is still loaded for all of them. This patch changes things by searching each individual segment one at a time, but sharing the HitCollector used across each segment. This allows FieldCaches and Filters to be keyed on individual SegmentReaders, making reopen much cheaper. FieldCache loading over multiple segments can be much faster as well - with the old method, all unique terms for every segment is enumerated against each segment - because of the likely logarithmic change in terms per segment, this can be very wasteful. Searching individual segments avoids this cost. The term/document statistics from the multireader are used to score results for each segment. When sorting, its more difficult to use a single HitCollector for each sub searcher. Ordinals are not comparable across segments. To account for this, a new field sort enabled HitCollector is introduced that is able to collect and sort across segments (because of its ability to compare ordinals across segments). This TopFieldCollector class will collect the values/ordinals for a given segment, and upon moving to the next segment, translate any ordinals/values so that they can be compared against the values for the new segment. This is done lazily. All and all, the switch seems to provide numerous performance benefits, in both sorted and non sorted search. We were seeing a good loss on indices with lots of segments (1000?) and certain queue sizes / queries, but the latest results seem to show thats been mostly taken care of (you shouldnt be using such a large queue on such a segmented index anyway). * Introduces ** MultiReaderHitCollector - a HitCollector that can collect across multiple IndexReaders. Old HitCollectors are wrapped to support multiple IndexReaders. ** TopFieldCollector - a HitCollector that can compare values/ordinals across IndexReaders and sort on fields. ** FieldValueHitQueue - a Priority queue that is part of the TopFieldCollector implementation. ** FieldComparator - a new
[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666163#action_12666163 ] markrmil...@gmail.com edited comment on LUCENE-1483 at 1/22/09 9:58 AM: -- Nice work Mike - pretty polished. I've spent a little time looking it over, but I'm going to look more tonight. Everything looking pretty good to me. Not sure what to name that new class, but here are some ideas: TopScoreDocCollector TopHitCollector TopResultCollector TopMatchCollector TopCollector TopScoreCollector Could be a low score, so that last one is odd, but I guess the low would kind of be the top... *edit* nevermind...I was thinking lowest score could be considered top match, but it wouldnt be the case with this hitcollector implementation, so I guess it makes as much sense as any of the others. was (Author: markrmil...@gmail.com): Nice work Mike - pretty polished. I've spent a little time looking it over, but I'm going to look more tonight. Everything looking pretty good to me. Not sure what to name that new class, but here are some ideas: TopScoreDocCollector TopHitCollector TopResultCollector TopMatchCollector TopCollector TopScoreCollector Could be a low score, so that last one is odd, but I guess the low would kind of be the top... Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector Key: LUCENE-1483 URL: https://issues.apache.org/jira/browse/LUCENE-1483 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Mark Miller Priority: Minor Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py This issue changes how an IndexSearcher searches over multiple segments. The current method of searching multiple segments is to use a MultiSegmentReader and treat all of the segments as one. This causes filters and FieldCaches to be keyed to the MultiReader and makes reopen expensive. If only a few segments change, the FieldCache is still loaded for all of them. This patch changes things by searching each individual segment one at a time, but sharing the HitCollector used across each segment. This allows FieldCaches and Filters to be keyed on individual SegmentReaders, making reopen much cheaper. FieldCache loading over multiple segments can be much faster as well - with the old method, all unique terms for every segment is enumerated against each segment - because of the likely logarithmic change in terms per segment, this can be very wasteful. Searching individual segments avoids this cost. The term/document statistics from the multireader are used to score results for each segment. When sorting, its more difficult to use a single HitCollector for each sub searcher. Ordinals are not comparable across segments. To account for this, a new field sort enabled HitCollector is introduced that is able to collect and sort across segments (because of its ability to compare ordinals across segments). This TopFieldCollector class will collect the values/ordinals for a given segment, and upon moving to the next segment, translate any ordinals/values so that they can be compared against the values for the new segment. This is done lazily. All and all, the switch seems to provide numerous performance benefits, in both sorted and non sorted search. We were seeing a good loss on indices with lots of segments (1000?) and certain queue sizes / queries, but the latest results seem to show thats been mostly taken care of (you shouldnt be using such a large queue on such a segmented index anyway). * Introduces ** MultiReaderHitCollector - a HitCollector that can collect across multiple IndexReaders. Old HitCollectors are wrapped to support multiple IndexReaders. ** TopFieldCollector - a HitCollector that can compare values/ordinals across IndexReaders and sort on fields. ** FieldValueHitQueue - a Priority queue that is part of the TopFieldCollector implementation. ** FieldComparator - a new Comparator class that works across IndexReaders. Part of the
[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12664984#action_12664984 ] markrmil...@gmail.com edited comment on LUCENE-1483 at 1/18/09 2:19 PM: -- My previous results had a few oddities going with them (I was loosely playing around). Being a little more careful, here is an example of the difference, and the hotspots. Timings are probably not completely comparable as my comp couldnt keep up profiling the second version very well - its much slower without profiling as well though: Index is 60 docs, 46 segments, 63849 unique terms. Load the fieldcache on one multireader ||method||time||invocations|| |FieldCacheImpl.createValue|156536(98%)|1| |MultiTermDocs.next()|148499(93.5%)|621803| |MutliTermDocs(int)|140397(88.4%)|1002938| |SegmentTermDocs.seek(Term)|138332(87.1%)|1002938| load the fieldcache on each sub reader of the multireader, one at a time ||method||time||invocations|| |FieldCacheImpl.createValue|7815(80.4%)|46| |SegmentTermDocs.next()|3315(34.1%)|642046| |SegmentTermEnum.next()|1936(19.9%)|42046| |SegmentTermDocs.seek(TermEnum)|874(9%)|42046| *edit* wrong values was (Author: markrmil...@gmail.com): My previous results had a few oddities going with them (I was loosely playing around). Being a little more careful, here is an example of the difference, and the hotspots. Timings are probably not completely comparable as my comp couldnt keep up profiling the second version very well - its much slower without profiling as well though: Index is 60 docs, 46 segments, 63849 unique terms. Load the fieldcache on one multireader ||method||time||invocations|| |FieldCacheImpl.createValue|156536(98%)|1| |MultiTermDocs.next()|148499(93.5%)|621803| |MutliTermDocs(int)|140397(88.4%)|1002938| |SegmentTermDocs.seek(Term)|138332(87.1%)|1002938| load the fieldcache on each sub reader of the multireader, one at a time ||method||time||invocations|| |FieldCacheImpl.createValue|7815(80.4%)|46| |SegmentTermDocs.next()|3315(34.1%)|642046| |SegmentTermEnum.next()|1936(19.9%)|42046| |SegmentTermDocs.seek(TermEnum)|874(9%)|42046| Unique terms per segment: 21312,41837,41843,41849,41854,41860,41865,41870,41878,41883,41888,41894,41902,41906,41910,41912,41916,41921,41924 41930,41932,41936,41943,41947,41951,41956,41960,41964,41970,41974,41979,41982,41989,41994,41999,42002,42005 42007,42011,42016,42020,42026,42033,42039,42044,42046 Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector Key: LUCENE-1483 URL: https://issues.apache.org/jira/browse/LUCENE-1483 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Mark Miller Priority: Minor Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py FieldCache and Filters are forced down to a single segment reader, allowing for individual segment reloading on reopen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12664984#action_12664984 ] markrmil...@gmail.com edited comment on LUCENE-1483 at 1/18/09 8:20 PM: -- My previous results had a few oddities going with them (I was loosely playing around). Being a little more careful, here is an example of the difference, and the hotspots. Timings are probably not completely comparable as my comp couldnt keep up profiling the second version very well - its much slower without profiling as well though: Index is 60 docs, 46 segments Load the fieldcache on one multireader ||method||time||invocations|| |FieldCacheImpl.createValue|156536(98%)|1| |MultiTermDocs.next()|148499(93.5%)|621803| |MutliTermDocs(int)|140397(88.4%)|1002938| |SegmentTermDocs.seek(Term)|138332(87.1%)|1002938| load the fieldcache on each sub reader of the multireader, one at a time ||method||time||invocations|| |FieldCacheImpl.createValue|7815(80.4%)|46| |SegmentTermDocs.next()|3315(34.1%)|642046| |SegmentTermEnum.next()|1936(19.9%)|42046| |SegmentTermDocs.seek(TermEnum)|874(9%)|42046| *edit* wrong values was (Author: markrmil...@gmail.com): My previous results had a few oddities going with them (I was loosely playing around). Being a little more careful, here is an example of the difference, and the hotspots. Timings are probably not completely comparable as my comp couldnt keep up profiling the second version very well - its much slower without profiling as well though: Index is 60 docs, 46 segments, 63849 unique terms. Load the fieldcache on one multireader ||method||time||invocations|| |FieldCacheImpl.createValue|156536(98%)|1| |MultiTermDocs.next()|148499(93.5%)|621803| |MutliTermDocs(int)|140397(88.4%)|1002938| |SegmentTermDocs.seek(Term)|138332(87.1%)|1002938| load the fieldcache on each sub reader of the multireader, one at a time ||method||time||invocations|| |FieldCacheImpl.createValue|7815(80.4%)|46| |SegmentTermDocs.next()|3315(34.1%)|642046| |SegmentTermEnum.next()|1936(19.9%)|42046| |SegmentTermDocs.seek(TermEnum)|874(9%)|42046| *edit* wrong values Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector Key: LUCENE-1483 URL: https://issues.apache.org/jira/browse/LUCENE-1483 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Mark Miller Priority: Minor Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py FieldCache and Filters are forced down to a single segment reader, allowing for individual segment reloading on reopen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663607#action_12663607 ] markrmil...@gmail.com edited comment on LUCENE-1483 at 1/13/09 7:03 PM: -- Disregarding any missing gains with those simple policies, the rest of those numbers actually look pretty good! Still some problems here and there (large queue size still sticky), but overall some solid gains as well. orddem seems to be best in most cases currently - maybe we can tweak that a little more somehow. Where its not better, or not much worse, is with a single segment. That result is interesting, because both policies beat it nicely, and its because they simpely use straight ord on the first segment. But ordsubord seems to outperform the policies. That doesn't make sense. Its largely the same, but should be a tad slower if anything. Other results match up so nicely, it seems like it might not be noise, in which case, weird. was (Author: markrmil...@gmail.com): Disregarding any any with those simple policies, the rest of those numbers actually look pretty good! Still some problems here and there (large queue size still sticky), but overall some solid gains as well. orddem seems to be best in most cases currently - maybe we can tweak that a little more somehow. Where its not better, or not much worse, is with a single segment. That result is interesting, because both policies beat it nicely, and its because they simple use straight ord on the first segment. But ordsubord seems to outperform the policies. That doesn't make sense. Its largely the same, but should be a tad slower if anything. Other results match up so nicely, it seems like it might not be noise, in which case, weird. Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector Key: LUCENE-1483 URL: https://issues.apache.org/jira/browse/LUCENE-1483 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Mark Miller Priority: Minor Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py FieldCache and Filters are forced down to a single segment reader, allowing for individual segment reloading on reopen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662038#action_12662038 ] markrmil...@gmail.com edited comment on LUCENE-1483 at 1/8/09 9:15 AM: - Its the ORDSUBORD again (which I don't think we will use) and the two Policies. Odd because its the last hit of 10 that fails for all 3. I'll ferret it out tonight. - Mark *EDIT* yup...always the last entry thats wrong no matter the queue size - for all 3, which is odd because ORD_SUBORD doesnt have too much of a relationship to the two policies. Will be a fun one. was (Author: markrmil...@gmail.com): Its the ORDSUBORD again (which I don't think we will use) and the two Policies. Odd because its the last hit of 10 that fails for all 3. I'll ferret it out tonight. - Mark Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector Key: LUCENE-1483 URL: https://issues.apache.org/jira/browse/LUCENE-1483 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Mark Miller Priority: Minor Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py FieldCache and Filters are forced down to a single segment reader, allowing for individual segment reloading on reopen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12661160#action_12661160 ] markrmil...@gmail.com edited comment on LUCENE-1483 at 1/6/09 6:57 AM: - bq. Mark, I see 3 testcase failures in TestSort if I pretend that SortField.STRING means STRING_ORD - do you see that? Yeah, sorry. That STRING_ORD custom comparator policy is just a joke really, so I only really tested it on the StringSort test. It's just not initing the ords along with the values on switching. Making ords package private so that it can be changed (and changing it) fixes things. Not sure about new constructors or package private for that part of the switch... bq. I think we should fix TestSort so that it runs N times, each time using a different STRING sort method, to make sure we are covering all these methods? Yeah, this makes sense in any case. I just keep switching them by hand as I work on them. was (Author: markrmil...@gmail.com): bq. Mark, I see 3 testcase failures in TestSort if I pretend that SortField.STRING means STRING_ORD - do you see that? Yeah, sorry. That STRING_ORD custom comparator is just a joke really, so I only really tested it on the StringSort test. It's just not initing the ords along with the values on switching. Making ords package private so that it can be changed (and changing it) fixes things. Not sure about new constructors or package private for that part of the switch... bq. I think we should fix TestSort so that it runs N times, each time using a different STRING sort method, to make sure we are covering all these methods? Yeah, this makes sense in any case. I just keep switching them by hand as I work on them. Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector Key: LUCENE-1483 URL: https://issues.apache.org/jira/browse/LUCENE-1483 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Mark Miller Priority: Minor Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py FieldCache and Filters are forced down to a single segment reader, allowing for individual segment reloading on reopen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12660322#action_12660322 ] markrmil...@gmail.com edited comment on LUCENE-1483 at 1/2/09 6:24 AM: - So what looks like a promising strategy? Off the top I am thinking something as simple as: start with ORD with no fallback on the largest. if the next segments are fairly large, use ORD_VAL if the segments get somewhat smaller, move to ORD_DEM Oddly, I've seen VAL perform well in certain situations, so maybe it has its place, but I don't know where yet. *edit* Oh, yeah, queue size should also play a roll in the switching was (Author: markrmil...@gmail.com): So what looks like a promising strategy? Off the top I am thinking something as simple as: start with ORD with no fallback on the largest. if the next segments are fairly large, use ORD_VAL if the segments get somewhat smaller, move to ORD_DEM Oddly, I've seen VAL perform well in certain situations, so maybe it has its place, but I don't know where yet. Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector Key: LUCENE-1483 URL: https://issues.apache.org/jira/browse/LUCENE-1483 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Mark Miller Priority: Minor Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py FieldCache and Filters are forced down to a single segment reader, allowing for individual segment reloading on reopen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org