[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1201#action_1201
 ] 

markrmil...@gmail.com edited comment on LUCENE-1483 at 1/23/09 9:13 AM:
--

bq. I was looking after the initial warmup, but noticed no difference. Maybe 
the string field I used was not distinct enough. What is a good number for a 
noticeable speed improve (50% distinct terms?).

Hes not saying after the warm up, but that the warm up should be faster based 
on that.

Its because of this:

The old way, if you had 5 segments with unique terms distributions of 50,000, 
6000, 6000, 5, 5, then for the old way, we would try to load all 62,010 terms 
for every segment - 5 x 5 -310,050.

With the new way, we load 50,000 terms for the first, 6000 for the next, then 
6000, then 5 and 5: total of 62,010.

Even though most of the 50,000 wont be found in the 5 term segment, it still 
takes a long time to check them all. So the more unique terms and the more 
segments, the worse the problem got.

*edit*
little fix on those numbers

  was (Author: markrmil...@gmail.com):
bq. I was looking after the initial warmup, but noticed no difference. 
Maybe the string field I used was not distinct enough. What is a good number 
for a noticeable speed improve (50% distinct terms?).

Hes not saying after the warm up, but that the warm up should be faster based 
on that.

Its because of this:

The old way, if you had 5 segments with unique terms distributions of 50,000, 
6000, 6000, 5, 5, then for the old way, we would try to load all 50,000 terms 
for every segment - 5 x 5 - 250,000.

With the new way, we load 50,000 terms for the first, 6000 for the next, then 
6000, then 5 and 5: total of 62,000.

Even though most of the 50,000 wont be found in the 5 term segment, it still 
takes a long time to check them all. So the more unique terms and the more 
segments, the worse the problem got.
  
 Change IndexSearcher multisegment searches to search each individual segment 
 using a single HitCollector
 

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py


 This issue changes how an IndexSearcher searches over multiple segments. The 
 current method of searching multiple segments is to use a MultiSegmentReader 
 and treat all of the segments as one. This causes filters and FieldCaches to 
 be keyed to the MultiReader and makes reopen expensive. If only a few 
 segments change, the FieldCache is still loaded for all of them.
 This patch changes things by searching each individual segment one at a time, 
 but sharing the HitCollector used across each segment. This allows 
 FieldCaches and Filters to be keyed on individual SegmentReaders, making 
 reopen much cheaper. FieldCache loading over multiple segments can be much 
 faster as well - with the old method, all unique terms for every segment is 
 enumerated against each segment - because of the likely logarithmic change in 
 terms per segment, this can be very wasteful. Searching individual segments 
 avoids this cost. The term/document statistics from the multireader are used 
 to score results for each segment.
 When sorting, its more difficult to use a single HitCollector for each sub 
 searcher. Ordinals are not comparable across segments. To account for this, a 
 new field sort enabled HitCollector is introduced that is able to collect and 
 sort across segments (because of its ability to compare ordinals across 
 segments). This TopFieldCollector class will collect the values/ordinals for 
 a given segment, and upon moving to the next segment, translate any 
 ordinals/values so that they can be compared against the values for the new 
 segment. This is done lazily.
 All and all, the switch seems to provide numerous performance benefits, in 
 both sorted and non sorted search. We were seeing a good loss on indices with 
 lots of segments (1000?) and certain 

[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666801#action_12666801
 ] 

thetaphi edited comment on LUCENE-1483 at 1/23/09 4:07 PM:


bq. So null -- I cannot be split into sub-readers; empty array -- I am a null 
reader; array.length  0 -- I do have sequential sub-readers?

This is a good optimization. If a MultiReader would return null instead of an 
empty array, it wouldn't be a problem (the empty reader would be searched with 
no results). But returning an empty array is better in this case. So 
gatherSubReaders() should only check for (null) and then add the parent reader 
itsself to the List and in all other cases do the recursion.

  was (Author: thetaphi):
bq. So null -- I cannot be split into sub-readers; empty array -- I am a 
null reader; array.length  0 -- I do have sequential sub-readers?

This is a good optimization. If a MultiReader would return null instead of an 
empty array, it wouldn't be a problem (the empty reader would be searched with 
no results). But returning an empty array is better in this case. So 
gatherSubReaders() should only check for (null) and then add the parent reader 
itsself to the List and in all other cases add the array contents maybe using 
List.addAll(Arrays.asList(array)) instead of the loop.
  
 Change IndexSearcher multisegment searches to search each individual segment 
 using a single HitCollector
 

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, 
 sortCollate.py


 This issue changes how an IndexSearcher searches over multiple segments. The 
 current method of searching multiple segments is to use a MultiSegmentReader 
 and treat all of the segments as one. This causes filters and FieldCaches to 
 be keyed to the MultiReader and makes reopen expensive. If only a few 
 segments change, the FieldCache is still loaded for all of them.
 This patch changes things by searching each individual segment one at a time, 
 but sharing the HitCollector used across each segment. This allows 
 FieldCaches and Filters to be keyed on individual SegmentReaders, making 
 reopen much cheaper. FieldCache loading over multiple segments can be much 
 faster as well - with the old method, all unique terms for every segment is 
 enumerated against each segment - because of the likely logarithmic change in 
 terms per segment, this can be very wasteful. Searching individual segments 
 avoids this cost. The term/document statistics from the multireader are used 
 to score results for each segment.
 When sorting, its more difficult to use a single HitCollector for each sub 
 searcher. Ordinals are not comparable across segments. To account for this, a 
 new field sort enabled HitCollector is introduced that is able to collect and 
 sort across segments (because of its ability to compare ordinals across 
 segments). This TopFieldCollector class will collect the values/ordinals for 
 a given segment, and upon moving to the next segment, translate any 
 ordinals/values so that they can be compared against the values for the new 
 segment. This is done lazily.
 All and all, the switch seems to provide numerous performance benefits, in 
 both sorted and non sorted search. We were seeing a good loss on indices with 
 lots of segments (1000?) and certain queue sizes / queries, but the latest 
 results seem to show thats been mostly taken care of (you shouldnt be using 
 such a large queue on such a segmented index anyway).
 * Introduces
 ** MultiReaderHitCollector - a HitCollector that can collect across multiple 
 IndexReaders. Old HitCollectors are wrapped to support multiple IndexReaders.
 ** TopFieldCollector - a HitCollector that can compare values/ordinals across 
 IndexReaders and sort on fields.
 ** FieldValueHitQueue - a Priority queue that is part of the 
 TopFieldCollector implementation.
 ** FieldComparator - a new 

[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-22 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666163#action_12666163
 ] 

markrmil...@gmail.com edited comment on LUCENE-1483 at 1/22/09 9:58 AM:
--

Nice work Mike - pretty polished. I've spent a little time looking it over, but 
I'm going to look more tonight. Everything looking pretty good to me.

Not sure what to name that new class, but here are some ideas:

TopScoreDocCollector
TopHitCollector
TopResultCollector
TopMatchCollector
TopCollector
TopScoreCollector

Could be a low score, so that last one is odd, but I guess the low would kind 
of be the top...
*edit*
nevermind...I was thinking lowest score could be considered top match, but it 
wouldnt be the case with this hitcollector implementation, so I guess it makes 
as much sense as any of the others.



  was (Author: markrmil...@gmail.com):
Nice work Mike - pretty polished. I've spent a little time looking it over, 
but I'm going to look more tonight. Everything looking pretty good to me.

Not sure what to name that new class, but here are some ideas:

TopScoreDocCollector
TopHitCollector
TopResultCollector
TopMatchCollector
TopCollector
TopScoreCollector

Could be a low score, so that last one is odd, but I guess the low would kind 
of be the top...


  
 Change IndexSearcher multisegment searches to search each individual segment 
 using a single HitCollector
 

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, sortBench.py, sortCollate.py


 This issue changes how an IndexSearcher searches over multiple segments. The 
 current method of searching multiple segments is to use a MultiSegmentReader 
 and treat all of the segments as one. This causes filters and FieldCaches to 
 be keyed to the MultiReader and makes reopen expensive. If only a few 
 segments change, the FieldCache is still loaded for all of them.
 This patch changes things by searching each individual segment one at a time, 
 but sharing the HitCollector used across each segment. This allows 
 FieldCaches and Filters to be keyed on individual SegmentReaders, making 
 reopen much cheaper. FieldCache loading over multiple segments can be much 
 faster as well - with the old method, all unique terms for every segment is 
 enumerated against each segment - because of the likely logarithmic change in 
 terms per segment, this can be very wasteful. Searching individual segments 
 avoids this cost. The term/document statistics from the multireader are used 
 to score results for each segment.
 When sorting, its more difficult to use a single HitCollector for each sub 
 searcher. Ordinals are not comparable across segments. To account for this, a 
 new field sort enabled HitCollector is introduced that is able to collect and 
 sort across segments (because of its ability to compare ordinals across 
 segments). This TopFieldCollector class will collect the values/ordinals for 
 a given segment, and upon moving to the next segment, translate any 
 ordinals/values so that they can be compared against the values for the new 
 segment. This is done lazily.
 All and all, the switch seems to provide numerous performance benefits, in 
 both sorted and non sorted search. We were seeing a good loss on indices with 
 lots of segments (1000?) and certain queue sizes / queries, but the latest 
 results seem to show thats been mostly taken care of (you shouldnt be using 
 such a large queue on such a segmented index anyway).
 * Introduces
 ** MultiReaderHitCollector - a HitCollector that can collect across multiple 
 IndexReaders. Old HitCollectors are wrapped to support multiple IndexReaders.
 ** TopFieldCollector - a HitCollector that can compare values/ordinals across 
 IndexReaders and sort on fields.
 ** FieldValueHitQueue - a Priority queue that is part of the 
 TopFieldCollector implementation.
 ** FieldComparator - a new Comparator class that works across IndexReaders. 
 Part of the 

[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12664984#action_12664984
 ] 

markrmil...@gmail.com edited comment on LUCENE-1483 at 1/18/09 2:19 PM:
--

My previous results had a few oddities going with them (I was loosely playing 
around). Being a little more careful, here is an example of the difference, and 
the hotspots. Timings are probably not completely comparable as my comp couldnt 
keep up profiling the second version very well - its much slower without 
profiling as well though:

Index is 60 docs, 46 segments, 63849 unique terms.

Load the fieldcache on one multireader

||method||time||invocations||
|FieldCacheImpl.createValue|156536(98%)|1|
|MultiTermDocs.next()|148499(93.5%)|621803|
|MutliTermDocs(int)|140397(88.4%)|1002938|
|SegmentTermDocs.seek(Term)|138332(87.1%)|1002938|

load the fieldcache on each sub reader of the multireader, one at a time

||method||time||invocations||
|FieldCacheImpl.createValue|7815(80.4%)|46|
|SegmentTermDocs.next()|3315(34.1%)|642046|
|SegmentTermEnum.next()|1936(19.9%)|42046|
|SegmentTermDocs.seek(TermEnum)|874(9%)|42046|


*edit*
wrong values





  was (Author: markrmil...@gmail.com):
My previous results had a few oddities going with them (I was loosely 
playing around). Being a little more careful, here is an example of the 
difference, and the hotspots. Timings are probably not completely comparable as 
my comp couldnt keep up profiling the second version very well - its much 
slower without profiling as well though:

Index is 60 docs, 46 segments, 63849 unique terms.

Load the fieldcache on one multireader

||method||time||invocations||
|FieldCacheImpl.createValue|156536(98%)|1|
|MultiTermDocs.next()|148499(93.5%)|621803|
|MutliTermDocs(int)|140397(88.4%)|1002938|
|SegmentTermDocs.seek(Term)|138332(87.1%)|1002938|

load the fieldcache on each sub reader of the multireader, one at a time

||method||time||invocations||
|FieldCacheImpl.createValue|7815(80.4%)|46|
|SegmentTermDocs.next()|3315(34.1%)|642046|
|SegmentTermEnum.next()|1936(19.9%)|42046|
|SegmentTermDocs.seek(TermEnum)|874(9%)|42046|


Unique terms per segment:
21312,41837,41843,41849,41854,41860,41865,41870,41878,41883,41888,41894,41902,41906,41910,41912,41916,41921,41924
41930,41932,41936,41943,41947,41951,41956,41960,41964,41970,41974,41979,41982,41989,41994,41999,42002,42005
42007,42011,42016,42020,42026,42033,42039,42044,42046




  
 Change IndexSearcher multisegment searches to search each individual segment 
 using a single HitCollector
 

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 sortBench.py, sortCollate.py


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-18 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12664984#action_12664984
 ] 

markrmil...@gmail.com edited comment on LUCENE-1483 at 1/18/09 8:20 PM:
--

My previous results had a few oddities going with them (I was loosely playing 
around). Being a little more careful, here is an example of the difference, and 
the hotspots. Timings are probably not completely comparable as my comp couldnt 
keep up profiling the second version very well - its much slower without 
profiling as well though:

Index is 60 docs, 46 segments

Load the fieldcache on one multireader

||method||time||invocations||
|FieldCacheImpl.createValue|156536(98%)|1|
|MultiTermDocs.next()|148499(93.5%)|621803|
|MutliTermDocs(int)|140397(88.4%)|1002938|
|SegmentTermDocs.seek(Term)|138332(87.1%)|1002938|

load the fieldcache on each sub reader of the multireader, one at a time

||method||time||invocations||
|FieldCacheImpl.createValue|7815(80.4%)|46|
|SegmentTermDocs.next()|3315(34.1%)|642046|
|SegmentTermEnum.next()|1936(19.9%)|42046|
|SegmentTermDocs.seek(TermEnum)|874(9%)|42046|


*edit*
wrong values





  was (Author: markrmil...@gmail.com):
My previous results had a few oddities going with them (I was loosely 
playing around). Being a little more careful, here is an example of the 
difference, and the hotspots. Timings are probably not completely comparable as 
my comp couldnt keep up profiling the second version very well - its much 
slower without profiling as well though:

Index is 60 docs, 46 segments, 63849 unique terms.

Load the fieldcache on one multireader

||method||time||invocations||
|FieldCacheImpl.createValue|156536(98%)|1|
|MultiTermDocs.next()|148499(93.5%)|621803|
|MutliTermDocs(int)|140397(88.4%)|1002938|
|SegmentTermDocs.seek(Term)|138332(87.1%)|1002938|

load the fieldcache on each sub reader of the multireader, one at a time

||method||time||invocations||
|FieldCacheImpl.createValue|7815(80.4%)|46|
|SegmentTermDocs.next()|3315(34.1%)|642046|
|SegmentTermEnum.next()|1936(19.9%)|42046|
|SegmentTermDocs.seek(TermEnum)|874(9%)|42046|


*edit*
wrong values




  
 Change IndexSearcher multisegment searches to search each individual segment 
 using a single HitCollector
 

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 sortBench.py, sortCollate.py


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-13 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663607#action_12663607
 ] 

markrmil...@gmail.com edited comment on LUCENE-1483 at 1/13/09 7:03 PM:
--

Disregarding any missing gains with those simple policies, the rest of those 
numbers actually look pretty good! Still some problems here and there (large 
queue size still sticky), but overall some solid gains as well.

orddem seems to be best in most cases currently - maybe we can tweak that a 
little more somehow. Where its not better, or not much worse, is with a single 
segment. That result is interesting, because both policies beat it nicely, and 
its because they simpely use straight ord on the first segment. But ordsubord 
seems to outperform the policies. That doesn't make sense. Its largely the 
same, but should be a tad slower if anything. Other results match up so nicely, 
it seems like it might not be noise, in which case, weird.

  was (Author: markrmil...@gmail.com):
Disregarding any any with those simple policies, the rest of those numbers 
actually look pretty good! Still some problems here and there (large queue size 
still sticky), but overall some solid gains as well.

orddem seems to be best in most cases currently - maybe we can tweak that a 
little more somehow. Where its not better, or not much worse, is with a single 
segment. That result is interesting, because both policies beat it nicely, and 
its because they simple use straight ord on the first segment. But ordsubord 
seems to outperform the policies. That doesn't make sense. Its largely the 
same, but should be a tad slower if anything. Other results match up so nicely, 
it seems like it might not be noise, in which case, weird.
  
 Change IndexSearcher multisegment searches to search each individual segment 
 using a single HitCollector
 

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, sortBench.py, sortCollate.py


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662038#action_12662038
 ] 

markrmil...@gmail.com edited comment on LUCENE-1483 at 1/8/09 9:15 AM:
-

Its the ORDSUBORD again (which I don't think we will use) and the two Policies. 
Odd because its the last hit of  10 that fails for all 3. I'll ferret it out 
tonight.

- Mark

*EDIT*

yup...always the last entry thats wrong no matter the queue size - for all 3, 
which is odd because ORD_SUBORD doesnt have too much of a relationship to the 
two policies. Will be a fun one.

  was (Author: markrmil...@gmail.com):
Its the ORDSUBORD again (which I don't think we will use) and the two 
Policies. Odd because its the last hit of  10 that fails for all 3. I'll ferret 
it out tonight.

- Mark
  
 Change IndexSearcher multisegment searches to search each individual segment 
 using a single HitCollector
 

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483-partial.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 sortBench.py, sortCollate.py


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-06 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12661160#action_12661160
 ] 

markrmil...@gmail.com edited comment on LUCENE-1483 at 1/6/09 6:57 AM:
-

bq. Mark, I see 3 testcase failures in TestSort if I pretend that 
SortField.STRING means STRING_ORD - do you see that?

Yeah, sorry. That STRING_ORD custom comparator policy is just a joke really, so 
I only really tested it on the StringSort test. It's just not initing the ords 
along with the values on switching. Making ords package private so that it can 
be changed (and changing it) fixes things. Not sure about new constructors or 
package private for that part of the switch...

bq. I think we should fix TestSort so that it runs N times, each time using a 
different STRING sort method, to make sure we are covering all these methods?

Yeah, this makes sense in any case. I just keep switching them by hand as I 
work on them.

  was (Author: markrmil...@gmail.com):
bq. Mark, I see 3 testcase failures in TestSort if I pretend that 
SortField.STRING means STRING_ORD - do you see that?

Yeah, sorry. That STRING_ORD custom comparator is just a joke really, so I only 
really tested it on the StringSort test. It's just not initing the ords along 
with the values on switching. Making ords package private so that it can be 
changed (and changing it) fixes things. Not sure about new constructors or 
package private for that part of the switch...

bq. I think we should fix TestSort so that it runs N times, each time using a 
different STRING sort method, to make sure we are covering all these methods?

Yeah, this makes sense in any case. I just keep switching them by hand as I 
work on them.
  
 Change IndexSearcher multisegment searches to search each individual segment 
 using a single HitCollector
 

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 sortBench.py, sortCollate.py


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-02 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12660322#action_12660322
 ] 

markrmil...@gmail.com edited comment on LUCENE-1483 at 1/2/09 6:24 AM:
-

So what looks like a promising strategy?

Off the top I am thinking something as simple as:

start with ORD with no fallback on the largest.
if the next segments are fairly large, use ORD_VAL
if the segments get somewhat smaller, move to ORD_DEM

Oddly, I've seen VAL perform well in certain situations, so maybe it has its 
place, but I don't know where yet.

*edit*

Oh, yeah, queue size should also play a roll in the switching 

  was (Author: markrmil...@gmail.com):
So what looks like a promising strategy?

Off the top I am thinking something as simple as:

start with ORD with no fallback on the largest.
if the next segments are fairly large, use ORD_VAL
if the segments get somewhat smaller, move to ORD_DEM

Oddly, I've seen VAL perform well in certain situations, so maybe it has its 
place, but I don't know where yet.
  
 Change IndexSearcher multisegment searches to search each individual segment 
 using a single HitCollector
 

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org