[jira] [Commented] (LUCENE-5488) FilteredQuery.explain does not honor FilterStrategy

2014-03-07 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924432#comment-13924432
 ] 

Lei Wang commented on LUCENE-5488:
--

{noformat}
+if (result == null) {
+  result = new Explanation
+  (0.0f, "failure to match filter: " + f.toString());
{noformat}

if inner == null, result will be null here, we should not rewrite it to 
"failure to ...", should continue return null

> FilteredQuery.explain does not honor FilterStrategy
> ---
>
> Key: LUCENE-5488
> URL: https://issues.apache.org/jira/browse/LUCENE-5488
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.6.1
>Reporter: John Wang
>Assignee: Michael Busch
> Attachments: LUCENE-5488.patch, LUCENE-5488.patch
>
>
> Some Filter implementations produce DocIdSets without the iterator() 
> implementation, such as o.a.l.facet.range.Range.getFilter(). It is done with 
> the intention to be used in conjunction with FilteredQuery with 
> FilterStrategy set to be QUERY_FIRST_FILTER_STRATEGY for performance reasons.
> However, this behavior is not honored by FilteredQuery.explain where 
> docidset.iterator is called regardless and causing such valid usages of above 
> filter types to fail.
> The fix is to check bits() first and and fall back to iterator if bits is 
> null. In which case, the input Filter is indeed bad.
> See attached unit test, which fails without this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5495) Boolean Filter does not handle FilterClauses with only bits() implemented

2014-03-07 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924224#comment-13924224
 ] 

Lei Wang commented on LUCENE-5495:
--

please drop my FixedBitSet or OpenBitSet suggestion. for the FixedBitSet or 
OpenBitSet, which way is faster depends on the use case i think. without 
merging should be ok.

> Boolean Filter does not handle FilterClauses with only bits() implemented
> -
>
> Key: LUCENE-5495
> URL: https://issues.apache.org/jira/browse/LUCENE-5495
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.6.1
>Reporter: John Wang
> Attachments: LUCENE-5495.patch
>
>
> Some Filter implementations produce DocIdSets without the iterator() 
> implementation, such as o.a.l.facet.range.Range.getFilter().
> Currently, such filters cannot be added to a BooleanFilter because 
> BooleanFilter expects all FilterClauses with Filters that have iterator() 
> implemented.
> This patch improves the behavior by taking Filters with bits() implemented 
> and treat them separately.
> This behavior would be faster in the case for Filters with a forward index as 
> the underlying data structure, where there would be no need to scan the index 
> to build an iterator.
> See attached unit test, which fails without this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5495) Boolean Filter does not handle FilterClauses with only bits() implemented

2014-03-07 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922940#comment-13922940
 ] 

Lei Wang edited comment on LUCENE-5495 at 3/7/14 7:12 PM:
--

 {noformat}
+  public BitsDocIdSet(Bits bits, int length) {
+this.bits = bits;
+this.length = length;
+  }
 {noformat}

We can assert bits is not a DocIdSet here. if it is, this adds overhead only.

 {noformat}
+final List mustBitsList = new ArrayList();
+final List mustNotBitsList = new ArrayList();
 {noformat}

May need a SHOULD list also?

 {noformat}
+if (bits != null) {
+  mustNotBitsList.add(bits);
+}
 {noformat}

if bits is already a FixedBitSet or OpenBitSet, merge them into res might be 
faster? same for other lists (Not necessary, please drop this one)




was (Author: wonlay):
 {noformat}
+  public BitsDocIdSet(Bits bits, int length) {
+this.bits = bits;
+this.length = length;
+  }
 {noformat}

We can assert bits is not a DocIdSet here. if it is, this adds overhead only.

 {noformat}
+final List mustBitsList = new ArrayList();
+final List mustNotBitsList = new ArrayList();
 {noformat}

May need a SHOULD list also?

 {noformat}
+if (bits != null) {
+  mustNotBitsList.add(bits);
+}
 {noformat}

if bits is already a FixedBitSet or OpenBitSet, merge them into res might be 
faster? same for other lists



> Boolean Filter does not handle FilterClauses with only bits() implemented
> -
>
> Key: LUCENE-5495
> URL: https://issues.apache.org/jira/browse/LUCENE-5495
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.6.1
>Reporter: John Wang
> Attachments: LUCENE-5495.patch
>
>
> Some Filter implementations produce DocIdSets without the iterator() 
> implementation, such as o.a.l.facet.range.Range.getFilter().
> Currently, such filters cannot be added to a BooleanFilter because 
> BooleanFilter expects all FilterClauses with Filters that have iterator() 
> implemented.
> This patch improves the behavior by taking Filters with bits() implemented 
> and treat them separately.
> This behavior would be faster in the case for Filters with a forward index as 
> the underlying data structure, where there would be no need to scan the index 
> to build an iterator.
> See attached unit test, which fails without this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5495) Boolean Filter does not handle FilterClauses with only bits() implemented

2014-03-06 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922940#comment-13922940
 ] 

Lei Wang edited comment on LUCENE-5495 at 3/6/14 7:35 PM:
--

 {noformat}
+  public BitsDocIdSet(Bits bits, int length) {
+this.bits = bits;
+this.length = length;
+  }
 {noformat}

We can assert bits is not a DocIdSet here. if it is, this adds overhead only.

 {noformat}
+final List mustBitsList = new ArrayList();
+final List mustNotBitsList = new ArrayList();
 {noformat}

May need a SHOULD list also?

 {noformat}
+if (bits != null) {
+  mustNotBitsList.add(bits);
+}
 {noformat}

if bits is already a FixedBitSet or OpenBitSet, merge them into res might be 
faster? same for other lists




was (Author: wonlay):
+  public BitsDocIdSet(Bits bits, int length) {
+this.bits = bits;
+this.length = length;
+  }

We can assert bits is not a DocIdSet here. if it is, this adds overhead only.


+final List mustBitsList = new ArrayList();
+final List mustNotBitsList = new ArrayList();

May need a SHOULD list also?


+if (bits != null) {
+  mustNotBitsList.add(bits);
+}

if bits is already a FixedBitSet or OpenBitSet, merge them into res might be 
faster? same for other lists



> Boolean Filter does not handle FilterClauses with only bits() implemented
> -
>
> Key: LUCENE-5495
> URL: https://issues.apache.org/jira/browse/LUCENE-5495
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.6.1
>Reporter: John Wang
> Attachments: LUCENE-5495.patch
>
>
> Some Filter implementations produce DocIdSets without the iterator() 
> implementation, such as o.a.l.facet.range.Range.getFilter().
> Currently, such filters cannot be added to a BooleanFilter because 
> BooleanFilter expects all FilterClauses with Filters that have iterator() 
> implemented.
> This patch improves the behavior by taking Filters with bits() implemented 
> and treat them separately.
> This behavior would be faster in the case for Filters with a forward index as 
> the underlying data structure, where there would be no need to scan the index 
> to build an iterator.
> See attached unit test, which fails without this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5495) Boolean Filter does not handle FilterClauses with only bits() implemented

2014-03-06 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922940#comment-13922940
 ] 

Lei Wang commented on LUCENE-5495:
--

+  public BitsDocIdSet(Bits bits, int length) {
+this.bits = bits;
+this.length = length;
+  }

We can assert bits is not a DocIdSet here. if it is, this adds overhead only.


+final List mustBitsList = new ArrayList();
+final List mustNotBitsList = new ArrayList();

May need a SHOULD list also?


+if (bits != null) {
+  mustNotBitsList.add(bits);
+}

if bits is already a FixedBitSet or OpenBitSet, merge them into res might be 
faster? same for other lists



> Boolean Filter does not handle FilterClauses with only bits() implemented
> -
>
> Key: LUCENE-5495
> URL: https://issues.apache.org/jira/browse/LUCENE-5495
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.6.1
>Reporter: John Wang
> Attachments: LUCENE-5495.patch
>
>
> Some Filter implementations produce DocIdSets without the iterator() 
> implementation, such as o.a.l.facet.range.Range.getFilter().
> Currently, such filters cannot be added to a BooleanFilter because 
> BooleanFilter expects all FilterClauses with Filters that have iterator() 
> implemented.
> This patch improves the behavior by taking Filters with bits() implemented 
> and treat them separately.
> This behavior would be faster in the case for Filters with a forward index as 
> the underlying data structure, where there would be no need to scan the index 
> to build an iterator.
> See attached unit test, which fails without this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

2014-02-09 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896217#comment-13896217
 ] 

Lei Wang commented on LUCENE-5425:
--

oh cool! that patch looks better!

> Make creation of FixedBitSet in FacetsCollector overridable
> ---
>
> Key: LUCENE-5425
> URL: https://issues.apache.org/jira/browse/LUCENE-5425
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 4.6
>Reporter: John Wang
>Assignee: Shai Erera
> Fix For: 5.0, 4.7
>
> Attachments: LUCENE-5425.patch, facetscollector.patch, 
> facetscollector.patch, fixbitset.patch, openbitsetiter.patch
>
>
> In FacetsCollector, creation of bits in MatchingDocs are allocated per query. 
> For large indexes where maxDocs are large creating a bitset of maxDoc bits 
> will be expensive and would great a lot of garbage.
> Attached patch is to allow for this allocation customizable while maintaining 
> current behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

2014-02-09 Thread Lei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Wang updated LUCENE-5425:
-

Attachment: openbitsetiter.patch

Faster OpenBitSetIterator

> Make creation of FixedBitSet in FacetsCollector overridable
> ---
>
> Key: LUCENE-5425
> URL: https://issues.apache.org/jira/browse/LUCENE-5425
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 4.6
>Reporter: John Wang
>Assignee: Shai Erera
> Fix For: 5.0, 4.7
>
> Attachments: LUCENE-5425.patch, facetscollector.patch, 
> facetscollector.patch, fixbitset.patch, openbitsetiter.patch
>
>
> In FacetsCollector, creation of bits in MatchingDocs are allocated per query. 
> For large indexes where maxDocs are large creating a bitset of maxDoc bits 
> will be expensive and would great a lot of garbage.
> Attached patch is to allow for this allocation customizable while maintaining 
> current behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

2014-02-09 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896149#comment-13896149
 ] 

Lei Wang commented on LUCENE-5425:
--

Looks like we should be able to replace the impl of OpenBitSetIterator with the 
one Shai added in the FixedBitSet. The only missing data is the numBits, but 
it's optional. And at the same time, any code that is using the will be benefit 
from this.

> Make creation of FixedBitSet in FacetsCollector overridable
> ---
>
> Key: LUCENE-5425
> URL: https://issues.apache.org/jira/browse/LUCENE-5425
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 4.6
>Reporter: John Wang
>Assignee: Shai Erera
> Fix For: 5.0, 4.7
>
> Attachments: LUCENE-5425.patch, facetscollector.patch, 
> facetscollector.patch, fixbitset.patch
>
>
> In FacetsCollector, creation of bits in MatchingDocs are allocated per query. 
> For large indexes where maxDocs are large creating a bitset of maxDoc bits 
> will be expensive and would great a lot of garbage.
> Attached patch is to allow for this allocation customizable while maintaining 
> current behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

2014-02-05 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892435#comment-13892435
 ] 

Lei Wang commented on LUCENE-5425:
--

Thanks!

> Make creation of FixedBitSet in FacetsCollector overridable
> ---
>
> Key: LUCENE-5425
> URL: https://issues.apache.org/jira/browse/LUCENE-5425
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 4.6
>Reporter: John Wang
> Attachments: LUCENE-5425.patch, facetscollector.patch, 
> facetscollector.patch, fixbitset.patch
>
>
> In FacetsCollector, creation of bits in MatchingDocs are allocated per query. 
> For large indexes where maxDocs are large creating a bitset of maxDoc bits 
> will be expensive and would great a lot of garbage.
> Attached patch is to allow for this allocation customizable while maintaining 
> current behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

2014-02-05 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892338#comment-13892338
 ] 

Lei Wang commented on LUCENE-5425:
--

Yes! with this patch, we can write our own FacetCollector, and do customize.

Only one small suggestions. The createHitsSet is marked as protected, but the 
class itself is final, no sub-class can override it other then creating a new 
FacetCollector. Can we remove the final modifier for the class, and add finals 
to the methods we don't want the user to override?

> Make creation of FixedBitSet in FacetsCollector overridable
> ---
>
> Key: LUCENE-5425
> URL: https://issues.apache.org/jira/browse/LUCENE-5425
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 4.6
>Reporter: John Wang
> Attachments: LUCENE-5425.patch, facetscollector.patch, 
> facetscollector.patch, fixbitset.patch
>
>
> In FacetsCollector, creation of bits in MatchingDocs are allocated per query. 
> For large indexes where maxDocs are large creating a bitset of maxDoc bits 
> will be expensive and would great a lot of garbage.
> Attached patch is to allow for this allocation customizable while maintaining 
> current behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

2014-02-04 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891044#comment-13891044
 ] 

Lei Wang commented on LUCENE-5425:
--

The overhead may not from the additional method call, might be the openbitset, 
the impl in the fixedbitset is more friendly to branching predict and 
instructions prefetching.

But I think we can simply apply the reuse part now first. I will start run some 
more tests later on when i get time. I actually have an idea now to apply the 
optimization in general which I thought might only be applied in our case 
previously. will also test that out, see if we can improve this overall.

> Make creation of FixedBitSet in FacetsCollector overridable
> ---
>
> Key: LUCENE-5425
> URL: https://issues.apache.org/jira/browse/LUCENE-5425
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 4.6
>Reporter: John Wang
> Attachments: facetscollector.patch, facetscollector.patch, 
> fixbitset.patch
>
>
> In FacetsCollector, creation of bits in MatchingDocs are allocated per query. 
> For large indexes where maxDocs are large creating a bitset of maxDoc bits 
> will be expensive and would great a lot of garbage.
> Attached patch is to allow for this allocation customizable while maintaining 
> current behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

2014-02-03 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890040#comment-13890040
 ] 

Lei Wang commented on LUCENE-5425:
--

tried with the lucenutil, but got some problem. I cannot get same numbers for 
two identical code of trunk. even if they are all trunks, i get different 
numbers:
Report after iter 19:
TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
   OrHighMed   74.15  (7.1%)   71.24  (8.3%)   
-3.9% ( -18% -   12%)
 LowTerm  515.68 (15.1%)  496.20 (12.3%)   
-3.8% ( -27% -   27%)
OrNotHighLow   72.22  (8.2%)   70.36  (7.6%)   
-2.6% ( -17% -   14%)
OrNotHighMed   79.01  (7.3%)   77.43  (8.4%)   
-2.0% ( -16% -   14%)
   OrHighNotHigh   38.66  (4.5%)   37.90  (6.4%)   
-2.0% ( -12% -9%)
 Respell   51.21  (7.1%)   50.23  (6.5%)   
-1.9% ( -14% -   12%)
   MedPhrase   69.67  (7.5%)   68.35  (7.4%)   
-1.9% ( -15% -   14%)
   OrHighLow   67.24  (7.8%)   66.00  (9.0%)   
-1.8% ( -17% -   16%)
  Fuzzy1   27.37  (5.7%)   26.96  (5.5%)   
-1.5% ( -11% -   10%)
  Fuzzy2   37.21  (3.8%)   36.71  (5.6%)   
-1.3% ( -10% -8%)
 MedSloppyPhrase9.94  (5.4%)9.83  (3.9%)   
-1.1% (  -9% -8%)
 LowSpanNear8.60  (3.9%)8.54  (3.8%)   
-0.7% (  -8% -7%)
 AndHighHigh   40.23  (3.1%)   40.03  (2.5%)   
-0.5% (  -5% -5%)
HighTerm   76.07  (9.0%)   75.96  (9.1%)   
-0.2% ( -16% -   19%)
  OrHighHigh   11.62  (3.0%)   11.62  (4.8%)   
-0.1% (  -7% -7%)
  IntNRQ9.51  (3.9%)9.51  (8.3%)
0.0% ( -11% -   12%)
  HighPhrase   25.61  (7.0%)   25.63  (7.7%)
0.1% ( -13% -   15%)
 LowSloppyPhrase   30.21  (5.2%)   30.24  (4.3%)
0.1% (  -8% -   10%)
PKLookup  212.03  (9.0%)  212.25 (11.5%)
0.1% ( -18% -   22%)
   OrNotHighHigh   27.75  (3.5%)   27.80  (6.5%)
0.2% (  -9% -   10%)
OrHighNotMed   58.14  (5.9%)   58.27  (8.3%)
0.2% ( -13% -   15%)
 MedSpanNear   22.73  (3.7%)   22.80  (5.1%)
0.3% (  -8% -9%)
Wildcard   42.84  (5.0%)   42.97  (5.4%)
0.3% (  -9% -   11%)
HighSloppyPhrase   23.99  (7.4%)   24.08  (6.3%)
0.4% ( -12% -   15%)
  AndHighLow  625.62  (6.6%)  629.52 (10.5%)
0.6% ( -15% -   18%)
 Prefix3   77.68  (7.2%)   78.17  (6.2%)
0.6% ( -11% -   15%)
   LowPhrase   14.58  (4.7%)   14.77  (5.0%)
1.3% (  -8% -   11%)
HighSpanNear   11.84  (4.3%)   11.99  (5.2%)
1.3% (  -7% -   11%)
OrHighNotLow   66.04  (8.4%)   67.28  (9.2%)
1.9% ( -14% -   21%)
  AndHighMed   66.55  (4.3%)   67.91  (6.2%)
2.1% (  -8% -   13%)
 MedTerm  139.78  (9.5%)  145.63 (10.3%)
4.2% ( -14% -   26%)

with the patch, the numbers are also different, but no bigger difference than 
the trunk-trunk numbers:
Report after iter 19:
TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
  AndHighLow  730.30 (11.5%)  700.95 (10.6%)   
-4.0% ( -23% -   20%)
 LowTerm  520.94 (10.6%)  504.25 (11.4%)   
-3.2% ( -22% -   21%)
  Fuzzy1   57.55  (5.1%)   56.26  (4.8%)   
-2.2% ( -11% -8%)
 Respell   35.85  (4.7%)   35.18  (4.1%)   
-1.9% ( -10% -7%)
   OrHighNotHigh   37.77  (7.3%)   37.19  (5.9%)   
-1.5% ( -13% -   12%)
HighSloppyPhrase   12.30  (7.5%)   12.17  (7.7%)   
-1.1% ( -15% -   15%)
  HighPhrase   29.38  (5.2%)   29.06  (4.3%)   
-1.1% ( -10% -8%)
OrNotHighMed   25.93  (6.2%)   25.68  (5.5%)   
-1.0% ( -11% -   11%)
   OrNotHighHigh   19.72  (5.9%)   19.53  (4.9%)   
-0.9% ( -11% -   10%)
  Fuzzy2   11.30  (3.6%)   11.24  (5.1%)   
-0.6% (  -8% -8%)
PKLookup  218.16  (8.6%)  217.53  (9.3%)   
-0.3% ( -16% -   19%)
 LowSloppyPhrase   43.09  (5.6%)   43.00  (3.5%)   
-0.2% (  -8% -9%)

[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

2014-01-31 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888464#comment-13888464
 ] 

Lei Wang commented on LUCENE-5425:
--

And, the idea of keep the DocIdSet a raw DocIdSet is quite good. For the 
collecting part, we can have a wrapper, which can return a DocIdSet after the 
collecting is done.

> Make creation of FixedBitSet in FacetsCollector overridable
> ---
>
> Key: LUCENE-5425
> URL: https://issues.apache.org/jira/browse/LUCENE-5425
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 4.6
>Reporter: John Wang
> Attachments: facetscollector.patch, facetscollector.patch
>
>
> In FacetsCollector, creation of bits in MatchingDocs are allocated per query. 
> For large indexes where maxDocs are large creating a bitset of maxDoc bits 
> will be expensive and would great a lot of garbage.
> Attached patch is to allow for this allocation customizable while maintaining 
> current behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

2014-01-31 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888463#comment-13888463
 ] 

Lei Wang commented on LUCENE-5425:
--

I'm not exactly sure how to run the facets benchmarks. I did a run on ant 
run-task -Dtask.alg=conf/facets.alg, and changed "SearchSameRdr" Search > : 40 
to 4, to get stable results.

I'm not sure how to read the results also..., but the numbers looks quite 
similar between trunk and the docidset on my box.

trunk:
 [java] > Report sum by Prefix (Search) and Round (4 about 4 
out of 42)
 [java] Operation   round facets   runCnt   recsPerRunrec/s 
 elapsedSecavgUsedMemavgTotalMem
 [java] SearchSameRdr_4 0   true14 7,155.64 
   5.5931,610,096 51,212,288
 [java] SearchSameRdr_4 -   1  false -  -   1 -  -   4 -   8,814.46 
-  -   4.54 -  33,534,008 -   49,209,344
 [java] SearchSameRdr_4 2   true14 9,088.84 
   4.4035,673,136 48,373,760
 [java] SearchSameRdr_4 -   3  false -  -   1 -  -   4 -   9,045.68 
-  -   4.42 -  35,279,544 -   47,661,056
 [java] 
 [java] 
 [java] > Report sum by Prefix (Populate) and Round (4 about 4 
out of 42)
 [java] Operation   round facets   runCnt   recsPerRunrec/s  
elapsedSecavgUsedMemavgTotalMem
 [java] Populate0   true121578 2,489.96
8.6731,369,696 51,212,288
 [java] Populate -  -   1  false -  -   1 -  -   21578 -   3,973.85 -  -   
5.43 -  33,272,104 -   49,209,344
 [java] Populate2   true121578 4,216.92
5.1232,701,392 48,373,760
 [java] Populate -  -   3  false -  -   1 -  -   21578 -   4,366.25 -  -   
4.94 -  35,064,408 -   47,661,056
 [java] 
 [java] 
 [java] > Report sum by Prefix (MAddDocs) and Round (4 about 4 
out of 42)
 [java] Operationround facets   runCnt   recsPerRunrec/s  
elapsedSecavgUsedMemavgTotalMem
 [java] MAddDocs_Exhaust 0   true121578 3,469.13
6.2224,536,720 51,212,288
 [java] MAddDocs_Exhaust -   1  false -  -   1 -  -   21578 -   4,845.72 -  
-   4.45 -  34,857,920 -   49,209,344
 [java] MAddDocs_Exhaust 2   true121578 5,129.07
4.2129,209,256 48,373,760
 [java] MAddDocs_Exhaust -   3  false -  -   1 -  -   21578 -   5,259.08 -  
-   4.10 -  25,845,424 -   47,661,056


With the patch, but I changed the OpenBitSet to FixedBitSet, and use 
bits.iterator() to return iterator (It's still an OpenBitSetIterator), the 
result:
 [java] > Report sum by Prefix (Search) and Round (4 about 4 
out of 42)
 [java] Operation   round facets   runCnt   recsPerRunrec/s 
 elapsedSecavgUsedMemavgTotalMem
 [java] SearchSameRdr_4 0   true14 7,280.67 
   5.4925,424,104 51,113,984
 [java] SearchSameRdr_4 -   1  false -  -   1 -  -   4 -   8,689.98 
-  -   4.60 -  31,356,960 -   49,053,696
 [java] SearchSameRdr_4 2   true14 9,157.51 
   4.3738,849,248 47,632,384
 [java] SearchSameRdr_4 -   3  false -  -   1 -  -   4 -   9,097.11 
-  -   4.40 -  39,840,912 -   46,465,024
 [java] 
 [java] 
 [java] > Report sum by Prefix (Populate) and Round (4 about 4 
out of 42)
 [java] Operation   round facets   runCnt   recsPerRunrec/s  
elapsedSecavgUsedMemavgTotalMem
 [java] Populate0   true121578 2,465.21
8.7525,187,152 51,113,984
 [java] Populate -  -   1  false -  -   1 -  -   21578 -   2,651.19 -  -   
8.14 -  30,985,904 -   49,053,696
 [java] Populate2   true121578 4,247.64
5.0838,656,320 47,632,384
 [java] Populate -  -   3  false -  -   1 -  -   21578 -   4,298.41 -  -   
5.02 -  39,355,912 -   46,465,024
 [java] 
 [java] 
 [java] > Report sum by Prefix (MAddDocs) and Round (4 about 4 
out of 42)
 [java] Operationround facets   runCnt   recsPerRunrec/s  
elapsedSecavgUsedMemavgTotalMem
 [java] MAddDocs_Exhaust 0   true121578 3,404.01
6.3434,015,968 51,113,984
 [java] MAddDocs_Exhaust -   1  false -  -   1 -  -   21578 -   3,062.88 -  
-   7.05 -  30,420,848 -   49,053,696
 [java] MAddDocs_Exhaust 2   true121578 5,147.42
4.1928,833,976 47,632,384
 [java] MAddDocs_Exhaust -   3  false -  -   1 -  -   21578 -   5,129.07 -  
-   4.21 -  37,117,288 -   46,465,024


> Make creation of FixedBitSet in FacetsCollector overridable
> 

[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

2014-01-30 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887506#comment-13887506
 ] 

Lei Wang commented on LUCENE-5425:
--

agree it can be a done in a separate issue. didn't know a wrapper will affect 
performance that much. it's just an additional method call to me. the default 
OpenBitSetIterator impl is different with nextSetBit, maybe that's the reason? 
anyway, start with this change won't hurt anything. and caching of the bitset 
should be able to get 20% down on the overhead of new a bitset each time (the 
other 80% is from the memset). After getting this in, i can do a separate test 
on the DocIdSet. See if we can get an acceptable performance for the default 
behavior without reusing the memory.

> Make creation of FixedBitSet in FacetsCollector overridable
> ---
>
> Key: LUCENE-5425
> URL: https://issues.apache.org/jira/browse/LUCENE-5425
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 4.6
>Reporter: John Wang
> Attachments: facetscollector.patch
>
>
> In FacetsCollector, creation of bits in MatchingDocs are allocated per query. 
> For large indexes where maxDocs are large creating a bitset of maxDoc bits 
> will be expensive and would great a lot of garbage.
> Attached patch is to allow for this allocation customizable while maintaining 
> current behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

2014-01-30 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887340#comment-13887340
 ] 

Lei Wang commented on LUCENE-5425:
--

Better not to depend on the bitset, it's better to depend on a more general 
interface. In the user's application, it they can get rid of the memset part of 
that data structure, like us in twitter, it can get 4X + performance 
improvements than a simple caching of the bitset.

> Make creation of FixedBitSet in FacetsCollector overridable
> ---
>
> Key: LUCENE-5425
> URL: https://issues.apache.org/jira/browse/LUCENE-5425
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 4.6
>Reporter: John Wang
> Attachments: facetscollector.patch
>
>
> In FacetsCollector, creation of bits in MatchingDocs are allocated per query. 
> For large indexes where maxDocs are large creating a bitset of maxDoc bits 
> will be expensive and would great a lot of garbage.
> Attached patch is to allow for this allocation customizable while maintaining 
> current behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5426) Make SortedSetDocValuesReaderState customizable

2014-01-30 Thread Lei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887337#comment-13887337
 ] 

Lei Wang commented on LUCENE-5426:
--

looks like the DefaultSortedSetDocsValuesReaderState.java is missing in the 
patch. forgot to attach?

> Make SortedSetDocValuesReaderState customizable
> ---
>
> Key: LUCENE-5426
> URL: https://issues.apache.org/jira/browse/LUCENE-5426
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 4.6
>Reporter: John Wang
> Attachments: sortedsetreaderstate.patch
>
>
> We have a reader that have a different data structure (in memory) where the 
> cost of computing ordinals per reader open is too expensive in the realtime 
> setting.
> We are maintaining in memory data structure that supports all functionality 
> and would like to leverage SortedSetDocValuesAccumulator.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org