[jira] [Commented] (LUCENE-5488) FilteredQuery.explain does not honor FilterStrategy
[ https://issues.apache.org/jira/browse/LUCENE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924432#comment-13924432 ] Lei Wang commented on LUCENE-5488: -- {noformat} +if (result == null) { + result = new Explanation + (0.0f, "failure to match filter: " + f.toString()); {noformat} if inner == null, result will be null here, we should not rewrite it to "failure to ...", should continue return null > FilteredQuery.explain does not honor FilterStrategy > --- > > Key: LUCENE-5488 > URL: https://issues.apache.org/jira/browse/LUCENE-5488 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.6.1 >Reporter: John Wang >Assignee: Michael Busch > Attachments: LUCENE-5488.patch, LUCENE-5488.patch > > > Some Filter implementations produce DocIdSets without the iterator() > implementation, such as o.a.l.facet.range.Range.getFilter(). It is done with > the intention to be used in conjunction with FilteredQuery with > FilterStrategy set to be QUERY_FIRST_FILTER_STRATEGY for performance reasons. > However, this behavior is not honored by FilteredQuery.explain where > docidset.iterator is called regardless and causing such valid usages of above > filter types to fail. > The fix is to check bits() first and and fall back to iterator if bits is > null. In which case, the input Filter is indeed bad. > See attached unit test, which fails without this patch. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5495) Boolean Filter does not handle FilterClauses with only bits() implemented
[ https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924224#comment-13924224 ] Lei Wang commented on LUCENE-5495: -- please drop my FixedBitSet or OpenBitSet suggestion. for the FixedBitSet or OpenBitSet, which way is faster depends on the use case i think. without merging should be ok. > Boolean Filter does not handle FilterClauses with only bits() implemented > - > > Key: LUCENE-5495 > URL: https://issues.apache.org/jira/browse/LUCENE-5495 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.6.1 >Reporter: John Wang > Attachments: LUCENE-5495.patch > > > Some Filter implementations produce DocIdSets without the iterator() > implementation, such as o.a.l.facet.range.Range.getFilter(). > Currently, such filters cannot be added to a BooleanFilter because > BooleanFilter expects all FilterClauses with Filters that have iterator() > implemented. > This patch improves the behavior by taking Filters with bits() implemented > and treat them separately. > This behavior would be faster in the case for Filters with a forward index as > the underlying data structure, where there would be no need to scan the index > to build an iterator. > See attached unit test, which fails without this patch. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5495) Boolean Filter does not handle FilterClauses with only bits() implemented
[ https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922940#comment-13922940 ] Lei Wang edited comment on LUCENE-5495 at 3/7/14 7:12 PM: -- {noformat} + public BitsDocIdSet(Bits bits, int length) { +this.bits = bits; +this.length = length; + } {noformat} We can assert bits is not a DocIdSet here. if it is, this adds overhead only. {noformat} +final List mustBitsList = new ArrayList(); +final List mustNotBitsList = new ArrayList(); {noformat} May need a SHOULD list also? {noformat} +if (bits != null) { + mustNotBitsList.add(bits); +} {noformat} if bits is already a FixedBitSet or OpenBitSet, merge them into res might be faster? same for other lists (Not necessary, please drop this one) was (Author: wonlay): {noformat} + public BitsDocIdSet(Bits bits, int length) { +this.bits = bits; +this.length = length; + } {noformat} We can assert bits is not a DocIdSet here. if it is, this adds overhead only. {noformat} +final List mustBitsList = new ArrayList(); +final List mustNotBitsList = new ArrayList(); {noformat} May need a SHOULD list also? {noformat} +if (bits != null) { + mustNotBitsList.add(bits); +} {noformat} if bits is already a FixedBitSet or OpenBitSet, merge them into res might be faster? same for other lists > Boolean Filter does not handle FilterClauses with only bits() implemented > - > > Key: LUCENE-5495 > URL: https://issues.apache.org/jira/browse/LUCENE-5495 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.6.1 >Reporter: John Wang > Attachments: LUCENE-5495.patch > > > Some Filter implementations produce DocIdSets without the iterator() > implementation, such as o.a.l.facet.range.Range.getFilter(). > Currently, such filters cannot be added to a BooleanFilter because > BooleanFilter expects all FilterClauses with Filters that have iterator() > implemented. > This patch improves the behavior by taking Filters with bits() implemented > and treat them separately. > This behavior would be faster in the case for Filters with a forward index as > the underlying data structure, where there would be no need to scan the index > to build an iterator. > See attached unit test, which fails without this patch. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5495) Boolean Filter does not handle FilterClauses with only bits() implemented
[ https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922940#comment-13922940 ] Lei Wang edited comment on LUCENE-5495 at 3/6/14 7:35 PM: -- {noformat} + public BitsDocIdSet(Bits bits, int length) { +this.bits = bits; +this.length = length; + } {noformat} We can assert bits is not a DocIdSet here. if it is, this adds overhead only. {noformat} +final List mustBitsList = new ArrayList(); +final List mustNotBitsList = new ArrayList(); {noformat} May need a SHOULD list also? {noformat} +if (bits != null) { + mustNotBitsList.add(bits); +} {noformat} if bits is already a FixedBitSet or OpenBitSet, merge them into res might be faster? same for other lists was (Author: wonlay): + public BitsDocIdSet(Bits bits, int length) { +this.bits = bits; +this.length = length; + } We can assert bits is not a DocIdSet here. if it is, this adds overhead only. +final List mustBitsList = new ArrayList(); +final List mustNotBitsList = new ArrayList(); May need a SHOULD list also? +if (bits != null) { + mustNotBitsList.add(bits); +} if bits is already a FixedBitSet or OpenBitSet, merge them into res might be faster? same for other lists > Boolean Filter does not handle FilterClauses with only bits() implemented > - > > Key: LUCENE-5495 > URL: https://issues.apache.org/jira/browse/LUCENE-5495 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.6.1 >Reporter: John Wang > Attachments: LUCENE-5495.patch > > > Some Filter implementations produce DocIdSets without the iterator() > implementation, such as o.a.l.facet.range.Range.getFilter(). > Currently, such filters cannot be added to a BooleanFilter because > BooleanFilter expects all FilterClauses with Filters that have iterator() > implemented. > This patch improves the behavior by taking Filters with bits() implemented > and treat them separately. > This behavior would be faster in the case for Filters with a forward index as > the underlying data structure, where there would be no need to scan the index > to build an iterator. > See attached unit test, which fails without this patch. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5495) Boolean Filter does not handle FilterClauses with only bits() implemented
[ https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922940#comment-13922940 ] Lei Wang commented on LUCENE-5495: -- + public BitsDocIdSet(Bits bits, int length) { +this.bits = bits; +this.length = length; + } We can assert bits is not a DocIdSet here. if it is, this adds overhead only. +final List mustBitsList = new ArrayList(); +final List mustNotBitsList = new ArrayList(); May need a SHOULD list also? +if (bits != null) { + mustNotBitsList.add(bits); +} if bits is already a FixedBitSet or OpenBitSet, merge them into res might be faster? same for other lists > Boolean Filter does not handle FilterClauses with only bits() implemented > - > > Key: LUCENE-5495 > URL: https://issues.apache.org/jira/browse/LUCENE-5495 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 4.6.1 >Reporter: John Wang > Attachments: LUCENE-5495.patch > > > Some Filter implementations produce DocIdSets without the iterator() > implementation, such as o.a.l.facet.range.Range.getFilter(). > Currently, such filters cannot be added to a BooleanFilter because > BooleanFilter expects all FilterClauses with Filters that have iterator() > implemented. > This patch improves the behavior by taking Filters with bits() implemented > and treat them separately. > This behavior would be faster in the case for Filters with a forward index as > the underlying data structure, where there would be no need to scan the index > to build an iterator. > See attached unit test, which fails without this patch. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896217#comment-13896217 ] Lei Wang commented on LUCENE-5425: -- oh cool! that patch looks better! > Make creation of FixedBitSet in FacetsCollector overridable > --- > > Key: LUCENE-5425 > URL: https://issues.apache.org/jira/browse/LUCENE-5425 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 4.6 >Reporter: John Wang >Assignee: Shai Erera > Fix For: 5.0, 4.7 > > Attachments: LUCENE-5425.patch, facetscollector.patch, > facetscollector.patch, fixbitset.patch, openbitsetiter.patch > > > In FacetsCollector, creation of bits in MatchingDocs are allocated per query. > For large indexes where maxDocs are large creating a bitset of maxDoc bits > will be expensive and would great a lot of garbage. > Attached patch is to allow for this allocation customizable while maintaining > current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Wang updated LUCENE-5425: - Attachment: openbitsetiter.patch Faster OpenBitSetIterator > Make creation of FixedBitSet in FacetsCollector overridable > --- > > Key: LUCENE-5425 > URL: https://issues.apache.org/jira/browse/LUCENE-5425 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 4.6 >Reporter: John Wang >Assignee: Shai Erera > Fix For: 5.0, 4.7 > > Attachments: LUCENE-5425.patch, facetscollector.patch, > facetscollector.patch, fixbitset.patch, openbitsetiter.patch > > > In FacetsCollector, creation of bits in MatchingDocs are allocated per query. > For large indexes where maxDocs are large creating a bitset of maxDoc bits > will be expensive and would great a lot of garbage. > Attached patch is to allow for this allocation customizable while maintaining > current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896149#comment-13896149 ] Lei Wang commented on LUCENE-5425: -- Looks like we should be able to replace the impl of OpenBitSetIterator with the one Shai added in the FixedBitSet. The only missing data is the numBits, but it's optional. And at the same time, any code that is using the will be benefit from this. > Make creation of FixedBitSet in FacetsCollector overridable > --- > > Key: LUCENE-5425 > URL: https://issues.apache.org/jira/browse/LUCENE-5425 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 4.6 >Reporter: John Wang >Assignee: Shai Erera > Fix For: 5.0, 4.7 > > Attachments: LUCENE-5425.patch, facetscollector.patch, > facetscollector.patch, fixbitset.patch > > > In FacetsCollector, creation of bits in MatchingDocs are allocated per query. > For large indexes where maxDocs are large creating a bitset of maxDoc bits > will be expensive and would great a lot of garbage. > Attached patch is to allow for this allocation customizable while maintaining > current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892435#comment-13892435 ] Lei Wang commented on LUCENE-5425: -- Thanks! > Make creation of FixedBitSet in FacetsCollector overridable > --- > > Key: LUCENE-5425 > URL: https://issues.apache.org/jira/browse/LUCENE-5425 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 4.6 >Reporter: John Wang > Attachments: LUCENE-5425.patch, facetscollector.patch, > facetscollector.patch, fixbitset.patch > > > In FacetsCollector, creation of bits in MatchingDocs are allocated per query. > For large indexes where maxDocs are large creating a bitset of maxDoc bits > will be expensive and would great a lot of garbage. > Attached patch is to allow for this allocation customizable while maintaining > current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892338#comment-13892338 ] Lei Wang commented on LUCENE-5425: -- Yes! with this patch, we can write our own FacetCollector, and do customize. Only one small suggestions. The createHitsSet is marked as protected, but the class itself is final, no sub-class can override it other then creating a new FacetCollector. Can we remove the final modifier for the class, and add finals to the methods we don't want the user to override? > Make creation of FixedBitSet in FacetsCollector overridable > --- > > Key: LUCENE-5425 > URL: https://issues.apache.org/jira/browse/LUCENE-5425 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 4.6 >Reporter: John Wang > Attachments: LUCENE-5425.patch, facetscollector.patch, > facetscollector.patch, fixbitset.patch > > > In FacetsCollector, creation of bits in MatchingDocs are allocated per query. > For large indexes where maxDocs are large creating a bitset of maxDoc bits > will be expensive and would great a lot of garbage. > Attached patch is to allow for this allocation customizable while maintaining > current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891044#comment-13891044 ] Lei Wang commented on LUCENE-5425: -- The overhead may not from the additional method call, might be the openbitset, the impl in the fixedbitset is more friendly to branching predict and instructions prefetching. But I think we can simply apply the reuse part now first. I will start run some more tests later on when i get time. I actually have an idea now to apply the optimization in general which I thought might only be applied in our case previously. will also test that out, see if we can improve this overall. > Make creation of FixedBitSet in FacetsCollector overridable > --- > > Key: LUCENE-5425 > URL: https://issues.apache.org/jira/browse/LUCENE-5425 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 4.6 >Reporter: John Wang > Attachments: facetscollector.patch, facetscollector.patch, > fixbitset.patch > > > In FacetsCollector, creation of bits in MatchingDocs are allocated per query. > For large indexes where maxDocs are large creating a bitset of maxDoc bits > will be expensive and would great a lot of garbage. > Attached patch is to allow for this allocation customizable while maintaining > current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890040#comment-13890040 ] Lei Wang commented on LUCENE-5425: -- tried with the lucenutil, but got some problem. I cannot get same numbers for two identical code of trunk. even if they are all trunks, i get different numbers: Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff OrHighMed 74.15 (7.1%) 71.24 (8.3%) -3.9% ( -18% - 12%) LowTerm 515.68 (15.1%) 496.20 (12.3%) -3.8% ( -27% - 27%) OrNotHighLow 72.22 (8.2%) 70.36 (7.6%) -2.6% ( -17% - 14%) OrNotHighMed 79.01 (7.3%) 77.43 (8.4%) -2.0% ( -16% - 14%) OrHighNotHigh 38.66 (4.5%) 37.90 (6.4%) -2.0% ( -12% -9%) Respell 51.21 (7.1%) 50.23 (6.5%) -1.9% ( -14% - 12%) MedPhrase 69.67 (7.5%) 68.35 (7.4%) -1.9% ( -15% - 14%) OrHighLow 67.24 (7.8%) 66.00 (9.0%) -1.8% ( -17% - 16%) Fuzzy1 27.37 (5.7%) 26.96 (5.5%) -1.5% ( -11% - 10%) Fuzzy2 37.21 (3.8%) 36.71 (5.6%) -1.3% ( -10% -8%) MedSloppyPhrase9.94 (5.4%)9.83 (3.9%) -1.1% ( -9% -8%) LowSpanNear8.60 (3.9%)8.54 (3.8%) -0.7% ( -8% -7%) AndHighHigh 40.23 (3.1%) 40.03 (2.5%) -0.5% ( -5% -5%) HighTerm 76.07 (9.0%) 75.96 (9.1%) -0.2% ( -16% - 19%) OrHighHigh 11.62 (3.0%) 11.62 (4.8%) -0.1% ( -7% -7%) IntNRQ9.51 (3.9%)9.51 (8.3%) 0.0% ( -11% - 12%) HighPhrase 25.61 (7.0%) 25.63 (7.7%) 0.1% ( -13% - 15%) LowSloppyPhrase 30.21 (5.2%) 30.24 (4.3%) 0.1% ( -8% - 10%) PKLookup 212.03 (9.0%) 212.25 (11.5%) 0.1% ( -18% - 22%) OrNotHighHigh 27.75 (3.5%) 27.80 (6.5%) 0.2% ( -9% - 10%) OrHighNotMed 58.14 (5.9%) 58.27 (8.3%) 0.2% ( -13% - 15%) MedSpanNear 22.73 (3.7%) 22.80 (5.1%) 0.3% ( -8% -9%) Wildcard 42.84 (5.0%) 42.97 (5.4%) 0.3% ( -9% - 11%) HighSloppyPhrase 23.99 (7.4%) 24.08 (6.3%) 0.4% ( -12% - 15%) AndHighLow 625.62 (6.6%) 629.52 (10.5%) 0.6% ( -15% - 18%) Prefix3 77.68 (7.2%) 78.17 (6.2%) 0.6% ( -11% - 15%) LowPhrase 14.58 (4.7%) 14.77 (5.0%) 1.3% ( -8% - 11%) HighSpanNear 11.84 (4.3%) 11.99 (5.2%) 1.3% ( -7% - 11%) OrHighNotLow 66.04 (8.4%) 67.28 (9.2%) 1.9% ( -14% - 21%) AndHighMed 66.55 (4.3%) 67.91 (6.2%) 2.1% ( -8% - 13%) MedTerm 139.78 (9.5%) 145.63 (10.3%) 4.2% ( -14% - 26%) with the patch, the numbers are also different, but no bigger difference than the trunk-trunk numbers: Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff AndHighLow 730.30 (11.5%) 700.95 (10.6%) -4.0% ( -23% - 20%) LowTerm 520.94 (10.6%) 504.25 (11.4%) -3.2% ( -22% - 21%) Fuzzy1 57.55 (5.1%) 56.26 (4.8%) -2.2% ( -11% -8%) Respell 35.85 (4.7%) 35.18 (4.1%) -1.9% ( -10% -7%) OrHighNotHigh 37.77 (7.3%) 37.19 (5.9%) -1.5% ( -13% - 12%) HighSloppyPhrase 12.30 (7.5%) 12.17 (7.7%) -1.1% ( -15% - 15%) HighPhrase 29.38 (5.2%) 29.06 (4.3%) -1.1% ( -10% -8%) OrNotHighMed 25.93 (6.2%) 25.68 (5.5%) -1.0% ( -11% - 11%) OrNotHighHigh 19.72 (5.9%) 19.53 (4.9%) -0.9% ( -11% - 10%) Fuzzy2 11.30 (3.6%) 11.24 (5.1%) -0.6% ( -8% -8%) PKLookup 218.16 (8.6%) 217.53 (9.3%) -0.3% ( -16% - 19%) LowSloppyPhrase 43.09 (5.6%) 43.00 (3.5%) -0.2% ( -8% -9%)
[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888464#comment-13888464 ] Lei Wang commented on LUCENE-5425: -- And, the idea of keep the DocIdSet a raw DocIdSet is quite good. For the collecting part, we can have a wrapper, which can return a DocIdSet after the collecting is done. > Make creation of FixedBitSet in FacetsCollector overridable > --- > > Key: LUCENE-5425 > URL: https://issues.apache.org/jira/browse/LUCENE-5425 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 4.6 >Reporter: John Wang > Attachments: facetscollector.patch, facetscollector.patch > > > In FacetsCollector, creation of bits in MatchingDocs are allocated per query. > For large indexes where maxDocs are large creating a bitset of maxDoc bits > will be expensive and would great a lot of garbage. > Attached patch is to allow for this allocation customizable while maintaining > current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888463#comment-13888463 ] Lei Wang commented on LUCENE-5425: -- I'm not exactly sure how to run the facets benchmarks. I did a run on ant run-task -Dtask.alg=conf/facets.alg, and changed "SearchSameRdr" Search > : 40 to 4, to get stable results. I'm not sure how to read the results also..., but the numbers looks quite similar between trunk and the docidset on my box. trunk: [java] > Report sum by Prefix (Search) and Round (4 about 4 out of 42) [java] Operation round facets runCnt recsPerRunrec/s elapsedSecavgUsedMemavgTotalMem [java] SearchSameRdr_4 0 true14 7,155.64 5.5931,610,096 51,212,288 [java] SearchSameRdr_4 - 1 false - - 1 - - 4 - 8,814.46 - - 4.54 - 33,534,008 - 49,209,344 [java] SearchSameRdr_4 2 true14 9,088.84 4.4035,673,136 48,373,760 [java] SearchSameRdr_4 - 3 false - - 1 - - 4 - 9,045.68 - - 4.42 - 35,279,544 - 47,661,056 [java] [java] [java] > Report sum by Prefix (Populate) and Round (4 about 4 out of 42) [java] Operation round facets runCnt recsPerRunrec/s elapsedSecavgUsedMemavgTotalMem [java] Populate0 true121578 2,489.96 8.6731,369,696 51,212,288 [java] Populate - - 1 false - - 1 - - 21578 - 3,973.85 - - 5.43 - 33,272,104 - 49,209,344 [java] Populate2 true121578 4,216.92 5.1232,701,392 48,373,760 [java] Populate - - 3 false - - 1 - - 21578 - 4,366.25 - - 4.94 - 35,064,408 - 47,661,056 [java] [java] [java] > Report sum by Prefix (MAddDocs) and Round (4 about 4 out of 42) [java] Operationround facets runCnt recsPerRunrec/s elapsedSecavgUsedMemavgTotalMem [java] MAddDocs_Exhaust 0 true121578 3,469.13 6.2224,536,720 51,212,288 [java] MAddDocs_Exhaust - 1 false - - 1 - - 21578 - 4,845.72 - - 4.45 - 34,857,920 - 49,209,344 [java] MAddDocs_Exhaust 2 true121578 5,129.07 4.2129,209,256 48,373,760 [java] MAddDocs_Exhaust - 3 false - - 1 - - 21578 - 5,259.08 - - 4.10 - 25,845,424 - 47,661,056 With the patch, but I changed the OpenBitSet to FixedBitSet, and use bits.iterator() to return iterator (It's still an OpenBitSetIterator), the result: [java] > Report sum by Prefix (Search) and Round (4 about 4 out of 42) [java] Operation round facets runCnt recsPerRunrec/s elapsedSecavgUsedMemavgTotalMem [java] SearchSameRdr_4 0 true14 7,280.67 5.4925,424,104 51,113,984 [java] SearchSameRdr_4 - 1 false - - 1 - - 4 - 8,689.98 - - 4.60 - 31,356,960 - 49,053,696 [java] SearchSameRdr_4 2 true14 9,157.51 4.3738,849,248 47,632,384 [java] SearchSameRdr_4 - 3 false - - 1 - - 4 - 9,097.11 - - 4.40 - 39,840,912 - 46,465,024 [java] [java] [java] > Report sum by Prefix (Populate) and Round (4 about 4 out of 42) [java] Operation round facets runCnt recsPerRunrec/s elapsedSecavgUsedMemavgTotalMem [java] Populate0 true121578 2,465.21 8.7525,187,152 51,113,984 [java] Populate - - 1 false - - 1 - - 21578 - 2,651.19 - - 8.14 - 30,985,904 - 49,053,696 [java] Populate2 true121578 4,247.64 5.0838,656,320 47,632,384 [java] Populate - - 3 false - - 1 - - 21578 - 4,298.41 - - 5.02 - 39,355,912 - 46,465,024 [java] [java] [java] > Report sum by Prefix (MAddDocs) and Round (4 about 4 out of 42) [java] Operationround facets runCnt recsPerRunrec/s elapsedSecavgUsedMemavgTotalMem [java] MAddDocs_Exhaust 0 true121578 3,404.01 6.3434,015,968 51,113,984 [java] MAddDocs_Exhaust - 1 false - - 1 - - 21578 - 3,062.88 - - 7.05 - 30,420,848 - 49,053,696 [java] MAddDocs_Exhaust 2 true121578 5,147.42 4.1928,833,976 47,632,384 [java] MAddDocs_Exhaust - 3 false - - 1 - - 21578 - 5,129.07 - - 4.21 - 37,117,288 - 46,465,024 > Make creation of FixedBitSet in FacetsCollector overridable >
[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887506#comment-13887506 ] Lei Wang commented on LUCENE-5425: -- agree it can be a done in a separate issue. didn't know a wrapper will affect performance that much. it's just an additional method call to me. the default OpenBitSetIterator impl is different with nextSetBit, maybe that's the reason? anyway, start with this change won't hurt anything. and caching of the bitset should be able to get 20% down on the overhead of new a bitset each time (the other 80% is from the memset). After getting this in, i can do a separate test on the DocIdSet. See if we can get an acceptable performance for the default behavior without reusing the memory. > Make creation of FixedBitSet in FacetsCollector overridable > --- > > Key: LUCENE-5425 > URL: https://issues.apache.org/jira/browse/LUCENE-5425 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 4.6 >Reporter: John Wang > Attachments: facetscollector.patch > > > In FacetsCollector, creation of bits in MatchingDocs are allocated per query. > For large indexes where maxDocs are large creating a bitset of maxDoc bits > will be expensive and would great a lot of garbage. > Attached patch is to allow for this allocation customizable while maintaining > current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887340#comment-13887340 ] Lei Wang commented on LUCENE-5425: -- Better not to depend on the bitset, it's better to depend on a more general interface. In the user's application, it they can get rid of the memset part of that data structure, like us in twitter, it can get 4X + performance improvements than a simple caching of the bitset. > Make creation of FixedBitSet in FacetsCollector overridable > --- > > Key: LUCENE-5425 > URL: https://issues.apache.org/jira/browse/LUCENE-5425 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 4.6 >Reporter: John Wang > Attachments: facetscollector.patch > > > In FacetsCollector, creation of bits in MatchingDocs are allocated per query. > For large indexes where maxDocs are large creating a bitset of maxDoc bits > will be expensive and would great a lot of garbage. > Attached patch is to allow for this allocation customizable while maintaining > current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5426) Make SortedSetDocValuesReaderState customizable
[ https://issues.apache.org/jira/browse/LUCENE-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887337#comment-13887337 ] Lei Wang commented on LUCENE-5426: -- looks like the DefaultSortedSetDocsValuesReaderState.java is missing in the patch. forgot to attach? > Make SortedSetDocValuesReaderState customizable > --- > > Key: LUCENE-5426 > URL: https://issues.apache.org/jira/browse/LUCENE-5426 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Affects Versions: 4.6 >Reporter: John Wang > Attachments: sortedsetreaderstate.patch > > > We have a reader that have a different data structure (in memory) where the > cost of computing ordinals per reader open is too expensive in the realtime > setting. > We are maintaining in memory data structure that supports all functionality > and would like to leverage SortedSetDocValuesAccumulator. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org