[jira] [Commented] (LUCENE-7262) Add back the "estimate match count" optimization

2016-05-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267972#comment-15267972
 ] 

ASF subversion and git services commented on LUCENE-7262:
-

Commit 5b51479b69ec3c52e42c9b95418ee285080311f7 in lucene-solr's branch 
refs/heads/branch_6x from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5b51479 ]

LUCENE-7262: Fix NPE, this should lazy-init in start()
(cherry picked from commit 91153b9)


> Add back the "estimate match count" optimization
> 
>
> Key: LUCENE-7262
> URL: https://issues.apache.org/jira/browse/LUCENE-7262
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master, 6.1
>
> Attachments: LUCENE-7262.patch, LUCENE-7262.patch, LUCENE-7262.patch
>
>
> Follow-up to my last message on LUCENE-7051: I removed this optimization a 
> while ago because it made things a bit more complicated but did not seem to 
> help with point queries. However the reason why it did not seem to help was 
> that the benchmark only runs queries that match 25% of the dataset. This 
> makes the run time completely dominated by calls to FixedBitSet.set so the 
> call to FixedBitSet.cardinality() looks free. However with slightly sparser 
> queries like the geo benchmark generates (dense enough to trigger the 
> creation of a FixedBitSet but sparse enough so that FixedBitSet.set does not 
> dominate the run time), one can notice speed-ups when this call is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7262) Add back the "estimate match count" optimization

2016-05-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267967#comment-15267967
 ] 

ASF subversion and git services commented on LUCENE-7262:
-

Commit 91153b9627d7bd9e17dcb4762ebbaf26bc3410f4 in lucene-solr's branch 
refs/heads/master from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=91153b9 ]

LUCENE-7262: Fix NPE, this should lazy-init in start()


> Add back the "estimate match count" optimization
> 
>
> Key: LUCENE-7262
> URL: https://issues.apache.org/jira/browse/LUCENE-7262
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master, 6.1
>
> Attachments: LUCENE-7262.patch, LUCENE-7262.patch, LUCENE-7262.patch
>
>
> Follow-up to my last message on LUCENE-7051: I removed this optimization a 
> while ago because it made things a bit more complicated but did not seem to 
> help with point queries. However the reason why it did not seem to help was 
> that the benchmark only runs queries that match 25% of the dataset. This 
> makes the run time completely dominated by calls to FixedBitSet.set so the 
> call to FixedBitSet.cardinality() looks free. However with slightly sparser 
> queries like the geo benchmark generates (dense enough to trigger the 
> creation of a FixedBitSet but sparse enough so that FixedBitSet.set does not 
> dominate the run time), one can notice speed-ups when this call is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7262) Add back the "estimate match count" optimization

2016-05-02 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267727#comment-15267727
 ] 

Steve Rowe commented on LUCENE-7262:


TestPointQueries failures reported on LUCENE-7269 appear to be related to this 
issue.

> Add back the "estimate match count" optimization
> 
>
> Key: LUCENE-7262
> URL: https://issues.apache.org/jira/browse/LUCENE-7262
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master, 6.1
>
> Attachments: LUCENE-7262.patch, LUCENE-7262.patch, LUCENE-7262.patch
>
>
> Follow-up to my last message on LUCENE-7051: I removed this optimization a 
> while ago because it made things a bit more complicated but did not seem to 
> help with point queries. However the reason why it did not seem to help was 
> that the benchmark only runs queries that match 25% of the dataset. This 
> makes the run time completely dominated by calls to FixedBitSet.set so the 
> call to FixedBitSet.cardinality() looks free. However with slightly sparser 
> queries like the geo benchmark generates (dense enough to trigger the 
> creation of a FixedBitSet but sparse enough so that FixedBitSet.set does not 
> dominate the run time), one can notice speed-ups when this call is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7262) Add back the "estimate match count" optimization

2016-05-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1522#comment-1522
 ] 

ASF subversion and git services commented on LUCENE-7262:
-

Commit e9f2ac0021e004593599706f4e2db1bd1f724248 in lucene-solr's branch 
refs/heads/branch_6x from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e9f2ac0 ]

LUCENE-7262: Leverage index statistics to make DocIdSetBuilder more efficient.


> Add back the "estimate match count" optimization
> 
>
> Key: LUCENE-7262
> URL: https://issues.apache.org/jira/browse/LUCENE-7262
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master, 6.1
>
> Attachments: LUCENE-7262.patch, LUCENE-7262.patch, LUCENE-7262.patch
>
>
> Follow-up to my last message on LUCENE-7051: I removed this optimization a 
> while ago because it made things a bit more complicated but did not seem to 
> help with point queries. However the reason why it did not seem to help was 
> that the benchmark only runs queries that match 25% of the dataset. This 
> makes the run time completely dominated by calls to FixedBitSet.set so the 
> call to FixedBitSet.cardinality() looks free. However with slightly sparser 
> queries like the geo benchmark generates (dense enough to trigger the 
> creation of a FixedBitSet but sparse enough so that FixedBitSet.set does not 
> dominate the run time), one can notice speed-ups when this call is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7262) Add back the "estimate match count" optimization

2016-05-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266616#comment-15266616
 ] 

ASF subversion and git services commented on LUCENE-7262:
-

Commit 4fa2b29b200b2a92157396af3f485d38a4954e7a in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4fa2b29 ]

LUCENE-7262: Leverage index statistics to make DocIdSetBuilder more efficient.


> Add back the "estimate match count" optimization
> 
>
> Key: LUCENE-7262
> URL: https://issues.apache.org/jira/browse/LUCENE-7262
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7262.patch, LUCENE-7262.patch, LUCENE-7262.patch
>
>
> Follow-up to my last message on LUCENE-7051: I removed this optimization a 
> while ago because it made things a bit more complicated but did not seem to 
> help with point queries. However the reason why it did not seem to help was 
> that the benchmark only runs queries that match 25% of the dataset. This 
> makes the run time completely dominated by calls to FixedBitSet.set so the 
> call to FixedBitSet.cardinality() looks free. However with slightly sparser 
> queries like the geo benchmark generates (dense enough to trigger the 
> creation of a FixedBitSet but sparse enough so that FixedBitSet.set does not 
> dominate the run time), one can notice speed-ups when this call is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7262) Add back the "estimate match count" optimization

2016-04-29 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264162#comment-15264162
 ] 

David Smiley commented on LUCENE-7262:
--

+1 and nice testing.  I think you can use the new constructor accepting Terms 
for stats in more places (judging from a find-usages on DocIdSetBuilder).

> Add back the "estimate match count" optimization
> 
>
> Key: LUCENE-7262
> URL: https://issues.apache.org/jira/browse/LUCENE-7262
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7262.patch, LUCENE-7262.patch
>
>
> Follow-up to my last message on LUCENE-7051: I removed this optimization a 
> while ago because it made things a bit more complicated but did not seem to 
> help with point queries. However the reason why it did not seem to help was 
> that the benchmark only runs queries that match 25% of the dataset. This 
> makes the run time completely dominated by calls to FixedBitSet.set so the 
> call to FixedBitSet.cardinality() looks free. However with slightly sparser 
> queries like the geo benchmark generates (dense enough to trigger the 
> creation of a FixedBitSet but sparse enough so that FixedBitSet.set does not 
> dominate the run time), one can notice speed-ups when this call is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7262) Add back the "estimate match count" optimization

2016-04-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264150#comment-15264150
 ] 

Robert Muir commented on LUCENE-7262:
-

nice to remove that dedup a lot of the time too.

> Add back the "estimate match count" optimization
> 
>
> Key: LUCENE-7262
> URL: https://issues.apache.org/jira/browse/LUCENE-7262
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7262.patch, LUCENE-7262.patch
>
>
> Follow-up to my last message on LUCENE-7051: I removed this optimization a 
> while ago because it made things a bit more complicated but did not seem to 
> help with point queries. However the reason why it did not seem to help was 
> that the benchmark only runs queries that match 25% of the dataset. This 
> makes the run time completely dominated by calls to FixedBitSet.set so the 
> call to FixedBitSet.cardinality() looks free. However with slightly sparser 
> queries like the geo benchmark generates (dense enough to trigger the 
> creation of a FixedBitSet but sparse enough so that FixedBitSet.set does not 
> dominate the run time), one can notice speed-ups when this call is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7262) Add back the "estimate match count" optimization

2016-04-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263029#comment-15263029
 ] 

Robert Muir commented on LUCENE-7262:
-

I think the problem at LUCENE-7051 time was, that points didnt have any 
statistics. Now they do, so i think our job is easier.

> Add back the "estimate match count" optimization
> 
>
> Key: LUCENE-7262
> URL: https://issues.apache.org/jira/browse/LUCENE-7262
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7262.patch
>
>
> Follow-up to my last message on LUCENE-7051: I removed this optimization a 
> while ago because it made things a bit more complicated but did not seem to 
> help with point queries. However the reason why it did not seem to help was 
> that the benchmark only runs queries that match 25% of the dataset. This 
> makes the run time completely dominated by calls to FixedBitSet.set so the 
> call to FixedBitSet.cardinality() looks free. However with slightly sparser 
> queries like the geo benchmark generates (dense enough to trigger the 
> creation of a FixedBitSet but sparse enough so that FixedBitSet.set does not 
> dominate the run time), one can notice speed-ups when this call is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7262) Add back the "estimate match count" optimization

2016-04-28 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262848#comment-15262848
 ] 

Adrien Grand commented on LUCENE-7262:
--

Good ideas. I added it as close as it was before LUCENE-7051 but I will give 
these ideas a try.

> Add back the "estimate match count" optimization
> 
>
> Key: LUCENE-7262
> URL: https://issues.apache.org/jira/browse/LUCENE-7262
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7262.patch
>
>
> Follow-up to my last message on LUCENE-7051: I removed this optimization a 
> while ago because it made things a bit more complicated but did not seem to 
> help with point queries. However the reason why it did not seem to help was 
> that the benchmark only runs queries that match 25% of the dataset. This 
> makes the run time completely dominated by calls to FixedBitSet.set so the 
> call to FixedBitSet.cardinality() looks free. However with slightly sparser 
> queries like the geo benchmark generates (dense enough to trigger the 
> creation of a FixedBitSet but sparse enough so that FixedBitSet.set does not 
> dominate the run time), one can notice speed-ups when this call is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7262) Add back the "estimate match count" optimization

2016-04-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262817#comment-15262817
 ] 

Robert Muir commented on LUCENE-7262:
-

So this means postings still calls cardinality()? Why wouldn't it do the same? 
I'm a bit concerned with each query tracking its own estimate (and having the 
formula/stats pulling etc duplicated everywhere). 

This is why when looking at MatchingPoints, it pulls the stats it needs. but 
alternatively DocIDSetBuilder could take parameters of sumDocFreq, maxDoc, 
docCount and do this itself. Points would pass size() for sumDocFreq, its the 
equivalent there.

In other words, i see providing a good cost() as the responsibility of 
DocIDSetBuilder. The only thing impl-specific is how to get sumDocFreq and 
docCount (e.g. Terms.sumDocFreq/docCount vs PointValues.size/docCount).

> Add back the "estimate match count" optimization
> 
>
> Key: LUCENE-7262
> URL: https://issues.apache.org/jira/browse/LUCENE-7262
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7262.patch
>
>
> Follow-up to my last message on LUCENE-7051: I removed this optimization a 
> while ago because it made things a bit more complicated but did not seem to 
> help with point queries. However the reason why it did not seem to help was 
> that the benchmark only runs queries that match 25% of the dataset. This 
> makes the run time completely dominated by calls to FixedBitSet.set so the 
> call to FixedBitSet.cardinality() looks free. However with slightly sparser 
> queries like the geo benchmark generates (dense enough to trigger the 
> creation of a FixedBitSet but sparse enough so that FixedBitSet.set does not 
> dominate the run time), one can notice speed-ups when this call is skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org