[jira] [Updated] (LUCENE-7365) Don't use BooleanScorer for small segments

2016-07-01 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-7365:
--
Attachment: LUCENE-7365.patch

Here's a patch with Adrien's idea, actually including the 
LinearScoringIndexSearcher class this time.

> Don't use BooleanScorer for small segments
> --
>
> Key: LUCENE-7365
> URL: https://issues.apache.org/jira/browse/LUCENE-7365
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
> Attachments: LUCENE-7365-query.patch, LUCENE-7365.patch, 
> LUCENE-7365.patch, LUCENE-7365.patch
>
>
> If a BooleanQuery meets certain criteria (only contains disjunctions, is 
> likely to match large numbers of docs) then we use a BooleanScorer to score 
> groups of 1024 docs at a time.  This allocates arrays of 1024 Bucket objects 
> up-front.  On very small segments (for example, a MemoryIndex) this is very 
> wasteful of memory, particularly if the query is large or deeply-nested.  We 
> should avoid using a bulk scorer on these segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7365) Don't use BooleanScorer for small segments

2016-06-30 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-7365:
--
Attachment: LUCENE-7365-query.patch

This is an alternative idea, a ForceNoBulkScoringQuery implementation that 
wraps an existing query and ensures use of the DefaultBulkScorer.

> Don't use BooleanScorer for small segments
> --
>
> Key: LUCENE-7365
> URL: https://issues.apache.org/jira/browse/LUCENE-7365
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
> Attachments: LUCENE-7365-query.patch, LUCENE-7365.patch, 
> LUCENE-7365.patch
>
>
> If a BooleanQuery meets certain criteria (only contains disjunctions, is 
> likely to match large numbers of docs) then we use a BooleanScorer to score 
> groups of 1024 docs at a time.  This allocates arrays of 1024 Bucket objects 
> up-front.  On very small segments (for example, a MemoryIndex) this is very 
> wasteful of memory, particularly if the query is large or deeply-nested.  We 
> should avoid using a bulk scorer on these segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7365) Don't use BooleanScorer for small segments

2016-06-30 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-7365:
--
Attachment: LUCENE-7365.patch

I like the idea of a specialised IndexSearcher, that's a lot less invasive.  
Here's a patch.

LinearScoringIndexSearcher is a separate, public class, because I can see 
situations other than MemoryIndex where you might want to disable bulk scoring 
(for example, luwak also allows you to match against small batches of 
documents, and the same caveats apply to these as to MI).  In this patch it's 
in the memory/ module, but that does force DefaultBulkScorer to become public, 
so maybe it would be better in core?

> Don't use BooleanScorer for small segments
> --
>
> Key: LUCENE-7365
> URL: https://issues.apache.org/jira/browse/LUCENE-7365
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
> Attachments: LUCENE-7365.patch, LUCENE-7365.patch
>
>
> If a BooleanQuery meets certain criteria (only contains disjunctions, is 
> likely to match large numbers of docs) then we use a BooleanScorer to score 
> groups of 1024 docs at a time.  This allocates arrays of 1024 Bucket objects 
> up-front.  On very small segments (for example, a MemoryIndex) this is very 
> wasteful of memory, particularly if the query is large or deeply-nested.  We 
> should avoid using a bulk scorer on these segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7365) Don't use BooleanScorer for small segments

2016-06-29 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-7365:
--
Attachment: LUCENE-7365.patch

Patch.  This prevents use of BooleanScorer if the segment is smaller than 1024 
docs.  I'm not sure if that's the best cutoff though, and I'd like to do some 
benchmarking to check performance.

> Don't use BooleanScorer for small segments
> --
>
> Key: LUCENE-7365
> URL: https://issues.apache.org/jira/browse/LUCENE-7365
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
> Attachments: LUCENE-7365.patch
>
>
> If a BooleanQuery meets certain criteria (only contains disjunctions, is 
> likely to match large numbers of docs) then we use a BooleanScorer to score 
> groups of 1024 docs at a time.  This allocates arrays of 1024 Bucket objects 
> up-front.  On very small segments (for example, a MemoryIndex) this is very 
> wasteful of memory, particularly if the query is large or deeply-nested.  We 
> should avoid using a bulk scorer on these segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org