[jira] [Updated] (LUCENE-7365) Don't use BooleanScorer for small segments
[ https://issues.apache.org/jira/browse/LUCENE-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-7365: -- Attachment: LUCENE-7365.patch Here's a patch with Adrien's idea, actually including the LinearScoringIndexSearcher class this time. > Don't use BooleanScorer for small segments > -- > > Key: LUCENE-7365 > URL: https://issues.apache.org/jira/browse/LUCENE-7365 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward > Attachments: LUCENE-7365-query.patch, LUCENE-7365.patch, > LUCENE-7365.patch, LUCENE-7365.patch > > > If a BooleanQuery meets certain criteria (only contains disjunctions, is > likely to match large numbers of docs) then we use a BooleanScorer to score > groups of 1024 docs at a time. This allocates arrays of 1024 Bucket objects > up-front. On very small segments (for example, a MemoryIndex) this is very > wasteful of memory, particularly if the query is large or deeply-nested. We > should avoid using a bulk scorer on these segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7365) Don't use BooleanScorer for small segments
[ https://issues.apache.org/jira/browse/LUCENE-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-7365: -- Attachment: LUCENE-7365-query.patch This is an alternative idea, a ForceNoBulkScoringQuery implementation that wraps an existing query and ensures use of the DefaultBulkScorer. > Don't use BooleanScorer for small segments > -- > > Key: LUCENE-7365 > URL: https://issues.apache.org/jira/browse/LUCENE-7365 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward > Attachments: LUCENE-7365-query.patch, LUCENE-7365.patch, > LUCENE-7365.patch > > > If a BooleanQuery meets certain criteria (only contains disjunctions, is > likely to match large numbers of docs) then we use a BooleanScorer to score > groups of 1024 docs at a time. This allocates arrays of 1024 Bucket objects > up-front. On very small segments (for example, a MemoryIndex) this is very > wasteful of memory, particularly if the query is large or deeply-nested. We > should avoid using a bulk scorer on these segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7365) Don't use BooleanScorer for small segments
[ https://issues.apache.org/jira/browse/LUCENE-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-7365: -- Attachment: LUCENE-7365.patch I like the idea of a specialised IndexSearcher, that's a lot less invasive. Here's a patch. LinearScoringIndexSearcher is a separate, public class, because I can see situations other than MemoryIndex where you might want to disable bulk scoring (for example, luwak also allows you to match against small batches of documents, and the same caveats apply to these as to MI). In this patch it's in the memory/ module, but that does force DefaultBulkScorer to become public, so maybe it would be better in core? > Don't use BooleanScorer for small segments > -- > > Key: LUCENE-7365 > URL: https://issues.apache.org/jira/browse/LUCENE-7365 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward > Attachments: LUCENE-7365.patch, LUCENE-7365.patch > > > If a BooleanQuery meets certain criteria (only contains disjunctions, is > likely to match large numbers of docs) then we use a BooleanScorer to score > groups of 1024 docs at a time. This allocates arrays of 1024 Bucket objects > up-front. On very small segments (for example, a MemoryIndex) this is very > wasteful of memory, particularly if the query is large or deeply-nested. We > should avoid using a bulk scorer on these segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7365) Don't use BooleanScorer for small segments
[ https://issues.apache.org/jira/browse/LUCENE-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-7365: -- Attachment: LUCENE-7365.patch Patch. This prevents use of BooleanScorer if the segment is smaller than 1024 docs. I'm not sure if that's the best cutoff though, and I'd like to do some benchmarking to check performance. > Don't use BooleanScorer for small segments > -- > > Key: LUCENE-7365 > URL: https://issues.apache.org/jira/browse/LUCENE-7365 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward > Attachments: LUCENE-7365.patch > > > If a BooleanQuery meets certain criteria (only contains disjunctions, is > likely to match large numbers of docs) then we use a BooleanScorer to score > groups of 1024 docs at a time. This allocates arrays of 1024 Bucket objects > up-front. On very small segments (for example, a MemoryIndex) this is very > wasteful of memory, particularly if the query is large or deeply-nested. We > should avoid using a bulk scorer on these segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org