[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2019-12-05 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988801#comment-16988801
 ] 

Jason Gerlowski edited comment on SOLR-13890 at 12/5/19 1:21 PM:
-

bq. I haven't investigated how feasible it is but I wonder if Solr even needs 
PostFilter given TwoPhaseIterator exists. For another day.

It's a good question.  I'd imagine that two things are necessary if we wanted 
to replace Solr's postfilter:
# We'd need to make sure that TPI implementations provide the same performance 
gains as postfilter ones.  I wouldn't have doubted this previously.  But 
knowing that DVTQ already has TPI, and recalling the gains we saw with our 
postfilter in (unpublished) perf tests is enough to plant a seed of doubt for 
me.
# We'd need to see whether users are fine ceding control of when this special 
execution mode is triggered.  TPI being heuristic-triggered seems double-edged 
to me if a user finds themselves fighting those heuristics on a query they know 
would benefit from TPI.  Though maybe there's an override flag I'm just not 
aware of that forces TPI to be used/ignored.

That said, "for another day" for sure.


was (Author: gerlowskija):
bq. I haven't investigated how feasible it is but I wonder if Solr even needs 
PostFilter given TwoPhaseIterator exists. For another day.

It's a good question.  I'd imagine that two things are necessary if we wanted 
to replace Solr's postfilter:
# We'd need to make sure that TPI implementations provide the same performance 
gains as postfilter ones.  I wouldn't have considered this previously.  But 
knowing that DVTQ already has TPI, and recalling the gains we saw with our 
postfilter in (unpublished) perf tests is enough to plant a seed of doubt for 
me.
# We'd need to see whether users are fine ceding control of when this special 
execution mode is triggered.  TPI being heuristic-triggered seems double-edged 
to me if a user finds themselves fighting those heuristics on a query they know 
would benefit from TPI.  Though maybe there's an override flag I'm just not 
aware of that forces TPI to be used/ignored.

That said, "for another day" for sure.

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2019-12-23 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002416#comment-17002416
 ] 

Joel Bernstein edited comment on SOLR-13890 at 12/23/19 6:09 PM:
-

I dug into this pretty deeply and I believe there is large advantage to the top 
level doc values approach when there is a large number of terms. The reason is 
that *MultiSortedSetDocValues.lookupOrd*  (in MultiDocValues) is really clever, 
so the overhead of doing the top level term lookup is much less than doing the 
segment by segment term lookups. Using the top level ordinals inside of the 
scorer would be possible also but seemed kind of awkward. But, in theory using 
top level ordinals in the scorer would get similar similar performance as this 
patch.


was (Author: joel.bernstein):
I dug into this pretty deeply and I believe there is large advantage to top 
level doc values approach when there is a large number of terms. The reason is 
that *MultiSortedSetDocValues.lookupOrd*  (in MultiDocValues) is really clever, 
so the overhead of doing the top level term lookup is much less than doing the 
segment by segment term lookups. Using the top level ordinals inside of the 
scorer would be possible also but seemed kind of awkward. But, in theory using 
top level ordinals in the scorer would get as similar performance as this patch.

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2019-12-23 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002416#comment-17002416
 ] 

Joel Bernstein edited comment on SOLR-13890 at 12/23/19 6:15 PM:
-

I dug into this pretty deeply and I believe there is a large advantage to the 
top level doc values approach when there is a large number of terms. The reason 
is that *MultiSortedSetDocValues.lookupOrd*  (in MultiDocValues) is really 
clever, so the overhead of doing the top level term lookup is much less than 
doing the segment by segment term lookups. Using the top level ordinals inside 
of the scorer would be possible also but seemed kind of awkward. But, in theory 
using top level ordinals in the scorer would get similar similar performance as 
this patch.


was (Author: joel.bernstein):
I dug into this pretty deeply and I believe there is large advantage to the top 
level doc values approach when there is a large number of terms. The reason is 
that *MultiSortedSetDocValues.lookupOrd*  (in MultiDocValues) is really clever, 
so the overhead of doing the top level term lookup is much less than doing the 
segment by segment term lookups. Using the top level ordinals inside of the 
scorer would be possible also but seemed kind of awkward. But, in theory using 
top level ordinals in the scorer would get similar similar performance as this 
patch.

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2019-12-23 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002416#comment-17002416
 ] 

Joel Bernstein edited comment on SOLR-13890 at 12/23/19 6:18 PM:
-

I dug into this pretty deeply and I believe there is a large advantage to the 
top level doc values approach when there is a large number of terms. The reason 
is that *MultiSortedSetDocValues.lookupOrd*  (in MultiDocValues) is really 
clever, so the overhead of doing the top level term lookup is much less than 
doing the segment by segment term lookups. Using the top level ordinals inside 
of the scorer would be possible also but seemed kind of awkward. But, in theory 
using top level ordinals in the scorer would get similar performance as this 
patch.


was (Author: joel.bernstein):
I dug into this pretty deeply and I believe there is a large advantage to the 
top level doc values approach when there is a large number of terms. The reason 
is that *MultiSortedSetDocValues.lookupOrd*  (in MultiDocValues) is really 
clever, so the overhead of doing the top level term lookup is much less than 
doing the segment by segment term lookups. Using the top level ordinals inside 
of the scorer would be possible also but seemed kind of awkward. But, in theory 
using top level ordinals in the scorer would get similar similar performance as 
this patch.

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2019-12-23 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002424#comment-17002424
 ] 

Joel Bernstein edited comment on SOLR-13890 at 12/23/19 6:27 PM:
-

The other really big aspect of this is caching.

Even though the scorer based filter can be fast if it's applied with the main 
query, in Solr that's not going to happen.

The reason is the filter cache. Which will apply the filter against the entire 
index and create a DocSet to cache. 

Our filter cache is top level so it gets dumped after a single document is 
loaded. So scenarios where there is lot's of indexing going the filter cache 
becomes problematic. 

There are ways around this issue, like turning off caching using local params, 
or not using filter queries. But these approaches are not what users typically 
do with a filter.

So, the postfilters behavior provides the best solution for certain situations 
where the filter cache is problematic.

 

 


was (Author: joel.bernstein):
The other really big aspect of this is caching.

Even though scorer based filter can be fast if it's applied with the main 
query, in Solr that's not going to happen.

The reason is the filter cache. Which will apply the filter against the entire 
index and create a DocSet to cache. 

Our filter cache is top level so it gets dumped after a single document is 
loaded. So scenarios where there is lot's of indexing going the filter cache 
becomes problematic. 

There are ways around this issue, like turning off caching using local params, 
or not using filter queries. But these approaches are not what users typically 
do with a filter.

So, the postfilters behavior provides the best solution for certain situations 
where the filter cache is problematic.

 

 

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2019-12-23 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002424#comment-17002424
 ] 

Joel Bernstein edited comment on SOLR-13890 at 12/23/19 6:28 PM:
-

The other really big aspect of this is caching.

Even though the scorer based filter can be fast if it's applied with the main 
query, in Solr that's not going to happen.

The reason is the filter cache. Which will apply the filter against the entire 
index and create a DocSet to cache. 

Our filter cache is top level so it gets dumped after a single document is 
loaded. So in scenarios where there is lot's of indexing going on the filter 
cache becomes problematic. 

There are ways around this issue, like turning off caching using local params, 
or not using filter queries. But these approaches are not what users typically 
do with a filter.

So, the postfilters behavior provides the best solution for certain situations 
where the filter cache is problematic.

 

 


was (Author: joel.bernstein):
The other really big aspect of this is caching.

Even though the scorer based filter can be fast if it's applied with the main 
query, in Solr that's not going to happen.

The reason is the filter cache. Which will apply the filter against the entire 
index and create a DocSet to cache. 

Our filter cache is top level so it gets dumped after a single document is 
loaded. So scenarios where there is lot's of indexing going the filter cache 
becomes problematic. 

There are ways around this issue, like turning off caching using local params, 
or not using filter queries. But these approaches are not what users typically 
do with a filter.

So, the postfilters behavior provides the best solution for certain situations 
where the filter cache is problematic.

 

 

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2019-12-23 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002424#comment-17002424
 ] 

Joel Bernstein edited comment on SOLR-13890 at 12/23/19 6:30 PM:
-

The other really big aspect of this is caching.

Even though the scorer based filter can be fast if it's applied with the main 
query, in Solr that's not going to happen.

The reason is the filter cache. Which will apply the filter against the entire 
index and create a DocSet to cache. 

Our filter cache is top level so it gets dumped after a single document is 
loaded. So in scenarios where there is lot's of indexing going on the filter 
cache becomes problematic. 

There are ways around this issue, like turning off caching using local params, 
or not using filter queries. But these approaches are not what users typically 
do with a filter.

So, the postfilters behavior (not cached in filter cache) provides the best 
solution for certain situations where the filter cache is problematic.

 

 


was (Author: joel.bernstein):
The other really big aspect of this is caching.

Even though the scorer based filter can be fast if it's applied with the main 
query, in Solr that's not going to happen.

The reason is the filter cache. Which will apply the filter against the entire 
index and create a DocSet to cache. 

Our filter cache is top level so it gets dumped after a single document is 
loaded. So in scenarios where there is lot's of indexing going on the filter 
cache becomes problematic. 

There are ways around this issue, like turning off caching using local params, 
or not using filter queries. But these approaches are not what users typically 
do with a filter.

So, the postfilters behavior provides the best solution for certain situations 
where the filter cache is problematic.

 

 

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2019-12-23 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002424#comment-17002424
 ] 

Joel Bernstein edited comment on SOLR-13890 at 12/23/19 6:52 PM:
-

The other really big aspect of this is caching.

Even though the scorer based filter can be fast if it's applied with the main 
query, in Solr that's not going to happen.

The reason is the filter cache. Solr will apply the filter against the entire 
index and create a DocSet to cache. 

Our filter cache is top level so it gets dumped after a single document is 
loaded. So in scenarios where there is lot's of indexing going on the filter 
cache becomes problematic. 

There are ways around this issue, like turning off caching using local params, 
or not using filter queries. But these approaches are not what users typically 
do with a filter.

So, the postfilters behavior (not cached in filter cache) provides the best 
solution for certain situations where the filter cache is problematic.

 

 


was (Author: joel.bernstein):
The other really big aspect of this is caching.

Even though the scorer based filter can be fast if it's applied with the main 
query, in Solr that's not going to happen.

The reason is the filter cache. Which will apply the filter against the entire 
index and create a DocSet to cache. 

Our filter cache is top level so it gets dumped after a single document is 
loaded. So in scenarios where there is lot's of indexing going on the filter 
cache becomes problematic. 

There are ways around this issue, like turning off caching using local params, 
or not using filter queries. But these approaches are not what users typically 
do with a filter.

So, the postfilters behavior (not cached in filter cache) provides the best 
solution for certain situations where the filter cache is problematic.

 

 

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2019-12-23 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002424#comment-17002424
 ] 

Joel Bernstein edited comment on SOLR-13890 at 12/23/19 8:23 PM:
-

The other really big aspect of this is caching.

Even though the scorer based filter can be fast if it's applied with the main 
query, in Solr that's not going to happen.

The reason is the filter cache. Solr will apply the filter against the *entire 
index* and create a DocSet to cache. This will be slow compared to the 
postfilter if the number of search results is small relative to the size of the 
index. Which might be acceptable if the filter cache provided a big advantage 
on subsequent requests. But ... 

Solr's  filter cache is top level so it gets dumped after a single document is 
loaded. So in scenarios where there is lot's of indexing going on the filter 
cache becomes problematic. 

There are ways around this issue, like turning off caching using local params, 
or not using filter queries. But these approaches are not what users typically 
do with a filter.

So, the postfilters behavior (not cached in filter cache) provides the best 
solution for certain situations where the filter cache is problematic.

 

 


was (Author: joel.bernstein):
The other really big aspect of this is caching.

Even though the scorer based filter can be fast if it's applied with the main 
query, in Solr that's not going to happen.

The reason is the filter cache. Solr will apply the filter against the entire 
index and create a DocSet to cache. 

Our filter cache is top level so it gets dumped after a single document is 
loaded. So in scenarios where there is lot's of indexing going on the filter 
cache becomes problematic. 

There are ways around this issue, like turning off caching using local params, 
or not using filter queries. But these approaches are not what users typically 
do with a filter.

So, the postfilters behavior (not cached in filter cache) provides the best 
solution for certain situations where the filter cache is problematic.

 

 

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2019-12-23 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002424#comment-17002424
 ] 

Joel Bernstein edited comment on SOLR-13890 at 12/23/19 8:24 PM:
-

The other really big aspect of this is caching.

Even though the scorer based filter can be fast if it's applied with the main 
query, in Solr that's not going to happen.

The reason is the filter cache. Solr will apply the filter against the *entire 
index* and create a DocSet to cache. This will be slow compared to the 
postfilter if the number of search results is small relative to the size of the 
index. Which might be acceptable if the filter cache provided a big advantage 
on subsequent requests. But ... 

Solr's filter cache is top level so it gets dumped after a single document is 
loaded. So in scenarios where there is lot's of indexing going on the filter 
cache becomes problematic. 

There are ways around this issue, like turning off caching using local params, 
or not using filter queries. But these approaches are not what users typically 
do with a filter.

So, the postfilters behavior (not cached in filter cache) provides the best 
solution for certain situations where the filter cache is problematic.

 

 


was (Author: joel.bernstein):
The other really big aspect of this is caching.

Even though the scorer based filter can be fast if it's applied with the main 
query, in Solr that's not going to happen.

The reason is the filter cache. Solr will apply the filter against the *entire 
index* and create a DocSet to cache. This will be slow compared to the 
postfilter if the number of search results is small relative to the size of the 
index. Which might be acceptable if the filter cache provided a big advantage 
on subsequent requests. But ... 

Solr's  filter cache is top level so it gets dumped after a single document is 
loaded. So in scenarios where there is lot's of indexing going on the filter 
cache becomes problematic. 

There are ways around this issue, like turning off caching using local params, 
or not using filter queries. But these approaches are not what users typically 
do with a filter.

So, the postfilters behavior (not cached in filter cache) provides the best 
solution for certain situations where the filter cache is problematic.

 

 

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-03 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007636#comment-17007636
 ] 

Joel Bernstein edited comment on SOLR-13890 at 1/3/20 5:28 PM:
---

The problem that I have with the scorer based filter is only about the use of 
the filter cache. Traditionally with filters the filter cache is used and with 
postfilters the filter cache is not used. This actually a nice split, it's 
clean, easy to implement and easy to document. 

For these types of large filters, most of the time, the filter cache is not fit 
for purpose. But its there and it traps people, I've seen it over and over 
again.

The issue is that as the indexes get larger the top level filter cache becomes 
untenable because the filter is built against the entire index. Then when one 
document is indexed the entire cache is dropped and needs to be warmed. This is 
a huge problem for use cases like access control where there are large filters 
that need to get rebuilt for thousands of users. This drains memory and cpu and 
cause GC issues and performance problems.

So, I have traditionally turned to high performant postfilters, that are fast 
enough that caching is not needed for these types of problems. 

Eliminating the postfilter means, in my opinion overloading the behavior of 
traditional filters to have to deal intelligently with caches, or asking the 
user to dive into the details of caching.

The other option is to add the segment level filter cache that is actually fit 
for purpose and then standardize on the filter based approach. Until we do 
this, though the postfilters provide a simple approach for getting the behavior 
that is needed for these types of large filters.

 


was (Author: joel.bernstein):
The problem that I have with the scorer based filter is only about the use of 
the filter cache. Traditionally with filters the filter cache is used and with 
postfilters the filter cache is not used. This actually a nice split, it's 
clean, easy to implement and easy to document. 

For these types of large filters, most of the time, the filter cache is not fit 
for purpose. But its there and it traps people, I've seen it over and over 
again.

The issue is that as the indexes get larger the top level filter cache becomes 
untenable because the filter is built against the entire index. Then when one 
document is indexed the entire cache is dropped and needs to be warmed. This is 
a huge problem for use cases like access control where there are large filters 
that need to get rebuilt for thousands of users. This drains memory and cpu and 
cause GC issues and performance problems.

So, I have traditionally turned to high performant postfilters, that are fast 
enough that caching is not needed for these types of problems. 

Eliminating the postfilter means, in my opinion overloading the behavior of 
traditional filters to have to deal intelligently with caches, or asking the 
user to dive into the details of caching.

The other option is add the segment level filter cache that is actually fit for 
purpose and then standardize on the filter based approach. Until we do this, 
though the postfilters provide a simple approach for getting the behavior that 
is needed for these types of large filters.

 

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, 
> post_optimize_performance.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@l

[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-03 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007636#comment-17007636
 ] 

Joel Bernstein edited comment on SOLR-13890 at 1/3/20 5:29 PM:
---

The problem that I have with the scorer based filter is only about the use of 
the filter cache. Traditionally with filters the filter cache is used and with 
postfilters the filter cache is not used. This actually a nice split, it's 
clean, easy to implement and easy to document. 

For these types of large filters, most of the time, the filter cache is not fit 
for purpose. But its there and it traps people, I've seen it over and over 
again.

The issue is that as the indexes get larger the top level filter cache becomes 
untenable because the filter is built against the entire index. Then when one 
document is indexed the entire cache is dropped and needs to be warmed. This is 
a huge problem for use cases like access control where there are large filters 
that need to get rebuilt for thousands of users. This drains memory and cpu and 
causes GC issues and performance problems.

So, I have traditionally turned to high performant postfilters, that are fast 
enough that caching is not needed for these types of problems. 

Eliminating the postfilter means, in my opinion overloading the behavior of 
traditional filters to have to deal intelligently with caches, or asking the 
user to dive into the details of caching.

The other option is to add the segment level filter cache that is actually fit 
for purpose and then standardize on the filter based approach. Until we do 
this, though the postfilters provide a simple approach for getting the behavior 
that is needed for these types of large filters.

 


was (Author: joel.bernstein):
The problem that I have with the scorer based filter is only about the use of 
the filter cache. Traditionally with filters the filter cache is used and with 
postfilters the filter cache is not used. This actually a nice split, it's 
clean, easy to implement and easy to document. 

For these types of large filters, most of the time, the filter cache is not fit 
for purpose. But its there and it traps people, I've seen it over and over 
again.

The issue is that as the indexes get larger the top level filter cache becomes 
untenable because the filter is built against the entire index. Then when one 
document is indexed the entire cache is dropped and needs to be warmed. This is 
a huge problem for use cases like access control where there are large filters 
that need to get rebuilt for thousands of users. This drains memory and cpu and 
cause GC issues and performance problems.

So, I have traditionally turned to high performant postfilters, that are fast 
enough that caching is not needed for these types of problems. 

Eliminating the postfilter means, in my opinion overloading the behavior of 
traditional filters to have to deal intelligently with caches, or asking the 
user to dive into the details of caching.

The other option is to add the segment level filter cache that is actually fit 
for purpose and then standardize on the filter based approach. Until we do 
this, though the postfilters provide a simple approach for getting the behavior 
that is needed for these types of large filters.

 

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, 
> post_optimize_performance.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h.

[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-03 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007636#comment-17007636
 ] 

Joel Bernstein edited comment on SOLR-13890 at 1/3/20 5:30 PM:
---

The problem that I have with the scorer based filter is only about the use of 
the filter cache. Traditionally with filters the filter cache is used and with 
postfilters the filter cache is not used. This is actually a nice split, it's 
clean, easy to implement and easy to document. 

For these types of large filters, most of the time, the filter cache is not fit 
for purpose. But its there and it traps people, I've seen it over and over 
again.

The issue is that as the indexes get larger the top level filter cache becomes 
untenable because the filter is built against the entire index. Then when one 
document is indexed the entire cache is dropped and needs to be warmed. This is 
a huge problem for use cases like access control where there are large filters 
that need to get rebuilt for thousands of users. This drains memory and cpu and 
causes GC issues and performance problems.

So, I have traditionally turned to high performant postfilters, that are fast 
enough that caching is not needed for these types of problems. 

Eliminating the postfilter means, in my opinion overloading the behavior of 
traditional filters to have to deal intelligently with caches, or asking the 
user to dive into the details of caching.

The other option is to add the segment level filter cache that is actually fit 
for purpose and then standardize on the filter based approach. Until we do 
this, though the postfilters provide a simple approach for getting the behavior 
that is needed for these types of large filters.

 


was (Author: joel.bernstein):
The problem that I have with the scorer based filter is only about the use of 
the filter cache. Traditionally with filters the filter cache is used and with 
postfilters the filter cache is not used. This actually a nice split, it's 
clean, easy to implement and easy to document. 

For these types of large filters, most of the time, the filter cache is not fit 
for purpose. But its there and it traps people, I've seen it over and over 
again.

The issue is that as the indexes get larger the top level filter cache becomes 
untenable because the filter is built against the entire index. Then when one 
document is indexed the entire cache is dropped and needs to be warmed. This is 
a huge problem for use cases like access control where there are large filters 
that need to get rebuilt for thousands of users. This drains memory and cpu and 
causes GC issues and performance problems.

So, I have traditionally turned to high performant postfilters, that are fast 
enough that caching is not needed for these types of problems. 

Eliminating the postfilter means, in my opinion overloading the behavior of 
traditional filters to have to deal intelligently with caches, or asking the 
user to dive into the details of caching.

The other option is to add the segment level filter cache that is actually fit 
for purpose and then standardize on the filter based approach. Until we do 
this, though the postfilters provide a simple approach for getting the behavior 
that is needed for these types of large filters.

 

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, 
> post_optimize_performance.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issue

[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-03 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007737#comment-17007737
 ] 

Joel Bernstein edited comment on SOLR-13890 at 1/3/20 8:11 PM:
---

If we auto-default cache=false then I'm fine with moving forward with the top 
level docvalues / TPI approach. Let's just give the users a terms query thats 
fast and works well with a *large number of terms*, *larger indexes, large 
number of filters* in a *frequently indexing environment.* These goals cannot 
be achieved with the current filter cache. So let's turn it off by default for 
this implementation.

Then let's get a segment level filter cache in place.


was (Author: joel.bernstein):
If we auto-default cache=false then I'm fine with moving the top level 
docvalues / TPI approach. Let's just give the users a terms query thats fast 
and works well with a *large number of terms*, *larger indexes, large number of 
filters* in a *frequently indexing environment.* These goals cannot be achieved 
with the current filter cache. So let's turn it off by default for this 
implementation.

Then let's get a segment level filter cache in place.

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, 
> post_optimize_performance.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-03 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973952#comment-16973952
 ] 

Mikhail Khludnev edited comment on SOLR-13890 at 1/3/20 8:57 PM:
-

FYI, Automaton uses inverted index, it works per segment. If number of terms is 
small it builds per segment disjunction, thus if it bypasses filtercache it 
will be even lazy, otherwise it eagerly builds per-segment docset.  


was (Author: mkhludnev):
FYI, Automaton uses inverted index, it works per segment. If number of terms is 
small it builds per segment conjunction, thus if it bypasses filtercache it 
will be even lazy, otherwise it eagerly builds per-segment docset.  

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, 
> post_optimize_performance.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-06 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009170#comment-17009170
 ] 

Jason Gerlowski edited comment on SOLR-13890 at 1/6/20 9:04 PM:


Latest patch ties up some of the loose ends I mentioned in my last comment.  
Pending review from you guys, I'm pretty happy pulling the trigger on what 
we've got right now.  We get the improved performance I was after without 
introducing another postfilter.  Things will be even better with SOLR-14166, 
but that doesn't need to block this effort.

Pending more feedback I'll aim to merge this on Wednesday.


was (Author: gerlowskija):
Latest patch ties up some of the loose ends I mentioned in my last comment.  
Pending review from you guys, I'm pretty happy pulling the trigger on what 
we've got right now.  We get the improved performance I was after without 
introducing another postfilter.

Pending more feedback I'll aim to merge this on Wednesday.

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, 
> post_optimize_performance.png, toplevel-tpi-perf-comparison.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-06 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009170#comment-17009170
 ] 

Jason Gerlowski edited comment on SOLR-13890 at 1/6/20 9:04 PM:


Latest patch ties up some of the loose ends I mentioned in my last comment.  
Pending review from you guys, I'm pretty happy pulling the trigger on what 
we've got right now.  We get the improved performance I was after without 
introducing another postfilter.

Pending more feedback I'll aim to merge this on Wednesday.


was (Author: gerlowskija):
Latest patch ties up some of the loose ends I mentioned in my last comment.  
Pending review from you guys, I'm pretty happy pulling the trigger on what 
we've got right now.  We get the good performance I was after without 
introducing another postfilter.

Pending more feedback I'll aim to merge this on Wednesday.

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, SOLR-13890.patch, Screen Shot 2020-01-02 at 2.25.12 PM.png, 
> post_optimize_performance.png, toplevel-tpi-perf-comparison.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-06 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009194#comment-17009194
 ] 

Mikhail Khludnev edited comment on SOLR-13890 at 1/6/20 10:08 PM:
--

regarding {{PerSegmentViewDocIdSetIterator}}: I don't follow. Lucene's 
{{DocIdSetIterator}} is strictly per-segment, using it for top-level iteration 
is something that never happen. fwiw, usually toplevel Solr docsets converted 
to Lucene's DocIdSets via {{DocSet.getTopFilter()}}. 

adding argument to method {{QueryMethod.makeFilter(String fname, BytesRef[] 
bytesRefs, SolrParams localParams)}} is not something which is backward 
compatible, and might frustrate other devs.  

Note: {{TopLevelDocValuesTermsQuery}} uses {{OrdinalMap}} via 
{{getSlowAtomicReader()}}. It might be clearer to iterate persegment, and then 
access global ordinals via MultiSortedDocValues.mapping.getGlobalOrds()

Also, this query relies on SolrIndexSearcher, but iirc even in Solr queries 
sometimes invoked with Lucene's Searcher. There's some issues with such cast.  


was (Author: mkhludnev):
regarding {{PerSegmentViewDocIdSetIterator}}: I don't follow. Lucene's 
{{DocIdSetIterator}} is strictly per-segment, using it for top-level iteration 
is something that never happen. fwiw, usually toplevel Solr docsets converted 
to Lucene's DocIdSets via {{DocSet.getTopFilter()}}. 

adding argument to method {{QueryMethod.makeFilter(String fname, BytesRef[] 
bytesRefs, SolrParams localParams)}} is not something which is backward 
compatible, and might frustrate other devs.  

Note: {{TopLevelDocValuesTermsQuery}} uses {{OrdinalMap}} via 
{{getSlowAtomicReader()}}. It might be clearer to iterate persegment, and then 
access global ordinals via MultiSortedDocValues.mapping.getGlobalOrds()

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, Screen Shot 2020-01-02 
> at 2.25.12 PM.png, post_optimize_performance.png, 
> toplevel-tpi-perf-comparison.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries

2020-01-06 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009194#comment-17009194
 ] 

Mikhail Khludnev edited comment on SOLR-13890 at 1/6/20 10:17 PM:
--

regarding {{PerSegmentViewDocIdSetIterator}}: I don't follow. Lucene's 
{{DocIdSetIterator}} is strictly per-segment, using it for top-level iteration 
is something that never happen. fwiw, usually toplevel Solr docsets converted 
to Lucene's DocIdSets via {{DocSet.getTopFilter()}}. 

adding argument to method {{QueryMethod.makeFilter(String fname, BytesRef[] 
bytesRefs, SolrParams localParams)}} is not something which is backward 
compatible, and might frustrate other devs.  

Note: {{TopLevelDocValuesTermsQuery}} uses {{OrdinalMap}} via 
{{getSlowAtomicReader()}}. It might be clearer to iterate persegment, and then 
access global ordinals via MultiSortedDocValues.mapping.getGlobalOrds()

Also, this query relies on SolrIndexSearcher, but iirc even in Solr queries 
sometimes invoked with Lucene's Searcher. There's some issues with such cast 
SOLR-6357.  


was (Author: mkhludnev):
regarding {{PerSegmentViewDocIdSetIterator}}: I don't follow. Lucene's 
{{DocIdSetIterator}} is strictly per-segment, using it for top-level iteration 
is something that never happen. fwiw, usually toplevel Solr docsets converted 
to Lucene's DocIdSets via {{DocSet.getTopFilter()}}. 

adding argument to method {{QueryMethod.makeFilter(String fname, BytesRef[] 
bytesRefs, SolrParams localParams)}} is not something which is backward 
compatible, and might frustrate other devs.  

Note: {{TopLevelDocValuesTermsQuery}} uses {{OrdinalMap}} via 
{{getSlowAtomicReader()}}. It might be clearer to iterate persegment, and then 
access global ordinals via MultiSortedDocValues.mapping.getGlobalOrds()

Also, this query relies on SolrIndexSearcher, but iirc even in Solr queries 
sometimes invoked with Lucene's Searcher. There's some issues with such cast.  

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, 
> SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch, Screen Shot 2020-01-02 
> at 2.25.12 PM.png, post_optimize_performance.png, 
> toplevel-tpi-perf-comparison.png
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org