RE: Post-sort filtering

Steve Molloy Mon, 04 Feb 2013 06:36:03 -0800

BTW, I've logged SOLR-4397 for this and submitted a first patch (based on 4.1 
tag which is what we use). Need to at least add logic to respect timeAllowed, 
and would like a better way of handling missing results than going back and 
restarting by asking for more, but works for now so guess it's a start.

Steve Molloy                              [email protected]
Software Architect  |  Information Discovery & Analytics R&D               
OpenText                      

-----Original Message-----
From: Steve Molloy [mailto:[email protected]] 
Sent: January-24-13 1:16 PM
To: [email protected]
Subject: RE: Post-sort filtering

I was actually looking for an extension point to plug in, which I wasn't able 
to find looking at the code. And yes, I'm willing to have counts being off, the 
important thing being that results don't contain the wrong document. I'd like 
to avoid oversampling and requesting back because of the bandwidth and overall 
resource usage this implies. I'm currently trying out a "PostSortFilter" 
approach that I'll share if it seems interesting enough.

Steve Molloy
Software Architect  |  Information Discovery & Analytics R&D
Website:
www.opentext.com

This email message is confidential, may be privileged, and is intended for the 
exclusive use of the addressee. Any other person is strictly prohibited from 
disclosing or reproducing it. If the addressee cannot be reached or is unknown 
to you, please inform the sender by return email and delete this email message 
and all copies immediately.

-----Original Message-----
From: Erick Erickson [mailto:[email protected]]
Sent: January-24-13 1:11 PM
To: [email protected]
Subject: Re: Post-sort filtering

this has some problems. First, your facet, group, num hits, etc.
counts will be off for that user. Second, you can't sort without having all of 
the documents, so unless you're willing to have your counts be off, you really 
have to pay the price of post-filtering everything.

If you can live with the counts being off, consider just having the application 
do a couple of round-trips. Get the docs (oversample, say just get the IDs for 
the top 100 docs) _without_ any kind of ACL filtering. Then send those docs 
back to the server with the ACL filtering. If you don't get enough to fill up a 
response, get the next page of 100, etc.....

Finally, the user's list is a better place for this kind of question, this list 
is for discussing developing the code...

Best
Erick

On Wed, Jan 23, 2013 at 9:05 AM, Steve Molloy <[email protected]> wrote:
> Hi,
>
>     I'm looking for a way to apply filtering that unfortunately 
> implies high cost because it needs to access external resources (for 
> security). I looked at (and tried) the PostFilter technique, which 
> offers some advantages, but still imply a lot of matches in a lot of 
> cases. What I'd like to be able to do is to filter out ids until I 
> have enough to fill the response, then stop filtering (and accept 
> everything). The idea being that total count is not as important, 
> major thing being results should not contain documents requester 
> should not see. So, post filter almost does the trick, except it comes 
> before sorting, so first X documents are not the same that the post filter is 
> getting.
>
> Is there a way to filter out documents after they have been scored and 
> sorted?
>
> Thanks,
> Steve
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For additional 
commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For additional 
commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Post-sort filtering

Reply via email to