Re: on-the-fly "filters" from docID lists

Mark Harwood Thu, 22 Jul 2010 23:56:33 -0700

Re scalability of filter construction - the database is likely to hold stable 
primary keys not lucene doc ids which are unstable in the face of updates. You 
therefore need a quick way of converting stable database keys read from the db 
into current lucene doc ids to create the filter. That could involve a lot of 
disk seeks unless you cache a pk->docid lookup in ram.  You should use 
cachingwrapperfilter too to cache the computed  user permissions from one 
search to the next. 
This can get messy. If the access permissions are centred around roles/groups 
it is normally faster to tag docs with these group names and query them with 
the list of roles the user holds. 
If individual user-doc-level perms are required you could also consider 
dynamically looking up perms for just the top n results being shown at the risk 
of needing to repeat the query with a larger n if insufficient matches pass the 
lookup.


Cheers 
Mark
----------------------------------------


On 23 Jul 2010, at 01:55, Michael McCandless <luc...@mikemccandless.com> wrote:

> Well, Lucene can apply such a filter rather quickly; but, your custom
> code first has to build it... so it's really a question of whether
> your custom code can build up / iterate the filter scalably.
> 
> Mike
> 
> On Thu, Jul 22, 2010 at 4:37 PM, Burton-West, Tom <tburt...@umich.edu> wrote:
>> Hi Mike and Martin,
>> 
>> We have a similar use-case.   Is there a scalability/performance issue with 
>> the getDocIdSet having to iterate through hundreds of thousands of docIDs?
>> 
>> Tom Burton-West
>> http://www.hathitrust.org/blogs/large-scale-search
>> 
>> -----Original Message-----
>> From: Michael McCandless [mailto:luc...@mikemccandless.com]
>> Sent: Thursday, July 22, 2010 5:20 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: on-the-fly "filters" from docID lists
>> 
>> It sounds like you should implement a custom Filter?
>> 
>> Its getDocIdSet would consult your foreign key-value store and iterate
>> through the allowed docIDs, per segment.
>> 
>> Mike
>> 
>> On Wed, Jul 21, 2010 at 8:37 AM, Martin J <martinj.eng...@gmail.com> wrote:
>>> Hello, we are trying to implement a query type for Lucene (with eventual
>>> target being Solr) where the query string passed in needs to be "filtered"
>>> through a large list of document IDs per user. We can't store the user ID
>>> information in the lucene index per document so we were planning to pull the
>>> list of documents owned by user X from a key-value store at query time and
>>> then build some sort of filter in memory before doing the Lucene/Solr query.
>>> For example:
>>> 
>>> content:"cars" user_id:X567
>>> 
>>> would first pull the list of docIDs that user_id:X567 has "access" to from a
>>> keyvalue store and then we'd query the main index with content:"cars" but
>>> only allow the docIDs that came back to be part of the response. The list of
>>> docIDs can near the hundreds of thousands.
>>> 
>>> What should I be looking at to implement such a feature?
>>> 
>>> Thank you
>>> Martin
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: on-the-fly "filters" from docID lists

Reply via email to