Re: Solr's Filtering approaches

2013-10-12 Thread Roman Chyla
David,
We have a similar query in astrophysics, an user can select an area of the
skymany stars out there

I am long overdue in creating a Jira issue, but here you have another
efficient mechanism for searching large number of ids

https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/search/BitSetQParserPlugin.java

Roman
On 12 Oct 2013 01:57, David Philip davidphilipshe...@gmail.com wrote:

 Groups are pharmaceutical research expts.. User is presented with graph
 view, he can select some region and all the groups in that region gets
 included..user can modify the groups also here.. so we didn't maintain
 group information in same solr index but we have externalized.
 I looked at post filter article. So my understanding is that, I simply have
 to extended as you did and should include implementaton for
 isAllowed(acls[doc], groups) .This will filter the documents in the
 collector and finally this collector will be returned. am I right?

   @Override
   public void collect(int doc) throws IOException {
 if (isAllowed(acls[doc], user, groups)) super.collect(doc);
   }


 Erick, I am interested to know whether I can extend any class that can
 return me only the bitset of the documents that match the search query. I
 can then do bitset1.andbitset2OfGroups - finally, collect only those
 documents to return to user. How do I try this approach? Any pointers for
 bit set?

 Thanks - David




 On Thu, Oct 10, 2013 at 5:25 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Well, my first question is why 50K groups is necessary, and
  whether you can simplify that. How a user can manually
  choose from among that many groups is interesting. But
  assuming they're all necessary, I can think of two things.
 
  If the user can only select ranges, just put in filter queries
  using ranges. Or possibly both ranges and individual entries,
  as fq=group:[1A TO 1A] OR group:(2B 45C 98Z) etc.
  You need to be a little careful how you put index these so
  range queries work properly, in the above you'd miss
  2A because it's sorting lexicographically, you'd need to
  store in some form that sorts like 001A 01A
  and so on. You wouldn't need to show that form to the
  user, just form your fq's in the app to work with
  that form.
 
  If that won't work (you wouldn't want this to get huge), think
  about a post filter that would only operate on documents that
  had made it through the select, although how to convey which
  groups the user selected to the post filter is an open
  question.
 
  Best,
  Erick
 
  On Wed, Oct 9, 2013 at 12:23 PM, David Philip
  davidphilipshe...@gmail.com wrote:
   Hi All,
  
   I have an issue in handling filters for one of our requirements and
   liked to get suggestion  for the best approaches.
  
  
   *Use Case:*
  
   1.  We have List of groups and the number of groups can increase upto
 1
   million. Currently we have almost 90 thousand groups in the solr search
   system.
  
   2.  Just before the user hits a search, He has options to select the
 no.
  of
groups he want to retrieve. [the distinct list of these group Names
 for
   display are retrieved from other solr index that has more information
  about
   groups]
  
   *3.User Operation:** *
   Say if user selected group 1A  - group 1A.  and searches for
  key:cancer.
  
  
   The current approach I was thinking is : get search results and filter
   query by groupids' list selected by user. But my concern is When these
   groups list is increasing to 50k unique Ids, This can cause lot of
 delay
   in getting search results. So wanted to know whether there are
 different
filtering ways that I can try for?
  
   I was thinking of one more approach as suggested by my colleague to do
 -
intersection.  -
   Get the groupIds' selected by user.
   Get the list of groupId's from search results,
   Perform intersection of both and then get the entire result set of only
   those groupid that intersected. Is this better way? Can I use any cache
   technique in this case?
  
  
   - David.
 



Re: Solr's Filtering approaches

2013-10-11 Thread David Philip
Groups are pharmaceutical research expts.. User is presented with graph
view, he can select some region and all the groups in that region gets
included..user can modify the groups also here.. so we didn't maintain
group information in same solr index but we have externalized.
I looked at post filter article. So my understanding is that, I simply have
to extended as you did and should include implementaton for
isAllowed(acls[doc], groups) .This will filter the documents in the
collector and finally this collector will be returned. am I right?

  @Override
  public void collect(int doc) throws IOException {
if (isAllowed(acls[doc], user, groups)) super.collect(doc);
  }


Erick, I am interested to know whether I can extend any class that can
return me only the bitset of the documents that match the search query. I
can then do bitset1.andbitset2OfGroups - finally, collect only those
documents to return to user. How do I try this approach? Any pointers for
bit set?

Thanks - David




On Thu, Oct 10, 2013 at 5:25 PM, Erick Erickson erickerick...@gmail.comwrote:

 Well, my first question is why 50K groups is necessary, and
 whether you can simplify that. How a user can manually
 choose from among that many groups is interesting. But
 assuming they're all necessary, I can think of two things.

 If the user can only select ranges, just put in filter queries
 using ranges. Or possibly both ranges and individual entries,
 as fq=group:[1A TO 1A] OR group:(2B 45C 98Z) etc.
 You need to be a little careful how you put index these so
 range queries work properly, in the above you'd miss
 2A because it's sorting lexicographically, you'd need to
 store in some form that sorts like 001A 01A
 and so on. You wouldn't need to show that form to the
 user, just form your fq's in the app to work with
 that form.

 If that won't work (you wouldn't want this to get huge), think
 about a post filter that would only operate on documents that
 had made it through the select, although how to convey which
 groups the user selected to the post filter is an open
 question.

 Best,
 Erick

 On Wed, Oct 9, 2013 at 12:23 PM, David Philip
 davidphilipshe...@gmail.com wrote:
  Hi All,
 
  I have an issue in handling filters for one of our requirements and
  liked to get suggestion  for the best approaches.
 
 
  *Use Case:*
 
  1.  We have List of groups and the number of groups can increase upto 1
  million. Currently we have almost 90 thousand groups in the solr search
  system.
 
  2.  Just before the user hits a search, He has options to select the no.
 of
   groups he want to retrieve. [the distinct list of these group Names for
  display are retrieved from other solr index that has more information
 about
  groups]
 
  *3.User Operation:** *
  Say if user selected group 1A  - group 1A.  and searches for
 key:cancer.
 
 
  The current approach I was thinking is : get search results and filter
  query by groupids' list selected by user. But my concern is When these
  groups list is increasing to 50k unique Ids, This can cause lot of delay
  in getting search results. So wanted to know whether there are different
   filtering ways that I can try for?
 
  I was thinking of one more approach as suggested by my colleague to do -
   intersection.  -
  Get the groupIds' selected by user.
  Get the list of groupId's from search results,
  Perform intersection of both and then get the entire result set of only
  those groupid that intersected. Is this better way? Can I use any cache
  technique in this case?
 
 
  - David.



Re: Solr's Filtering approaches

2013-10-10 Thread Erick Erickson
Well, my first question is why 50K groups is necessary, and
whether you can simplify that. How a user can manually
choose from among that many groups is interesting. But
assuming they're all necessary, I can think of two things.

If the user can only select ranges, just put in filter queries
using ranges. Or possibly both ranges and individual entries,
as fq=group:[1A TO 1A] OR group:(2B 45C 98Z) etc.
You need to be a little careful how you put index these so
range queries work properly, in the above you'd miss
2A because it's sorting lexicographically, you'd need to
store in some form that sorts like 001A 01A
and so on. You wouldn't need to show that form to the
user, just form your fq's in the app to work with
that form.

If that won't work (you wouldn't want this to get huge), think
about a post filter that would only operate on documents that
had made it through the select, although how to convey which
groups the user selected to the post filter is an open
question.

Best,
Erick

On Wed, Oct 9, 2013 at 12:23 PM, David Philip
davidphilipshe...@gmail.com wrote:
 Hi All,

 I have an issue in handling filters for one of our requirements and
 liked to get suggestion  for the best approaches.


 *Use Case:*

 1.  We have List of groups and the number of groups can increase upto 1
 million. Currently we have almost 90 thousand groups in the solr search
 system.

 2.  Just before the user hits a search, He has options to select the no. of
  groups he want to retrieve. [the distinct list of these group Names for
 display are retrieved from other solr index that has more information about
 groups]

 *3.User Operation:** *
 Say if user selected group 1A  - group 1A.  and searches for key:cancer.


 The current approach I was thinking is : get search results and filter
 query by groupids' list selected by user. But my concern is When these
 groups list is increasing to 50k unique Ids, This can cause lot of delay
 in getting search results. So wanted to know whether there are different
  filtering ways that I can try for?

 I was thinking of one more approach as suggested by my colleague to do -
  intersection.  -
 Get the groupIds' selected by user.
 Get the list of groupId's from search results,
 Perform intersection of both and then get the entire result set of only
 those groupid that intersected. Is this better way? Can I use any cache
 technique in this case?


 - David.