SolrDocumentList - bitwise operation

2013-10-13 Thread Michael Tyler
Hello,

I have 2 different solr indexes returning 2 different sets of
SolrDocumentList. Doc Id is the foreign key relation.

After obtaining them, I want to perform "AND" operation between them and
then return results to user. Can you tell me how do I get this? I am using
solr 4.3

 SolrDocumentList results1 = responseA.getResults();
 SolrDocumentList results2 = responseB.getResults();

results1  : d1, d2, d3
results2  :  d1,d2, d4

Return : d1, d2

Regards,
Michael


Re: SolrDocumentList - bitwise operation

2013-10-14 Thread Michael Tyler
Hi Shawn,

  This is time consuming operation. I already have this in my application .
I was pondering whether I can get bit set from both the solr indexes ,
bitset.and  then retrieve only those matched? I don't know how do I
retrieve bitset. - wanted to try this and test the performance.


Regards
Michael


On Sun, Oct 13, 2013 at 8:54 PM, Shawn Heisey  wrote:

> On 10/13/2013 8:34 AM, Michael Tyler wrote:
> > Hello,
> >
> > I have 2 different solr indexes returning 2 different sets of
> > SolrDocumentList. Doc Id is the foreign key relation.
> >
> > After obtaining them, I want to perform "AND" operation between them and
> > then return results to user. Can you tell me how do I get this? I am
> using
> > solr 4.3
> >
> >  SolrDocumentList results1 = responseA.getResults();
> >  SolrDocumentList results2 = responseB.getResults();
> >
> > results1  : d1, d2, d3
> > results2  :  d1,d2, d4
>
> The SolrDocumentList class extends ArrayList, which means
> that it inherits all ArrayList functionality.  Unfortunately, there's no
> built-in way of eliminating duplicates with a java List.  It's very easy
> to combine the two results into another object, but that object will
> contain both of the d1 and both of the d2 SolrDocument objects.
>
> The following code is a reasonably fast way to handle this.  It assumes
> that results1 is the list that should win when there are duplicates, so
> it gets added first.  It assumes that the uniqueKey field is named "id"
> and that it contains a String value.  If these are incorrect
> assumptions, you can adjust the code accordingly.
>
> SolrDocumentList results1 = responseA.getResults();
> SolrDocumentList results2 = responseB.getResults();
> List tmpList = new ArrayList();
> tmpList.add(results1);
> tmpList.add(results2);
>
> Set tmpSet = new HashSet();
> SolrDocumentList newList = new SolrDocumentList();
> for (SolrDocumentList l : tmpList)
> {
> for (SolrDocument d : l)
> {
> String id = (String) d.get("id");
> if (tmpSet.contains(id)) {
> continue;
> }
> tmpSet.add(id);
> newList.add(d);
> }
> }
>
> Thanks,
> Shawn
>
>


Solr DocValues - String

2013-10-14 Thread Michael Tyler
Hi All,

  I wanted to learn more about docValues. I did a fair google search but I
 didn't understand on the point that how do I use docvalues as column
fields.

How can we use this as column stride fields?

Right now, we are having fewer data in hbase, we are thinking to move it to
solr itself if we can use the docValue feature(adding columns dynamically).


I read an article in datastax about solr docValues.
In that, there is a field created which copies all the values of other
fields into a common "all" field. And then he searches only on that field.
i understand this part and we have same feature in my system. But I did not
understand why is docValue= true added to it? What is the advantage?


Schema of that is:











 
 name

  id
  
  
  
  
  



Link : 
http://www.datastax.com/docs/datastax_enterprise3.1/solutions/dse_search_load_data


Thanks

Michael.


Re: SolrDocumentList - bitwise operation

2013-10-17 Thread Michael Tyler
Hi,

   Regrets, I was confused with bit-set. I l have Shawn's suggested
approach in system.  I want to try with other ways and test performance.

How can I use join? I have 2 different solr indexes.
localhost:8080/solr_1/select?q=content:test&fl=id,name,type
localhost:8081/solr_1_1/select?q=text:test&fl=id

After getting results - Join by id

How do I do this? please suggest me with other ways to do this. current
method is taking lot of time.

Thanks
Michael.










On Tue, Oct 15, 2013 at 11:41 PM, Erick Erickson wrote:

> Why do you think a bitset would help? Bitsets have
> a bit set on for every document that matches
> based on the _internal_ Lucene document ID, it
> has nothing to do with the  you have
> defined. Nor does it have anything to do with the
> foreign key relationship.
>
> So either I don't understand the problem at all or
> pursuing bitsets is a red herring.
>
> You might be substantially faster by sorting the
> results and then doing a skip-list sort of thing.
>
> FWIW,
> Erick
>
>
> On Mon, Oct 14, 2013 at 1:47 PM, Michael Tyler
> wrote:
>
> > Hi Shawn,
> >
> >   This is time consuming operation. I already have this in my
> application .
> > I was pondering whether I can get bit set from both the solr indexes ,
> > bitset.and  then retrieve only those matched? I don't know how do I
> > retrieve bitset. - wanted to try this and test the performance.
> >
> >
> > Regards
> > Michael
> >
> >
> > On Sun, Oct 13, 2013 at 8:54 PM, Shawn Heisey  wrote:
> >
> > > On 10/13/2013 8:34 AM, Michael Tyler wrote:
> > > > Hello,
> > > >
> > > > I have 2 different solr indexes returning 2 different sets of
> > > > SolrDocumentList. Doc Id is the foreign key relation.
> > > >
> > > > After obtaining them, I want to perform "AND" operation between them
> > and
> > > > then return results to user. Can you tell me how do I get this? I am
> > > using
> > > > solr 4.3
> > > >
> > > >  SolrDocumentList results1 = responseA.getResults();
> > > >  SolrDocumentList results2 = responseB.getResults();
> > > >
> > > > results1  : d1, d2, d3
> > > > results2  :  d1,d2, d4
> > >
> > > The SolrDocumentList class extends ArrayList, which means
> > > that it inherits all ArrayList functionality.  Unfortunately, there's
> no
> > > built-in way of eliminating duplicates with a java List.  It's very
> easy
> > > to combine the two results into another object, but that object will
> > > contain both of the d1 and both of the d2 SolrDocument objects.
> > >
> > > The following code is a reasonably fast way to handle this.  It assumes
> > > that results1 is the list that should win when there are duplicates, so
> > > it gets added first.  It assumes that the uniqueKey field is named "id"
> > > and that it contains a String value.  If these are incorrect
> > > assumptions, you can adjust the code accordingly.
> > >
> > > SolrDocumentList results1 = responseA.getResults();
> > > SolrDocumentList results2 = responseB.getResults();
> > > List tmpList = new ArrayList();
> > > tmpList.add(results1);
> > > tmpList.add(results2);
> > >
> > > Set tmpSet = new HashSet();
> > > SolrDocumentList newList = new SolrDocumentList();
> > > for (SolrDocumentList l : tmpList)
> > > {
> > > for (SolrDocument d : l)
> > > {
> > > String id = (String) d.get("id");
> > > if (tmpSet.contains(id)) {
> > > continue;
> > > }
> > > tmpSet.add(id);
> > > newList.add(d);
> > > }
> > > }
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>