Re: SolrDocumentList - bitwise operation

2013-10-17 Thread Michael Tyler
Hi,

   Regrets, I was confused with bit-set. I l have Shawn's suggested
approach in system.  I want to try with other ways and test performance.

How can I use join? I have 2 different solr indexes.
localhost:8080/solr_1/select?q=content:testfl=id,name,type
localhost:8081/solr_1_1/select?q=text:testfl=id

After getting results - Join by id

How do I do this? please suggest me with other ways to do this. current
method is taking lot of time.

Thanks
Michael.










On Tue, Oct 15, 2013 at 11:41 PM, Erick Erickson erickerick...@gmail.comwrote:

 Why do you think a bitset would help? Bitsets have
 a bit set on for every document that matches
 based on the _internal_ Lucene document ID, it
 has nothing to do with the uniqueKey you have
 defined. Nor does it have anything to do with the
 foreign key relationship.

 So either I don't understand the problem at all or
 pursuing bitsets is a red herring.

 You might be substantially faster by sorting the
 results and then doing a skip-list sort of thing.

 FWIW,
 Erick


 On Mon, Oct 14, 2013 at 1:47 PM, Michael Tyler
 michaeltyler1...@gmail.comwrote:

  Hi Shawn,
 
This is time consuming operation. I already have this in my
 application .
  I was pondering whether I can get bit set from both the solr indexes ,
  bitset.and  then retrieve only those matched? I don't know how do I
  retrieve bitset. - wanted to try this and test the performance.
 
 
  Regards
  Michael
 
 
  On Sun, Oct 13, 2013 at 8:54 PM, Shawn Heisey s...@elyograg.org wrote:
 
   On 10/13/2013 8:34 AM, Michael Tyler wrote:
Hello,
   
I have 2 different solr indexes returning 2 different sets of
SolrDocumentList. Doc Id is the foreign key relation.
   
After obtaining them, I want to perform AND operation between them
  and
then return results to user. Can you tell me how do I get this? I am
   using
solr 4.3
   
 SolrDocumentList results1 = responseA.getResults();
 SolrDocumentList results2 = responseB.getResults();
   
results1  : d1, d2, d3
results2  :  d1,d2, d4
  
   The SolrDocumentList class extends ArrayListSolrDocument, which means
   that it inherits all ArrayList functionality.  Unfortunately, there's
 no
   built-in way of eliminating duplicates with a java List.  It's very
 easy
   to combine the two results into another object, but that object will
   contain both of the d1 and both of the d2 SolrDocument objects.
  
   The following code is a reasonably fast way to handle this.  It assumes
   that results1 is the list that should win when there are duplicates, so
   it gets added first.  It assumes that the uniqueKey field is named id
   and that it contains a String value.  If these are incorrect
   assumptions, you can adjust the code accordingly.
  
   SolrDocumentList results1 = responseA.getResults();
   SolrDocumentList results2 = responseB.getResults();
   ListSolrDocumentList tmpList = new ArrayListSolrDocumentList();
   tmpList.add(results1);
   tmpList.add(results2);
  
   SetString tmpSet = new HashSetString();
   SolrDocumentList newList = new SolrDocumentList();
   for (SolrDocumentList l : tmpList)
   {
   for (SolrDocument d : l)
   {
   String id = (String) d.get(id);
   if (tmpSet.contains(id)) {
   continue;
   }
   tmpSet.add(id);
   newList.add(d);
   }
   }
  
   Thanks,
   Shawn
  
  
 



Re: SolrDocumentList - bitwise operation

2013-10-15 Thread Erick Erickson
Why do you think a bitset would help? Bitsets have
a bit set on for every document that matches
based on the _internal_ Lucene document ID, it
has nothing to do with the uniqueKey you have
defined. Nor does it have anything to do with the
foreign key relationship.

So either I don't understand the problem at all or
pursuing bitsets is a red herring.

You might be substantially faster by sorting the
results and then doing a skip-list sort of thing.

FWIW,
Erick


On Mon, Oct 14, 2013 at 1:47 PM, Michael Tyler
michaeltyler1...@gmail.comwrote:

 Hi Shawn,

   This is time consuming operation. I already have this in my application .
 I was pondering whether I can get bit set from both the solr indexes ,
 bitset.and  then retrieve only those matched? I don't know how do I
 retrieve bitset. - wanted to try this and test the performance.


 Regards
 Michael


 On Sun, Oct 13, 2013 at 8:54 PM, Shawn Heisey s...@elyograg.org wrote:

  On 10/13/2013 8:34 AM, Michael Tyler wrote:
   Hello,
  
   I have 2 different solr indexes returning 2 different sets of
   SolrDocumentList. Doc Id is the foreign key relation.
  
   After obtaining them, I want to perform AND operation between them
 and
   then return results to user. Can you tell me how do I get this? I am
  using
   solr 4.3
  
SolrDocumentList results1 = responseA.getResults();
SolrDocumentList results2 = responseB.getResults();
  
   results1  : d1, d2, d3
   results2  :  d1,d2, d4
 
  The SolrDocumentList class extends ArrayListSolrDocument, which means
  that it inherits all ArrayList functionality.  Unfortunately, there's no
  built-in way of eliminating duplicates with a java List.  It's very easy
  to combine the two results into another object, but that object will
  contain both of the d1 and both of the d2 SolrDocument objects.
 
  The following code is a reasonably fast way to handle this.  It assumes
  that results1 is the list that should win when there are duplicates, so
  it gets added first.  It assumes that the uniqueKey field is named id
  and that it contains a String value.  If these are incorrect
  assumptions, you can adjust the code accordingly.
 
  SolrDocumentList results1 = responseA.getResults();
  SolrDocumentList results2 = responseB.getResults();
  ListSolrDocumentList tmpList = new ArrayListSolrDocumentList();
  tmpList.add(results1);
  tmpList.add(results2);
 
  SetString tmpSet = new HashSetString();
  SolrDocumentList newList = new SolrDocumentList();
  for (SolrDocumentList l : tmpList)
  {
  for (SolrDocument d : l)
  {
  String id = (String) d.get(id);
  if (tmpSet.contains(id)) {
  continue;
  }
  tmpSet.add(id);
  newList.add(d);
  }
  }
 
  Thanks,
  Shawn
 
 



Re: SolrDocumentList - bitwise operation

2013-10-14 Thread Michael Tyler
Hi Shawn,

  This is time consuming operation. I already have this in my application .
I was pondering whether I can get bit set from both the solr indexes ,
bitset.and  then retrieve only those matched? I don't know how do I
retrieve bitset. - wanted to try this and test the performance.


Regards
Michael


On Sun, Oct 13, 2013 at 8:54 PM, Shawn Heisey s...@elyograg.org wrote:

 On 10/13/2013 8:34 AM, Michael Tyler wrote:
  Hello,
 
  I have 2 different solr indexes returning 2 different sets of
  SolrDocumentList. Doc Id is the foreign key relation.
 
  After obtaining them, I want to perform AND operation between them and
  then return results to user. Can you tell me how do I get this? I am
 using
  solr 4.3
 
   SolrDocumentList results1 = responseA.getResults();
   SolrDocumentList results2 = responseB.getResults();
 
  results1  : d1, d2, d3
  results2  :  d1,d2, d4

 The SolrDocumentList class extends ArrayListSolrDocument, which means
 that it inherits all ArrayList functionality.  Unfortunately, there's no
 built-in way of eliminating duplicates with a java List.  It's very easy
 to combine the two results into another object, but that object will
 contain both of the d1 and both of the d2 SolrDocument objects.

 The following code is a reasonably fast way to handle this.  It assumes
 that results1 is the list that should win when there are duplicates, so
 it gets added first.  It assumes that the uniqueKey field is named id
 and that it contains a String value.  If these are incorrect
 assumptions, you can adjust the code accordingly.

 SolrDocumentList results1 = responseA.getResults();
 SolrDocumentList results2 = responseB.getResults();
 ListSolrDocumentList tmpList = new ArrayListSolrDocumentList();
 tmpList.add(results1);
 tmpList.add(results2);

 SetString tmpSet = new HashSetString();
 SolrDocumentList newList = new SolrDocumentList();
 for (SolrDocumentList l : tmpList)
 {
 for (SolrDocument d : l)
 {
 String id = (String) d.get(id);
 if (tmpSet.contains(id)) {
 continue;
 }
 tmpSet.add(id);
 newList.add(d);
 }
 }

 Thanks,
 Shawn




Re: SolrDocumentList - bitwise operation

2013-10-13 Thread Liu Bo
join query might be helpful: http://wiki.apache.org/solr/Join

join can across indexes but probably won't work in solr clound.

be aware that only to documents are retrievable, if you want content from
both documents, join query won't work. And in lucene join query doesn't
quite work on multiple join conditions, haven't test it in solr yet.

I have similar join case like you, eventually I choose to denormalize our
data into one set of documents.


On 13 October 2013 22:34, Michael Tyler michaeltyler1...@gmail.com wrote:

 Hello,

 I have 2 different solr indexes returning 2 different sets of
 SolrDocumentList. Doc Id is the foreign key relation.

 After obtaining them, I want to perform AND operation between them and
 then return results to user. Can you tell me how do I get this? I am using
 solr 4.3

  SolrDocumentList results1 = responseA.getResults();
  SolrDocumentList results2 = responseB.getResults();

 results1  : d1, d2, d3
 results2  :  d1,d2, d4

 Return : d1, d2

 Regards,
 Michael




-- 
All the best

Liu Bo


Re: SolrDocumentList - bitwise operation

2013-10-13 Thread Shawn Heisey
On 10/13/2013 8:34 AM, Michael Tyler wrote:
 Hello,
 
 I have 2 different solr indexes returning 2 different sets of
 SolrDocumentList. Doc Id is the foreign key relation.
 
 After obtaining them, I want to perform AND operation between them and
 then return results to user. Can you tell me how do I get this? I am using
 solr 4.3
 
  SolrDocumentList results1 = responseA.getResults();
  SolrDocumentList results2 = responseB.getResults();
 
 results1  : d1, d2, d3
 results2  :  d1,d2, d4

The SolrDocumentList class extends ArrayListSolrDocument, which means
that it inherits all ArrayList functionality.  Unfortunately, there's no
built-in way of eliminating duplicates with a java List.  It's very easy
to combine the two results into another object, but that object will
contain both of the d1 and both of the d2 SolrDocument objects.

The following code is a reasonably fast way to handle this.  It assumes
that results1 is the list that should win when there are duplicates, so
it gets added first.  It assumes that the uniqueKey field is named id
and that it contains a String value.  If these are incorrect
assumptions, you can adjust the code accordingly.

SolrDocumentList results1 = responseA.getResults();
SolrDocumentList results2 = responseB.getResults();
ListSolrDocumentList tmpList = new ArrayListSolrDocumentList();
tmpList.add(results1);
tmpList.add(results2);

SetString tmpSet = new HashSetString();
SolrDocumentList newList = new SolrDocumentList();
for (SolrDocumentList l : tmpList)
{
for (SolrDocument d : l)
{
String id = (String) d.get(id);
if (tmpSet.contains(id)) {
continue;
}
tmpSet.add(id);
newList.add(d);
}
}

Thanks,
Shawn