Search multiple cores, one result

2014-09-22 Thread Clemens Wyss DEV
As mentioned in antoher post we (already) have a (Lucene-based) generic 
indexing framework which allows any source/entity to provide 
indexable/searchable data.
Sources may be:
pages
events
products
customers
...
As their names imply they have nothing in common ;) Never the less we'd like to 
search across them, getting one resultset with the top hits
(searching across sources is also required for (auto)suggesting search terms)

In our current Lucene-approach we create a Lucene index per source (and 
language) and then search across the indexs  with a MultiIndexReader.
Switching to Solr we'd like to rethink the design decision whether to 
a) put all data into one core(Lucene index) 
or to
b) split them into seperate cores

if  b) how can I search across the cores (in SolrJ)?

Thx
Clemens


Re: Search multiple cores, one result

2014-09-22 Thread Erick Erickson
Depending on the size, I'd go for (a). IOW, I  wouldn't change the
sharding to use (a), but if you have the same shard setup in that
case, it's easier.

You'd index a type field with each doc indicating the source of your
document. Then use the grouping feature to return the top N from each
of the groups you care about.

Then a single request will return some docs from each of your doc
types, and it's again up to the application layer to combine them
intelligently. I'm sure you're aware that the scores aren't comparable
in this scenario, so

Of course you can use filter queries (fq) clauses to restrict to a
single type of doc as appropriate.

Best,
Erick

On Mon, Sep 22, 2014 at 4:54 AM, Clemens Wyss DEV clemens...@mysign.ch wrote:
 As mentioned in antoher post we (already) have a (Lucene-based) generic 
 indexing framework which allows any source/entity to provide 
 indexable/searchable data.
 Sources may be:
 pages
 events
 products
 customers
 ...
 As their names imply they have nothing in common ;) Never the less we'd like 
 to search across them, getting one resultset with the top hits
 (searching across sources is also required for (auto)suggesting search terms)

 In our current Lucene-approach we create a Lucene index per source (and 
 language) and then search across the indexs  with a MultiIndexReader.
 Switching to Solr we'd like to rethink the design decision whether to
 a) put all data into one core(Lucene index)
 or to
 b) split them into seperate cores

 if  b) how can I search across the cores (in SolrJ)?

 Thx
 Clemens


Re: search multiple cores

2014-05-31 Thread sunayansaikia
Hi Alvaro Cabrerizo,

Regarding the following --
- B is a constraint over the documents in the coreB

I tried and it seems if I try with the fields available only in coreB but
not in coreA, it throws an error saying, 'undefined field 'the_field'. The
field 'the_field' in coreB is indexed enabled.

Any inputs will be of great help.

Thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-multiple-cores-tp4136059p4139063.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search multiple cores

2014-05-16 Thread Erick Erickson
Really, really, _really_ consider denormalizing the data. You're
trying to use Solr
as a RDBMS. Solr is a _great_ search engine, but it's not a DB and trying
to make it behave as one is almost always a mistake.

Using joins should really be something you try _last_.

Best,
Erick

On Tue, May 13, 2014 at 8:27 PM, Jay Potharaju jspothar...@gmail.com wrote:
 Hi,
 I am trying to join across multiple cores using query time join. Following
 is my setup
 3 cores - Solr 4.7
 core1:  0.5 million documents
 core2: 4 million documents and growing. This contains the child documents
 for documents in core1.
 core3: 2 million documents and growing. Contains records from all users.

  core2 contains documents that are accessible to each user based on their
 permissions. The number of documents accessible to a user range from couple
 of 1000s to 100,000.

 I would like to get results by combining all three cores. For each search I
 get documents from core3 and then query core1 to get parent documents 
 then core2 to get the appropriate child documents depending of user
 permissions.

 I 'm referring to this link to join across cores
 http://stackoverflow.com/questions/12665797/is-solr-4-0-capable-of-using-join-for-multiple-core

 {!join from=fromField to=toField fromIndex=fromCoreName}fromQuery

 This is not working for me. Can anyone suggest why it is not working. Any
 pointers on how to search across multiple cores.

 thanks



 J


Re: search multiple cores

2014-05-15 Thread Alvaro Cabrerizo
As far as I know (and how i have been using it), the join can't do what you
want. The structure of the query you could try (among others) is :

1. http://SOLR_ADDRESS/coreA/select?q=Afq={!join ... fromCore=coreB}B
2. http://SOLR_ADDRESS/coreA/select?q=A AND
_query_:{!joinhttp://solr_address/coreA/select?q=Afq=%7B!join ...
fromCore=coreB}B

Where:

   - A is a constraint over the documents of coreA (the documents returned
   by the query belong to this core).
   - B is a constraint over the documents in the coreB
   - fq is a constraint that have to satisfy documents in core A that
   depends on documents of B (query 1.)
   - The nested query in 2. is similar to the fq in query 1.

If I've understood your requirement, you would like to get documents from
coreA that satisfy a condition depending on documents of coreB, and those
documents of coreB should also satisfy a condition from documents of coreC.
This kind of transitivity (A-B-C) is the one I think can't be addressed
by the join parser. In the structure of the former presented queries I
can't guess how to include the constraint between coreB and coreC.

In case you have three cores in action, the query you could execute (not
tested but I can't see any issue) would look like this:

3. 
http://SOLR_ADDRESS/coreA/select?q=Afq={!joinhttp://solr_address/coreA/select?q=Afq=%7B!join
...
from=coreB}Bfq={!join... fromCore=coreC}C
4. http://SOLR_ADDRESS/coreA/select?q=A AND
_query_:{!joinhttp://solr_address/coreA/select?q=Afq=%7B!join ...
fromCore=coreB}B AND
_query_:{!joinhttp://solr_address/coreA/select?q=Afq=%7B!join ...
fromCore=coreC}C

But in this case there is no a transitive restriction but independent
conditions between coreA - coreB and coreA - coreC.


Regards.


On Wed, May 14, 2014 at 5:27 AM, Jay Potharaju jspothar...@gmail.comwrote:

 Hi,
 I am trying to join across multiple cores using query time join. Following
 is my setup
 3 cores - Solr 4.7
 core1:  0.5 million documents
 core2: 4 million documents and growing. This contains the child documents
 for documents in core1.
 core3: 2 million documents and growing. Contains records from all users.

  core2 contains documents that are accessible to each user based on their
 permissions. The number of documents accessible to a user range from couple
 of 1000s to 100,000.

 I would like to get results by combining all three cores. For each search I
 get documents from core3 and then query core1 to get parent documents 
 then core2 to get the appropriate child documents depending of user
 permissions.

 I 'm referring to this link to join across cores

 http://stackoverflow.com/questions/12665797/is-solr-4-0-capable-of-using-join-for-multiple-core

 {!join from=fromField to=toField fromIndex=fromCoreName}fromQuery

 This is not working for me. Can anyone suggest why it is not working. Any
 pointers on how to search across multiple cores.

 thanks



 J



search multiple cores

2014-05-13 Thread Jay Potharaju
Hi,
I am trying to join across multiple cores using query time join. Following
is my setup
3 cores - Solr 4.7
core1:  0.5 million documents
core2: 4 million documents and growing. This contains the child documents
for documents in core1.
core3: 2 million documents and growing. Contains records from all users.

 core2 contains documents that are accessible to each user based on their
permissions. The number of documents accessible to a user range from couple
of 1000s to 100,000.

I would like to get results by combining all three cores. For each search I
get documents from core3 and then query core1 to get parent documents 
then core2 to get the appropriate child documents depending of user
permissions.

I 'm referring to this link to join across cores
http://stackoverflow.com/questions/12665797/is-solr-4-0-capable-of-using-join-for-multiple-core

{!join from=fromField to=toField fromIndex=fromCoreName}fromQuery

This is not working for me. Can anyone suggest why it is not working. Any
pointers on how to search across multiple cores.

thanks



J