Search multiple cores, one result
As mentioned in antoher post we (already) have a (Lucene-based) generic indexing framework which allows any source/entity to provide indexable/searchable data. Sources may be: pages events products customers ... As their names imply they have nothing in common ;) Never the less we'd like to search across them, getting one resultset with the top hits (searching across sources is also required for (auto)suggesting search terms) In our current Lucene-approach we create a Lucene index per source (and language) and then search across the indexs with a MultiIndexReader. Switching to Solr we'd like to rethink the design decision whether to a) put all data into one core(Lucene index) or to b) split them into seperate cores if b) how can I search across the cores (in SolrJ)? Thx Clemens
Re: Search multiple cores, one result
Depending on the size, I'd go for (a). IOW, I wouldn't change the sharding to use (a), but if you have the same shard setup in that case, it's easier. You'd index a type field with each doc indicating the source of your document. Then use the grouping feature to return the top N from each of the groups you care about. Then a single request will return some docs from each of your doc types, and it's again up to the application layer to combine them intelligently. I'm sure you're aware that the scores aren't comparable in this scenario, so Of course you can use filter queries (fq) clauses to restrict to a single type of doc as appropriate. Best, Erick On Mon, Sep 22, 2014 at 4:54 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: As mentioned in antoher post we (already) have a (Lucene-based) generic indexing framework which allows any source/entity to provide indexable/searchable data. Sources may be: pages events products customers ... As their names imply they have nothing in common ;) Never the less we'd like to search across them, getting one resultset with the top hits (searching across sources is also required for (auto)suggesting search terms) In our current Lucene-approach we create a Lucene index per source (and language) and then search across the indexs with a MultiIndexReader. Switching to Solr we'd like to rethink the design decision whether to a) put all data into one core(Lucene index) or to b) split them into seperate cores if b) how can I search across the cores (in SolrJ)? Thx Clemens
Re: search multiple cores
Hi Alvaro Cabrerizo, Regarding the following -- - B is a constraint over the documents in the coreB I tried and it seems if I try with the fields available only in coreB but not in coreA, it throws an error saying, 'undefined field 'the_field'. The field 'the_field' in coreB is indexed enabled. Any inputs will be of great help. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/search-multiple-cores-tp4136059p4139063.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: search multiple cores
Really, really, _really_ consider denormalizing the data. You're trying to use Solr as a RDBMS. Solr is a _great_ search engine, but it's not a DB and trying to make it behave as one is almost always a mistake. Using joins should really be something you try _last_. Best, Erick On Tue, May 13, 2014 at 8:27 PM, Jay Potharaju jspothar...@gmail.com wrote: Hi, I am trying to join across multiple cores using query time join. Following is my setup 3 cores - Solr 4.7 core1: 0.5 million documents core2: 4 million documents and growing. This contains the child documents for documents in core1. core3: 2 million documents and growing. Contains records from all users. core2 contains documents that are accessible to each user based on their permissions. The number of documents accessible to a user range from couple of 1000s to 100,000. I would like to get results by combining all three cores. For each search I get documents from core3 and then query core1 to get parent documents then core2 to get the appropriate child documents depending of user permissions. I 'm referring to this link to join across cores http://stackoverflow.com/questions/12665797/is-solr-4-0-capable-of-using-join-for-multiple-core {!join from=fromField to=toField fromIndex=fromCoreName}fromQuery This is not working for me. Can anyone suggest why it is not working. Any pointers on how to search across multiple cores. thanks J
Re: search multiple cores
As far as I know (and how i have been using it), the join can't do what you want. The structure of the query you could try (among others) is : 1. http://SOLR_ADDRESS/coreA/select?q=Afq={!join ... fromCore=coreB}B 2. http://SOLR_ADDRESS/coreA/select?q=A AND _query_:{!joinhttp://solr_address/coreA/select?q=Afq=%7B!join ... fromCore=coreB}B Where: - A is a constraint over the documents of coreA (the documents returned by the query belong to this core). - B is a constraint over the documents in the coreB - fq is a constraint that have to satisfy documents in core A that depends on documents of B (query 1.) - The nested query in 2. is similar to the fq in query 1. If I've understood your requirement, you would like to get documents from coreA that satisfy a condition depending on documents of coreB, and those documents of coreB should also satisfy a condition from documents of coreC. This kind of transitivity (A-B-C) is the one I think can't be addressed by the join parser. In the structure of the former presented queries I can't guess how to include the constraint between coreB and coreC. In case you have three cores in action, the query you could execute (not tested but I can't see any issue) would look like this: 3. http://SOLR_ADDRESS/coreA/select?q=Afq={!joinhttp://solr_address/coreA/select?q=Afq=%7B!join ... from=coreB}Bfq={!join... fromCore=coreC}C 4. http://SOLR_ADDRESS/coreA/select?q=A AND _query_:{!joinhttp://solr_address/coreA/select?q=Afq=%7B!join ... fromCore=coreB}B AND _query_:{!joinhttp://solr_address/coreA/select?q=Afq=%7B!join ... fromCore=coreC}C But in this case there is no a transitive restriction but independent conditions between coreA - coreB and coreA - coreC. Regards. On Wed, May 14, 2014 at 5:27 AM, Jay Potharaju jspothar...@gmail.comwrote: Hi, I am trying to join across multiple cores using query time join. Following is my setup 3 cores - Solr 4.7 core1: 0.5 million documents core2: 4 million documents and growing. This contains the child documents for documents in core1. core3: 2 million documents and growing. Contains records from all users. core2 contains documents that are accessible to each user based on their permissions. The number of documents accessible to a user range from couple of 1000s to 100,000. I would like to get results by combining all three cores. For each search I get documents from core3 and then query core1 to get parent documents then core2 to get the appropriate child documents depending of user permissions. I 'm referring to this link to join across cores http://stackoverflow.com/questions/12665797/is-solr-4-0-capable-of-using-join-for-multiple-core {!join from=fromField to=toField fromIndex=fromCoreName}fromQuery This is not working for me. Can anyone suggest why it is not working. Any pointers on how to search across multiple cores. thanks J
search multiple cores
Hi, I am trying to join across multiple cores using query time join. Following is my setup 3 cores - Solr 4.7 core1: 0.5 million documents core2: 4 million documents and growing. This contains the child documents for documents in core1. core3: 2 million documents and growing. Contains records from all users. core2 contains documents that are accessible to each user based on their permissions. The number of documents accessible to a user range from couple of 1000s to 100,000. I would like to get results by combining all three cores. For each search I get documents from core3 and then query core1 to get parent documents then core2 to get the appropriate child documents depending of user permissions. I 'm referring to this link to join across cores http://stackoverflow.com/questions/12665797/is-solr-4-0-capable-of-using-join-for-multiple-core {!join from=fromField to=toField fromIndex=fromCoreName}fromQuery This is not working for me. Can anyone suggest why it is not working. Any pointers on how to search across multiple cores. thanks J