Re: Query to multiple collections
Hi, This kind of was one of the problems I was facing recently. While in my use case I am supposed to be showing spellcheck suggestions (collated) from two different collections. To also mention both these collections are using the same schema while they need to be segregated as for the business nature they serve. I considered using the aliasing approach too, while was little unsure if this might work for me. Weirdly the standard select URL itself is a trouble for me and I run into the following exception on my browser : http://:8983/solr/products.1,products.3/select?q=*:* { "responseHeader": { "zkConnected": true, "status": 500, "QTime": 24, "params": { "q": "*:*" } }, "error": { "trace": "java.lang.NullPointerException\n\tat org.apache.solr.handler.component.QueryComponent.unmarshalSortValues(QueryComponent.java:1034)\n\tat org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:885)\n\tat org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:585)\n\tat org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:564)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:423)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:534)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)\n\tat org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tat java.lang.Thread.run(Thread.java:748)\n", "code": 500 } } I would really appreciate if someone could possibly tell me what could be happening? Thanks, Atita On Tue, Oct 23, 2018 at 1:58 AM Rohan Kasat wrote: > Thanks Shawn for the update. > I am going ahead with the standard aliases approach , suits my use case. > > Regards, > Rohan Kasat > > > On Mon, Oct 22, 2018 at 4:49 PM Shawn Heisey wrote: > > > On 10/22/2018 1:26 PM, Chris Ulicny wrote: > > > There weren't any particular problems we ran into since the client that > > > makes the queries to multiple collections previously would query > multiple > > > cores using the 'shards' parameter before we moved to solrcloud. We > > didn't > > > have any complicated sorting or scoring requirements fortunately. > > > > > > The one thing I remember looking into was what solr would do when two > > > documents with the same id were found in both collections. I believe it > > > just non-deterministic
Re: Query to multiple collections
Thanks Shawn for the update. I am going ahead with the standard aliases approach , suits my use case. Regards, Rohan Kasat On Mon, Oct 22, 2018 at 4:49 PM Shawn Heisey wrote: > On 10/22/2018 1:26 PM, Chris Ulicny wrote: > > There weren't any particular problems we ran into since the client that > > makes the queries to multiple collections previously would query multiple > > cores using the 'shards' parameter before we moved to solrcloud. We > didn't > > have any complicated sorting or scoring requirements fortunately. > > > > The one thing I remember looking into was what solr would do when two > > documents with the same id were found in both collections. I believe it > > just non-deterministically picked one, probably the one that came in > first > > or last. > > Yes, that is how it works. I do not know whether it is the first one to > respond or the last one to respond that ends up in the results. Solr is > designed to work with data where the uniqueKey field really is unique > across everything that is being queried. Results can vary when you have > the same uniqueKey value in more than one place and you query both of > them at once. > > > Depending on how many collections you need to query simultaneously, it's > > worth looking into using aliases for lists of collections as Alex > > mentioned. > > > > Unfortunately, in our use case, it wasn't worth the headache of managing > > aliases for every possible combination of collections that needed to be > > queried, but we would have preferred to use aliases. > > Aliases are the cleanest option. This syntax also works, sorta blew my > mind when somebody told me about it: > > http://host:port/solr/current,archive2,archive4/select?q=*:* > > If you're using a Solr client library, it might not be possible to > control the URL like that, but if you're building URLs yourself, you > could use it. > > I recently filed an issue related to alias handling, some unexpected > behavior: > > https://issues.apache.org/jira/browse/SOLR-12849 > > Thanks, > Shawn > > -- *Regards,Rohan Kasat*
Re: Query to multiple collections
On 10/22/2018 1:26 PM, Chris Ulicny wrote: There weren't any particular problems we ran into since the client that makes the queries to multiple collections previously would query multiple cores using the 'shards' parameter before we moved to solrcloud. We didn't have any complicated sorting or scoring requirements fortunately. The one thing I remember looking into was what solr would do when two documents with the same id were found in both collections. I believe it just non-deterministically picked one, probably the one that came in first or last. Yes, that is how it works. I do not know whether it is the first one to respond or the last one to respond that ends up in the results. Solr is designed to work with data where the uniqueKey field really is unique across everything that is being queried. Results can vary when you have the same uniqueKey value in more than one place and you query both of them at once. Depending on how many collections you need to query simultaneously, it's worth looking into using aliases for lists of collections as Alex mentioned. Unfortunately, in our use case, it wasn't worth the headache of managing aliases for every possible combination of collections that needed to be queried, but we would have preferred to use aliases. Aliases are the cleanest option. This syntax also works, sorta blew my mind when somebody told me about it: http://host:port/solr/current,archive2,archive4/select?q=*:* If you're using a Solr client library, it might not be possible to control the URL like that, but if you're building URLs yourself, you could use it. I recently filed an issue related to alias handling, some unexpected behavior: https://issues.apache.org/jira/browse/SOLR-12849 Thanks, Shawn
Re: Query to multiple collections
Thanks Chris. This help. Regards, Rohan On Mon, Oct 22, 2018 at 12:26 PM Chris Ulicny wrote: > There weren't any particular problems we ran into since the client that > makes the queries to multiple collections previously would query multiple > cores using the 'shards' parameter before we moved to solrcloud. We didn't > have any complicated sorting or scoring requirements fortunately. > > The one thing I remember looking into was what solr would do when two > documents with the same id were found in both collections. I believe it > just non-deterministically picked one, probably the one that came in first > or last. > > Depending on how many collections you need to query simultaneously, it's > worth looking into using aliases for lists of collections as Alex > mentioned. > > Unfortunately, in our use case, it wasn't worth the headache of managing > aliases for every possible combination of collections that needed to be > queried, but we would have preferred to use aliases. > > On Mon, Oct 22, 2018 at 2:27 PM Rohan Kasat wrote: > > > Thanks Alex. > > I check aliases but dint focused much , will try to relate more to my use > > case and have a look again at the same. > > I guess the specification of collection in the query should be useful. > > > > Regards, > > Rohan Kasat > > > > On Mon, Oct 22, 2018 at 11:21 AM Alexandre Rafalovitch < > arafa...@gmail.com > > > > > wrote: > > > > > Have you tried using aliases: > > > > > > > > > http://lucene.apache.org/solr/guide/7_5/collections-api.html#collections-api > > > > > > You can also - I think - specify a collection of shards/collections > > > directly in the query, but there may be side edge-cases with that (not > > > sure). > > > > > > Regards, > > > Alex. > > > On Mon, 22 Oct 2018 at 13:49, Rohan Kasat > wrote: > > > > > > > > Hi All , > > > > > > > > I have a SolrCloud setup with multiple collections. > > > > I have created say - two collections here as the data source for the > > > both > > > > collections are different and hence wanted to store them differently. > > > > There is a use case , where i need to query both the collections and > > show > > > > unified search results. > > > > The fields in the schema are same. ( say - title , description , > date ) > > > > Is there any specific way i can do this directly with the collections > > API > > > > or something like that? > > > > Or i need to write a federator and combine results from search to the > > > > respective collections and then unify them? > > > > > > > > -- > > > > > > > > *Regards,Rohan* > > > > > > > > > -- > > > > *Regards,Rohan Kasat* > > > -- *Regards,Rohan Kasat*
Re: Query to multiple collections
There weren't any particular problems we ran into since the client that makes the queries to multiple collections previously would query multiple cores using the 'shards' parameter before we moved to solrcloud. We didn't have any complicated sorting or scoring requirements fortunately. The one thing I remember looking into was what solr would do when two documents with the same id were found in both collections. I believe it just non-deterministically picked one, probably the one that came in first or last. Depending on how many collections you need to query simultaneously, it's worth looking into using aliases for lists of collections as Alex mentioned. Unfortunately, in our use case, it wasn't worth the headache of managing aliases for every possible combination of collections that needed to be queried, but we would have preferred to use aliases. On Mon, Oct 22, 2018 at 2:27 PM Rohan Kasat wrote: > Thanks Alex. > I check aliases but dint focused much , will try to relate more to my use > case and have a look again at the same. > I guess the specification of collection in the query should be useful. > > Regards, > Rohan Kasat > > On Mon, Oct 22, 2018 at 11:21 AM Alexandre Rafalovitch > > wrote: > > > Have you tried using aliases: > > > > > http://lucene.apache.org/solr/guide/7_5/collections-api.html#collections-api > > > > You can also - I think - specify a collection of shards/collections > > directly in the query, but there may be side edge-cases with that (not > > sure). > > > > Regards, > > Alex. > > On Mon, 22 Oct 2018 at 13:49, Rohan Kasat wrote: > > > > > > Hi All , > > > > > > I have a SolrCloud setup with multiple collections. > > > I have created say - two collections here as the data source for the > > both > > > collections are different and hence wanted to store them differently. > > > There is a use case , where i need to query both the collections and > show > > > unified search results. > > > The fields in the schema are same. ( say - title , description , date ) > > > Is there any specific way i can do this directly with the collections > API > > > or something like that? > > > Or i need to write a federator and combine results from search to the > > > respective collections and then unify them? > > > > > > -- > > > > > > *Regards,Rohan* > > > > > -- > > *Regards,Rohan Kasat* >
Re: Query to multiple collections
Thanks Alex. I check aliases but dint focused much , will try to relate more to my use case and have a look again at the same. I guess the specification of collection in the query should be useful. Regards, Rohan Kasat On Mon, Oct 22, 2018 at 11:21 AM Alexandre Rafalovitch wrote: > Have you tried using aliases: > > http://lucene.apache.org/solr/guide/7_5/collections-api.html#collections-api > > You can also - I think - specify a collection of shards/collections > directly in the query, but there may be side edge-cases with that (not > sure). > > Regards, > Alex. > On Mon, 22 Oct 2018 at 13:49, Rohan Kasat wrote: > > > > Hi All , > > > > I have a SolrCloud setup with multiple collections. > > I have created say - two collections here as the data source for the > both > > collections are different and hence wanted to store them differently. > > There is a use case , where i need to query both the collections and show > > unified search results. > > The fields in the schema are same. ( say - title , description , date ) > > Is there any specific way i can do this directly with the collections API > > or something like that? > > Or i need to write a federator and combine results from search to the > > respective collections and then unify them? > > > > -- > > > > *Regards,Rohan* > -- *Regards,Rohan Kasat*
Re: Query to multiple collections
Thanks Chris for the update. I was thinking on the same grounds just wanted to check if you faced any specific issues. Regards, Rohan Kasat On Mon, Oct 22, 2018 at 11:20 AM Chris Ulicny wrote: > Rohan, > > I do not remember where I came across it or what restrictions exist on it, > but it works for our use case of querying multiple archived collections > with identical schemas in the same SolrCloud cluster. The queries have the > following form: > > > http::/solr/current/select?collection=current,archive2,archive4&q=... > > > It seems like it might work for your use case, but you might need to tread > carefully depending on your requirements for the returned results. Sorting > and duplicate unique keys come to mind. > > Best, > Chris > > On Mon, Oct 22, 2018 at 1:49 PM Rohan Kasat wrote: > > > Hi All , > > > > I have a SolrCloud setup with multiple collections. > > I have created say - two collections here as the data source for the > both > > collections are different and hence wanted to store them differently. > > There is a use case , where i need to query both the collections and show > > unified search results. > > The fields in the schema are same. ( say - title , description , date ) > > Is there any specific way i can do this directly with the collections API > > or something like that? > > Or i need to write a federator and combine results from search to the > > respective collections and then unify them? > > > > -- > > > > *Regards,Rohan* > > > -- *Regards,Rohan Kasat*
Re: Query to multiple collections
Have you tried using aliases: http://lucene.apache.org/solr/guide/7_5/collections-api.html#collections-api You can also - I think - specify a collection of shards/collections directly in the query, but there may be side edge-cases with that (not sure). Regards, Alex. On Mon, 22 Oct 2018 at 13:49, Rohan Kasat wrote: > > Hi All , > > I have a SolrCloud setup with multiple collections. > I have created say - two collections here as the data source for the both > collections are different and hence wanted to store them differently. > There is a use case , where i need to query both the collections and show > unified search results. > The fields in the schema are same. ( say - title , description , date ) > Is there any specific way i can do this directly with the collections API > or something like that? > Or i need to write a federator and combine results from search to the > respective collections and then unify them? > > -- > > *Regards,Rohan*
Re: Query to multiple collections
Rohan, I do not remember where I came across it or what restrictions exist on it, but it works for our use case of querying multiple archived collections with identical schemas in the same SolrCloud cluster. The queries have the following form: http::/solr/current/select?collection=current,archive2,archive4&q=... It seems like it might work for your use case, but you might need to tread carefully depending on your requirements for the returned results. Sorting and duplicate unique keys come to mind. Best, Chris On Mon, Oct 22, 2018 at 1:49 PM Rohan Kasat wrote: > Hi All , > > I have a SolrCloud setup with multiple collections. > I have created say - two collections here as the data source for the both > collections are different and hence wanted to store them differently. > There is a use case , where i need to query both the collections and show > unified search results. > The fields in the schema are same. ( say - title , description , date ) > Is there any specific way i can do this directly with the collections API > or something like that? > Or i need to write a federator and combine results from search to the > respective collections and then unify them? > > -- > > *Regards,Rohan* >
Query to multiple collections
Hi All , I have a SolrCloud setup with multiple collections. I have created say - two collections here as the data source for the both collections are different and hence wanted to store them differently. There is a use case , where i need to query both the collections and show unified search results. The fields in the schema are same. ( say - title , description , date ) Is there any specific way i can do this directly with the collections API or something like that? Or i need to write a federator and combine results from search to the respective collections and then unify them? -- *Regards,Rohan*