Re: Query to multiple collections

2018-10-25 Thread Atita Arora
Hi,

This kind of was one of the problems I was facing recently.
While in my use case I am supposed to be showing spellcheck suggestions
(collated) from two different collections.
To also mention both these collections are using the same schema while they
need to be segregated as for the business nature they serve.

I considered using the aliasing approach too, while was little unsure if
this might work for me.
Weirdly the standard select URL itself is a trouble for me and I run into
the following exception on my browser :

http://:8983/solr/products.1,products.3/select?q=*:*

{
  "responseHeader": {
"zkConnected": true,
"status": 500,
"QTime": 24,
"params": {
  "q": "*:*"
}
  },
  "error": {
"trace": "java.lang.NullPointerException\n\tat
org.apache.solr.handler.component.QueryComponent.unmarshalSortValues(QueryComponent.java:1034)\n\tat
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:885)\n\tat
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:585)\n\tat
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:564)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:423)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:534)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)\n\tat
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tat
java.lang.Thread.run(Thread.java:748)\n",
"code": 500
  }
}

I would really appreciate if someone could possibly tell me what could be
happening?

Thanks,
Atita

On Tue, Oct 23, 2018 at 1:58 AM Rohan Kasat  wrote:

> Thanks Shawn for the update.
> I am going ahead with the standard aliases approach , suits my use case.
>
> Regards,
> Rohan Kasat
>
>
> On Mon, Oct 22, 2018 at 4:49 PM Shawn Heisey  wrote:
>
> > On 10/22/2018 1:26 PM, Chris Ulicny wrote:
> > > There weren't any particular problems we ran into since the client that
> > > makes the queries to multiple collections previously would query
> multiple
> > > cores using the 'shards' parameter before we moved to solrcloud. We
> > didn't
> > > have any complicated sorting or scoring requirements fortunately.
> > >
> > > The one thing I remember looking into was what solr would do when two
> > > documents with the same id were found in both collections. I believe it
> > > just non-deterministic

Re: Query to multiple collections

2018-10-22 Thread Rohan Kasat
Thanks Shawn for the update.
I am going ahead with the standard aliases approach , suits my use case.

Regards,
Rohan Kasat


On Mon, Oct 22, 2018 at 4:49 PM Shawn Heisey  wrote:

> On 10/22/2018 1:26 PM, Chris Ulicny wrote:
> > There weren't any particular problems we ran into since the client that
> > makes the queries to multiple collections previously would query multiple
> > cores using the 'shards' parameter before we moved to solrcloud. We
> didn't
> > have any complicated sorting or scoring requirements fortunately.
> >
> > The one thing I remember looking into was what solr would do when two
> > documents with the same id were found in both collections. I believe it
> > just non-deterministically picked one, probably the one that came in
> first
> > or last.
>
> Yes, that is how it works.  I do not know whether it is the first one to
> respond or the last one to respond that ends up in the results.  Solr is
> designed to work with data where the uniqueKey field really is unique
> across everything that is being queried.  Results can vary when you have
> the same uniqueKey value in more than one place and you query both of
> them at once.
>
> > Depending on how many collections you need to query simultaneously, it's
> > worth looking into using aliases for lists of collections as Alex
> > mentioned.
> >
> > Unfortunately, in our use case, it wasn't worth the headache of managing
> > aliases for every possible combination of collections that needed to be
> > queried, but we would have preferred to use aliases.
>
> Aliases are the cleanest option.  This syntax also works, sorta blew my
> mind when somebody told me about it:
>
> http://host:port/solr/current,archive2,archive4/select?q=*:*
>
> If you're using a Solr client library, it might not be possible to
> control the URL like that, but if you're building URLs yourself, you
> could use it.
>
> I recently filed an issue related to alias handling, some unexpected
> behavior:
>
> https://issues.apache.org/jira/browse/SOLR-12849
>
> Thanks,
> Shawn
>
>

-- 

*Regards,Rohan Kasat*


Re: Query to multiple collections

2018-10-22 Thread Shawn Heisey

On 10/22/2018 1:26 PM, Chris Ulicny wrote:

There weren't any particular problems we ran into since the client that
makes the queries to multiple collections previously would query multiple
cores using the 'shards' parameter before we moved to solrcloud. We didn't
have any complicated sorting or scoring requirements fortunately.

The one thing I remember looking into was what solr would do when two
documents with the same id were found in both collections. I believe it
just non-deterministically picked one, probably the one that came in first
or last.


Yes, that is how it works.  I do not know whether it is the first one to 
respond or the last one to respond that ends up in the results.  Solr is 
designed to work with data where the uniqueKey field really is unique 
across everything that is being queried.  Results can vary when you have 
the same uniqueKey value in more than one place and you query both of 
them at once.



Depending on how many collections you need to query simultaneously, it's
worth looking into using aliases for lists of collections as Alex
mentioned.

Unfortunately, in our use case, it wasn't worth the headache of managing
aliases for every possible combination of collections that needed to be
queried, but we would have preferred to use aliases.


Aliases are the cleanest option.  This syntax also works, sorta blew my 
mind when somebody told me about it:


http://host:port/solr/current,archive2,archive4/select?q=*:*

If you're using a Solr client library, it might not be possible to 
control the URL like that, but if you're building URLs yourself, you 
could use it.


I recently filed an issue related to alias handling, some unexpected 
behavior:


https://issues.apache.org/jira/browse/SOLR-12849

Thanks,
Shawn



Re: Query to multiple collections

2018-10-22 Thread Rohan Kasat
Thanks Chris.

This help.

Regards,
Rohan

On Mon, Oct 22, 2018 at 12:26 PM Chris Ulicny  wrote:

> There weren't any particular problems we ran into since the client that
> makes the queries to multiple collections previously would query multiple
> cores using the 'shards' parameter before we moved to solrcloud. We didn't
> have any complicated sorting or scoring requirements fortunately.
>
> The one thing I remember looking into was what solr would do when two
> documents with the same id were found in both collections. I believe it
> just non-deterministically picked one, probably the one that came in first
> or last.
>
> Depending on how many collections you need to query simultaneously, it's
> worth looking into using aliases for lists of collections as Alex
> mentioned.
>
> Unfortunately, in our use case, it wasn't worth the headache of managing
> aliases for every possible combination of collections that needed to be
> queried, but we would have preferred to use aliases.
>
> On Mon, Oct 22, 2018 at 2:27 PM Rohan Kasat  wrote:
>
> > Thanks Alex.
> > I check aliases but dint focused much , will try to relate more to my use
> > case and have a look again at the same.
> > I guess the specification of collection in the query should be useful.
> >
> > Regards,
> > Rohan Kasat
> >
> > On Mon, Oct 22, 2018 at 11:21 AM Alexandre Rafalovitch <
> arafa...@gmail.com
> > >
> > wrote:
> >
> > > Have you tried using aliases:
> > >
> > >
> >
> http://lucene.apache.org/solr/guide/7_5/collections-api.html#collections-api
> > >
> > > You can also - I think - specify a collection of shards/collections
> > > directly in the query, but there may be side edge-cases with that (not
> > > sure).
> > >
> > > Regards,
> > > Alex.
> > > On Mon, 22 Oct 2018 at 13:49, Rohan Kasat 
> wrote:
> > > >
> > > > Hi All ,
> > > >
> > > > I have a SolrCloud setup with multiple collections.
> > > > I have created say -  two collections here as the data source for the
> > > both
> > > > collections are different and hence wanted to store them differently.
> > > > There is a use case , where i need to query both the collections and
> > show
> > > > unified search results.
> > > > The fields in the schema are same. ( say - title , description ,
> date )
> > > > Is there any specific way i can do this directly with the collections
> > API
> > > > or something like that?
> > > > Or i need to write a federator and combine results from search to the
> > > > respective collections and then unify them?
> > > >
> > > > --
> > > >
> > > > *Regards,Rohan*
> > >
> >
> >
> > --
> >
> > *Regards,Rohan Kasat*
> >
>
-- 

*Regards,Rohan Kasat*


Re: Query to multiple collections

2018-10-22 Thread Chris Ulicny
There weren't any particular problems we ran into since the client that
makes the queries to multiple collections previously would query multiple
cores using the 'shards' parameter before we moved to solrcloud. We didn't
have any complicated sorting or scoring requirements fortunately.

The one thing I remember looking into was what solr would do when two
documents with the same id were found in both collections. I believe it
just non-deterministically picked one, probably the one that came in first
or last.

Depending on how many collections you need to query simultaneously, it's
worth looking into using aliases for lists of collections as Alex
mentioned.

Unfortunately, in our use case, it wasn't worth the headache of managing
aliases for every possible combination of collections that needed to be
queried, but we would have preferred to use aliases.

On Mon, Oct 22, 2018 at 2:27 PM Rohan Kasat  wrote:

> Thanks Alex.
> I check aliases but dint focused much , will try to relate more to my use
> case and have a look again at the same.
> I guess the specification of collection in the query should be useful.
>
> Regards,
> Rohan Kasat
>
> On Mon, Oct 22, 2018 at 11:21 AM Alexandre Rafalovitch  >
> wrote:
>
> > Have you tried using aliases:
> >
> >
> http://lucene.apache.org/solr/guide/7_5/collections-api.html#collections-api
> >
> > You can also - I think - specify a collection of shards/collections
> > directly in the query, but there may be side edge-cases with that (not
> > sure).
> >
> > Regards,
> > Alex.
> > On Mon, 22 Oct 2018 at 13:49, Rohan Kasat  wrote:
> > >
> > > Hi All ,
> > >
> > > I have a SolrCloud setup with multiple collections.
> > > I have created say -  two collections here as the data source for the
> > both
> > > collections are different and hence wanted to store them differently.
> > > There is a use case , where i need to query both the collections and
> show
> > > unified search results.
> > > The fields in the schema are same. ( say - title , description , date )
> > > Is there any specific way i can do this directly with the collections
> API
> > > or something like that?
> > > Or i need to write a federator and combine results from search to the
> > > respective collections and then unify them?
> > >
> > > --
> > >
> > > *Regards,Rohan*
> >
>
>
> --
>
> *Regards,Rohan Kasat*
>


Re: Query to multiple collections

2018-10-22 Thread Rohan Kasat
Thanks Alex.
I check aliases but dint focused much , will try to relate more to my use
case and have a look again at the same.
I guess the specification of collection in the query should be useful.

Regards,
Rohan Kasat

On Mon, Oct 22, 2018 at 11:21 AM Alexandre Rafalovitch 
wrote:

> Have you tried using aliases:
>
> http://lucene.apache.org/solr/guide/7_5/collections-api.html#collections-api
>
> You can also - I think - specify a collection of shards/collections
> directly in the query, but there may be side edge-cases with that (not
> sure).
>
> Regards,
> Alex.
> On Mon, 22 Oct 2018 at 13:49, Rohan Kasat  wrote:
> >
> > Hi All ,
> >
> > I have a SolrCloud setup with multiple collections.
> > I have created say -  two collections here as the data source for the
> both
> > collections are different and hence wanted to store them differently.
> > There is a use case , where i need to query both the collections and show
> > unified search results.
> > The fields in the schema are same. ( say - title , description , date )
> > Is there any specific way i can do this directly with the collections API
> > or something like that?
> > Or i need to write a federator and combine results from search to the
> > respective collections and then unify them?
> >
> > --
> >
> > *Regards,Rohan*
>


-- 

*Regards,Rohan Kasat*


Re: Query to multiple collections

2018-10-22 Thread Rohan Kasat
Thanks Chris for the update.
I was thinking on the same grounds just wanted to check if you faced any
specific issues.

Regards,
Rohan Kasat


On Mon, Oct 22, 2018 at 11:20 AM Chris Ulicny  wrote:

> Rohan,
>
> I do not remember where I came across it or what restrictions exist on it,
> but it works for our use case of querying multiple archived collections
> with identical schemas in the same SolrCloud cluster. The queries have the
> following form:
>
>
> http::/solr/current/select?collection=current,archive2,archive4&q=...
>
>
> It seems like it might work for your use case, but you might need to tread
> carefully depending on your requirements for the returned results. Sorting
> and duplicate unique keys come to mind.
>
> Best,
> Chris
>
> On Mon, Oct 22, 2018 at 1:49 PM Rohan Kasat  wrote:
>
> > Hi All ,
> >
> > I have a SolrCloud setup with multiple collections.
> > I have created say -  two collections here as the data source for the
> both
> > collections are different and hence wanted to store them differently.
> > There is a use case , where i need to query both the collections and show
> > unified search results.
> > The fields in the schema are same. ( say - title , description , date )
> > Is there any specific way i can do this directly with the collections API
> > or something like that?
> > Or i need to write a federator and combine results from search to the
> > respective collections and then unify them?
> >
> > --
> >
> > *Regards,Rohan*
> >
>


-- 

*Regards,Rohan Kasat*


Re: Query to multiple collections

2018-10-22 Thread Alexandre Rafalovitch
Have you tried using aliases:
http://lucene.apache.org/solr/guide/7_5/collections-api.html#collections-api

You can also - I think - specify a collection of shards/collections
directly in the query, but there may be side edge-cases with that (not
sure).

Regards,
Alex.
On Mon, 22 Oct 2018 at 13:49, Rohan Kasat  wrote:
>
> Hi All ,
>
> I have a SolrCloud setup with multiple collections.
> I have created say -  two collections here as the data source for the both
> collections are different and hence wanted to store them differently.
> There is a use case , where i need to query both the collections and show
> unified search results.
> The fields in the schema are same. ( say - title , description , date )
> Is there any specific way i can do this directly with the collections API
> or something like that?
> Or i need to write a federator and combine results from search to the
> respective collections and then unify them?
>
> --
>
> *Regards,Rohan*


Re: Query to multiple collections

2018-10-22 Thread Chris Ulicny
Rohan,

I do not remember where I came across it or what restrictions exist on it,
but it works for our use case of querying multiple archived collections
with identical schemas in the same SolrCloud cluster. The queries have the
following form:

http::/solr/current/select?collection=current,archive2,archive4&q=...


It seems like it might work for your use case, but you might need to tread
carefully depending on your requirements for the returned results. Sorting
and duplicate unique keys come to mind.

Best,
Chris

On Mon, Oct 22, 2018 at 1:49 PM Rohan Kasat  wrote:

> Hi All ,
>
> I have a SolrCloud setup with multiple collections.
> I have created say -  two collections here as the data source for the both
> collections are different and hence wanted to store them differently.
> There is a use case , where i need to query both the collections and show
> unified search results.
> The fields in the schema are same. ( say - title , description , date )
> Is there any specific way i can do this directly with the collections API
> or something like that?
> Or i need to write a federator and combine results from search to the
> respective collections and then unify them?
>
> --
>
> *Regards,Rohan*
>


Query to multiple collections

2018-10-22 Thread Rohan Kasat
Hi All ,

I have a SolrCloud setup with multiple collections.
I have created say -  two collections here as the data source for the both
collections are different and hence wanted to store them differently.
There is a use case , where i need to query both the collections and show
unified search results.
The fields in the schema are same. ( say - title , description , date )
Is there any specific way i can do this directly with the collections API
or something like that?
Or i need to write a federator and combine results from search to the
respective collections and then unify them?

-- 

*Regards,Rohan*