subject:"multiple collections"

RE: Using Multiple collections with streaming expressions

2020-11-12 Thread ufuk yılmaz

Many thanks for the info Joel

--ufuk

Sent from Mail for Windows 10

From: Joel Bernstein
Sent: 12 November 2020 17:00
To: solr-user@lucene.apache.org
Subject: Re: Using Multiple collections with streaming expressions

T

Re: Using Multiple collections with streaming expressions

2020-11-12 Thread Joel Bernstein

The multiple collection syntax has been implemented for only a few stream
sources: search, timeseries, facet and stats. Eventually it will be
implemented for all stream sources.


Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Nov 10, 2020 at 12:32 PM ufuk yılmaz 
wrote:

> Thanks again Erick, that’s a good idea!
>
> Alternatively, I use an alias covering multiple collections in these
> situations, but there may be too many combinations of collections, so it’s
> not always suitable.
>
> Merged significantTerms streams will have meaningles scores in tuples I
> think, it would be comparing apples and oranges, but in this case I’m only
> interested in getting foreground counts, so it’s another day’s problem
>
> What seemed strange to me was source code for streams appeared to be
> handling this case.
>
>
> Sent from Mail for Windows 10
>
> From: Erick Erickson
> Sent: 10 November 2020 16:48
> To: solr-user@lucene.apache.org
> Subject: Re: Using Multiple collections with streaming expressions
>
> Y
>
>

RE: Using Multiple collections with streaming expressions

2020-11-10 Thread ufuk yılmaz

Thanks again Erick, that’s a good idea!

Alternatively, I use an alias covering multiple collections in these 
situations, but there may be too many combinations of collections, so it’s not 
always suitable.

Merged significantTerms streams will have meaningles scores in tuples I think, 
it would be comparing apples and oranges, but in this case I’m only interested 
in getting foreground counts, so it’s another day’s problem

What seemed strange to me was source code for streams appeared to be handling 
this case.


Sent from Mail for Windows 10

From: Erick Erickson
Sent: 10 November 2020 16:48
To: solr-user@lucene.apache.org
Subject: Re: Using Multiple collections with streaming expressions

Y

Re: Using Multiple collections with streaming expressions

2020-11-10 Thread Erick Erickson

You need to open multiple streams, one to each collection then combine them. 
For instance,
open a significantTerms stream to collection1, another to collection2 and wrap 
both
in a merge stream.

Best,
Erick

> On Nov 9, 2020, at 1:58 PM, ufuk yılmaz  wrote:
> 
> For example the streaming expression significantTerms:
> 
> https://lucene.apache.org/solr/guide/8_4/stream-source-reference.html#significantterms
> 
> 
> significantTerms(collection1,
> q="body:Solr",
> field="author",
> limit="50",
> minDocFreq="10",
> maxDocFreq=".20",
> minTermLength="5")
> 
> Solr supports querying multiple collections at once, but I can’t figure  out 
> how I can do that with streaming expressions.
> When I try enclosing them in quotes like:
> 
> significantTerms(“collection1, collection2”,
> q="body:Solr",
> field="author",
> limit="50",
> minDocFreq="10",
> maxDocFreq=".20",
> minTermLength="5")
> 
> It gives the error: "EXCEPTION":"java.io.IOException: Slices not found for \" 
> collection1, collection2\""
> I think Solr thinks quotes as part of the collection names, hence it can’t 
> find slices for it.
> 
> When I just use it without quotes:
> significantTerms(collection1, collection2,…
> It gives the error: "EXCEPTION":"invalid expression 
> significantTerms(collection1, collection2, …
> 
> I tried single quotes, escaping the quotation mark but nothing Works…
> 
> Any ideas?
> 
> Best, ufuk
> 
> Windows 10 için Posta ile gönderildi
>

Using Multiple collections with streaming expressions

2020-11-09 Thread ufuk yılmaz

For example the streaming expression significantTerms:

https://lucene.apache.org/solr/guide/8_4/stream-source-reference.html#significantterms


significantTerms(collection1,
 q="body:Solr",
 field="author",
 limit="50",
 minDocFreq="10",
 maxDocFreq=".20",
     minTermLength="5")

Solr supports querying multiple collections at once, but I can’t figure  out 
how I can do that with streaming expressions.
When I try enclosing them in quotes like:

significantTerms(“collection1, collection2”,
 q="body:Solr",
 field="author",
 limit="50",
 minDocFreq="10",
 maxDocFreq=".20",
 minTermLength="5")

It gives the error: "EXCEPTION":"java.io.IOException: Slices not found for \" 
collection1, collection2\""
I think Solr thinks quotes as part of the collection names, hence it can’t find 
slices for it.

When I just use it without quotes:
significantTerms(collection1, collection2,…
It gives the error: "EXCEPTION":"invalid expression 
significantTerms(collection1, collection2, …

I tried single quotes, escaping the quotation mark but nothing Works…

Any ideas?

Best, ufuk

Windows 10 için Posta ile gönderildi

fetch streaming expression multiple collections problem

2020-09-24 Thread uyilmaz



Hello all,

When I try to use the "select" streaming expression with multiple collections 
it works without any problems, like:

search(
"collection1,collection2",
q="*:*",
fl="field1,field2",
qt="/export",
sort="field1 desc"
)

but when I try to use the "fetch" expression similarly:

fetch(
"collection1,collection2"


It gives me an error saying: "EXCEPTION": "java.io.IOException: Slices not 
found for \"collection1,collection2\""

when I use it without quotes problem is resolved but another problem arises:

fetch(
collection1,collection2


which fetches fields only from collection1.. and returns empty for documents 
residing in collection2.

I took a look at the source code of fetch and select expressions, they both get 
collection parameter exactly the same way, using:

String collectionName = factory.getValueOperand(expression, 0)

I'm lost. When I use an alias in place of multiple collections it works as 
desired, but we have many collections and queries are generated dynamically so 
we would need many combination of aliases.

Need help.
Regards

-- 
uyilmaz

Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly

There may be other ways, easiest way is to write a script that gets the cluster 
status, and for each collection per replica you will have these details:

"collections":{
  “collection1":{
"pullReplicas":"0",
"replicationFactor":"1",
"shards":{
  "shard1":{
"range":"8000-8ccb",
"state":"active",
"replicas":{"core_node33":{
"core”:"collection1_shard1_replica_n30",
"base_url":"http://host:port/solr;,
"node_name”:”host:port",
"state":"active",
"type":"NRT",
"force_set_state":"false",
"leader":"true"}}},

For each replica of each shard make a localized call for numRecords:  
base_url/core/sleect?q=*:*=shardX=false=0
If you have replicas that disagree with each other with the number of records 
per shard then u have an issue with replicas not being in sync for a collection.
This is what I meant when I said “replicas out of sync”.

Your situation was actually very simple :) one of you collections has less data.
You seem to have a sync requirement between collections which is interesting, 
but thats beyond solr.
Your inter collection sync script needs some debugging most likely :) 

> On Aug 12, 2020, at 4:29 PM, Jae Joo  wrote:
> 
> Good question. How can I validate if the replicas are all synched?
> 
> 
> On Wed, Aug 12, 2020 at 7:28 PM Jae Joo  wrote:
> 
>> numFound  is same but different score.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly
>>  wrote:
>> 
>>> Try a simple test of querying each collection 5 times in a row, if the
>>> numFound are different for a single collection within tase 5 calls then u
>>> have it.
>>> Please try it, what you may think is sync’d may actually not be. How do
>>> you validate correct sync ?
>>> 
 On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:

 The replications are all synched and there are no updates while I was
 testing.

 On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
  wrote:

> Most likely you have 1 or more collections behind the alias that have
> replicas out of sync :)
> 
> Try querying each collection to find the one out of sync.
> 
>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>> 
>> I have 10 collections in single alias and having different result sets
> for
>> every time with the same query.
>> 
>> Is it as designed or do I miss something?
>> 
>> The configuration and schema for all 10 collections are identical.
>> Thanks,
>> 
>> Jae
> 
> 
>>> 
>>>

Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly

Glad u nailed the out of sync one :) 

> On Aug 12, 2020, at 4:38 PM, Jae Joo  wrote:
> 
> I found it the root cause. I have 3 collections assigned to a alias and one
> of them are NOT synched.
> By the alias.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Collection 1
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Collection 2
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Collection 3
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Aug 12, 2020 at 7:29 PM Jae Joo  wrote:
> 
>> Good question. How can I validate if the replicas are all synched?
>> 
>> 
>> On Wed, Aug 12, 2020 at 7:28 PM Jae Joo  wrote:
>> 
>>> numFound  is same but different score.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly
>>>  wrote:
>>> 
 Try a simple test of querying each collection 5 times in a row, if the
 numFound are different for a single collection within tase 5 calls then u
 have it.
 Please try it, what you may think is sync’d may actually not be. How do
 you validate correct sync ?
 
> On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
> 
> The replications are all synched and there are no updates while I was
> testing.
> 
> 
> On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
>  wrote:
> 
>> Most likely you have 1 or more collections behind the alias that have
>> replicas out of sync :)
>> 
>> Try querying each collection to find the one out of sync.
>> 
>>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>>> 
>>> I have 10 collections in single alias and having different result
 sets
>> for
>>> every time with the same query.
>>> 
>>> Is it as designed or do I miss something?
>>> 
>>> The configuration and schema for all 10 collections are identical.
>>> Thanks,
>>> 
>>> Jae
>> 
>>

Re: Multiple Collections in a Alias.

2020-08-12 Thread Jae Joo

I found it the root cause. I have 3 collections assigned to a alias and one
of them are NOT synched.
By the alias.











Collection 1











Collection 2











Collection 3











On Wed, Aug 12, 2020 at 7:29 PM Jae Joo  wrote:

> Good question. How can I validate if the replicas are all synched?
>
>
> On Wed, Aug 12, 2020 at 7:28 PM Jae Joo  wrote:
>
>> numFound  is same but different score.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>
>> On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly
>>  wrote:
>>
>>> Try a simple test of querying each collection 5 times in a row, if the
>>> numFound are different for a single collection within tase 5 calls then u
>>> have it.
>>> Please try it, what you may think is sync’d may actually not be. How do
>>> you validate correct sync ?
>>>
>>> > On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
>>> >
>>> > The replications are all synched and there are no updates while I was
>>> > testing.
>>> >
>>> >
>>> > On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
>>> >  wrote:
>>> >
>>> >> Most likely you have 1 or more collections behind the alias that have
>>> >> replicas out of sync :)
>>> >>
>>> >> Try querying each collection to find the one out of sync.
>>> >>
>>> >>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>>> >>>
>>> >>> I have 10 collections in single alias and having different result
>>> sets
>>> >> for
>>> >>> every time with the same query.
>>> >>>
>>> >>> Is it as designed or do I miss something?
>>> >>>
>>> >>> The configuration and schema for all 10 collections are identical.
>>> >>> Thanks,
>>> >>>
>>> >>> Jae
>>> >>
>>> >>
>>>
>>>

Re: Multiple Collections in a Alias.

2020-08-12 Thread Walter Underwood

Different absolute scores from different collections are OK, because
the exact values depend on the number of deleted documents.

For the set of documents that are in different orders from different
collections, are the scores of that set identical? If they are, then it
is normal to have a different order from different collections.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 12, 2020, at 4:29 PM, Jae Joo  wrote:
> 
> Good question. How can I validate if the replicas are all synched?
> 
> 
> On Wed, Aug 12, 2020 at 7:28 PM Jae Joo  wrote:
> 
>> numFound  is same but different score.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly
>>  wrote:
>> 
>>> Try a simple test of querying each collection 5 times in a row, if the
>>> numFound are different for a single collection within tase 5 calls then u
>>> have it.
>>> Please try it, what you may think is sync’d may actually not be. How do
>>> you validate correct sync ?
>>> 
 On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
 
 The replications are all synched and there are no updates while I was
 testing.
 
 
 On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
  wrote:
 
> Most likely you have 1 or more collections behind the alias that have
> replicas out of sync :)
> 
> Try querying each collection to find the one out of sync.
> 
>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>> 
>> I have 10 collections in single alias and having different result sets
> for
>> every time with the same query.
>> 
>> Is it as designed or do I miss something?
>> 
>> The configuration and schema for all 10 collections are identical.
>> Thanks,
>> 
>> Jae
> 
> 
>>> 
>>>

Re: Multiple Collections in a Alias.

2020-08-12 Thread Jae Joo

Good question. How can I validate if the replicas are all synched?


On Wed, Aug 12, 2020 at 7:28 PM Jae Joo  wrote:

> numFound  is same but different score.
> 
> 
> 
> 
> 
> 
> 
>
> On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly
>  wrote:
>
>> Try a simple test of querying each collection 5 times in a row, if the
>> numFound are different for a single collection within tase 5 calls then u
>> have it.
>> Please try it, what you may think is sync’d may actually not be. How do
>> you validate correct sync ?
>>
>> > On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
>> >
>> > The replications are all synched and there are no updates while I was
>> > testing.
>> >
>> >
>> > On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
>> >  wrote:
>> >
>> >> Most likely you have 1 or more collections behind the alias that have
>> >> replicas out of sync :)
>> >>
>> >> Try querying each collection to find the one out of sync.
>> >>
>> >>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>> >>>
>> >>> I have 10 collections in single alias and having different result sets
>> >> for
>> >>> every time with the same query.
>> >>>
>> >>> Is it as designed or do I miss something?
>> >>>
>> >>> The configuration and schema for all 10 collections are identical.
>> >>> Thanks,
>> >>>
>> >>> Jae
>> >>
>> >>
>>
>>

Re: Multiple Collections in a Alias.

2020-08-12 Thread Jae Joo

numFound  is same but different score.








On Wed, Aug 12, 2020 at 6:01 PM Aroop Ganguly
 wrote:

> Try a simple test of querying each collection 5 times in a row, if the
> numFound are different for a single collection within tase 5 calls then u
> have it.
> Please try it, what you may think is sync’d may actually not be. How do
> you validate correct sync ?
>
> > On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
> >
> > The replications are all synched and there are no updates while I was
> > testing.
> >
> >
> > On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
> >  wrote:
> >
> >> Most likely you have 1 or more collections behind the alias that have
> >> replicas out of sync :)
> >>
> >> Try querying each collection to find the one out of sync.
> >>
> >>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
> >>>
> >>> I have 10 collections in single alias and having different result sets
> >> for
> >>> every time with the same query.
> >>>
> >>> Is it as designed or do I miss something?
> >>>
> >>> The configuration and schema for all 10 collections are identical.
> >>> Thanks,
> >>>
> >>> Jae
> >>
> >>
>
>

Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly

Try a simple test of querying each collection 5 times in a row, if the numFound 
are different for a single collection within tase 5 calls then u have it.
Please try it, what you may think is sync’d may actually not be. How do you 
validate correct sync ?

> On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
> 
> The replications are all synched and there are no updates while I was
> testing.
> 
> 
> On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
>  wrote:
> 
>> Most likely you have 1 or more collections behind the alias that have
>> replicas out of sync :)
>> 
>> Try querying each collection to find the one out of sync.
>> 
>>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>>> 
>>> I have 10 collections in single alias and having different result sets
>> for
>>> every time with the same query.
>>> 
>>> Is it as designed or do I miss something?
>>> 
>>> The configuration and schema for all 10 collections are identical.
>>> Thanks,
>>> 
>>> Jae
>> 
>>

Re: Multiple Collections in a Alias.

2020-08-12 Thread Walter Underwood

Are the scores the same for the documents that are ordered differently?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 12, 2020, at 10:55 AM, Jae Joo  wrote:
> 
> The replications are all synched and there are no updates while I was
> testing.
> 
> 
> On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
>  wrote:
> 
>> Most likely you have 1 or more collections behind the alias that have
>> replicas out of sync :)
>> 
>> Try querying each collection to find the one out of sync.
>> 
>>> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
>>> 
>>> I have 10 collections in single alias and having different result sets
>> for
>>> every time with the same query.
>>> 
>>> Is it as designed or do I miss something?
>>> 
>>> The configuration and schema for all 10 collections are identical.
>>> Thanks,
>>> 
>>> Jae
>> 
>>

Re: Multiple Collections in a Alias.

2020-08-12 Thread Jae Joo

The replications are all synched and there are no updates while I was
testing.


On Wed, Aug 12, 2020 at 1:49 PM Aroop Ganguly
 wrote:

> Most likely you have 1 or more collections behind the alias that have
> replicas out of sync :)
>
> Try querying each collection to find the one out of sync.
>
> > On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
> >
> > I have 10 collections in single alias and having different result sets
> for
> > every time with the same query.
> >
> > Is it as designed or do I miss something?
> >
> > The configuration and schema for all 10 collections are identical.
> > Thanks,
> >
> > Jae
>
>

Re: Multiple Collections in a Alias.

2020-08-12 Thread Aroop Ganguly

Most likely you have 1 or more collections behind the alias that have replicas 
out of sync :) 

Try querying each collection to find the one out of sync.

> On Aug 12, 2020, at 10:47 AM, Jae Joo  wrote:
> 
> I have 10 collections in single alias and having different result sets for
> every time with the same query.
> 
> Is it as designed or do I miss something?
> 
> The configuration and schema for all 10 collections are identical.
> Thanks,
> 
> Jae

Multiple Collections in a Alias.

2020-08-12 Thread Jae Joo

I have 10 collections in single alias and having different result sets for
every time with the same query.

Is it as designed or do I miss something?

The configuration and schema for all 10 collections are identical.
Thanks,

Jae

Re: Reload synonyms without reloading the multiple collections

2018-12-30 Thread Simón de Frosterus Pokrzywnicki

Sorry, I see that it may have been confusing.

My webapp calls the reload of all the affected Collections (about a dozen
of them) in sequential mode using the Collections API.

Ideally I would be able to write some QueryTimeSynonymFilterFactory that
would periodically or when told, reload the synonym's file from ZK, which
is what the system edits when a user changes some synonyms.

I understand that a Collection needs to be reloaded if the synonyms were to
be used at indexation time, but this is not my case.

The managed API is on the same situation, basically it does what I am doing
on my own right now. At the end, there has to be a reload of the affected
Collections.

Regards,
Simón

On Sun, Dec 30, 2018 at 5:01 AM Shawn Heisey  wrote:

> On 12/29/2018 5:55 AM, Simón de Frosterus Pokrzywnicki wrote:
> > The problem is that when the user changes the synonyms, it automatically
> > triggers a sequential reload of all the Collections.
>
> What exactly is being done when you say "the user changes the
> synonyms"?  Just uploading a new synonyms definition file to zookeeper
> would *NOT* result in a reload of *ANY* collection.  As far as I am
> aware, collection reloads only happen when they are explicitly
> requested.  Usage of the managed APIs to change aspects of the schema
> could cause a reload, but it's only going to happen on the collection
> where the API is used, not all collections.
>
> Basically, I cannot imagine any situation that would cause a reload of
> all collections, other than explicitly asking Solr to do those reloads.
>
> Thanks,
> Shawn
>
>

Re: Reload synonyms without reloading the multiple collections

2018-12-29 Thread Shawn Heisey


On 12/29/2018 5:55 AM, Simón de Frosterus Pokrzywnicki wrote:

The problem is that when the user changes the synonyms, it automatically
triggers a sequential reload of all the Collections.


What exactly is being done when you say "the user changes the 
synonyms"?  Just uploading a new synonyms definition file to zookeeper 
would *NOT* result in a reload of *ANY* collection.  As far as I am 
aware, collection reloads only happen when they are explicitly 
requested.  Usage of the managed APIs to change aspects of the schema 
could cause a reload, but it's only going to happen on the collection 
where the API is used, not all collections.


Basically, I cannot imagine any situation that would cause a reload of 
all collections, other than explicitly asking Solr to do those reloads.


Thanks,
Shawn

Reload synonyms without reloading the multiple collections

2018-12-29 Thread Simón de Frosterus Pokrzywnicki

Hello,

I have a solrcloud setup with multiple Collections based on the same
configset.

One of the features I have is that the user can define their own synonyms
in order to improve their search experience which has worked fine until
recently.

Lately the platform has grown and the user has several dozen Collections,
must of them with 200k or more documents of non-trivial size.

The problem is that when the user changes the synonyms, it automatically
triggers a sequential reload of all the Collections. This is now always
causing problems, to a point where the platform becomes unstable and may
need a restart of Solr, which means we have to access the platform and
manually stabilize it.

The synonyms are only used at query time, so there is no need to reindex
anything and it seems like overkill to reload the Collections to change the
synonyms.

I have tried creating my own CustomSynonymGraphFilter and have it call
the loadSynonyms()
method as needed but this causes some weird behavior where queries
sometimes have the newly added synonyms working fine but sometimes not. I
get the impression that there may be like N "threads" handling the queries
but I only change the SynonymMap for one of them, so when the query hits
the right "thread" it works, but in most cases it does not.

My custom fieldType looks like this:


  




  
  





  



I would like to know if there is some other class I can redefine to make
sure the new SynonymMap is used in all cases.

Thanks,
Simón

PS: I have upgraded to Solr 7.6.

Re: Query to multiple collections

2018-10-25 Thread Atita Arora

Hi,

This kind of was one of the problems I was facing recently.
While in my use case I am supposed to be showing spellcheck suggestions
(collated) from two different collections.
To also mention both these collections are using the same schema while they
need to be segregated as for the business nature they serve.

I considered using the aliasing approach too, while was little unsure if
this might work for me.
Weirdly the standard select URL itself is a trouble for me and I run into
the following exception on my browser :

http://:8983/solr/products.1,products.3/select?q=*:*

{
  "responseHeader": {
"zkConnected": true,
"status": 500,
"QTime": 24,
"params": {
  "q": "*:*"
}
  },
  "error": {
"trace": "java.lang.NullPointerException\n\tat
org.apache.solr.handler.component.QueryComponent.unmarshalSortValues(QueryComponent.java:1034)\n\tat
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:885)\n\tat
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:585)\n\tat
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:564)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:423)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:534)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)\n\tat
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\n\tat
java.lang.Thread.run(Thread.java:748)\n",
"code": 500
  }
}

I would really appreciate if someone could possibly tell me what could be
happening?

Thanks,
Atita

On Tue, Oct 23, 2018 at 1:58 AM Rohan Kasat  wrote:

> Thanks Shawn for the update.
> I am going ahead with the standard aliases approach , suits my use case.
>
> Regards,
> Rohan Kasat
>
>
> On Mon, Oct 22, 2018 at 4:49 PM Shawn Heisey  wrote:
>
> > On 10/22/2018 1:26 PM, Chris Ulicny wrote:
> > > There weren't any particular problems we ran into since the client that
> > > makes the queries to multiple collections previously would query
> multiple
> > > cores using the 'shards' parameter before we moved to solrcloud. We
> > didn't
> > > have any complicated sorting or scoring requirements

Re: Query to multiple collections

2018-10-22 Thread Rohan Kasat

Thanks Shawn for the update.
I am going ahead with the standard aliases approach , suits my use case.

Regards,
Rohan Kasat


On Mon, Oct 22, 2018 at 4:49 PM Shawn Heisey  wrote:

> On 10/22/2018 1:26 PM, Chris Ulicny wrote:
> > There weren't any particular problems we ran into since the client that
> > makes the queries to multiple collections previously would query multiple
> > cores using the 'shards' parameter before we moved to solrcloud. We
> didn't
> > have any complicated sorting or scoring requirements fortunately.
> >
> > The one thing I remember looking into was what solr would do when two
> > documents with the same id were found in both collections. I believe it
> > just non-deterministically picked one, probably the one that came in
> first
> > or last.
>
> Yes, that is how it works.  I do not know whether it is the first one to
> respond or the last one to respond that ends up in the results.  Solr is
> designed to work with data where the uniqueKey field really is unique
> across everything that is being queried.  Results can vary when you have
> the same uniqueKey value in more than one place and you query both of
> them at once.
>
> > Depending on how many collections you need to query simultaneously, it's
> > worth looking into using aliases for lists of collections as Alex
> > mentioned.
> >
> > Unfortunately, in our use case, it wasn't worth the headache of managing
> > aliases for every possible combination of collections that needed to be
> > queried, but we would have preferred to use aliases.
>
> Aliases are the cleanest option.  This syntax also works, sorta blew my
> mind when somebody told me about it:
>
> http://host:port/solr/current,archive2,archive4/select?q=*:*
>
> If you're using a Solr client library, it might not be possible to
> control the URL like that, but if you're building URLs yourself, you
> could use it.
>
> I recently filed an issue related to alias handling, some unexpected
> behavior:
>
> https://issues.apache.org/jira/browse/SOLR-12849
>
> Thanks,
> Shawn
>
>

-- 

*Regards,Rohan Kasat*

Re: Query to multiple collections

2018-10-22 Thread Shawn Heisey


On 10/22/2018 1:26 PM, Chris Ulicny wrote:

There weren't any particular problems we ran into since the client that
makes the queries to multiple collections previously would query multiple
cores using the 'shards' parameter before we moved to solrcloud. We didn't
have any complicated sorting or scoring requirements fortunately.

The one thing I remember looking into was what solr would do when two
documents with the same id were found in both collections. I believe it
just non-deterministically picked one, probably the one that came in first
or last.


Yes, that is how it works.  I do not know whether it is the first one to 
respond or the last one to respond that ends up in the results.  Solr is 
designed to work with data where the uniqueKey field really is unique 
across everything that is being queried.  Results can vary when you have 
the same uniqueKey value in more than one place and you query both of 
them at once.



Depending on how many collections you need to query simultaneously, it's
worth looking into using aliases for lists of collections as Alex
mentioned.

Unfortunately, in our use case, it wasn't worth the headache of managing
aliases for every possible combination of collections that needed to be
queried, but we would have preferred to use aliases.


Aliases are the cleanest option.  This syntax also works, sorta blew my 
mind when somebody told me about it:


http://host:port/solr/current,archive2,archive4/select?q=*:*

If you're using a Solr client library, it might not be possible to 
control the URL like that, but if you're building URLs yourself, you 
could use it.


I recently filed an issue related to alias handling, some unexpected 
behavior:


https://issues.apache.org/jira/browse/SOLR-12849

Thanks,
Shawn

Re: Query to multiple collections

2018-10-22 Thread Rohan Kasat

Thanks Chris.

This help.

Regards,
Rohan

On Mon, Oct 22, 2018 at 12:26 PM Chris Ulicny  wrote:

> There weren't any particular problems we ran into since the client that
> makes the queries to multiple collections previously would query multiple
> cores using the 'shards' parameter before we moved to solrcloud. We didn't
> have any complicated sorting or scoring requirements fortunately.
>
> The one thing I remember looking into was what solr would do when two
> documents with the same id were found in both collections. I believe it
> just non-deterministically picked one, probably the one that came in first
> or last.
>
> Depending on how many collections you need to query simultaneously, it's
> worth looking into using aliases for lists of collections as Alex
> mentioned.
>
> Unfortunately, in our use case, it wasn't worth the headache of managing
> aliases for every possible combination of collections that needed to be
> queried, but we would have preferred to use aliases.
>
> On Mon, Oct 22, 2018 at 2:27 PM Rohan Kasat  wrote:
>
> > Thanks Alex.
> > I check aliases but dint focused much , will try to relate more to my use
> > case and have a look again at the same.
> > I guess the specification of collection in the query should be useful.
> >
> > Regards,
> > Rohan Kasat
> >
> > On Mon, Oct 22, 2018 at 11:21 AM Alexandre Rafalovitch <
> arafa...@gmail.com
> > >
> > wrote:
> >
> > > Have you tried using aliases:
> > >
> > >
> >
> http://lucene.apache.org/solr/guide/7_5/collections-api.html#collections-api
> > >
> > > You can also - I think - specify a collection of shards/collections
> > > directly in the query, but there may be side edge-cases with that (not
> > > sure).
> > >
> > > Regards,
> > > Alex.
> > > On Mon, 22 Oct 2018 at 13:49, Rohan Kasat 
> wrote:
> > > >
> > > > Hi All ,
> > > >
> > > > I have a SolrCloud setup with multiple collections.
> > > > I have created say -  two collections here as the data source for the
> > > both
> > > > collections are different and hence wanted to store them differently.
> > > > There is a use case , where i need to query both the collections and
> > show
> > > > unified search results.
> > > > The fields in the schema are same. ( say - title , description ,
> date )
> > > > Is there any specific way i can do this directly with the collections
> > API
> > > > or something like that?
> > > > Or i need to write a federator and combine results from search to the
> > > > respective collections and then unify them?
> > > >
> > > > --
> > > >
> > > > *Regards,Rohan*
> > >
> >
> >
> > --
> >
> > *Regards,Rohan Kasat*
> >
>
-- 

*Regards,Rohan Kasat*

Re: Query to multiple collections

2018-10-22 Thread Chris Ulicny

There weren't any particular problems we ran into since the client that
makes the queries to multiple collections previously would query multiple
cores using the 'shards' parameter before we moved to solrcloud. We didn't
have any complicated sorting or scoring requirements fortunately.

The one thing I remember looking into was what solr would do when two
documents with the same id were found in both collections. I believe it
just non-deterministically picked one, probably the one that came in first
or last.

Depending on how many collections you need to query simultaneously, it's
worth looking into using aliases for lists of collections as Alex
mentioned.

Unfortunately, in our use case, it wasn't worth the headache of managing
aliases for every possible combination of collections that needed to be
queried, but we would have preferred to use aliases.

On Mon, Oct 22, 2018 at 2:27 PM Rohan Kasat  wrote:

> Thanks Alex.
> I check aliases but dint focused much , will try to relate more to my use
> case and have a look again at the same.
> I guess the specification of collection in the query should be useful.
>
> Regards,
> Rohan Kasat
>
> On Mon, Oct 22, 2018 at 11:21 AM Alexandre Rafalovitch  >
> wrote:
>
> > Have you tried using aliases:
> >
> >
> http://lucene.apache.org/solr/guide/7_5/collections-api.html#collections-api
> >
> > You can also - I think - specify a collection of shards/collections
> > directly in the query, but there may be side edge-cases with that (not
> > sure).
> >
> > Regards,
> > Alex.
> > On Mon, 22 Oct 2018 at 13:49, Rohan Kasat  wrote:
> > >
> > > Hi All ,
> > >
> > > I have a SolrCloud setup with multiple collections.
> > > I have created say -  two collections here as the data source for the
> > both
> > > collections are different and hence wanted to store them differently.
> > > There is a use case , where i need to query both the collections and
> show
> > > unified search results.
> > > The fields in the schema are same. ( say - title , description , date )
> > > Is there any specific way i can do this directly with the collections
> API
> > > or something like that?
> > > Or i need to write a federator and combine results from search to the
> > > respective collections and then unify them?
> > >
> > > --
> > >
> > > *Regards,Rohan*
> >
>
>
> --
>
> *Regards,Rohan Kasat*
>

Re: Query to multiple collections

2018-10-22 Thread Rohan Kasat

Thanks Alex.
I check aliases but dint focused much , will try to relate more to my use
case and have a look again at the same.
I guess the specification of collection in the query should be useful.

Regards,
Rohan Kasat

On Mon, Oct 22, 2018 at 11:21 AM Alexandre Rafalovitch 
wrote:

> Have you tried using aliases:
>
> http://lucene.apache.org/solr/guide/7_5/collections-api.html#collections-api
>
> You can also - I think - specify a collection of shards/collections
> directly in the query, but there may be side edge-cases with that (not
> sure).
>
> Regards,
> Alex.
> On Mon, 22 Oct 2018 at 13:49, Rohan Kasat  wrote:
> >
> > Hi All ,
> >
> > I have a SolrCloud setup with multiple collections.
> > I have created say -  two collections here as the data source for the
> both
> > collections are different and hence wanted to store them differently.
> > There is a use case , where i need to query both the collections and show
> > unified search results.
> > The fields in the schema are same. ( say - title , description , date )
> > Is there any specific way i can do this directly with the collections API
> > or something like that?
> > Or i need to write a federator and combine results from search to the
> > respective collections and then unify them?
> >
> > --
> >
> > *Regards,Rohan*
>


-- 

*Regards,Rohan Kasat*

Re: Query to multiple collections

2018-10-22 Thread Rohan Kasat

Thanks Chris for the update.
I was thinking on the same grounds just wanted to check if you faced any
specific issues.

Regards,
Rohan Kasat


On Mon, Oct 22, 2018 at 11:20 AM Chris Ulicny  wrote:

> Rohan,
>
> I do not remember where I came across it or what restrictions exist on it,
> but it works for our use case of querying multiple archived collections
> with identical schemas in the same SolrCloud cluster. The queries have the
> following form:
>
>
> http::/solr/current/select?collection=current,archive2,archive4=...
>
>
> It seems like it might work for your use case, but you might need to tread
> carefully depending on your requirements for the returned results. Sorting
> and duplicate unique keys come to mind.
>
> Best,
> Chris
>
> On Mon, Oct 22, 2018 at 1:49 PM Rohan Kasat  wrote:
>
> > Hi All ,
> >
> > I have a SolrCloud setup with multiple collections.
> > I have created say -  two collections here as the data source for the
> both
> > collections are different and hence wanted to store them differently.
> > There is a use case , where i need to query both the collections and show
> > unified search results.
> > The fields in the schema are same. ( say - title , description , date )
> > Is there any specific way i can do this directly with the collections API
> > or something like that?
> > Or i need to write a federator and combine results from search to the
> > respective collections and then unify them?
> >
> > --
> >
> > *Regards,Rohan*
> >
>


-- 

*Regards,Rohan Kasat*

Re: Query to multiple collections

2018-10-22 Thread Alexandre Rafalovitch

Have you tried using aliases:
http://lucene.apache.org/solr/guide/7_5/collections-api.html#collections-api

You can also - I think - specify a collection of shards/collections
directly in the query, but there may be side edge-cases with that (not
sure).

Regards,
Alex.
On Mon, 22 Oct 2018 at 13:49, Rohan Kasat  wrote:
>
> Hi All ,
>
> I have a SolrCloud setup with multiple collections.
> I have created say -  two collections here as the data source for the both
> collections are different and hence wanted to store them differently.
> There is a use case , where i need to query both the collections and show
> unified search results.
> The fields in the schema are same. ( say - title , description , date )
> Is there any specific way i can do this directly with the collections API
> or something like that?
> Or i need to write a federator and combine results from search to the
> respective collections and then unify them?
>
> --
>
> *Regards,Rohan*

Re: Query to multiple collections

2018-10-22 Thread Chris Ulicny

Rohan,

I do not remember where I came across it or what restrictions exist on it,
but it works for our use case of querying multiple archived collections
with identical schemas in the same SolrCloud cluster. The queries have the
following form:

http::/solr/current/select?collection=current,archive2,archive4=...

It seems like it might work for your use case, but you might need to tread
carefully depending on your requirements for the returned results. Sorting
and duplicate unique keys come to mind.

Best,
Chris

On Mon, Oct 22, 2018 at 1:49 PM Rohan Kasat  wrote:

> Hi All ,
>
> I have a SolrCloud setup with multiple collections.
> I have created say -  two collections here as the data source for the both
> collections are different and hence wanted to store them differently.
> There is a use case , where i need to query both the collections and show
> unified search results.
> The fields in the schema are same. ( say - title , description , date )
> Is there any specific way i can do this directly with the collections API
> or something like that?
> Or i need to write a federator and combine results from search to the
> respective collections and then unify them?
>
> --
>
> *Regards,Rohan*
>

Query to multiple collections

2018-10-22 Thread Rohan Kasat

Hi All ,

I have a SolrCloud setup with multiple collections.
I have created say -  two collections here as the data source for the both
collections are different and hence wanted to store them differently.
There is a use case , where i need to query both the collections and show
unified search results.
The fields in the schema are same. ( say - title , description , date )
Is there any specific way i can do this directly with the collections API
or something like that?
Or i need to write a federator and combine results from search to the
respective collections and then unify them?

-- 

*Regards,Rohan*

Re: Multiple collections for a write-alias

2017-11-13 Thread S G

We are actually very close to doing what Shawn has suggested.

Emir has a good point about new collections failing on deletes/updates of
older documents which were not present in the new collection. But even if
this
feature can be implemented for an append-only log, it would make a good
feature IMO.

Use-case for re-indexing everything again is generally that of an attribute
change like
enabling "indexed" or "docValues" on a field or adding a new field to a
schema.
While the reading client-code sits behind a flag to start using the new
attribute/field, we
have to re-index all the data without stopping older-format reads.
Currently, we have to do
dual writes to the new collections or play catch-up-after-a-bootstrap.

Note that the catch-up-after-a-bootstrap is not very easy too (it is very
similar to the one
described by Shwan). If this special place is Kafka or some table in the
DB, then we have to
do dual writes to the regular source-of-truth and this special place. Dual
writes with DB and Kafka
suffer from being transaction-less (and thus lack consistency) while dual
write to DB increase
the load on DB.

Having created_date / modified_date fields and querying the DB to find
live-traffic documents has
its own problems and is taxing on the DB again.

Dual writes to Solr's multiple collections directly is the simplest to
implement for a client and
that is exactly what this new feature could be. With a
dual-write-collection-alias, it becomes
easier for the client to not implement any of the above if the
dual-write-collection-alias does the following:

- Deletes on missing documents in new collection are simply ignored.
- Incremental updates just throw an error for not being supported on
multi-write-collection-alias.
- Regular updates (i.e. Delete-Then-Insert) should work just fine because
they will just treat the document as a brand new one and versioning
strategies can take care of out-of-order updates.

SG

On Fri, Nov 10, 2017 at 6:33 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> This approach could work only if it is append only index. In case you have
> updates/deletes, you have to process in order, otherwise you will get
> incorrect results. I am thinking that is one of the reasons why it might
> not be supported since not too useful.
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 9 Nov 2017, at 19:09, S G <sg.online.em...@gmail.com> wrote:
> >
> > Hi,
> >
> > We have a use-case to re-create a solr-collection by re-ingesting
> > everything but not tolerate a downtime while that is happening.
> >
> > We are using collection alias feature to point to the new collection when
> > it has been re-ingested fully.
> >
> > However, re-ingestion takes several hours to complete and during that
> time,
> > the customer has to write to both the collections - previous collection
> and
> > the one being bootstrapped.
> > This dual-write is harder to do from the client side (because client
> needs
> > to have a retry logic to ensure any update does not succeed in one
> > collection and fails in another - consistency problem) and it would be a
> > real welcome addition if collection aliasing can support this.
> >
> > Proposal:
> > If can enhance the write alias to point to multiple collections such that
> > any update to the alias is written to all the collections it points to,
> it
> > would help the client to avoid dual writes and also issue just a single
> > http call from the client instead of multiple. It would also reduce the
> > retry logic inside the client code used to keep the collections
> consistent.
> >
> >
> > Thanks
> > SG
>
>

Re: Multiple collections for a write-alias

2017-11-10 Thread Emir Arnautović

This approach could work only if it is append only index. In case you have 
updates/deletes, you have to process in order, otherwise you will get incorrect 
results. I am thinking that is one of the reasons why it might not be supported 
since not too useful.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 9 Nov 2017, at 19:09, S G <sg.online.em...@gmail.com> wrote:
> 
> Hi,
> 
> We have a use-case to re-create a solr-collection by re-ingesting
> everything but not tolerate a downtime while that is happening.
> 
> We are using collection alias feature to point to the new collection when
> it has been re-ingested fully.
> 
> However, re-ingestion takes several hours to complete and during that time,
> the customer has to write to both the collections - previous collection and
> the one being bootstrapped.
> This dual-write is harder to do from the client side (because client needs
> to have a retry logic to ensure any update does not succeed in one
> collection and fails in another - consistency problem) and it would be a
> real welcome addition if collection aliasing can support this.
> 
> Proposal:
> If can enhance the write alias to point to multiple collections such that
> any update to the alias is written to all the collections it points to, it
> would help the client to avoid dual writes and also issue just a single
> http call from the client instead of multiple. It would also reduce the
> retry logic inside the client code used to keep the collections consistent.
> 
> 
> Thanks
> SG

Re: Multiple collections for a write-alias

2017-11-09 Thread Shawn Heisey

On 11/9/2017 11:09 AM, S G wrote:
> However, re-ingestion takes several hours to complete and during that time,
> the customer has to write to both the collections - previous collection and
> the one being bootstrapped.
> This dual-write is harder to do from the client side (because client needs
> to have a retry logic to ensure any update does not succeed in one
> collection and fails in another - consistency problem) and it would be a
> real welcome addition if collection aliasing can support this.

Let me explain how I handle this situation.  I'm not running in cloud
mode, but I use the "swap" feature of CoreAdmin to do much the same
thing you're describing with collection aliases.

My source data (mysql database) has a way to track the last new document
that was added, as well as track which deletes have been applied, and
which documents need to be reinserted.  I use these pointers to decide
what data to retrieve on each indexing cycle, and then I update them to
new positions when the indexing cycle completes successfully.

When I do a full rebuild, I grab the current positions for new docs,
deletes, and reinserts, and store that information in a special place. 
Then I start building indexes in the "build" cores.  In the meantime, I
am continuing to update all the "live" cores, so users are unaware that
anything special is happening.

When the rebuild finishes (which can take a day or more), I go to that
special place where I stored all the position information, and proceed
to run a "catchup" indexing process on the build cores -- all the
deletes, new documents, and reinserts that happened since the time the
full rebuild started.  When that completes, I swap the build cores with
the live cores, and resume normal operation.

Doing it this way, I do not need to worry about the normal indexing
cycle handling writes to both the old index and the new index -- the
ongoing cycle just updates the current live cores.

> Proposal:
> If can enhance the write alias to point to multiple collections such that
> any update to the alias is written to all the collections it points to, it
> would help the client to avoid dual writes and also issue just a single
> http call from the client instead of multiple. It would also reduce the
> retry logic inside the client code used to keep the collections consistent.

Imagine an index with time-series data, where there is an alias called
"today" that includes up to 24 hourly collections.  If you were to write
to that alias with the idea you've proposed, the data would end up in
the wrong places and would in fact get incorrectly duplicated many times
... but the way it currently works, the writes would only go to the
FIRST collection in the alias, which can be arranged to always be the
"current" collection.

Your proposal is an interesting idea, but would require some development
work.  Errors during indexing could be a major source of headaches,
especially those errors that don't affect all collections in the alias
equally.  So as to not change how users expect Solr to work currently,
aliases would need a special flag to indicate that writes *should* be
duplicated to all collections in the alias, or maybe there would need to
be two different kinds of aliases.  Since such a feature is probably not
going to happen quickly even if it is something that we agree to work
on, would you be able to use something like the method that I outlined
above?

Thanks,
Shawn

Re: Multiple collections for a write-alias

2017-11-09 Thread Erick Erickson

Aliases can already point to multiple collections, have you just tried that?

I'm not totally sure what the behavior would be, but nothing you've written
indicates you tried so I thought I'd point it out.

It's not clear to me how useful this is though, or what failure messages
are returned. Or how you figure out which collection failed. Or how you'd
take remedial action.

Best,
Erick

Erick

On Thu, Nov 9, 2017 at 10:09 AM, S G <sg.online.em...@gmail.com> wrote:
> Hi,
>
> We have a use-case to re-create a solr-collection by re-ingesting
> everything but not tolerate a downtime while that is happening.
>
> We are using collection alias feature to point to the new collection when
> it has been re-ingested fully.
>
> However, re-ingestion takes several hours to complete and during that time,
> the customer has to write to both the collections - previous collection and
> the one being bootstrapped.
> This dual-write is harder to do from the client side (because client needs
> to have a retry logic to ensure any update does not succeed in one
> collection and fails in another - consistency problem) and it would be a
> real welcome addition if collection aliasing can support this.
>
> Proposal:
> If can enhance the write alias to point to multiple collections such that
> any update to the alias is written to all the collections it points to, it
> would help the client to avoid dual writes and also issue just a single
> http call from the client instead of multiple. It would also reduce the
> retry logic inside the client code used to keep the collections consistent.
>
>
> Thanks
> SG

Multiple collections for a write-alias

2017-11-09 Thread S G

Hi,

We have a use-case to re-create a solr-collection by re-ingesting
everything but not tolerate a downtime while that is happening.

We are using collection alias feature to point to the new collection when
it has been re-ingested fully.

However, re-ingestion takes several hours to complete and during that time,
the customer has to write to both the collections - previous collection and
the one being bootstrapped.
This dual-write is harder to do from the client side (because client needs
to have a retry logic to ensure any update does not succeed in one
collection and fails in another - consistency problem) and it would be a
real welcome addition if collection aliasing can support this.

Proposal:
If can enhance the write alias to point to multiple collections such that
any update to the alias is written to all the collections it points to, it
would help the client to avoid dual writes and also issue just a single
http call from the client instead of multiple. It would also reduce the
retry logic inside the client code used to keep the collections consistent.


Thanks
SG

Re: Multiple collections vs multiple shards for multitenancy

2017-05-07 Thread Chris Troullis

iment with how many
> >> _documents_ you can have in a collection (however you partition that
> >> up) and use the multi-tenant approach. So you have N collections and
> >> each collection has a (varying) number of tenants. This also tends to
> >> flatten out the update process on the assumption that your smaller
> >> tenants also don't update their data as often.
> >>
> >> However, I really have to question one of your basic statements:
> >>
> >> "This works fine with aggressive autowarming, but I have a need to
> reduce
> >> my NRT
> >> search capabilities to seconds as opposed to the minutes it is at
> now,"...
> >>
> >> The implication here is that your autowarming takes minutes. Very
> >> often people severely overdo the warmup by setting their autowarm
> >> counts to 100s or 1000s. This is rarely necessary, especially if you
> >> use docValues fields appropriately. Very often much of autowarming is
> >> "uninverting" fields (look in your Solr log). Essentially for any
> >> field you see this, use docValues and loading will be much faster.
> >>
> >> You also haven't said how many documents you have in a shard at
> >> present. This is actually the metric I use most often to size
> >> hardware. I claim you can find a sweet spot where minimal autowarming
> >> will give you good enough performance, and that number is what you
> >> should design to. Of course YMMV.
> >>
> >> Finally: push back really hard on how aggressive NRT support needs to
> >> be. Often "requirements" like this are made without much thought as
> >> "faster is better, let's make it 1 second!". There are situations
> >> where that's true, but it comes at a cost. Users may be better served
> >> by a predictable but fast system than one that's fast but
> >> unpredictable. "Documents may take up to 5 minutes to appear and
> >> searches will usually take less than a second" is nice and concise. I
> >> have my expectations. "Documents are searchable in 1 second, but the
> >> results may not come back for between 1 and 10 seconds" is much more
> >> frustrating.
> >>
> >> FWIW,
> >> Erick
> >>
> >> On Sat, May 6, 2017 at 5:12 AM, Chris Troullis <cptroul...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > I use Solr to serve multiple tenants and currently all tenant's data
> >> > resides in one large collection, and queries have a tenant identifier.
> >> This
> >> > works fine with aggressive autowarming, but I have a need to reduce my
> >> NRT
> >> > search capabilities to seconds as opposed to the minutes it is at now,
> >> > which will mean drastically reducing if not eliminating my
> autowarming.
> >> As
> >> > such I am considering splitting my index out by tenant so that when
> one
> >> > tenant modifies their data it doesn't blow away all of the searcher
> based
> >> > caches for all tenants on soft commit.
> >> >
> >> > I have done a lot of research on the subject and it seems like Solr
> Cloud
> >> > can have problems handling large numbers of collections. I'm obviously
> >> > going to have to run some tests to see how it performs, but my main
> >> > question is this: are there pros and cons to splitting the index into
> >> > multiple collections vs having 1 collection but splitting into
> multiple
> >> > shards? In my case I would have a shard per tenant and use implicit
> >> routing
> >> > to route to that specific shard. As I understand it a shard is
> basically
> >> > it's own lucene index, so I would still be eating that overhead with
> >> either
> >> > approach. What I don't know is if there are any other overheads
> involved
> >> > WRT collections vs shards, routing, zookeeper, etc.
> >> >
> >> > Thanks,
> >> >
> >> > Chris
> >>
>

Re: Multiple collections vs multiple shards for multitenancy

2017-05-06 Thread Erick Erickson

y for any
>> field you see this, use docValues and loading will be much faster.
>>
>> You also haven't said how many documents you have in a shard at
>> present. This is actually the metric I use most often to size
>> hardware. I claim you can find a sweet spot where minimal autowarming
>> will give you good enough performance, and that number is what you
>> should design to. Of course YMMV.
>>
>> Finally: push back really hard on how aggressive NRT support needs to
>> be. Often "requirements" like this are made without much thought as
>> "faster is better, let's make it 1 second!". There are situations
>> where that's true, but it comes at a cost. Users may be better served
>> by a predictable but fast system than one that's fast but
>> unpredictable. "Documents may take up to 5 minutes to appear and
>> searches will usually take less than a second" is nice and concise. I
>> have my expectations. "Documents are searchable in 1 second, but the
>> results may not come back for between 1 and 10 seconds" is much more
>> frustrating.
>>
>> FWIW,
>> Erick
>>
>> On Sat, May 6, 2017 at 5:12 AM, Chris Troullis <cptroul...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I use Solr to serve multiple tenants and currently all tenant's data
>> > resides in one large collection, and queries have a tenant identifier.
>> This
>> > works fine with aggressive autowarming, but I have a need to reduce my
>> NRT
>> > search capabilities to seconds as opposed to the minutes it is at now,
>> > which will mean drastically reducing if not eliminating my autowarming.
>> As
>> > such I am considering splitting my index out by tenant so that when one
>> > tenant modifies their data it doesn't blow away all of the searcher based
>> > caches for all tenants on soft commit.
>> >
>> > I have done a lot of research on the subject and it seems like Solr Cloud
>> > can have problems handling large numbers of collections. I'm obviously
>> > going to have to run some tests to see how it performs, but my main
>> > question is this: are there pros and cons to splitting the index into
>> > multiple collections vs having 1 collection but splitting into multiple
>> > shards? In my case I would have a shard per tenant and use implicit
>> routing
>> > to route to that specific shard. As I understand it a shard is basically
>> > it's own lucene index, so I would still be eating that overhead with
>> either
>> > approach. What I don't know is if there are any other overheads involved
>> > WRT collections vs shards, routing, zookeeper, etc.
>> >
>> > Thanks,
>> >
>> > Chris
>>

Re: Multiple collections vs multiple shards for multitenancy

2017-05-06 Thread Chris Troullis

gt; feeding...
>
> Sharding a single large collection and using custom routing to push
> tenants to a single shard will be an administrative problem for you.
> I'm assuming you have the typical multi-tenant problems, a bunch of
> tenants have around N docs, some smaller percentage have 3N and a few
> have 100N. Now you're having to keep track of how many docs are on
> each shard, do the routing yourself, etc. Plus you can't commit
> individually, a commit on one will _still_ commit on all so you're
> right back where you started.
>
> I've seen people use a hybrid approach: experiment with how many
> _documents_ you can have in a collection (however you partition that
> up) and use the multi-tenant approach. So you have N collections and
> each collection has a (varying) number of tenants. This also tends to
> flatten out the update process on the assumption that your smaller
> tenants also don't update their data as often.
>
> However, I really have to question one of your basic statements:
>
> "This works fine with aggressive autowarming, but I have a need to reduce
> my NRT
> search capabilities to seconds as opposed to the minutes it is at now,"...
>
> The implication here is that your autowarming takes minutes. Very
> often people severely overdo the warmup by setting their autowarm
> counts to 100s or 1000s. This is rarely necessary, especially if you
> use docValues fields appropriately. Very often much of autowarming is
> "uninverting" fields (look in your Solr log). Essentially for any
> field you see this, use docValues and loading will be much faster.
>
> You also haven't said how many documents you have in a shard at
> present. This is actually the metric I use most often to size
> hardware. I claim you can find a sweet spot where minimal autowarming
> will give you good enough performance, and that number is what you
> should design to. Of course YMMV.
>
> Finally: push back really hard on how aggressive NRT support needs to
> be. Often "requirements" like this are made without much thought as
> "faster is better, let's make it 1 second!". There are situations
> where that's true, but it comes at a cost. Users may be better served
> by a predictable but fast system than one that's fast but
> unpredictable. "Documents may take up to 5 minutes to appear and
> searches will usually take less than a second" is nice and concise. I
> have my expectations. "Documents are searchable in 1 second, but the
> results may not come back for between 1 and 10 seconds" is much more
> frustrating.
>
> FWIW,
> Erick
>
> On Sat, May 6, 2017 at 5:12 AM, Chris Troullis <cptroul...@gmail.com>
> wrote:
> > Hi,
> >
> > I use Solr to serve multiple tenants and currently all tenant's data
> > resides in one large collection, and queries have a tenant identifier.
> This
> > works fine with aggressive autowarming, but I have a need to reduce my
> NRT
> > search capabilities to seconds as opposed to the minutes it is at now,
> > which will mean drastically reducing if not eliminating my autowarming.
> As
> > such I am considering splitting my index out by tenant so that when one
> > tenant modifies their data it doesn't blow away all of the searcher based
> > caches for all tenants on soft commit.
> >
> > I have done a lot of research on the subject and it seems like Solr Cloud
> > can have problems handling large numbers of collections. I'm obviously
> > going to have to run some tests to see how it performs, but my main
> > question is this: are there pros and cons to splitting the index into
> > multiple collections vs having 1 collection but splitting into multiple
> > shards? In my case I would have a shard per tenant and use implicit
> routing
> > to route to that specific shard. As I understand it a shard is basically
> > it's own lucene index, so I would still be eating that overhead with
> either
> > approach. What I don't know is if there are any other overheads involved
> > WRT collections vs shards, routing, zookeeper, etc.
> >
> > Thanks,
> >
> > Chris
>

Re: Multiple collections vs multiple shards for multitenancy

2017-05-06 Thread Erick Erickson

Well, it's not either/or. And you haven't said how many tenants we're
talking about here. Solr startup times for a single _instance_ of Solr
when there are thousands of collections can be slow.

But note what I am talking about here: A single Solr on a single node
where there are hundreds and hundreds of collections (or replicas for
that matter). I know of very large installations with 100s of
thousands of _replicas_ that run. Admittedly with a lot of care and
feeding...

Sharding a single large collection and using custom routing to push
tenants to a single shard will be an administrative problem for you.
I'm assuming you have the typical multi-tenant problems, a bunch of
tenants have around N docs, some smaller percentage have 3N and a few
have 100N. Now you're having to keep track of how many docs are on
each shard, do the routing yourself, etc. Plus you can't commit
individually, a commit on one will _still_ commit on all so you're
right back where you started.

I've seen people use a hybrid approach: experiment with how many
_documents_ you can have in a collection (however you partition that
up) and use the multi-tenant approach. So you have N collections and
each collection has a (varying) number of tenants. This also tends to
flatten out the update process on the assumption that your smaller
tenants also don't update their data as often.

However, I really have to question one of your basic statements:

"This works fine with aggressive autowarming, but I have a need to reduce my NRT
search capabilities to seconds as opposed to the minutes it is at now,"...

The implication here is that your autowarming takes minutes. Very
often people severely overdo the warmup by setting their autowarm
counts to 100s or 1000s. This is rarely necessary, especially if you
use docValues fields appropriately. Very often much of autowarming is
"uninverting" fields (look in your Solr log). Essentially for any
field you see this, use docValues and loading will be much faster.

You also haven't said how many documents you have in a shard at
present. This is actually the metric I use most often to size
hardware. I claim you can find a sweet spot where minimal autowarming
will give you good enough performance, and that number is what you
should design to. Of course YMMV.

Finally: push back really hard on how aggressive NRT support needs to
be. Often "requirements" like this are made without much thought as
"faster is better, let's make it 1 second!". There are situations
where that's true, but it comes at a cost. Users may be better served
by a predictable but fast system than one that's fast but
unpredictable. "Documents may take up to 5 minutes to appear and
searches will usually take less than a second" is nice and concise. I
have my expectations. "Documents are searchable in 1 second, but the
results may not come back for between 1 and 10 seconds" is much more
frustrating.

FWIW,
Erick

On Sat, May 6, 2017 at 5:12 AM, Chris Troullis <cptroul...@gmail.com> wrote:
> Hi,
>
> I use Solr to serve multiple tenants and currently all tenant's data
> resides in one large collection, and queries have a tenant identifier. This
> works fine with aggressive autowarming, but I have a need to reduce my NRT
> search capabilities to seconds as opposed to the minutes it is at now,
> which will mean drastically reducing if not eliminating my autowarming. As
> such I am considering splitting my index out by tenant so that when one
> tenant modifies their data it doesn't blow away all of the searcher based
> caches for all tenants on soft commit.
>
> I have done a lot of research on the subject and it seems like Solr Cloud
> can have problems handling large numbers of collections. I'm obviously
> going to have to run some tests to see how it performs, but my main
> question is this: are there pros and cons to splitting the index into
> multiple collections vs having 1 collection but splitting into multiple
> shards? In my case I would have a shard per tenant and use implicit routing
> to route to that specific shard. As I understand it a shard is basically
> it's own lucene index, so I would still be eating that overhead with either
> approach. What I don't know is if there are any other overheads involved
> WRT collections vs shards, routing, zookeeper, etc.
>
> Thanks,
>
> Chris

Multiple collections vs multiple shards for multitenancy

2017-05-06 Thread Chris Troullis

Hi,

I use Solr to serve multiple tenants and currently all tenant's data
resides in one large collection, and queries have a tenant identifier. This
works fine with aggressive autowarming, but I have a need to reduce my NRT
search capabilities to seconds as opposed to the minutes it is at now,
which will mean drastically reducing if not eliminating my autowarming. As
such I am considering splitting my index out by tenant so that when one
tenant modifies their data it doesn't blow away all of the searcher based
caches for all tenants on soft commit.

I have done a lot of research on the subject and it seems like Solr Cloud
can have problems handling large numbers of collections. I'm obviously
going to have to run some tests to see how it performs, but my main
question is this: are there pros and cons to splitting the index into
multiple collections vs having 1 collection but splitting into multiple
shards? In my case I would have a shard per tenant and use implicit routing
to route to that specific shard. As I understand it a shard is basically
it's own lucene index, so I would still be eating that overhead with either
approach. What I don't know is if there are any other overheads involved
WRT collections vs shards, routing, zookeeper, etc.

Thanks,

Chris

Re: Is SolrCloudClient Singleton Pattern possible with multiple collections?

2016-07-14 Thread Pablo Anzorena

The thing is that back in solr4.8, when I was using solr standalone and I
had to make a distributed query among multiple shards, I found that for
each shard in the param "shards" it makes a request (which is the correct
behaviour I know) but when I put just one shard in the "shards" param then
it makes two identical requests.
So, now because I'm using SolrCloud I replaced the "shards" with
"collection" param and I was wondering if it would have the same erratic
behaviour.

Now I tried and I found that it has the correct behaviour.

Thanks, and sorry for asking before testing it.

2016-07-14 15:26 GMT-03:00 Erick Erickson <erickerick...@gmail.com>:

> bq:  if using the param "collection" is the same
>
> Did you just try it? If so what happened?
>
> Not sure what you're asking here. It's the name of the
> collection you want to query against. It's only
> necessary when you want to go against a
> collection that isn't the default which you can set with
> setDefaultCollection()
>
> Best,
> Erick
>
> On Thu, Jul 14, 2016 at 10:51 AM, Pablo Anzorena
> <anzorena.f...@gmail.com> wrote:
> > I was using
> > public QueryResponse query(ModifiableSolrParams params, METHOD method)
> >
> > And my actual code is parsing that object. I can change it to your
> method,
> > but before that let me ask you if using the param "collection" is the
> same.
> >
> > Actually I am using the param "collection" only when I need to request to
> > multiple collections.
> >
> > Thanks.
> >
> >
> >
> > 2016-07-14 14:15 GMT-03:00 Erick Erickson <erickerick...@gmail.com>:
> >
> >> Just use the
> >>
> >> public NamedList request(SolrRequest request, String collection)
> >>
> >> method on the SolrCloudClient?
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Jul 14, 2016 at 9:18 AM, Pablo Anzorena <
> anzorena.f...@gmail.com>
> >> wrote:
> >> > Hey,
> >> > So the question is quite simple, Is it possible to use Singleton
> Pattern
> >> > with SolrCloudClient instantiation and then reuse that instance to
> handle
> >> > multiple requests concurrently accessing differente collections?
> >> >
> >> > Thanks.
> >>
>

Re: Is SolrCloudClient Singleton Pattern possible with multiple collections?

2016-07-14 Thread Erick Erickson

bq:  if using the param "collection" is the same

Did you just try it? If so what happened?

Not sure what you're asking here. It's the name of the
collection you want to query against. It's only
necessary when you want to go against a
collection that isn't the default which you can set with
setDefaultCollection()

Best,
Erick

On Thu, Jul 14, 2016 at 10:51 AM, Pablo Anzorena
<anzorena.f...@gmail.com> wrote:
> I was using
> public QueryResponse query(ModifiableSolrParams params, METHOD method)
>
> And my actual code is parsing that object. I can change it to your method,
> but before that let me ask you if using the param "collection" is the same.
>
> Actually I am using the param "collection" only when I need to request to
> multiple collections.
>
> Thanks.
>
>
>
> 2016-07-14 14:15 GMT-03:00 Erick Erickson <erickerick...@gmail.com>:
>
>> Just use the
>>
>> public NamedList request(SolrRequest request, String collection)
>>
>> method on the SolrCloudClient?
>>
>> Best,
>> Erick
>>
>> On Thu, Jul 14, 2016 at 9:18 AM, Pablo Anzorena <anzorena.f...@gmail.com>
>> wrote:
>> > Hey,
>> > So the question is quite simple, Is it possible to use Singleton Pattern
>> > with SolrCloudClient instantiation and then reuse that instance to handle
>> > multiple requests concurrently accessing differente collections?
>> >
>> > Thanks.
>>

Re: Is SolrCloudClient Singleton Pattern possible with multiple collections?

2016-07-14 Thread Pablo Anzorena

I was using
public QueryResponse query(ModifiableSolrParams params, METHOD method)

And my actual code is parsing that object. I can change it to your method,
but before that let me ask you if using the param "collection" is the same.

Actually I am using the param "collection" only when I need to request to
multiple collections.

Thanks.



2016-07-14 14:15 GMT-03:00 Erick Erickson <erickerick...@gmail.com>:

> Just use the
>
> public NamedList request(SolrRequest request, String collection)
>
> method on the SolrCloudClient?
>
> Best,
> Erick
>
> On Thu, Jul 14, 2016 at 9:18 AM, Pablo Anzorena <anzorena.f...@gmail.com>
> wrote:
> > Hey,
> > So the question is quite simple, Is it possible to use Singleton Pattern
> > with SolrCloudClient instantiation and then reuse that instance to handle
> > multiple requests concurrently accessing differente collections?
> >
> > Thanks.
>

Re: Is SolrCloudClient Singleton Pattern possible with multiple collections?

2016-07-14 Thread Erick Erickson

Just use the

public NamedList request(SolrRequest request, String collection)

method on the SolrCloudClient?

Best,
Erick

On Thu, Jul 14, 2016 at 9:18 AM, Pablo Anzorena  wrote:
> Hey,
> So the question is quite simple, Is it possible to use Singleton Pattern
> with SolrCloudClient instantiation and then reuse that instance to handle
> multiple requests concurrently accessing differente collections?
>
> Thanks.

Is SolrCloudClient Singleton Pattern possible with multiple collections?

2016-07-14 Thread Pablo Anzorena

Hey,
So the question is quite simple, Is it possible to use Singleton Pattern
with SolrCloudClient instantiation and then reuse that instance to handle
multiple requests concurrently accessing differente collections?

Thanks.

Re: SolrCloud multiple collections each with unique schema via SolrJ?

2016-05-17 Thread Boman

Got it! I now use uploadConfig to load the default config for each new
collection I create, and then modify the schema. Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-multiple-collections-each-with-unique-schema-via-SolrJ-tp4277397p4277406.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud multiple collections each with unique schema via SolrJ?

2016-05-17 Thread Shawn Heisey

On 5/17/2016 7:00 PM, Boman wrote:
> I load the defaul config using scripts/cloud-scripts/zkcli.sh -cmd upconfig
> after which collections are created programmatically and the schema modified
> as per each collection's requirements.
>
> I now notice that it is the SAME "default" original schema that holds ALL
> the modifications (new fields). What I really want is that during collection
> creation time (using SolrJ) as follows: 
>
> CollectionAdminRequest.Create createRequest = new
> CollectionAdminRequest.Create();
> createRequest.setConfigName("default-config");
>
> the new collection would "inherit" a copy of the default schema, and
> following any updates to that schema, it should remain Collection-specific.
>
> Any suggestions on how to achieve this programmatically? Thanks.

If you want a different config/schema combo for each collection, you
need to upload a different configset for every collection.  When your
collections are all using the same config, any change that you make for
one of them will affect them all (after reload).

You can't share just part of the configset -- it's a cohesive unit
covering the solrconfig.xml, the schema, and all the other files in the
configset.

Thanks,
Shawn

SolrCloud multiple collections each with unique schema via SolrJ?

2016-05-17 Thread Boman

I load the defaul config using scripts/cloud-scripts/zkcli.sh -cmd upconfig
after which collections are created programmatically and the schema modified
as per each collection's requirements.

I now notice that it is the SAME "default" original schema that holds ALL
the modifications (new fields). What I really want is that during collection
creation time (using SolrJ) as follows: 

CollectionAdminRequest.Create createRequest = new
CollectionAdminRequest.Create();
createRequest.setConfigName("default-config");

the new collection would "inherit" a copy of the default schema, and
following any updates to that schema, it should remain Collection-specific.

Any suggestions on how to achieve this programmatically? Thanks.

--Boman.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-multiple-collections-each-with-unique-schema-via-SolrJ-tp4277397.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solrcloud: 1 server, 1 configset, multiple collections, multiple schemas

2015-12-07 Thread bengates

F*ck.

I switched from normal Solr to SolrCloud, thanks to the feature that allow
to create cores (collections) on-the-fly with the API, without having to
tell Solr where to find a schema.xml / a solrconfig.xml and let it create
them itself from a pre-defined configset.

If I understand well, there is actually no way to create a core or a
collection from the API, with a defined-at-once configset, without having to
do some CLI commands on the remote server?

Thanks for your reply,
Ben



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-1-server-1-configset-multiple-collections-multiple-schemas-tp4243584p4244010.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solrcloud: 1 server, 1 configset, multiple collections, multiple schemas

2015-12-07 Thread Shawn Heisey

On 12/7/2015 9:46 AM, bengates wrote:
> If I understand well, there is actually no way to create a core or a
> collection from the API, with a defined-at-once configset, without having to
> do some CLI commands on the remote server?

With SolrCloud, the only step that requires commandline is uploading the
configuration to zookeeper, which is done with the zkcli script included
with Solr.  This script talks to zookeeper over the TCP network socket,
so it can be run from anywhere with network access to the zookeeper
servers.  You do not need to run it directly on the remote Solr server.

With a zookeeper client that's not solr-specific, you may be able to
have even more control, but it won't be as easy as zkcli.

I've used the zookeeper plugin for eclipse, but their website seems to
be broken.  Here's the URL, I hope it starts working at some point:

http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper

Thanks,
Shawn

Re: Solrcloud: 1 server, 1 configset, multiple collections, multiple schemas

2015-12-05 Thread Erick Erickson

You have to upload the different configset with the zookeeper client
(this is done for you when you do the examples) using zkcli, see
the "upconfig" command here;
https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities

Similarly, you need to make changes locally (perhaps after
doing a "downconfig" and push them back up.

the new Admin UI does allow you to manipulate schemas from the UI,
but you have to both have them be "managed" and do the initial
upconfig(s) yourself.

Now, apart from this step, the rest of the collection operations are
available through the API.

Best,
Erick

On Sat, Dec 5, 2015 at 12:56 AM, bengates <benga...@aliceadsl.fr> wrote:
> I understand.
>
> How to do this via the API?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solrcloud-1-server-1-configset-multiple-collections-multiple-schemas-tp4243584p4243737.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solrcloud: 1 server, 1 configset, multiple collections, multiple schemas

2015-12-05 Thread bengates

I understand.

How to do this via the API? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-1-server-1-configset-multiple-collections-multiple-schemas-tp4243584p4243737.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solrcloud: 1 server, 1 configset, multiple collections, multiple schemas

2015-12-04 Thread bengates

Hello,

I'm having usage issues with *Solrcloud*.

What I want to do:
- Manage a solr server *only with the API* (create / reload / delete
collections, create / replace / delete fields, etc).
- A new collection should* start with pre-defined default fields, fieldTypes
and copyFields* (let's say, field1 and field2 for fields).
- Each collection must *have its own schema*.

What I've setup yet:
- Installed a *Solr 5.3.1* in //opt/solr/ on an Ubuntu 14.04 server
- Installed *Zookeeper 3.4.6* in //opt/zookeeper/ as described in the solr
wiki
- Added line "server.1=127.0.0.1:2888:3888" in //opt/zookeeper/conf/zoo.cfg/
- Added line "127.0.0.1:2181" in
//var/solr/data/solr.xml/
- Told solr or zookeeper somewhere (don't remember where I setup this) to
use //home/me/configSet/managed-schema.xml/ and
//home/me/configSet/solrconfig.xml/ for configSet
- Run solr on port 8983

My //home/me/configSet/managed-schema.xml/ contains *field1* and *field2*.

Now let's create a collection:
http://my.remote.addr:8983/solr/admin/collections?action=CREATE=collection1=1
- *collection1 *is created, with *field1 *and *field2*. Perfect.

Let's create another collection:
http://my.remote.addr:8983/solr/admin/collections?action=CREATE=collection2=1
- *collection2 *is created, with *field1 *and *field2*. Perfect.

No, if I *add some fields* on *collection1 *by POSTing to :
/http://my.remote.addr:8983/solr/collection1/schema/ the following:


- *field3 *and *field4 *are successfully added to *collection1*
- ... but they are *also added* to *collection2* (verified by GETting
/http://my.remote.addr:8983/solr/collection2/schema/fields/)

How to prevent this behavior, since my collections have *different kind of
datas*, and may have the same field names but not the same types?

Thanks,
Ben



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-1-server-1-configset-multiple-collections-multiple-schemas-tp4243584.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solrcloud: 1 server, 1 configset, multiple collections, multiple schemas

2015-12-04 Thread Jeff Wartes


If you want two different collections to have two different schemas, those
collections need to reference two different configsets.
So you need another copy of your config available using a different name,
and to reference that other name when you create the second collection.


On 12/4/15, 6:26 AM, "bengates" <benga...@aliceadsl.fr> wrote:

>Hello,
>
>I'm having usage issues with *Solrcloud*.
>
>What I want to do:
>- Manage a solr server *only with the API* (create / reload / delete
>collections, create / replace / delete fields, etc).
>- A new collection should* start with pre-defined default fields,
>fieldTypes
>and copyFields* (let's say, field1 and field2 for fields).
>- Each collection must *have its own schema*.
>
>What I've setup yet:
>- Installed a *Solr 5.3.1* in //opt/solr/ on an Ubuntu 14.04 server
>- Installed *Zookeeper 3.4.6* in //opt/zookeeper/ as described in the solr
>wiki
>- Added line "server.1=127.0.0.1:2888:3888" in
>//opt/zookeeper/conf/zoo.cfg/
>- Added line "127.0.0.1:2181" in
>//var/solr/data/solr.xml/
>- Told solr or zookeeper somewhere (don't remember where I setup this) to
>use //home/me/configSet/managed-schema.xml/ and
>//home/me/configSet/solrconfig.xml/ for configSet
>- Run solr on port 8983
>
>My //home/me/configSet/managed-schema.xml/ contains *field1* and *field2*.
>
>Now let's create a collection:
>http://my.remote.addr:8983/solr/admin/collections?action=CREATE=colle
>ction1=1
>- *collection1 *is created, with *field1 *and *field2*. Perfect.
>
>Let's create another collection:
>http://my.remote.addr:8983/solr/admin/collections?action=CREATE=colle
>ction2=1
>- *collection2 *is created, with *field1 *and *field2*. Perfect.
>
>No, if I *add some fields* on *collection1 *by POSTing to :
>/http://my.remote.addr:8983/solr/collection1/schema/ the following:
>
>
>- *field3 *and *field4 *are successfully added to *collection1*
>- ... but they are *also added* to *collection2* (verified by GETting
>/http://my.remote.addr:8983/solr/collection2/schema/fields/)
>
>How to prevent this behavior, since my collections have *different kind of
>datas*, and may have the same field names but not the same types?
>
>Thanks,
>Ben
>
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Solrcloud-1-server-1-configset-multiple
>-collections-multiple-schemas-tp4243584.html
>Sent from the Solr - User mailing list archive at Nabble.com.

Solrcloud - How to merge multiple collections to a single collection

2015-06-06 Thread uday

Is it possible to merge multiple collections to single collection in
solrcloud 5.x ?

Say we index daily logs to a collection per day and merge 7 day collections
to a week collection



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrcloud-How-to-merge-multiple-collections-to-a-single-collection-tp4210222.html
Sent from the Solr - User mailing list archive at Nabble.com.

Spellcheck across multiple collections

2015-06-02 Thread Zheng Lin Edwin Yeo

Hi,

Is there a way to collate the spellcheck across different collection? I
understand that for select query, this can be done by setting
collection=collection1,collection2 at the query.

However, when I do that for spellcheck, Solr did not return me any result
on the spellcheck when I entered a wrong spelling in the query. I can only
get results when I search on a single collection.

I'm using Solr 5.1


Regards,
Edwin

Query multiple collections together

2015-05-11 Thread Zheng Lin Edwin Yeo

Hi,

Would like to check, is there a way to query multiple collections together
in a single query and return the results in one result set?

For example, I have 2 collections and I want to search for records with the
word 'solr' in both of the collections. Is there a query to do that, or
must I query both collections separately, and get two different result sets?

Regards,
Edwin

Re: Query multiple collections together

2015-05-11 Thread Anshum Gupta

You can query multiple collections by specifying the list of collections
e.g.:

http://hostname:port
/solr/gettingstarted/select?q=testcollection=collection1,collection2,collection3

On Sun, May 10, 2015 at 11:49 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com
wrote:

 Hi,

 Would like to check, is there a way to query multiple collections together
 in a single query and return the results in one result set?

 For example, I have 2 collections and I want to search for records with the
 word 'solr' in both of the collections. Is there a query to do that, or
 must I query both collections separately, and get two different result
 sets?

 Regards,
 Edwin




-- 
Anshum Gupta

Re: Query multiple collections together

2015-05-11 Thread Zheng Lin Edwin Yeo

Thank you for the query.

Just to confirm, for the 'gettingstarted' in the query, does it matter
which collection name I put?

Regards,
Edwin
 On 11 May 2015 15:51, Anshum Gupta ans...@anshumgupta.net wrote:

 You can query multiple collections by specifying the list of collections
 e.g.:

 http://hostname:port

 /solr/gettingstarted/select?q=testcollection=collection1,collection2,collection3

 On Sun, May 10, 2015 at 11:49 PM, Zheng Lin Edwin Yeo 
 edwinye...@gmail.com
 wrote:

  Hi,
 
  Would like to check, is there a way to query multiple collections
 together
  in a single query and return the results in one result set?
 
  For example, I have 2 collections and I want to search for records with
 the
  word 'solr' in both of the collections. Is there a query to do that, or
  must I query both collections separately, and get two different result
  sets?
 
  Regards,
  Edwin
 



 --
 Anshum Gupta

Re: Query multiple collections together

2015-05-11 Thread Anshum Gupta

FWIR, you just need to make sure that it's a valid collection. It doesn't
have to be one from the list of collections that you want to query, but the
collection name you use in the URL should exist.
e.g, assuming you have 2 collections foo (10 docs) and bar (5 docs):

*/solr/foo/select?q=*:*collection=bar*  #results: 5

*/solr/xyz/select?q=*:*collection=bar* will lead to a HTTP 404 response

*/solr/foo/select?q=*:* *#results: 10


On Mon, May 11, 2015 at 12:59 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com
wrote:

 Thank you for the query.

 Just to confirm, for the 'gettingstarted' in the query, does it matter
 which collection name I put?

 Regards,
 Edwin
  On 11 May 2015 15:51, Anshum Gupta ans...@anshumgupta.net wrote:

  You can query multiple collections by specifying the list of collections
  e.g.:
 
  http://hostname:port
 
 
 /solr/gettingstarted/select?q=testcollection=collection1,collection2,collection3
 
  On Sun, May 10, 2015 at 11:49 PM, Zheng Lin Edwin Yeo 
  edwinye...@gmail.com
  wrote:
 
   Hi,
  
   Would like to check, is there a way to query multiple collections
  together
   in a single query and return the results in one result set?
  
   For example, I have 2 collections and I want to search for records with
  the
   word 'solr' in both of the collections. Is there a query to do that, or
   must I query both collections separately, and get two different result
   sets?
  
   Regards,
   Edwin
  
 
 
 
  --
  Anshum Gupta
 




-- 
Anshum Gupta

Re: Query multiple collections together

2015-05-11 Thread Zheng Lin Edwin Yeo

Ok, thank you so much.

Regards,
Edwin
On 11 May 2015 16:15, Anshum Gupta ans...@anshumgupta.net wrote:

 FWIR, you just need to make sure that it's a valid collection. It doesn't
 have to be one from the list of collections that you want to query, but the
 collection name you use in the URL should exist.
 e.g, assuming you have 2 collections foo (10 docs) and bar (5 docs):

 */solr/foo/select?q=*:*collection=bar*  #results: 5

 */solr/xyz/select?q=*:*collection=bar* will lead to a HTTP 404 response

 */solr/foo/select?q=*:* *#results: 10


 On Mon, May 11, 2015 at 12:59 AM, Zheng Lin Edwin Yeo 
 edwinye...@gmail.com
 wrote:

  Thank you for the query.
 
  Just to confirm, for the 'gettingstarted' in the query, does it matter
  which collection name I put?
 
  Regards,
  Edwin
   On 11 May 2015 15:51, Anshum Gupta ans...@anshumgupta.net wrote:
 
   You can query multiple collections by specifying the list of
 collections
   e.g.:
  
   http://hostname:port
  
  
 
 /solr/gettingstarted/select?q=testcollection=collection1,collection2,collection3
  
   On Sun, May 10, 2015 at 11:49 PM, Zheng Lin Edwin Yeo 
   edwinye...@gmail.com
   wrote:
  
Hi,
   
Would like to check, is there a way to query multiple collections
   together
in a single query and return the results in one result set?
   
For example, I have 2 collections and I want to search for records
 with
   the
word 'solr' in both of the collections. Is there a query to do that,
 or
must I query both collections separately, and get two different
 result
sets?
   
Regards,
Edwin
   
  
  
  
   --
   Anshum Gupta
  
 



 --
 Anshum Gupta

Re: Unable to setup solr cloud with multiple collections.

2015-03-25 Thread Erick Erickson

You're still mixing master/slave with SolrCloud. Do _not_ reconfigure
the replication. If you want your core (we call them replicas in
SolrCloud) to appear on various nodes in your cluster, either create
the collection with the nodes specified (createNodeSet) or, once the
collection is created on any node (or set of nodes) do an ADDREPLICA
(again with the collections API) where you want replicas to appear.
The rest is automatic, i.e. the replica's index will be copied from
the leader, all updates will be forwarded etc., without you doing any
other configuration.

I think you're shooting yourself in the foot by trying to fiddle with
replication.

Or I misunderstand your problem entirely.

Best,
Erick

On Tue, Mar 24, 2015 at 8:09 PM, sthita sthit...@gmail.com wrote:
Thanks Erick for your reply.
I am trying to create a new core i.e dict_cn , which is totally different in
terms of index data, configs etc from the existing core abc.
The core is created successfully in my master (i.e mail) and i can do solr
query on this newly created core .
All the config files(Schema.xml and solrconfig.xml) are in mail server and
zookeper helps it for me to share all config files to other collections.
I did the similar setup to other collection , so that newly created core
should be available to all the collections, but it is still showing down.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Unable-to-setup-solr-cloud-with-multiple-collections-tp4194833p4195078.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unable to setup solr cloud with multiple collections.

2015-03-24 Thread Erick Erickson

Why are you doing this in the first place? SolrCloud and master/slave
are fundamentally different. When running in SolrCloud mode, there is
no need whatsoever to configure replication as per the Wiki link
you've outlined above, that's for the older style master/slave setups.

Just change it back and watch the magic would be my advice.

So if you'd tell us why you thought this was necessary, perhaps we can
suggest alternatives because from a quick glance it looks unnecessary,
and in fact harmful.

Best,
Erick

On Mon, Mar 23, 2015 at 10:08 PM, sthita sthit...@gmail.com wrote:
I have newly created a new collection and activated the replication for 4
nodes(Including masters).
After doing the config changes as suggested on
http://wiki.apache.org/solr/SolrReplication
http://wiki.apache.org/solr/SolrReplication
The nodes of the newly created collections are down on solr cloud. We are
not able to add or remove any document on newly created core i.e dict_cn in
our case. All the configuration files look ok on solr cloud

http://lucene.472066.n3.nabble.com/file/n4194833/solr_issue.png

This is my replication changes on solrconfig.xml

requestHandler name=/replication class=solr.ReplicationHandler
startup=lazy
lst name=master str name=replicateAftercommit/str str
name=replicateAfterstartup/str
str name=confFilessolrconfig_cn.xml,schema_cn.xml/str /lst

lst name=slave str name=masterUrlhttp://mail:8983/solr/dict_cn/str
/lst

Note: I am using solr 4.4.0, zookeeper-3.4.5

Can anyone help me on this ?

--
View this message in context:
http://lucene.472066.n3.nabble.com/Unable-to-setup-solr-cloud-with-multiple-collections-tp4194833.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unable to setup solr cloud with multiple collections.

2015-03-24 Thread sthita

Thanks Erick for your reply.
I am trying to create a new core i.e dict_cn , which is totally different in
terms of index data, configs etc from the existing core abc. 
The core is created successfully in my master (i.e mail) and i can do solr
query on this newly created core .
All the config files(Schema.xml and solrconfig.xml) are in mail server and
zookeper helps it for me to share all config files to other collections.
I did the similar setup to other collection , so that newly created core
should be available to all the collections, but it is still showing down.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-setup-solr-cloud-with-multiple-collections-tp4194833p4195078.html
Sent from the Solr - User mailing list archive at Nabble.com.

Unable to setup solr cloud with multiple collections.

2015-03-23 Thread sthita

I have newly created a new collection and activated the replication for 4
nodes(Including masters). 
After doing the config changes as suggested on 
http://wiki.apache.org/solr/SolrReplication
http://wiki.apache.org/solr/SolrReplication  
The nodes of the newly created collections are down on solr cloud. We are
not able to add or remove any document on newly created core i.e dict_cn in
our case. All the configuration files  look ok on solr cloud

http://lucene.472066.n3.nabble.com/file/n4194833/solr_issue.png 

This is my replication changes on solrconfig.xml

requestHandler name=/replication class=solr.ReplicationHandler
startup=lazy
lst name=master str name=replicateAftercommit/str str
name=replicateAfterstartup/str 
str name=confFilessolrconfig_cn.xml,schema_cn.xml/str /lst 

lst name=slave str name=masterUrlhttp://mail:8983/solr/dict_cn/str 
/lst   

Note: I am using solr 4.4.0, zookeeper-3.4.5

Can anyone help me on this ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-setup-solr-cloud-with-multiple-collections-tp4194833.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can a single SolrServer instance update multiple collections?

2015-03-11 Thread tuxedomoon

@Shawn,

I can definitely upgrade to SolrJ 4.x and would prefer that so as to target
4.x cores as well.  I'm already on Java 7. 

One attempt I made was this

UpdateRequest updateRequest = new UpdateRequest();
updateRequest.setParam(collection, collectionName);
updateRequest.setMethod(SolrRequest.METHOD.POST);
updateRequest.add(solrdoc);
UpdateResponse updateResponse = updateRequest.process(solrServer);

but I kept getting Bad Request which I suspect was a SOLR/SolrJ version
conflict.  

I'm all ears!

Dan






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-a-single-SolrServer-instance-update-multiple-collections-tp4192480p4192520.html
Sent from the Solr - User mailing list archive at Nabble.com.

Can a single SolrServer instance update multiple collections?

2015-03-11 Thread tuxedomoon

I have a SolrJ application that reads from a Redis queue and updates
different collections based on the message content.  New collections are
added without my knowledge, so I am creating SolrServer objects on the fly
as follows:

def solrHost = http://myhost/solr/; (defined at startup)

def solrTarget = solrHost + collectionName
SolrServer solrServer = new CommonsHttpSolrServer(solrTarget)
updateResponse = solrServer.add(solrdoc)


This does work but obviously creates a new CommonsHttpSolrServer instance
for each message.  I assume GC will eliminate these but is there a way to do
this with a single SolrServer object?  

The SOLR host is version 3.5 and I am using the 3.5 jars for my application
(not sure if that is necessary). 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-a-single-SolrServer-instance-update-multiple-collections-tp4192480.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can a single SolrServer instance update multiple collections?

2015-03-11 Thread Shawn Heisey

On 3/11/2015 12:23 PM, tuxedomoon wrote:
 I have a SolrJ application that reads from a Redis queue and updates
 different collections based on the message content.  New collections are
 added without my knowledge, so I am creating SolrServer objects on the fly
 as follows:

 def solrHost = http://myhost/solr/; (defined at startup)

 def solrTarget = solrHost + collectionName
 SolrServer solrServer = new CommonsHttpSolrServer(solrTarget)
 updateResponse = solrServer.add(solrdoc)


 This does work but obviously creates a new CommonsHttpSolrServer instance
 for each message.  I assume GC will eliminate these but is there a way to do
 this with a single SolrServer object?  

 The SOLR host is version 3.5 and I am using the 3.5 jars for my application
 (not sure if that is necessary). 

What you want to accomplish should be possible, with some attention to
how SolrJ code is used.

We won't talk about SolrCloud, since you're not running Solr 4.x or
5.0.  Upgrading the server side is generally more involved than
upgrading the client side, and switching to SolrCloud can be a fairly
major conceptual leap.

To do what I'm thinking about, you will need to ugprade SolrJ.  When
SolrCloud is not involved, cross-version compatibility between Solr and
SolrJ is pretty good, although there can be some hiccups when crossing
the 3.x/4.x barrier relating to the update handlers.  Those hiccups are
normally easy to fix, but they are something you need to be aware of.

Once you've decided on whether you're upgrading Solr and which version
of SolrJ you will upgrade to, we can get down to the actual Java code
you'll need.  Note that recent 4.x and 5.0 versions require Java 7, so
if you're still on Java 6, you'll be limited to version 4.7.2.

It might even be possible to do this with SolrJ 3.5, but I am already
pretty familiar with how you can do it using new features in 4.x, and
since you're going to need to change the source code anyway, you might
as well take advantage of more modern client functionality that will
make the code easier to understand.

Just FYI, there are changes coming (currently planned for SolrJ 5.1)
that will make this VERY easy.

Thanks,
Shawn

Re: Can a single SolrServer instance update multiple collections?

2015-03-11 Thread Shawn Heisey

On 3/11/2015 3:35 PM, tuxedomoon wrote:
 I can definitely upgrade to SolrJ 4.x and would prefer that so as to target
 4.x cores as well.  I'm already on Java 7. 
 
 One attempt I made was this
 
 UpdateRequest updateRequest = new UpdateRequest();
 updateRequest.setParam(collection, collectionName);
 updateRequest.setMethod(SolrRequest.METHOD.POST);
 updateRequest.add(solrdoc);
 UpdateResponse updateResponse = updateRequest.process(solrServer);
 
 but I kept getting Bad Request which I suspect was a SOLR/SolrJ version
 conflict.  
 
 I'm all ears!

Can you share the full stacktrace?  If you can't see it on the client,
grab it from the server log.

The collection request parameter is only useful if you're running
SolrCloud.  The 3.x version and 4.x/5.x in non-cloud mode should ignore it.

UpdateRequest objects are created by default with a POST method, you
don't need to include that.

When I have some time to actually work on the code, I'm going to write
it using 4.x classes because that's what I have immediate access to, but
if you do 5.x, SolrServer becomes SolrClient, and HttpSolrServer becomes
HttpSolrClient.  I think everything else will be the same.  If I'm wrong
about that, it very likely will not be very hard to fix.

Thanks,
Shawn

Re: Can a single SolrServer instance update multiple collections?

2015-03-11 Thread tuxedomoon

@Shawn

I'm getting the Bad Request again, with the original code snippet I posted,
it appears to be an 'illegal' string field.

SOLR log
-
INFO:
{add=[mgid:arc:content:jokers.com:694d5bf8-ecfd-11e0-aca6-0026b9414f30]} 0 7
Mar 12, 2015 12:15:09 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR:
[doc=mgid:arc:content:jokers.com:694d5bf8-ecfd-11e0-aca6-0026b9414f30]
multiple values encountered for non multiValued field image_url_s:
[mgid:file:gsp:movie-assets:/movie-assets/cc/images/shows/miami-beach/episode-thumbnails/specials/iamstupid-the-movie_4x3.jpg,
mgid:file:gsp:movie-assets:/movie-assets/cc/images/shows/miami-beach/episode-thumbnails/specials/iamstupid-the-movie_4x3.jpg]
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:246)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:158)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)




SolrJ Log shows the doc being sent (this is the offending field only)

 field name=image_url_s/field


I will investigate on the feeds side, the existing SolrJ code is not the
culprit.  But I'd still like a more elegant solution.  If a SolrJ 5 client
can talk to a 3.5 host I'm willing to go there.  I know I'm not the only one
who would like to address collections on the fly.

thx

Dan  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-a-single-SolrServer-instance-update-multiple-collections-tp4192480p4192545.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can a single SolrServer instance update multiple collections?

2015-03-11 Thread Shawn Heisey

On 3/11/2015 4:28 PM, Shawn Heisey wrote:
 When I have some time to actually work on the code, I'm going to write
 it using 4.x classes because that's what I have immediate access to,
 but if you do 5.x, SolrServer becomes SolrClient, and HttpSolrServer
 becomes HttpSolrClient.

At the URL below is the code I came up with.  It shows how to do an add,
a commit, and a query where the Solr core (collection) is specified as
part of the request, rather than the server connection:

http://apaste.info/lRi

I did test this code successfully, although there was one difference in
that code (/update instead of /update/javabin) because my dev Solr
server is running 4.9.1, not 3.5.  The code I've shared uses SolrJ 4.x,
but is tailored to a server running 3.x with a typical 3.x config.  I
hope this code will work as-is ... and if it doesn't, that it will be
easy for you to figure out what I did wrong.

If you want to figure out how to use SolrRequest to implement a query
with a specific handler path, you could probably implement all of this
in SolrJ 3.5, where SolrQuery#setRequestHandler does not exist.  I'm
sure that if you look at the SolrQuery class and the
CommonsHttpSolrServer#query method from the 3.5 source code, you could
piece together how to do this.

It might be a good idea to abstract these procedures for add, commit,
and query into your own local methods that include the collection
parameter.  If you need it, you can also implement
UpdateRequest.ACTION.OPTIMIZE in a similar manner to the way that I used
UpdateRequest.ACTION.COMMIT.

See the following issue for the recent work that will go into a new 5.x
version (probably 5.1), which adds the capability you are seeking
directly to HttpSolrClient, implementing abstract methods from SolrClient:

https://issues.apache.org/jira/browse/SOLR-7201


Thanks,
Shawn

Use multiple collections having different configuration

2015-02-20 Thread Nitin Solanki

Hello,
I have scenario where I want to create/use 2 collection into
same Solr named as collection1 and collection2. I want to use distributed
servers. Each collection has multiple shards. Each collection contains
different configurations(solrconfig.xml and schema.xml). How can I do?
In between, If I want to re-configure any collection then how to do that?

As I know, If we use single collection which having multiple shards then we
need to use this upconfig link -

* example/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd
upconfig -confdir example/solr/collection1/conf -confname default *
and restart all the nodes.
For 2 collections into same solr. How can I do re-configure?

Re: Use multiple collections having different configuration

2015-02-20 Thread Shawn Heisey

On 2/20/2015 4:06 AM, Nitin Solanki wrote:
 I have scenario where I want to create/use 2 collection into
 same Solr named as collection1 and collection2. I want to use distributed
 servers. Each collection has multiple shards. Each collection contains
 different configurations(solrconfig.xml and schema.xml). How can I do?
 In between, If I want to re-configure any collection then how to do that?
 
 As I know, If we use single collection which having multiple shards then we
 need to use this upconfig link -
 
 * example/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd
 upconfig -confdir example/solr/collection1/conf -confname default *
 and restart all the nodes.
 For 2 collections into same solr. How can I do re-configure?

First, upload your two different configurations with zkcli upconfig
using two different names.

Create your collections with the Collections API, and tell each one to
use a different collection.configName.  If the collection already
exists, use the zkcli linkconfig command, and reload the collection.

If you need to change a config, edit the config on disk and re-do the
zkcli upconfig.  Then reload the collection with the Collections API.
Alternately you could upload a whole new config and then link it to the
existing collection.

The Collections API is not yet exposed in the admin interface, you will
need to do those calls yourself.  If you're doing this with SolrJ, there
are some objects inside CollectionAdminRequest that let you do all the
API actions.

Thanks,
Shawn

Re: Use multiple collections having different configuration

2015-02-20 Thread Nitin Solanki

Thanks Shawn..

On Fri, Feb 20, 2015 at 7:53 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 2/20/2015 4:06 AM, Nitin Solanki wrote:
  I have scenario where I want to create/use 2 collection into
  same Solr named as collection1 and collection2. I want to use distributed
  servers. Each collection has multiple shards. Each collection contains
  different configurations(solrconfig.xml and schema.xml). How can I do?
  In between, If I want to re-configure any collection then how to do that?
 
  As I know, If we use single collection which having multiple shards then
 we
  need to use this upconfig link -
 
  * example/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd
  upconfig -confdir example/solr/collection1/conf -confname default *
  and restart all the nodes.
  For 2 collections into same solr. How can I do re-configure?

 First, upload your two different configurations with zkcli upconfig
 using two different names.

 Create your collections with the Collections API, and tell each one to
 use a different collection.configName.  If the collection already
 exists, use the zkcli linkconfig command, and reload the collection.

 If you need to change a config, edit the config on disk and re-do the
 zkcli upconfig.  Then reload the collection with the Collections API.
 Alternately you could upload a whole new config and then link it to the
 existing collection.

 The Collections API is not yet exposed in the admin interface, you will
 need to do those calls yourself.  If you're doing this with SolrJ, there
 are some objects inside CollectionAdminRequest that let you do all the
 API actions.

 Thanks,
 Shawn

Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-21 Thread Ugo Matrangolo

Hi,

got a nice talk on IRC about this. The right thing to do is to start with a
clean SOLR cluster (no cores) and then create all the proper collections
with the Collections API.

Ugo


On Thu, Mar 20, 2014 at 7:26 PM, Jeff Wartes jwar...@whitepages.com wrote:


 Please note that although the article talks about the ADDREPLICA command,
 that feature is coming in Solr 4.8, so don¹t be confused if you can¹t find
 it yet. See https://issues.apache.org/jira/browse/SOLR-5130



 On 3/20/14, 7:45 AM, Erick Erickson erickerick...@gmail.com wrote:

 You might find this useful:
 http://heliosearch.org/solrcloud-assigning-nodes-machines/
 
 
 It uses the collections API to create your collection with zero
 nodes, then shows how to assign your leaders to specific
 machines (well, at least specify the nodes the leaders will
 be created on, it doesn't show how to assign, for instance,
 shard1 to nodeX)
 
 It also shows a way to assign specific replicas on specific nodes
 to specific shards, although as Mark says this is a transitional
 technique. I know there's an addreplica command in the works
 for the collections API that should make this easier, but that's
 not released yet.
 
 Best,
 Erick
 
 
 On Thu, Mar 20, 2014 at 7:23 AM, Ugo Matrangolo
 ugo.matrang...@gmail.com wrote:
  Hi,
 
  I would like some advice about the best way to bootstrap from scratch a
  SolrCloud cluster housing at least two collections with different
  sharding/replication setup.
 
  Going through the docs/'Solr In Action' book what I have sees so far is
  that there is a way to bootstrap a SolrCloud cluster with sharding
  configuration using the:
 
-DnumShards=2
 
  but this (afaik) works only for a single collection. What I need is a
 way
  to deploy from scratch a SolrCloud cluster housing (e.g.) two
 collections
  Foo and Bar where Foo has only one shard and is replicated everywhere
 while
  Bar has three shards and ,again, is replicated.
 
  I can't find a config file where to put this sharding plan and I'm
 starting
  to think that the only way to do this is after the deploy using the
  Collections API.
 
  Is there a best approach way to do this ?
 
  Ugo

Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-20 Thread Ugo Matrangolo

Hi,

I would like some advice about the best way to bootstrap from scratch a
SolrCloud cluster housing at least two collections with different
sharding/replication setup.

Going through the docs/'Solr In Action' book what I have sees so far is
that there is a way to bootstrap a SolrCloud cluster with sharding
configuration using the:

  -DnumShards=2

but this (afaik) works only for a single collection. What I need is a way
to deploy from scratch a SolrCloud cluster housing (e.g.) two collections
Foo and Bar where Foo has only one shard and is replicated everywhere while
Bar has three shards and ,again, is replicated.

I can't find a config file where to put this sharding plan and I'm starting
to think that the only way to do this is after the deploy using the
Collections API.

Is there a best approach way to do this ?

Ugo

Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-20 Thread Mark Miller

Honestly, the best approach is to start with no collections defined and use the 
collections api.

If you want to prefconfigure (which has it’s warts and will likely go away as 
an option), it’s tricky to do it with different numShards, as that is a global 
property per node.

You would basically set -DnumShards=1 and start your cluster with Foo defined. 
Then you stop the cluster and define Bar and start with -DnumShards=3.

The ability to preconfigure and bootstrap like this was kind of a transitional 
system meant to help people that knew Solr pre SolrCloud get something up 
quickly back before we had a collections api.

The collections API is much better if you want multiple collections and it’s 
the future.
-- 
Mark Miller
about.me/markrmiller

On March 20, 2014 at 10:24:18 AM, Ugo Matrangolo (ugo.matrang...@gmail.com) 
wrote:

Hi,  

I would like some advice about the best way to bootstrap from scratch a  
SolrCloud cluster housing at least two collections with different  
sharding/replication setup.  

Going through the docs/'Solr In Action' book what I have sees so far is  
that there is a way to bootstrap a SolrCloud cluster with sharding  
configuration using the:  

-DnumShards=2  

but this (afaik) works only for a single collection. What I need is a way  
to deploy from scratch a SolrCloud cluster housing (e.g.) two collections  
Foo and Bar where Foo has only one shard and is replicated everywhere while  
Bar has three shards and ,again, is replicated.  

I can't find a config file where to put this sharding plan and I'm starting  
to think that the only way to do this is after the deploy using the  
Collections API.  

Is there a best approach way to do this ?  

Ugo

Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-20 Thread Erick Erickson

You might find this useful:
http://heliosearch.org/solrcloud-assigning-nodes-machines/


It uses the collections API to create your collection with zero
nodes, then shows how to assign your leaders to specific
machines (well, at least specify the nodes the leaders will
be created on, it doesn't show how to assign, for instance,
shard1 to nodeX)

It also shows a way to assign specific replicas on specific nodes
to specific shards, although as Mark says this is a transitional
technique. I know there's an addreplica command in the works
for the collections API that should make this easier, but that's
not released yet.

Best,
Erick


On Thu, Mar 20, 2014 at 7:23 AM, Ugo Matrangolo
ugo.matrang...@gmail.com wrote:
 Hi,

 I would like some advice about the best way to bootstrap from scratch a
 SolrCloud cluster housing at least two collections with different
 sharding/replication setup.

 Going through the docs/'Solr In Action' book what I have sees so far is
 that there is a way to bootstrap a SolrCloud cluster with sharding
 configuration using the:

   -DnumShards=2

 but this (afaik) works only for a single collection. What I need is a way
 to deploy from scratch a SolrCloud cluster housing (e.g.) two collections
 Foo and Bar where Foo has only one shard and is replicated everywhere while
 Bar has three shards and ,again, is replicated.

 I can't find a config file where to put this sharding plan and I'm starting
 to think that the only way to do this is after the deploy using the
 Collections API.

 Is there a best approach way to do this ?

 Ugo

Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-20 Thread Jeff Wartes


Please note that although the article talks about the ADDREPLICA command,
that feature is coming in Solr 4.8, so don¹t be confused if you can¹t find
it yet. See https://issues.apache.org/jira/browse/SOLR-5130



On 3/20/14, 7:45 AM, Erick Erickson erickerick...@gmail.com wrote:

You might find this useful:
http://heliosearch.org/solrcloud-assigning-nodes-machines/


It uses the collections API to create your collection with zero
nodes, then shows how to assign your leaders to specific
machines (well, at least specify the nodes the leaders will
be created on, it doesn't show how to assign, for instance,
shard1 to nodeX)

It also shows a way to assign specific replicas on specific nodes
to specific shards, although as Mark says this is a transitional
technique. I know there's an addreplica command in the works
for the collections API that should make this easier, but that's
not released yet.

Best,
Erick


On Thu, Mar 20, 2014 at 7:23 AM, Ugo Matrangolo
ugo.matrang...@gmail.com wrote:
 Hi,

 I would like some advice about the best way to bootstrap from scratch a
 SolrCloud cluster housing at least two collections with different
 sharding/replication setup.

 Going through the docs/'Solr In Action' book what I have sees so far is
 that there is a way to bootstrap a SolrCloud cluster with sharding
 configuration using the:

   -DnumShards=2

 but this (afaik) works only for a single collection. What I need is a
way
 to deploy from scratch a SolrCloud cluster housing (e.g.) two
collections
 Foo and Bar where Foo has only one shard and is replicated everywhere
while
 Bar has three shards and ,again, is replicated.

 I can't find a config file where to put this sharding plan and I'm
starting
 to think that the only way to do this is after the deploy using the
 Collections API.

 Is there a best approach way to do this ?

 Ugo

Re: SolrCloud: Programmatically create multiple collections?

2013-08-14 Thread xinwu

Hey Shawn .Thanks for your reply.
I just want to access the base_url easily by a short instanceDir name.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Programmatically-create-multiple-collections-tp3916927p4084480.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud: Programmatically create multiple collections?

2013-08-14 Thread xinwu

Thank you Ani.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Programmatically-create-multiple-collections-tp3916927p4084485.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud: Programmatically create multiple collections?

2013-08-14 Thread Shawn Heisey

On 8/14/2013 12:34 AM, xinwu wrote:
 Hey Shawn .Thanks for your reply.
 I just want to access the base_url easily by a short instanceDir name.

For index updates and queries, you *can* access it by the
/solr/mycollection name.  Although there may be no core by that name,
the base URL will work.

Just now, I also tried /solr/mycollection/admin/system, which I expected
would NOT work because I have the collection_shardN_replicaN core names.
 On my 4.2.1 production cloud, this DOES work.  Your email had given me
the idea of filing a feature request to allow this shortcut, but it
appears that it's already a feature.  In situations where
maxShardsPerNode is used, you wouldn't be able to use that shortcut to
get all the info, but you could get most of it.

I can think of a workaround for the maxShardsPerNode limitation:  If you
access /solr/admin/cores on a machine before asking for further info,
your program will know what cores exist on that machine, so you'd be
able to get ALL info.

Thanks,
Shawn

Re: SolrCloud: Programmatically create multiple collections?

2013-08-13 Thread xinwu

HI,Mark.
When I managed collections via the Collections API.
How can I set the 'instanceDir' name?
eg:http://localhost:8983/solr/admin/collections?action=CREATEname=mycollectionnumShards=3replicationFactor=4
 
My instanceDir is 'mycollection_shard2_replica1'.
How can I change it to 'mycollection'?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Programmatically-create-multiple-collections-tp3916927p4084202.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud: Programmatically create multiple collections?

2013-08-13 Thread Shawn Heisey

On 8/13/2013 3:07 AM, xinwu wrote:
When I managed collections via the Collections API.
How can I set the 'instanceDir' name?
eg:http://localhost:8983/solr/admin/collections?action=CREATEname=mycollectionnumShards=3replicationFactor=4

My instanceDir is 'mycollection_shard2_replica1'.
How can I change it to 'mycollection'?

I don't think the collections API can do this, and to be honest, I don't
know why you would want to. It would make it impossible to have more
than one shard per Solr node, a capability that many people require.
The question of why would you want to? is something I'm genuinely
asking here.

Admin URLs accessed directly by client programs are the only logical
reason I can think of. For querying and updating the index, you can use
/solr/mycollection as a base URL to access your index, even though the
shard names are different. As for the admin URLs that let you access
system information, SOLR-4943 will make most of that available without a
core name in Solr 4.5. To access core-specific information, you need to
use the actual core name, but it should be possible to gather
information about which machine has which core in an automated way.

That said, if you create your collection a different way, you should be
able to do exactly what you want. What you would want to do is use the
zkcli command linkconfig to link a new collection with an already
uploaded config set, and then create the individual cores in your
collection using the CoreAdmin API instead of the Collections API.

http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin

Thanks,
Shawn

Re: SolrCloud: Programmatically create multiple collections?

2013-08-13 Thread Anirudha Jadhav

At this point you would need a higher level service sitting on top on solr
clusters which also talks to your zk setup in order to create custom
collections on the fly.

its not super difficult, but seems out of scope for solrcloud now.

let me know if others have a different opinion.

thanks,
Ani

On Tue, Aug 13, 2013 at 9:52 AM, Shawn Heisey s...@elyograg.org wrote:

On 8/13/2013 3:07 AM, xinwu wrote:
When I managed collections via the Collections API.
How can I set the 'instanceDir' name?
eg:
http://localhost:8983/solr/admin/collections?action=CREATEname=mycollectionnumShards=3replicationFactor=4
My instanceDir is 'mycollection_shard2_replica1'.
How can I change it to 'mycollection'?

http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin

Thanks,
Shawn

--
Anirudha P. Jadhav

Re: Querying multiple collections in SolrCloud

2013-06-27 Thread Erick Erickson

I'd _guess_ that this is unsupported across collections if
for no other reason than scores really aren't comparable
across collections and the default ordering within groups
is score. This is really a federated search type problem.

But if it makes sense to use N collections for other reasons,
it's really the same thing as grouping functionally, you just
send a separate request to each collection and combine
the results of those N requests rather than from N
groups in a single query. If the collections are hosted on
different machines for instance, you might get quicker
overall response by firing off parallel queries,
It Depends (tm)...

Best
Erick


On Wed, Jun 26, 2013 at 1:46 PM, Chris Toomey ctoo...@gmail.com wrote:

 Thanks Erick, that's a very helpful answer.

 Regarding the grouping option, does that require all the docs to be put
 into a single collection, or could it be done with across N collections
 (assuming each collection had a common type field for grouping on)?

 Chris


 On Wed, Jun 26, 2013 at 7:01 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  bq: Would the above setup qualify as multiple compatible collections
 
  No. While there may be enough fields in common to form a single query,
  the TF/IDF calculations will not be compatible and the scores from the
  various collections will NOT be comparable. So simply getting the list of
  top N docs will probably be dominated by the docs from a single type.
 
  bq: How does SolrCloud combine the query results from multiple
 collections?
 
  It doesn't. SolrCloud sorts the results from multiple nodes in the
  _same_ collection
  according to whatever sort criteria are specified, defaulting to score.
  Say you
  ask for the top 20 docs. A node from each shard returns the top 20 docs
  for that
  shard. The node processing them just merges all the returned lists and
  only keeps
  the top 20.
 
  I don't think your last two questions are really relevant, SolrCloud
  isn't built to
  query multiple collections and return the results coherently.
 
  The root problem here is that you're trying to compare docs from
  different collections for goodness to return the top N. This isn't
  actually hard
  _except_ when goodness is the score, then it just doesn't work. You
 can't
  even compare scores from different queries on the _same_ collection, much
  less different ones. Consider two collections, books and songs. One
  consists
  of lots and lots of text and the ter frequency and inverse doc freq
  (TF/IDF)
  will be hugely different than songs. Not to mention field length
  normalization.
 
  Now, all that aside there's an option. Index all the docs in a single
  collection and
  use grouping (aka field collapsing) to get a single response that has the
  top N
  docs from each type (they'll be in different sections of the original
  response) and present
  them to the user however makes sense. You'll get hands on experience in
  why this isn't something that's easy to do automatically if you try to
  sort these
  into a single list by relevance G...
 
  Best
  Erick
 
  On Tue, Jun 25, 2013 at 3:35 PM, Chris Toomey ctoo...@gmail.com wrote:
   Thanks Jack for the alternatives.  The first is interesting but has the
   downside of requiring multiple queries to get the full matching docs.
   The
   second is interesting and very simple, but has the downside of not
 being
   modular and being difficult to configure field boosting when the
   collections have overlapping field names with different boosts being
  needed
   for the same field in different document types.
  
   I'd still like to know about the viability of my original approach
 though
   too.
  
   Chris
  
  
   On Tue, Jun 25, 2013 at 3:19 PM, Jack Krupansky 
 j...@basetechnology.com
  wrote:
  
   One simple scenario to consider: N+1 collections - one collection per
   document type with detailed fields for that document type, and one
  common
   collection that indexes a subset of the fields. The main user query
  would
   be an edismax over the common fields in that main collection. You
 can
   then display summary results from the common collection. You can also
  then
   support drill down into the type-specific collection based on a
 type
   field for each document in the main collection.
  
   Or, sure, you actually CAN index multiple document types in the same
   collection - add all the fields to one schema - there is no time or
  space
   penalty if most of the field are empty for most documents.
  
   -- Jack Krupansky
  
   -Original Message- From: Chris Toomey
   Sent: Tuesday, June 25, 2013 6:08 PM
   To: solr-user@lucene.apache.org
   Subject: Querying multiple collections in SolrCloud
  
  
   Hi, I'm investigating using SolrCloud for querying documents of
  different
   but similar/related types, and have read through docs. on the wiki and
  done
   many searches in these archives, but still have some questions.
  Thanks
  in
   advance for your help

Re: Querying multiple collections in SolrCloud

2013-06-26 Thread Erick Erickson

bq: Would the above setup qualify as multiple compatible collections

No. While there may be enough fields in common to form a single query,
the TF/IDF calculations will not be compatible and the scores from the
various collections will NOT be comparable. So simply getting the list of
top N docs will probably be dominated by the docs from a single type.

bq: How does SolrCloud combine the query results from multiple collections?

It doesn't. SolrCloud sorts the results from multiple nodes in the
_same_ collection
according to whatever sort criteria are specified, defaulting to score. Say you
ask for the top 20 docs. A node from each shard returns the top 20 docs for that
shard. The node processing them just merges all the returned lists and
only keeps
the top 20.

I don't think your last two questions are really relevant, SolrCloud
isn't built to
query multiple collections and return the results coherently.

The root problem here is that you're trying to compare docs from
different collections for goodness to return the top N. This isn't
actually hard
_except_ when goodness is the score, then it just doesn't work. You can't
even compare scores from different queries on the _same_ collection, much
less different ones. Consider two collections, books and songs. One consists
of lots and lots of text and the ter frequency and inverse doc freq (TF/IDF)
will be hugely different than songs. Not to mention field length normalization.

Now, all that aside there's an option. Index all the docs in a single
collection and
use grouping (aka field collapsing) to get a single response that has the top N
docs from each type (they'll be in different sections of the original
response) and present
them to the user however makes sense. You'll get hands on experience in
why this isn't something that's easy to do automatically if you try to
sort these
into a single list by relevance G...

Best
Erick

On Tue, Jun 25, 2013 at 3:35 PM, Chris Toomey ctoo...@gmail.com wrote:
Thanks Jack for the alternatives. The first is interesting but has the
downside of requiring multiple queries to get the full matching docs. The
second is interesting and very simple, but has the downside of not being
modular and being difficult to configure field boosting when the
collections have overlapping field names with different boosts being needed
for the same field in different document types.

I'd still like to know about the viability of my original approach though
too.

Chris

On Tue, Jun 25, 2013 at 3:19 PM, Jack Krupansky
j...@basetechnology.comwrote:

One simple scenario to consider: N+1 collections - one collection per
document type with detailed fields for that document type, and one common
collection that indexes a subset of the fields. The main user query would
be an edismax over the common fields in that main collection. You can
then display summary results from the common collection. You can also then
support drill down into the type-specific collection based on a type
field for each document in the main collection.

Or, sure, you actually CAN index multiple document types in the same
collection - add all the fields to one schema - there is no time or space
penalty if most of the field are empty for most documents.

-- Jack Krupansky

-Original Message- From: Chris Toomey
Sent: Tuesday, June 25, 2013 6:08 PM
To: solr-user@lucene.apache.org
Subject: Querying multiple collections in SolrCloud

Hi, I'm investigating using SolrCloud for querying documents of different
but similar/related types, and have read through docs. on the wiki and done
many searches in these archives, but still have some questions. Thanks in
advance for your help.

Setup:
* Say that I have N distinct types of documents and I want to do queries
that return the best matches regardless document type. I.e., something
akin to a Google search where I'd like to get the best matches from the
web, news, images, and maps.

* Our main use case is supporting simple user-entered searches, which would
just contain terms / phrases and wouldn't specify fields.

* The document types will not all have the same fields, though there may be
some overlap in the fields.

* We plan to use a separate collection for each document type, and to use
the eDisMax query parser. Each collection would have a document-specific
schema configuration with appropriate defaults for query fields and boosts,
etc.

Questions:
* Would the above setup qualify as multiple compatible collections, such
that we could search all N collections with a single SolrCloud query, as in
the example query
http://localhost:8983/solr/**collection1/select?q=apple%**
20piecollection=c1,c2,..http://localhost:8983/solr/collection1/select?q=apple%20piecollection=c1,c2,..
.,cN**?
Again, we're not querying against specific fields.

* How does SolrCloud combine the query results from multiple collections?
Does it re-sort the combined result set, or does it just return

Re: Querying multiple collections in SolrCloud

2013-06-26 Thread Chris Toomey

Thanks Erick, that's a very helpful answer.

Regarding the grouping option, does that require all the docs to be put
into a single collection, or could it be done with across N collections
(assuming each collection had a common type field for grouping on)?

Chris


On Wed, Jun 26, 2013 at 7:01 AM, Erick Erickson erickerick...@gmail.comwrote:

 bq: Would the above setup qualify as multiple compatible collections

 No. While there may be enough fields in common to form a single query,
 the TF/IDF calculations will not be compatible and the scores from the
 various collections will NOT be comparable. So simply getting the list of
 top N docs will probably be dominated by the docs from a single type.

 bq: How does SolrCloud combine the query results from multiple collections?

 It doesn't. SolrCloud sorts the results from multiple nodes in the
 _same_ collection
 according to whatever sort criteria are specified, defaulting to score.
 Say you
 ask for the top 20 docs. A node from each shard returns the top 20 docs
 for that
 shard. The node processing them just merges all the returned lists and
 only keeps
 the top 20.

 I don't think your last two questions are really relevant, SolrCloud
 isn't built to
 query multiple collections and return the results coherently.

 The root problem here is that you're trying to compare docs from
 different collections for goodness to return the top N. This isn't
 actually hard
 _except_ when goodness is the score, then it just doesn't work. You can't
 even compare scores from different queries on the _same_ collection, much
 less different ones. Consider two collections, books and songs. One
 consists
 of lots and lots of text and the ter frequency and inverse doc freq
 (TF/IDF)
 will be hugely different than songs. Not to mention field length
 normalization.

 Now, all that aside there's an option. Index all the docs in a single
 collection and
 use grouping (aka field collapsing) to get a single response that has the
 top N
 docs from each type (they'll be in different sections of the original
 response) and present
 them to the user however makes sense. You'll get hands on experience in
 why this isn't something that's easy to do automatically if you try to
 sort these
 into a single list by relevance G...

 Best
 Erick

 On Tue, Jun 25, 2013 at 3:35 PM, Chris Toomey ctoo...@gmail.com wrote:
  Thanks Jack for the alternatives.  The first is interesting but has the
  downside of requiring multiple queries to get the full matching docs.
  The
  second is interesting and very simple, but has the downside of not being
  modular and being difficult to configure field boosting when the
  collections have overlapping field names with different boosts being
 needed
  for the same field in different document types.
 
  I'd still like to know about the viability of my original approach though
  too.
 
  Chris
 
 
  On Tue, Jun 25, 2013 at 3:19 PM, Jack Krupansky j...@basetechnology.com
 wrote:
 
  One simple scenario to consider: N+1 collections - one collection per
  document type with detailed fields for that document type, and one
 common
  collection that indexes a subset of the fields. The main user query
 would
  be an edismax over the common fields in that main collection. You can
  then display summary results from the common collection. You can also
 then
  support drill down into the type-specific collection based on a type
  field for each document in the main collection.
 
  Or, sure, you actually CAN index multiple document types in the same
  collection - add all the fields to one schema - there is no time or
 space
  penalty if most of the field are empty for most documents.
 
  -- Jack Krupansky
 
  -Original Message- From: Chris Toomey
  Sent: Tuesday, June 25, 2013 6:08 PM
  To: solr-user@lucene.apache.org
  Subject: Querying multiple collections in SolrCloud
 
 
  Hi, I'm investigating using SolrCloud for querying documents of
 different
  but similar/related types, and have read through docs. on the wiki and
 done
  many searches in these archives, but still have some questions.  Thanks
 in
  advance for your help.
 
  Setup:
  * Say that I have N distinct types of documents and I want to do queries
  that return the best matches regardless document type.  I.e., something
  akin to a Google search where I'd like to get the best matches from the
  web, news, images, and maps.
 
  * Our main use case is supporting simple user-entered searches, which
 would
  just contain terms / phrases and wouldn't specify fields.
 
  * The document types will not all have the same fields, though there
 may be
  some overlap in the fields.
 
  * We plan to use a separate collection for each document type, and to
 use
  the eDisMax query parser.  Each collection would have a
 document-specific
  schema configuration with appropriate defaults for query fields and
 boosts,
  etc.
 
  Questions:
  * Would the above setup qualify as multiple compatible collections

Querying multiple collections in SolrCloud

2013-06-25 Thread Chris Toomey

Hi, I'm investigating using SolrCloud for querying documents of different
but similar/related types, and have read through docs. on the wiki and done
many searches in these archives, but still have some questions.  Thanks in
advance for your help.

Setup:
* Say that I have N distinct types of documents and I want to do queries
that return the best matches regardless document type.  I.e., something
akin to a Google search where I'd like to get the best matches from the
web, news, images, and maps.

* Our main use case is supporting simple user-entered searches, which would
just contain terms / phrases and wouldn't specify fields.

* The document types will not all have the same fields, though there may be
some overlap in the fields.

* We plan to use a separate collection for each document type, and to use
the eDisMax query parser.  Each collection would have a document-specific
schema configuration with appropriate defaults for query fields and boosts,
etc.

Questions:
* Would the above setup qualify as multiple compatible collections, such
that we could search all N collections with a single SolrCloud query, as in
the example query 
http://localhost:8983/solr/collection1/select?q=apple%20piecollection=c1,c2,...,cN;?
 Again, we're not querying against specific fields.

* How does SolrCloud combine the query results from multiple collections?
 Does it re-sort the combined result set, or does it just return the
concatenation of the (unmerged) results from each of the collections?

* Does SolrCloud impose any restrictions on querying multiple, sharded
collections?  I know it supports querying say all 3 shards of a single
collection, so want to make sure it would also support say all Nx3 shards
of N collections.

* When SolrCloud queries multiple shards/collections, it queries them
concurrently vs. serially, correct?

thanks much,
Chris

Re: Querying multiple collections in SolrCloud

2013-06-25 Thread Jack Krupansky

One simple scenario to consider: N+1 collections - one collection per 
document type with detailed fields for that document type, and one common 
collection that indexes a subset of the fields. The main user query would be 
an edismax over the common fields in that main collection. You can then 
display summary results from the common collection. You can also then 
support drill down into the type-specific collection based on a type 
field for each document in the main collection.


Or, sure, you actually CAN index multiple document types in the same 
collection - add all the fields to one schema - there is no time or space 
penalty if most of the field are empty for most documents.


-- Jack Krupansky

-Original Message- 
From: Chris Toomey

Sent: Tuesday, June 25, 2013 6:08 PM
To: solr-user@lucene.apache.org
Subject: Querying multiple collections in SolrCloud

Hi, I'm investigating using SolrCloud for querying documents of different
but similar/related types, and have read through docs. on the wiki and done
many searches in these archives, but still have some questions.  Thanks in
advance for your help.

Setup:
* Say that I have N distinct types of documents and I want to do queries
that return the best matches regardless document type.  I.e., something
akin to a Google search where I'd like to get the best matches from the
web, news, images, and maps.

* Our main use case is supporting simple user-entered searches, which would
just contain terms / phrases and wouldn't specify fields.

* The document types will not all have the same fields, though there may be
some overlap in the fields.

* We plan to use a separate collection for each document type, and to use
the eDisMax query parser.  Each collection would have a document-specific
schema configuration with appropriate defaults for query fields and boosts,
etc.

Questions:
* Would the above setup qualify as multiple compatible collections, such
that we could search all N collections with a single SolrCloud query, as in
the example query 
http://localhost:8983/solr/collection1/select?q=apple%20piecollection=c1,c2,...,cN;?
Again, we're not querying against specific fields.

* How does SolrCloud combine the query results from multiple collections?
Does it re-sort the combined result set, or does it just return the
concatenation of the (unmerged) results from each of the collections?

* Does SolrCloud impose any restrictions on querying multiple, sharded
collections?  I know it supports querying say all 3 shards of a single
collection, so want to make sure it would also support say all Nx3 shards
of N collections.

* When SolrCloud queries multiple shards/collections, it queries them
concurrently vs. serially, correct?

thanks much,
Chris

Re: Querying multiple collections in SolrCloud

2013-06-25 Thread Chris Toomey

Thanks Jack for the alternatives. The first is interesting but has the
downside of requiring multiple queries to get the full matching docs. The
second is interesting and very simple, but has the downside of not being
modular and being difficult to configure field boosting when the
collections have overlapping field names with different boosts being needed
for the same field in different document types.

I'd still like to know about the viability of my original approach though
too.

Chris

On Tue, Jun 25, 2013 at 3:19 PM, Jack Krupansky j...@basetechnology.comwrote:

-- Jack Krupansky

-Original Message- From: Chris Toomey
Sent: Tuesday, June 25, 2013 6:08 PM
To: solr-user@lucene.apache.org
Subject: Querying multiple collections in SolrCloud

* Our main use case is supporting simple user-entered searches, which would
just contain terms / phrases and wouldn't specify fields.

* The document types will not all have the same fields, though there may be
some overlap in the fields.

* How does SolrCloud combine the query results from multiple collections?
Does it re-sort the combined result set, or does it just return the
concatenation of the (unmerged) results from each of the collections?

* Does SolrCloud impose any restrictions on querying multiple, sharded
collections? I know it supports querying say all 3 shards of a single
collection, so want to make sure it would also support say all Nx3 shards
of N collections.

* When SolrCloud queries multiple shards/collections, it queries them
concurrently vs. serially, correct?

thanks much,
Chris

Re: Search across multiple collections

2013-06-06 Thread Erick Erickson

You pretty much need to issue separate
queries against each collection and creatively
combine them. All of Solr's distributed search
stuff pre-supposes two things
1 the schemas are very similar
2 the types of docs in each collection are also
 very similar.

2 is a bit subtle. If you store different kinds of
docs in different cores, then that statistics for
term frequency etc. will be different. There's some
work being done (I think) to support distributed
tf/idf. But anyway, in this case the scores of the
docs from one collection will tend to dominate the
result set.

Or if you're talking about joining, see Anria's comments.

Best
Erick

On Wed, Jun 5, 2013 at 7:34 PM,  abillav...@innoventsolutions.com wrote:
 hi
 I've successfully searched over several separate collections (cores with
 unique schemas) using this kind of syntax.  This demonstrates a 2 core
 search

 http://localhost:8983/solr/collection1/select?
 q=my phrase to search on
 start=0
 rows=25
 fl=*,score
 fq={!join+fromIndex=collection2+from=sku+to=sku}id:1571


 I've split up the parameters so you see easily
 fq={!join+fromIndex=collection2+from=sku+to=sku}id:1571

 -- collection1/select  = use the select requestHandler out of collection1
 as a base
 -- collection2 is the 2nd core : equivalent of a table join in SQL
 -- sku is the field shared in both collection1, and collection2
 -- id is the field I want to find the id=1571 in.

 Hope this helps
 Anria




 On 2013-06-05 16:17, bbarani wrote:

 I am not sure the best way to search across multiple collection using SOLR
 4.3.

 Suppose, each collection have their own config files and I perform various
 operations on collections individually but when I search I want the search
 to happen across all collections. Can someone let me know how to perform
 search on multiple collections? Do I need to use sharding again?



 --
 View this message in context:

 http://lucene.472066.n3.nabble.com/Search-across-multiple-collections-tp4068469.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Search across multiple collections

2013-06-05 Thread bbarani

I am not sure the best way to search across multiple collection using SOLR
4.3. 

Suppose, each collection have their own config files and I perform various
operations on collections individually but when I search I want the search
to happen across all collections. Can someone let me know how to perform
search on multiple collections? Do I need to use sharding again?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-across-multiple-collections-tp4068469.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Search across multiple collections

2013-06-05 Thread abillavara


hi
I've successfully searched over several separate collections (cores 
with unique schemas) using this kind of syntax.  This demonstrates a 2 
core search


http://localhost:8983/solr/collection1/select?
q=my phrase to search on
start=0
rows=25
fl=*,score
fq={!join+fromIndex=collection2+from=sku+to=sku}id:1571


I've split up the parameters so you see easily
fq={!join+fromIndex=collection2+from=sku+to=sku}id:1571

-- collection1/select  = use the select requestHandler out of 
collection1 as a base

-- collection2 is the 2nd core : equivalent of a table join in SQL
-- sku is the field shared in both collection1, and collection2
-- id is the field I want to find the id=1571 in.

Hope this helps
Anria



On 2013-06-05 16:17, bbarani wrote:
I am not sure the best way to search across multiple collection using 
SOLR

4.3.

Suppose, each collection have their own config files and I perform 
various
operations on collections individually but when I search I want the 
search
to happen across all collections. Can someone let me know how to 
perform

search on multiple collections? Do I need to use sharding again?



--
View this message in context:
http://lucene.472066.n3.nabble.com/Search-across-multiple-collections-tp4068469.html
Sent from the Solr - User mailing list archive at Nabble.com.

Searching across multiple collections (cores)

2013-03-14 Thread kfdroid

I've been looking all over for a clear answer to this question and can't seem
to find one. It seems like a very basic concept to me though so maybe I'm
using the wrong terminology.  I want to be able to search across multiple
collections (as it is now called in SolrCloud world, previously called
Cores).  I want the scoring, sorting, faceting etc. to be blended, that is
to be relevant to data from all the collections, not just a set of
independent results per collection.  Is that possible?

A real-world example would be a merchandise site that has books, movies and
music. The index for each of those is quite different and they would have
their own schema.xml (and therefore be their own Collection). When in the
'books' area of a website the users could search on fields specific to books
(ISBN for example). However on a 'home' page a search would span across all
3 product lines, and the results should be scored relative to each other,
not just relative to other items in their specific collection. 

Is this possible in v4.0? I'm pretty sure it wasn't in v1.4.1. But it seems
to be a fundamentally useful concept, I was wondering if it had been
addressed yet.
Thanks,
Ken



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-across-multiple-collections-cores-tp4047457.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searching across multiple collections (cores)

2013-03-14 Thread Mark Miller

Yes, with SolrCloud, it's just the collection param (as long as the schemas are
compatible for this):

http://wiki.apache.org/solr/SolrCloud#Distributed_Requests

- Mark

On Mar 14, 2013, at 2:55 PM, kfdroid kfdr...@gmail.com wrote:

I've been looking all over for a clear answer to this question and can't seem
to find one. It seems like a very basic concept to me though so maybe I'm
using the wrong terminology. I want to be able to search across multiple
collections (as it is now called in SolrCloud world, previously called
Cores). I want the scoring, sorting, faceting etc. to be blended, that is
to be relevant to data from all the collections, not just a set of
independent results per collection. Is that possible?

A real-world example would be a merchandise site that has books, movies and
music. The index for each of those is quite different and they would have
their own schema.xml (and therefore be their own Collection). When in the
'books' area of a website the users could search on fields specific to books
(ISBN for example). However on a 'home' page a search would span across all
3 product lines, and the results should be scored relative to each other,
not just relative to other items in their specific collection.

Is this possible in v4.0? I'm pretty sure it wasn't in v1.4.1. But it seems
to be a fundamentally useful concept, I was wondering if it had been
addressed yet.
Thanks,
Ken

--
View this message in context:
http://lucene.472066.n3.nabble.com/Searching-across-multiple-collections-cores-tp4047457.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple Collections in one Zookeeper

2013-03-09 Thread jimtronic

Ok, I'm a little confused.

I had originally bootstrapped zookeeper using a solr.xml file which
specified the following cores:

cats
dogs
birds

In my /solr/#/cloud?view=tree view I see that I have

/collections
 /cats
 /dogs
 /birds
/configs
 /cats
 /dogs
 /birds

When I launch a new server and connect it to zookeeper, it creates all three
collections. What I'd like to do is move cats to it's own set of boxes. 

When I run:

java -DzkHost=zookeeper:9893/cats -jar start.jar

or

java -DzkHost=zookeeper:9893,zookeeper:9893/cats -jar start.jar


I get this error:

SEVERE: Could not create Overseer node

For simplicity, I'd like to only have zookeeper ensemble.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Collections-in-one-Zookeeper-tp4045936p4045981.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple Collections in one Zookeeper

2013-03-09 Thread Mark Miller

You want to create both under different root nodes in zk, so that you would have

/cluster1
and
/cluster2

Then you startup with addresses of:

 zookeeper:{port1},zookeeper:{port2}/cluster1

 zookeeper:{port2},zookeeper:{port2}/cluster2

If you are using one of the bootstrap calls on startup, it should create those 
for you with Solr 4.1, otherwise you have to create the root nodes ahead of 
time (you can use the zkcli tool we provide).

- mark


On Mar 9, 2013, at 2:38 AM, jimtronic jimtro...@gmail.com wrote:

 Ok, I'm a little confused.
 
 I had originally bootstrapped zookeeper using a solr.xml file which
 specified the following cores:
 
 cats
 dogs
 birds
 
 In my /solr/#/cloud?view=tree view I see that I have
 
 /collections
 /cats
 /dogs
 /birds
 /configs
 /cats
 /dogs
 /birds
 
 When I launch a new server and connect it to zookeeper, it creates all three
 collections. What I'd like to do is move cats to it's own set of boxes. 
 
 When I run:
 
 java -DzkHost=zookeeper:9893/cats -jar start.jar
 
 or
 
 java -DzkHost=zookeeper:9893,zookeeper:9893/cats -jar start.jar
 
 
 I get this error:
 
 SEVERE: Could not create Overseer node
 
 For simplicity, I'd like to only have zookeeper ensemble.
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Multiple-Collections-in-one-Zookeeper-tp4045936p4045981.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Multiple Collections in one Zookeeper

2013-03-08 Thread jimtronic

Hi, 

I have a solrcloud cluster running several cores and pointing at one
zookeeper.

For performance reasons, I'd like to move one of the cores on to it's own
dedicated cluster of servers. Can I use the same zookeeper to keep track of
both clusters.

Thanks!
Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Collections-in-one-Zookeeper-tp4045936.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple Collections in one Zookeeper

2013-03-08 Thread Michael Della Bitta

Yes, but you'll need to append a sub path on to the zookeeper path for your
second cluster. For ex:

zookeeper1.example.com,zookeeper2.example.com,zookeeper3.example.com/subpath
On Mar 8, 2013 6:46 PM, jimtronic jimtro...@gmail.com wrote:

 Hi,

 I have a solrcloud cluster running several cores and pointing at one
 zookeeper.

 For performance reasons, I'd like to move one of the cores on to it's own
 dedicated cluster of servers. Can I use the same zookeeper to keep track of
 both clusters.

 Thanks!
 Jim



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Multiple-Collections-in-one-Zookeeper-tp4045936.html
 Sent from the Solr - User mailing list archive at Nabble.com.

1 2 >

1 - 100 of 125 matches

Mail list logo