SolrCloud Questions for MultiCore Setup

2011-01-22 Thread Em

Hello list,

i want to experiment with the new SolrCloud feature. So far, I got
absolutely no experience in distributed search with Solr.
However, there are some things that remain unclear to me:

1 ) What is the usecase of a collection?
As far as I understood: A collection is the same as a core but in a
distributed sense. It contains a set of cores on one or multiple machines.
It makes sense that all the cores in a collection got the same schema and
solrconfig - right?
Can someone tell me if I understood the concept of a collection correctly?

2 ) The wiki says this will cause an update
-Durl=http://localhost:8983/solr/collection1/update
However, as far as I know this cause an update to a CORE named "collection1"
at localhost:8983, not to the full collection. Am I correct here?
So *I* have to care about consistency between the different replicas inside
my cloud?

3 ) If I got replicas of the same shard inside a collection, how does
SolrCloud determine that two documents in a result set are equal? Is it
neccessary to define a unique key? Is it random which of the two documents
is picked into the final resultset?

---
I think these are my most basic questions.
However, there is one more tricky thing:

If I understood the collection-idea correctly: What happens if I create two
cores and each core belongs to a different collection and THEN I do a SWAP. 
Say: core1->collection1, core2->collection2 
SWAP core1,core2
Does core2 now maps to collection1?

Thank you!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Questions-for-MultiCore-Setup-tp2309443p2309443.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Questions for MultiCore Setup

2011-01-22 Thread Lance Norskog
A "collection" is your data, like newspaper articles or movie titles.
It is a user-level concept, not really a Solr design concept.

A "core" is a Solr/Lucene index. It is addressable as
solr/collection-name on one machine.

You can use a core to store a collection, or you can break it up among
multiple cores (usually for performance reasons). When you use a core
like this, it is called a "shard". All of the different shards of a
collection form the collection.

Solr has a feature called Distributed Search that presents the
separate shards as if it were one Solr collection. You should set up
Distributed Search first. It does not use SolrCloud, but shows you how
these ideas work. After that, Solr Cloud will make more sense.

Lance

On Sat, Jan 22, 2011 at 9:35 AM, Em  wrote:
>
> Hello list,
>
> i want to experiment with the new SolrCloud feature. So far, I got
> absolutely no experience in distributed search with Solr.
> However, there are some things that remain unclear to me:
>
> 1 ) What is the usecase of a collection?
> As far as I understood: A collection is the same as a core but in a
> distributed sense. It contains a set of cores on one or multiple machines.
> It makes sense that all the cores in a collection got the same schema and
> solrconfig - right?
> Can someone tell me if I understood the concept of a collection correctly?
>
> 2 ) The wiki says this will cause an update
> -Durl=http://localhost:8983/solr/collection1/update
> However, as far as I know this cause an update to a CORE named "collection1"
> at localhost:8983, not to the full collection. Am I correct here?
> So *I* have to care about consistency between the different replicas inside
> my cloud?
>
> 3 ) If I got replicas of the same shard inside a collection, how does
> SolrCloud determine that two documents in a result set are equal? Is it
> neccessary to define a unique key? Is it random which of the two documents
> is picked into the final resultset?
>
> ---
> I think these are my most basic questions.
> However, there is one more tricky thing:
>
> If I understood the collection-idea correctly: What happens if I create two
> cores and each core belongs to a different collection and THEN I do a SWAP.
> Say: core1->collection1, core2->collection2
> SWAP core1,core2
> Does core2 now maps to collection1?
>
> Thank you!
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-Questions-for-MultiCore-Setup-tp2309443p2309443.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com


Re: SolrCloud Questions for MultiCore Setup

2011-01-23 Thread Em

Hi Lance,

thanks for your explanation.

As far as I know in distributed search i have to tell Solr what other shards
it has to query. So, if I want to query a specific core, present in all my
shards, i could tell Solr this by using the shards-param plus specified core
on each shard.

Using SolrCloud's distrib=true feature (it sets all the known shards
automatically?), a collection should consist only of one type of
core-schema, correct?
How does SolrCloud knows that shard_x and shard_y are replicas of eachother
(I took a look at the  possibility to specify alternative shards if one is
not available)? If it does not know that they are replicas of eachother, I
should use the syntax of specifying alternative shards for failover due to
performance-reasons, because querying 2 identical and available cores seems
to be wasted capacity, no? 

Thank you!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Questions-for-MultiCore-Setup-tp2309443p2312763.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Questions for MultiCore Setup

2011-01-23 Thread Em

By the way: although I am asking for SolrCloud explicitly again, I will take
your advice and try distributed search first to understand the concept
better.

Regards


Em wrote:
> 
> Hi Lance,
> 
> thanks for your explanation.
> 
> As far as I know in distributed search i have to tell Solr what other
> shards it has to query. So, if I want to query a specific core, present in
> all my shards, i could tell Solr this by using the shards-param plus
> specified core on each shard.
> 
> Using SolrCloud's distrib=true feature (it sets all the known shards
> automatically?), a collection should consist only of one type of
> core-schema, correct?
> How does SolrCloud knows that shard_x and shard_y are replicas of
> eachother (I took a look at the  possibility to specify alternative shards
> if one is not available)? If it does not know that they are replicas of
> eachother, I should use the syntax of specifying alternative shards for
> failover due to performance-reasons, because querying 2 identical and
> available cores seems to be wasted capacity, no? 
> 
> Thank you!
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Questions-for-MultiCore-Setup-tp2309443p2312767.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Questions for MultiCore Setup

2011-01-24 Thread Em

Hi,

just wanted to push this topic again.

Thank you!


Em wrote:
> 
> By the way: although I am asking for SolrCloud explicitly again, I will
> take your advice and try distributed search first to understand the
> concept better.
> 
> Regards
> 
> 
> Em wrote:
>> 
>> Hi Lance,
>> 
>> thanks for your explanation.
>> 
>> As far as I know in distributed search i have to tell Solr what other
>> shards it has to query. So, if I want to query a specific core, present
>> in all my shards, i could tell Solr this by using the shards-param plus
>> specified core on each shard.
>> 
>> Using SolrCloud's distrib=true feature (it sets all the known shards
>> automatically?), a collection should consist only of one type of
>> core-schema, correct?
>> How does SolrCloud knows that shard_x and shard_y are replicas of
>> eachother (I took a look at the  possibility to specify alternative
>> shards if one is not available)? If it does not know that they are
>> replicas of eachother, I should use the syntax of specifying alternative
>> shards for failover due to performance-reasons, because querying 2
>> identical and available cores seems to be wasted capacity, no? 
>> 
>> Thank you!
>> 
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Questions-for-MultiCore-Setup-tp2309443p2327089.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Questions for MultiCore Setup

2011-01-27 Thread Em

Hi,

excuse me for pushing this for a second time, but I can't figure it out by
looking at the source code...

Thanks!



> Hi Lance, 
> 
> thanks for your explanation. 
> 
> As far as I know in distributed search i have to tell Solr what other
> shards it has to query. So, if I want to query a specific core, present in
> all my shards, i could tell Solr this by using the shards-param plus
> specified core on each shard. 
> 
> Using SolrCloud's distrib=true feature (it sets all the known shards
> automatically?), a collection should consist only of one type of
> core-schema, correct? 
> How does SolrCloud knows that shard_x and shard_y are replicas of
> eachother (I took a look at the  possibility to specify alternative shards
> if one is not available)? If it does not know that they are replicas of
> eachother, I should use the syntax of specifying alternative shards for
> failover due to performance-reasons, because querying 2 identical and
> available cores seems to be wasted capacity, no? 
> 
> Thank you!
> 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Questions-for-MultiCore-Setup-tp2309443p2363396.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Questions for MultiCore Setup

2011-01-27 Thread Lance Norskog
Hello-

I have not used SolrCloud.

On 1/27/11, Em  wrote:
>
> Hi,
>
> excuse me for pushing this for a second time, but I can't figure it out by
> looking at the source code...
>
> Thanks!
>
>
>
>> Hi Lance,
>>
>> thanks for your explanation.
>>
>> As far as I know in distributed search i have to tell Solr what other
>> shards it has to query. So, if I want to query a specific core, present in
>> all my shards, i could tell Solr this by using the shards-param plus
>> specified core on each shard.
>>
>> Using SolrCloud's distrib=true feature (it sets all the known shards
>> automatically?), a collection should consist only of one type of
>> core-schema, correct?
>> How does SolrCloud knows that shard_x and shard_y are replicas of
>> eachother (I took a look at the  possibility to specify alternative shards
>> if one is not available)? If it does not know that they are replicas of
>> eachother, I should use the syntax of specifying alternative shards for
>> failover due to performance-reasons, because querying 2 identical and
>> available cores seems to be wasted capacity, no?
>>
>> Thank you!
>>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Questions-for-MultiCore-Setup-tp2309443p2363396.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


-- 
Lance Norskog
goks...@gmail.com