Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Otis Gospodnetic
Aha!  See, Kuli, I wasn't making it up! ;)

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 



>
> From: Robert Stewart 
>To: solr-user@lucene.apache.org 
>Sent: Monday, May 14, 2012 11:23 AM
>Subject: Re: Solr Shards multi core slower then single big core
> 
>We used to have one large index - then moved to 10 shards (7 million docs 
>each) - parallel search across all shards, and we get better performance that 
>way.  We use a 40 core box with 128GB ram.  We do a lot of faceting so maybe 
>that is why since facets can be built in parallel on different threads/cores.  
>We also have indexes on fast local disks (6 15K RPM disks using raid stripes).
>
>
>On May 14, 2012, at 10:42 AM, Michael Della Bitta wrote:
>
>> Hi, all,
>> 
>> I've been running into murmurs about this idea elsewhere:
>> 
>> http://stackoverflow.com/questions/8698762/run-multiple-big-solr-shard-instances-on-one-physical-machine
>> 
>> http://java.dzone.com/articles/optimizing-solr-or-how-7x-your?mz=33057-solr_lucene
>> 
>> Michael
>> 
>> On Mon, May 14, 2012 at 10:29 AM, Otis Gospodnetic
>>  wrote:
>>> Hi Kuli,
>>> 
>>> As long as there are enough CPUs with spare cycles and disk IO is not a 
>>> bottleneck, this works faster.  This was 12+ months ago.
>>> 
>>> Otis
>>> 
>>> Performance Monitoring for Solr / ElasticSearch / HBase - 
>>> http://sematext.com/spm
>>> 
>>> 
>>> 
>>>> 
>>>> From: Michael Kuhlmann 
>>>> To: solr-user@lucene.apache.org
>>>> Sent: Monday, May 14, 2012 10:21 AM
>>>> Subject: Re: Solr Shards multi core slower then single big core
>>>> 
>>>> Am 14.05.2012 16:18, schrieb Otis Gospodnetic:
>>>>> Hi Kuli,
>>>>> 
>>>>> In a client engagement, I did see this (N shards on 1 beefy box with lots 
>>>>> of RAM and CPU cores) be faster than 1 big index.
>>>>> 
>>>> 
>>>> I want to believe you, but I also want to understand. Can you explain
>>>> why? And did this only happen for single requests, or even under heavy 
>>>> load?
>>>> 
>>>> Greetings,
>>>> Kuli
>>>> 
>>>> 
>>>> 
>
>
>
>

Re: Solr Shards multi core slower then single big core

2012-05-14 Thread arjit
Robert can you tell what you mean when you say "We do a lot of faceting so
maybe that is why since facets can be built in parallel on different
threads/cores". I am novice in solr. Can you tell me where Can i read about
it ?
Thanks ,
Arjit



On Mon, May 14, 2012 at 8:54 PM, Robert Stewart [via Lucene] <
ml-node+s472066n3983692...@n3.nabble.com> wrote:

> We used to have one large index - then moved to 10 shards (7 million docs
> each) - parallel search across all shards, and we get better performance
> that way.  We use a 40 core box with 128GB ram.  We do a lot of faceting so
> maybe that is why since facets can be built in parallel on different
> threads/cores.  We also have indexes on fast local disks (6 15K RPM disks
> using raid stripes).
>
>
> On May 14, 2012, at 10:42 AM, Michael Della Bitta wrote:
>
> > Hi, all,
> >
> > I've been running into murmurs about this idea elsewhere:
> >
> >
> http://stackoverflow.com/questions/8698762/run-multiple-big-solr-shard-instances-on-one-physical-machine
> >
> >
> http://java.dzone.com/articles/optimizing-solr-or-how-7x-your?mz=33057-solr_lucene
> >
> > Michael
> >
> > On Mon, May 14, 2012 at 10:29 AM, Otis Gospodnetic
> > <[hidden email] <http://user/SendEmail.jtp?type=node&node=3983692&i=0>>
> wrote:
> >> Hi Kuli,
> >>
> >> As long as there are enough CPUs with spare cycles and disk IO is not a
> bottleneck, this works faster.  This was 12+ months ago.
> >>
> >> Otis
> >> 
> >> Performance Monitoring for Solr / ElasticSearch / HBase -
> http://sematext.com/spm
> >>
> >>
> >>
> >>> ________
> >>> From: Michael Kuhlmann <[hidden 
> >>> email]<http://user/SendEmail.jtp?type=node&node=3983692&i=1>>
>
> >>> To: [hidden email]<http://user/SendEmail.jtp?type=node&node=3983692&i=2>
> >>> Sent: Monday, May 14, 2012 10:21 AM
> >>> Subject: Re: Solr Shards multi core slower then single big core
> >>>
> >>> Am 14.05.2012 16:18, schrieb Otis Gospodnetic:
> >>>> Hi Kuli,
> >>>>
> >>>> In a client engagement, I did see this (N shards on 1 beefy box with
> lots of RAM and CPU cores) be faster than 1 big index.
> >>>>
> >>>
> >>> I want to believe you, but I also want to understand. Can you explain
> >>> why? And did this only happen for single requests, or even under heavy
> load?
> >>>
> >>> Greetings,
> >>> Kuli
> >>>
> >>>
> >>>
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Solr-Shards-multi-core-slower-then-single-big-core-tp3979115p3983692.html
>  To unsubscribe from Solr Shards multi core slower then single big core, click
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3979115&code=YXJqaXQyOTJAZ21haWwuY29tfDM5NzkxMTV8MTIwOTQwMDU4MA==>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Shards-multi-core-slower-then-single-big-core-tp3979115p3983697.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Robert Stewart
We used to have one large index - then moved to 10 shards (7 million docs each) 
- parallel search across all shards, and we get better performance that way.  
We use a 40 core box with 128GB ram.  We do a lot of faceting so maybe that is 
why since facets can be built in parallel on different threads/cores.  We also 
have indexes on fast local disks (6 15K RPM disks using raid stripes).


On May 14, 2012, at 10:42 AM, Michael Della Bitta wrote:

> Hi, all,
> 
> I've been running into murmurs about this idea elsewhere:
> 
> http://stackoverflow.com/questions/8698762/run-multiple-big-solr-shard-instances-on-one-physical-machine
> 
> http://java.dzone.com/articles/optimizing-solr-or-how-7x-your?mz=33057-solr_lucene
> 
> Michael
> 
> On Mon, May 14, 2012 at 10:29 AM, Otis Gospodnetic
>  wrote:
>> Hi Kuli,
>> 
>> As long as there are enough CPUs with spare cycles and disk IO is not a 
>> bottleneck, this works faster.  This was 12+ months ago.
>> 
>> Otis
>> 
>> Performance Monitoring for Solr / ElasticSearch / HBase - 
>> http://sematext.com/spm
>> 
>> 
>> 
>>> ____
>>> From: Michael Kuhlmann 
>>> To: solr-user@lucene.apache.org
>>> Sent: Monday, May 14, 2012 10:21 AM
>>> Subject: Re: Solr Shards multi core slower then single big core
>>> 
>>> Am 14.05.2012 16:18, schrieb Otis Gospodnetic:
>>>> Hi Kuli,
>>>> 
>>>> In a client engagement, I did see this (N shards on 1 beefy box with lots 
>>>> of RAM and CPU cores) be faster than 1 big index.
>>>> 
>>> 
>>> I want to believe you, but I also want to understand. Can you explain
>>> why? And did this only happen for single requests, or even under heavy load?
>>> 
>>> Greetings,
>>> Kuli
>>> 
>>> 
>>> 



Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Michael Della Bitta
Hi, all,

I've been running into murmurs about this idea elsewhere:

http://stackoverflow.com/questions/8698762/run-multiple-big-solr-shard-instances-on-one-physical-machine

http://java.dzone.com/articles/optimizing-solr-or-how-7x-your?mz=33057-solr_lucene

Michael

On Mon, May 14, 2012 at 10:29 AM, Otis Gospodnetic
 wrote:
> Hi Kuli,
>
> As long as there are enough CPUs with spare cycles and disk IO is not a 
> bottleneck, this works faster.  This was 12+ months ago.
>
> Otis
> 
> Performance Monitoring for Solr / ElasticSearch / HBase - 
> http://sematext.com/spm
>
>
>
>>
>> From: Michael Kuhlmann 
>>To: solr-user@lucene.apache.org
>>Sent: Monday, May 14, 2012 10:21 AM
>>Subject: Re: Solr Shards multi core slower then single big core
>>
>>Am 14.05.2012 16:18, schrieb Otis Gospodnetic:
>>> Hi Kuli,
>>>
>>> In a client engagement, I did see this (N shards on 1 beefy box with lots 
>>> of RAM and CPU cores) be faster than 1 big index.
>>>
>>
>>I want to believe you, but I also want to understand. Can you explain
>>why? And did this only happen for single requests, or even under heavy load?
>>
>>Greetings,
>>Kuli
>>
>>
>>


Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Otis Gospodnetic
Hi Kuli,

As long as there are enough CPUs with spare cycles and disk IO is not a 
bottleneck, this works faster.  This was 12+ months ago.

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 



>
> From: Michael Kuhlmann 
>To: solr-user@lucene.apache.org 
>Sent: Monday, May 14, 2012 10:21 AM
>Subject: Re: Solr Shards multi core slower then single big core
> 
>Am 14.05.2012 16:18, schrieb Otis Gospodnetic:
>> Hi Kuli,
>>
>> In a client engagement, I did see this (N shards on 1 beefy box with lots of 
>> RAM and CPU cores) be faster than 1 big index.
>>
>
>I want to believe you, but I also want to understand. Can you explain 
>why? And did this only happen for single requests, or even under heavy load?
>
>Greetings,
>Kuli
>
>
>

Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Michael Kuhlmann

Am 14.05.2012 16:18, schrieb Otis Gospodnetic:

Hi Kuli,

In a client engagement, I did see this (N shards on 1 beefy box with lots of 
RAM and CPU cores) be faster than 1 big index.



I want to believe you, but I also want to understand. Can you explain 
why? And did this only happen for single requests, or even under heavy load?


Greetings,
Kuli


Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Otis Gospodnetic
Hi Kuli,

In a client engagement, I did see this (N shards on 1 beefy box with lots of 
RAM and CPU cores) be faster than 1 big index.

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 



>
> From: Michael Kuhlmann 
>To: solr-user@lucene.apache.org 
>Sent: Monday, May 14, 2012 7:56 AM
>Subject: Re: Solr Shards multi core slower then single big core
> 
>Am 14.05.2012 13:22, schrieb Sami Siren:
>>> Sharding is (nearly) always slower than using one big index with sufficient
>>> hardware resources. Only use sharding when your index is too huge to fit
>>> into one single machine.
>> 
>> If you're not constrained by CPU or IO, in other words have plenty of
>> CPU cores available together with for example separate hard discs for
>> each shard splitting your index into smaller shards can in some cases
>> make a huge difference in one box too.
>
>Do you have an example?
>
>This is hard to believe. If you've several shard on the same machine, you'll 
>need that much memory that each shard has enough for all its caches and duch. 
>With that lot of memory, a single Solr core should be really fast.
>
>If dividing the index is the reason, then a software RAID 0 (striping) should 
>be much better.
>
>The only point I see is the concurrent search for one request. Maybe, for 
>large requests, this might outweigh the sharding overhead, but only for 
>long-running requests without disk I/O. I only see the case when using very 
>complicated query functions. And, this only stays true as long as you don't 
>run multiple concurrent requests.
>
>Greetings,
>Kuli
>
>
>

Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Michael Kuhlmann

Am 14.05.2012 13:22, schrieb Sami Siren:

Sharding is (nearly) always slower than using one big index with sufficient
hardware resources. Only use sharding when your index is too huge to fit
into one single machine.


If you're not constrained by CPU or IO, in other words have plenty of
CPU cores available together with for example separate hard discs for
each shard splitting your index into smaller shards can in some cases
make a huge difference in one box too.


Do you have an example?

This is hard to believe. If you've several shard on the same machine, 
you'll need that much memory that each shard has enough for all its 
caches and duch. With that lot of memory, a single Solr core should be 
really fast.


If dividing the index is the reason, then a software RAID 0 (striping) 
should be much better.


The only point I see is the concurrent search for one request. Maybe, 
for large requests, this might outweigh the sharding overhead, but only 
for long-running requests without disk I/O. I only see the case when 
using very complicated query functions. And, this only stays true as 
long as you don't run multiple concurrent requests.


Greetings,
Kuli


Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Sami Siren
> Sharding is (nearly) always slower than using one big index with sufficient
> hardware resources. Only use sharding when your index is too huge to fit
> into one single machine.

If you're not constrained by CPU or IO, in other words have plenty of
CPU cores available together with for example separate hard discs for
each shard splitting your index into smaller shards can in some cases
make a huge difference in one box too.

--
 Sami Siren


Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Michael Kuhlmann

Am 14.05.2012 05:56, schrieb arjit:

Thanks Erick for the reply.
I have 6 cores which doesn't contain duplicated data. every core has some
unique data. What I thought was when I read it would read parallel 6 cores
and join the result and return the query. And this would be efficient then
reading one big core.


No, it's not. When you request 10 documents from Solr, it can't know in 
prior which shards contain how many of those documents. It could be that 
each shard only needs to fill one or two documents into the result, but 
it might be that only one shard conatins all ten docuemnts. Therefor, 
Solr needs to request 10 documents from each shard, then taking only the 
10 top documents from those 60 ones and drop the rest. And it gets worse 
when you set an offset of, say, 100.


Sharding is (nearly) always slower than using one big index with 
sufficient hardware resources. Only use sharding when your index is too 
huge to fit into one single machine.


Greetings,
Kuli


Re: Solr Shards multi core slower then single big core

2012-05-13 Thread arjit
Thanks Erick for the reply.
I have 6 cores which doesn't contain duplicated data. every core has some
unique data. What I thought was when I read it would read parallel 6 cores
and join the result and return the query. And this would be efficient then
reading one big core.
My question is wouldn't Solr read in  parallel from shards when a query is
fired to it ?

Please let me know If i am assuming something which is wrong.

Thanks ,
Arjit



On Sun, May 13, 2012 at 12:44 AM, Erick Erickson [via Lucene] <
ml-node+s472066n3982950...@n3.nabble.com> wrote:

> One of the points of sharding is to use more _machines_. Running multiple
> shards on a single machine is not magically going to make things faster.
> In
> fact I'd expect your process to consume more resources since the
> cores are now not sharing common data (i.e. having a single word
> in more than one core will use two instances of that word).
>
> Best
> Erick
>
> On Fri, May 11, 2012 at 3:38 AM, arjit <[hidden 
> email]>
> wrote:
>
> > My query is
> > SolrQuery sQuery = new SolrQuery(query.getQueryStr());
> >sQuery.setQueryType("dismax");
> >
> >
> >sQuery.setRows(100);
> >
> >if (!query.isSearchOnDefaultField()) {
> >sQuery.setParam("qf", queryFields.toArray(new
> > String[queryFields.size()]));
> >}
> >sQuery.setFields(visibleFields.toArray(new
> > String[visibleFields.size()]));
> >
> >if(query.isORQuery())
> >{
> >sQuery.setParam("mm","1");
> >}
> >
> > My search is
> >
> > 
> >
> > dismax
> > explicit
> > 0.01
> >   >
> name="shards">localhost:9090/solr/book1,localhost:9090/solr/book2,localhost:9090/solr/book3,localhost:9090/solr/book4,localhost:9090/solr/book5,localhost:9090/solr/book6
>
> > 
> >
> > text^2.0
> >
> >
> > 
> >
> > 
> >title item_id author titleMinusAuthor
> > 
> >
> > 4
> > *:*
> >
> > text features name
> >
> > 0
> >
> > name
> > regex
> >   
> >
> >  
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Shards-multi-core-slower-then-single-big-core-tp3979115p3979243.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Solr-Shards-multi-core-slower-then-single-big-core-tp3979115p3982950.html
>  To unsubscribe from Solr Shards multi core slower then single big core, click
> here
> .
> NAML
>


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Shards-multi-core-slower-then-single-big-core-tp3979115p3983601.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Shards multi core slower then single big core

2012-05-12 Thread Erick Erickson
One of the points of sharding is to use more _machines_. Running multiple
shards on a single machine is not magically going to make things faster. In
fact I'd expect your process to consume more resources since the
cores are now not sharing common data (i.e. having a single word
in more than one core will use two instances of that word).

Best
Erick

On Fri, May 11, 2012 at 3:38 AM, arjit  wrote:
> My query is
> SolrQuery sQuery = new SolrQuery(query.getQueryStr());
>        sQuery.setQueryType("dismax");
>
>
>        sQuery.setRows(100);
>
>        if (!query.isSearchOnDefaultField()) {
>            sQuery.setParam("qf", queryFields.toArray(new
> String[queryFields.size()]));
>        }
>        sQuery.setFields(visibleFields.toArray(new
> String[visibleFields.size()]));
>
>        if(query.isORQuery())
>        {
>            sQuery.setParam("mm","1");
>        }
>
> My search is
>
> 
>    
>     dismax
>     explicit
>     0.01
>       name="shards">localhost:9090/solr/book1,localhost:9090/solr/book2,localhost:9090/solr/book3,localhost:9090/solr/book4,localhost:9090/solr/book5,localhost:9090/solr/book6
>     
>
>         text^2.0
>
>
>     
>
>     
>        title item_id author titleMinusAuthor
>     
>
>     4
>     *:*
>
>     text features name
>
>     0
>
>     name
>     regex
>   
>
>      
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Shards-multi-core-slower-then-single-big-core-tp3979115p3979243.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Shards multi core slower then single big core

2012-05-12 Thread arjit
My query is 
SolrQuery sQuery = new SolrQuery(query.getQueryStr());
sQuery.setQueryType("dismax");


sQuery.setRows(100);

if (!query.isSearchOnDefaultField()) {
sQuery.setParam("qf", queryFields.toArray(new
String[queryFields.size()]));
}
sQuery.setFields(visibleFields.toArray(new
String[visibleFields.size()]));

if(query.isORQuery())
{
sQuery.setParam("mm","1");
}

My search is 



 dismax
 explicit
 0.01
  localhost:9090/solr/book1,localhost:9090/solr/book2,localhost:9090/solr/book3,localhost:9090/solr/book4,localhost:9090/solr/book5,localhost:9090/solr/book6
 
 
 text^2.0 

   
 
   
 
title item_id author titleMinusAuthor
 

 4
 *:*
 
 text features name

 0
 
 name
 regex 
   
   
  


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Shards-multi-core-slower-then-single-big-core-tp3979115p3979243.html
Sent from the Solr - User mailing list archive at Nabble.com.