Re: rough maximum cores (shards) per machine?
Just to give a specific answer to the original question, I would say that dozens of cores (collections) is certainly fine (assuming the total data load and query rate is reasonable), maybe 50 or even 100. Low hundreds of cores/collections MAY work, but isn't advisable. Thousands, if it works at all, is probably just asking for trouble and likely to be far more hassle than it could possible be worth. Whether the number for you ends up being 37, 50, 75, 100, 237, or 1273, you will have to do a proof of concept implementation to validate it. I'm not sure where we are at these days for lazy-loading of cores. That may work for you with hundreds (thousands?!) of cores/collections for tenants who are mostly idle or dormant, but if the server is running long enough, it may build up a lot of memory usage for collections that were active but have gone idle after days or weeks. -- Jack Krupansky On Wed, Mar 25, 2015 at 2:49 AM, Shai Erera wrote: > While it's hard to answer this question because as others have said, "it > depends", I think it will be good of we can quantify or assess the cost of > running a SolrCore. > > For instance, let's say that a server can handle a load of 10M indexed > documents (I omit search load on purpose for now) in a single SolrCore. > Would the same server be able to handle the same number of documents, If we > indexed 1000 docs per SolrCore, in total of 10,000 SorClores? If the answer > is no, then it means there is some cost that comes w/ each SolrCore, and we > may at least be able to give an upper bound --- on a server with X amount > of storage, Y GB RAM and Z cores you can run up to maxSolrCores(X, Y, Z). > > Another way to look at it, if I were to create empty SolrCores, would I be > able to create an infinite number of cores if storage was infinite? Or even > empty cores have their toll on CPU and RAM? > > I know from the Lucene side of things that each SolrCore (carries a Lucene > index) there is a toll to an index -- the lexicon, IW's RAM buffer, Codecs > that store things in memory etc. For instance, one downside of splitting a > 10M core into 10,000 cores is that the cost of the holding the total > lexicon (dictionary of indexed words) goes up drastically, since now every > word (just the byte[] of the word) is potentially represented in memory > 10,000 times. > > What other RAM/CPU/Storage costs does a SolrCore carry with it? There are > the caches of course, which really depend on how many documents are > indexed. Any other non-trivial or constant cost? > > So yes, there isn't a single answer to this question. It's just like > someone would ask how many documents can a single Lucene index handle > efficiently. But if we can come up with basic numbers as I outlined above, > it might help people doing rough estimates. That doesn't mean people > shouldn't benchmark, as that upper bound may be wy too high for their > data set, query workload and search needs. > > Shai > > On Wed, Mar 25, 2015 at 5:25 AM, Damien Kamerman > wrote: > > > From my experience on a high-end sever (256GB memory, 40 core CPU) > testing > > collection numbers with one shard and two replicas, the maximum that > would > > work is 3,000 cores (1,500 collections). I'd recommend much less (perhaps > > half of that), depending on your startup-time requirements. (Though I > have > > settled on 6,000 collection maximum with some patching. See SOLR-7191). > You > > could create multiple clouds after that, and choose the cloud least used > to > > create your collection. > > > > Regarding memory usage I'd pencil in 6MB overheard (no docs) java heap > per > > collection. > > > > On 25 March 2015 at 13:46, Ian Rose wrote: > > > > > First off thanks everyone for the very useful replies thus far. > > > > > > Shawn - thanks for the list of items to check. #1 and #2 should be > fine > > > for us and I'll check our ulimit for #3. > > > > > > To add a bit of clarification, we are indeed using SolrCloud. Our > > current > > > setup is to create a new collection for each customer. For now we > allow > > > SolrCloud to decide for itself where to locate the initial shard(s) but > > in > > > time we expect to refine this such that our system will automatically > > > choose the least loaded nodes according to some metric(s). > > > > > > Having more than one business entity controlling the configuration of a > > > > single (Solr) server is a recipe for disaster. Solr works well if > there > > > is > > > > an architect for the system. > > > > > > > > > Jack, can you explain a bit what you mean here? It looks like Toke > > caught > > > your meaning but I'm afraid it missed me. What do you mean by > "business > > > entity"? Is your concern that with automatic creation of collections > > they > > > will be distributed willy-nilly across the cluster, leading to uneven > > load > > > across nodes? If it is relevant, the schema and solrconfig are > > controlled > > > entirely by me and is the same for all collections
Re: rough maximum cores (shards) per machine?
On 25/03/15 15:03, Ian Rose wrote: Per - Wow, 1 trillion documents stored is pretty impressive. One clarification: when you say that you have 2 replica per collection on each machine, what exactly does that mean? Do you mean that each collection is sharded into 50 shards, divided evenly over all 25 machines (thus 2 shards per machine)? Yes Or are some of these slave replicas (e.g. 25x sharding with 1 replica per shard)? No replication. It does not work very well, at least in 4.4.0. Besides that I am not a big fan of two (or more) machines having to do all the indexing work and making sure to keep synchronized. Use a distributed file-system supporting multiple copies of every piece of data (like HDFS) for HA on data-level. Have only one Solr-node handle the indexing into a particular shard - if this Solr-node breaks down let another Solr-node take over the indexing "leadership" on this shard. Besides the indexing Solr-node several other Solr-nodes can serve data from this shard - just watching the data-folder (can commits) done by the indexing-leader of this particular shard - will give you HA on service-level. That is probably how we are going to do HA - pretty soon. But that is another story Thanks! No problem
Re: rough maximum cores (shards) per machine?
Per - Wow, 1 trillion documents stored is pretty impressive. One clarification: when you say that you have 2 replica per collection on each machine, what exactly does that mean? Do you mean that each collection is sharded into 50 shards, divided evenly over all 25 machines (thus 2 shards per machine)? Or are some of these slave replicas (e.g. 25x sharding with 1 replica per shard)? Thanks! On Wed, Mar 25, 2015 at 5:13 AM, Per Steffensen wrote: > In one of our production environments we use 32GB, 4-core, 3T RAID0 > spinning disk Dell servers (do not remember the exact model). We have about > 25 collections with 2 replica (shard-instances) per collection on each > machine - 25 machines. Total of 25 coll * 2 replica/coll/machine * 25 > machines = 1250 replica. Each replica contains about 800 million pretty > small documents - thats about 1000 billion (do not know the english word > for it) documents all in all. We index about 1.5 billion new documents > every day (mainly into one of the collections = 50 replica across 25 > machine) and keep a history of 2 years on the data. Shifting the "index > into" collection every month. We can fairly easy keep up with the indexing > load. We have almost non of the data on the heap, but of course a small > fraction of the data in the files will at any time be in OS file-cache. > Compared to our indexing frequency we do not do a lot of searches. We have > about 10 users searching the system from time to time - anything from major > extracts to small quick searches. Depending on the nature of the search we > have response-times between 1 sec and 5 min. But of course that is very > dependent on "clever" choice on each field wrt index, store, doc-value etc. > BUT we are not using out-of-box Apache Solr. We have made quit a lot of > performance tweaks ourselves. > Please note that, even though you disable all Solr caches, each replica > will use heap-memory linearly dependent on the number of documents (and > their size) in that replica. But not much, so you can get pretty far with > relatively little RAM. > Our version of Solr is based on Apache Solr 4.4.0, but I expect/hope it > did not get worse in newer releases. > > Just to give you some idea of what can at least be achieved - in the > high-end of #replica and #docs, I guess > > Regards, Per Steffensen > > > On 24/03/15 14:02, Ian Rose wrote: > >> Hi all - >> >> I'm sure this topic has been covered before but I was unable to find any >> clear references online or in the mailing list. >> >> Are there any rules of thumb for how many cores (aka shards, since I am >> using SolrCloud) is "too many" for one machine? I realize there is no one >> answer (depends on size of the machine, etc.) so I'm just looking for a >> rough idea. Something like the following would be very useful: >> >> * People commonly run up to X cores/shards on a mid-sized (4 or 8 core) >> server without any problems. >> * I have never heard of anyone successfully running X cores/shards on a >> single machine, even if you throw a lot of hardware at it. >> >> Thanks! >> - Ian >> >> >
Re: rough maximum cores (shards) per machine?
In one of our production environments we use 32GB, 4-core, 3T RAID0 spinning disk Dell servers (do not remember the exact model). We have about 25 collections with 2 replica (shard-instances) per collection on each machine - 25 machines. Total of 25 coll * 2 replica/coll/machine * 25 machines = 1250 replica. Each replica contains about 800 million pretty small documents - thats about 1000 billion (do not know the english word for it) documents all in all. We index about 1.5 billion new documents every day (mainly into one of the collections = 50 replica across 25 machine) and keep a history of 2 years on the data. Shifting the "index into" collection every month. We can fairly easy keep up with the indexing load. We have almost non of the data on the heap, but of course a small fraction of the data in the files will at any time be in OS file-cache. Compared to our indexing frequency we do not do a lot of searches. We have about 10 users searching the system from time to time - anything from major extracts to small quick searches. Depending on the nature of the search we have response-times between 1 sec and 5 min. But of course that is very dependent on "clever" choice on each field wrt index, store, doc-value etc. BUT we are not using out-of-box Apache Solr. We have made quit a lot of performance tweaks ourselves. Please note that, even though you disable all Solr caches, each replica will use heap-memory linearly dependent on the number of documents (and their size) in that replica. But not much, so you can get pretty far with relatively little RAM. Our version of Solr is based on Apache Solr 4.4.0, but I expect/hope it did not get worse in newer releases. Just to give you some idea of what can at least be achieved - in the high-end of #replica and #docs, I guess Regards, Per Steffensen On 24/03/15 14:02, Ian Rose wrote: Hi all - I'm sure this topic has been covered before but I was unable to find any clear references online or in the mailing list. Are there any rules of thumb for how many cores (aka shards, since I am using SolrCloud) is "too many" for one machine? I realize there is no one answer (depends on size of the machine, etc.) so I'm just looking for a rough idea. Something like the following would be very useful: * People commonly run up to X cores/shards on a mid-sized (4 or 8 core) server without any problems. * I have never heard of anyone successfully running X cores/shards on a single machine, even if you throw a lot of hardware at it. Thanks! - Ian
Re: rough maximum cores (shards) per machine?
On Wed, 2015-03-25 at 03:46 +0100, Ian Rose wrote: > Thus theoretically we could actually just use one single collection for >all of our customers (adding a 'customer:' type fq to all > queries) but since we never need to query across customers it seemed > more performant (as well as safer - less chance of accidentally > leaking data across customers) to use separate collections. If only a few customers are active at a given time, it is more performant to use a collestion/customer. If many of them are active, the more performant option is to lump them together and filter on a field, due to the redundancy-reduction of larger indexes. The 1 collection/customer solution has another edge as ranking will be calculated based on the corpus of the customer and not based on all customers. If the number of customers is low enough to get the individual collections solution to work, that would be the preferable solution. - Toke Eskildsen, State and University Library, Denmark
Re: rough maximum cores (shards) per machine?
I've tried (very simplistically) hitting a collection with a good variety of searches and looking at the collection's heap memory and working out the bytes / doc. I've seen results around 100 bytes / doc, and as low as 3 bytes / doc for collections with small docs. It's still a work-in-progress - not sure if it will scale with docs - or is too simplistic. On 25 March 2015 at 17:49, Shai Erera wrote: > While it's hard to answer this question because as others have said, "it > depends", I think it will be good of we can quantify or assess the cost of > running a SolrCore. > > For instance, let's say that a server can handle a load of 10M indexed > documents (I omit search load on purpose for now) in a single SolrCore. > Would the same server be able to handle the same number of documents, If we > indexed 1000 docs per SolrCore, in total of 10,000 SorClores? If the answer > is no, then it means there is some cost that comes w/ each SolrCore, and we > may at least be able to give an upper bound --- on a server with X amount > of storage, Y GB RAM and Z cores you can run up to maxSolrCores(X, Y, Z). > > Another way to look at it, if I were to create empty SolrCores, would I be > able to create an infinite number of cores if storage was infinite? Or even > empty cores have their toll on CPU and RAM? > > I know from the Lucene side of things that each SolrCore (carries a Lucene > index) there is a toll to an index -- the lexicon, IW's RAM buffer, Codecs > that store things in memory etc. For instance, one downside of splitting a > 10M core into 10,000 cores is that the cost of the holding the total > lexicon (dictionary of indexed words) goes up drastically, since now every > word (just the byte[] of the word) is potentially represented in memory > 10,000 times. > > What other RAM/CPU/Storage costs does a SolrCore carry with it? There are > the caches of course, which really depend on how many documents are > indexed. Any other non-trivial or constant cost? > > So yes, there isn't a single answer to this question. It's just like > someone would ask how many documents can a single Lucene index handle > efficiently. But if we can come up with basic numbers as I outlined above, > it might help people doing rough estimates. That doesn't mean people > shouldn't benchmark, as that upper bound may be wy too high for their > data set, query workload and search needs. > > Shai > > On Wed, Mar 25, 2015 at 5:25 AM, Damien Kamerman > wrote: > > > From my experience on a high-end sever (256GB memory, 40 core CPU) > testing > > collection numbers with one shard and two replicas, the maximum that > would > > work is 3,000 cores (1,500 collections). I'd recommend much less (perhaps > > half of that), depending on your startup-time requirements. (Though I > have > > settled on 6,000 collection maximum with some patching. See SOLR-7191). > You > > could create multiple clouds after that, and choose the cloud least used > to > > create your collection. > > > > Regarding memory usage I'd pencil in 6MB overheard (no docs) java heap > per > > collection. > > > > On 25 March 2015 at 13:46, Ian Rose wrote: > > > > > First off thanks everyone for the very useful replies thus far. > > > > > > Shawn - thanks for the list of items to check. #1 and #2 should be > fine > > > for us and I'll check our ulimit for #3. > > > > > > To add a bit of clarification, we are indeed using SolrCloud. Our > > current > > > setup is to create a new collection for each customer. For now we > allow > > > SolrCloud to decide for itself where to locate the initial shard(s) but > > in > > > time we expect to refine this such that our system will automatically > > > choose the least loaded nodes according to some metric(s). > > > > > > Having more than one business entity controlling the configuration of a > > > > single (Solr) server is a recipe for disaster. Solr works well if > there > > > is > > > > an architect for the system. > > > > > > > > > Jack, can you explain a bit what you mean here? It looks like Toke > > caught > > > your meaning but I'm afraid it missed me. What do you mean by > "business > > > entity"? Is your concern that with automatic creation of collections > > they > > > will be distributed willy-nilly across the cluster, leading to uneven > > load > > > across nodes? If it is relevant, the schema and solrconfig are > > controlled > > > entirely by me and is the same for all collections. Thus theoretically > > we > > > could actually just use one single collection for all of our customers > > > (adding a 'customer:' type fq to all queries) but since we > > never > > > need to query across customers it seemed more performant (as well as > > safer > > > - less chance of accidentally leaking data across customers) to use > > > separate collections. > > > > > > Better to give each tenant a separate Solr instance that you spin up > and > > > > spin down based on demand. > > > > > > > > > Regarding this, if by tenant you mean "cus
Re: rough maximum cores (shards) per machine?
While it's hard to answer this question because as others have said, "it depends", I think it will be good of we can quantify or assess the cost of running a SolrCore. For instance, let's say that a server can handle a load of 10M indexed documents (I omit search load on purpose for now) in a single SolrCore. Would the same server be able to handle the same number of documents, If we indexed 1000 docs per SolrCore, in total of 10,000 SorClores? If the answer is no, then it means there is some cost that comes w/ each SolrCore, and we may at least be able to give an upper bound --- on a server with X amount of storage, Y GB RAM and Z cores you can run up to maxSolrCores(X, Y, Z). Another way to look at it, if I were to create empty SolrCores, would I be able to create an infinite number of cores if storage was infinite? Or even empty cores have their toll on CPU and RAM? I know from the Lucene side of things that each SolrCore (carries a Lucene index) there is a toll to an index -- the lexicon, IW's RAM buffer, Codecs that store things in memory etc. For instance, one downside of splitting a 10M core into 10,000 cores is that the cost of the holding the total lexicon (dictionary of indexed words) goes up drastically, since now every word (just the byte[] of the word) is potentially represented in memory 10,000 times. What other RAM/CPU/Storage costs does a SolrCore carry with it? There are the caches of course, which really depend on how many documents are indexed. Any other non-trivial or constant cost? So yes, there isn't a single answer to this question. It's just like someone would ask how many documents can a single Lucene index handle efficiently. But if we can come up with basic numbers as I outlined above, it might help people doing rough estimates. That doesn't mean people shouldn't benchmark, as that upper bound may be wy too high for their data set, query workload and search needs. Shai On Wed, Mar 25, 2015 at 5:25 AM, Damien Kamerman wrote: > From my experience on a high-end sever (256GB memory, 40 core CPU) testing > collection numbers with one shard and two replicas, the maximum that would > work is 3,000 cores (1,500 collections). I'd recommend much less (perhaps > half of that), depending on your startup-time requirements. (Though I have > settled on 6,000 collection maximum with some patching. See SOLR-7191). You > could create multiple clouds after that, and choose the cloud least used to > create your collection. > > Regarding memory usage I'd pencil in 6MB overheard (no docs) java heap per > collection. > > On 25 March 2015 at 13:46, Ian Rose wrote: > > > First off thanks everyone for the very useful replies thus far. > > > > Shawn - thanks for the list of items to check. #1 and #2 should be fine > > for us and I'll check our ulimit for #3. > > > > To add a bit of clarification, we are indeed using SolrCloud. Our > current > > setup is to create a new collection for each customer. For now we allow > > SolrCloud to decide for itself where to locate the initial shard(s) but > in > > time we expect to refine this such that our system will automatically > > choose the least loaded nodes according to some metric(s). > > > > Having more than one business entity controlling the configuration of a > > > single (Solr) server is a recipe for disaster. Solr works well if there > > is > > > an architect for the system. > > > > > > Jack, can you explain a bit what you mean here? It looks like Toke > caught > > your meaning but I'm afraid it missed me. What do you mean by "business > > entity"? Is your concern that with automatic creation of collections > they > > will be distributed willy-nilly across the cluster, leading to uneven > load > > across nodes? If it is relevant, the schema and solrconfig are > controlled > > entirely by me and is the same for all collections. Thus theoretically > we > > could actually just use one single collection for all of our customers > > (adding a 'customer:' type fq to all queries) but since we > never > > need to query across customers it seemed more performant (as well as > safer > > - less chance of accidentally leaking data across customers) to use > > separate collections. > > > > Better to give each tenant a separate Solr instance that you spin up and > > > spin down based on demand. > > > > > > Regarding this, if by tenant you mean "customer", this is not viable for > us > > from a cost perspective. As I mentioned initially, many of our customers > > are very small so dedicating an entire machine to each of them would not > be > > economical (or efficient). Or perhaps I am not understanding what your > > definition of "tenant" is? > > > > Cheers, > > Ian > > > > > > > > On Tue, Mar 24, 2015 at 4:51 PM, Toke Eskildsen > > wrote: > > > > > Jack Krupansky [jack.krupan...@gmail.com] wrote: > > > > I'm sure that I am quite unqualified to describe his hypothetical > > setup. > > > I > > > > mean, he's the one using the term multi-tenancy,
Re: rough maximum cores (shards) per machine?
>From my experience on a high-end sever (256GB memory, 40 core CPU) testing collection numbers with one shard and two replicas, the maximum that would work is 3,000 cores (1,500 collections). I'd recommend much less (perhaps half of that), depending on your startup-time requirements. (Though I have settled on 6,000 collection maximum with some patching. See SOLR-7191). You could create multiple clouds after that, and choose the cloud least used to create your collection. Regarding memory usage I'd pencil in 6MB overheard (no docs) java heap per collection. On 25 March 2015 at 13:46, Ian Rose wrote: > First off thanks everyone for the very useful replies thus far. > > Shawn - thanks for the list of items to check. #1 and #2 should be fine > for us and I'll check our ulimit for #3. > > To add a bit of clarification, we are indeed using SolrCloud. Our current > setup is to create a new collection for each customer. For now we allow > SolrCloud to decide for itself where to locate the initial shard(s) but in > time we expect to refine this such that our system will automatically > choose the least loaded nodes according to some metric(s). > > Having more than one business entity controlling the configuration of a > > single (Solr) server is a recipe for disaster. Solr works well if there > is > > an architect for the system. > > > Jack, can you explain a bit what you mean here? It looks like Toke caught > your meaning but I'm afraid it missed me. What do you mean by "business > entity"? Is your concern that with automatic creation of collections they > will be distributed willy-nilly across the cluster, leading to uneven load > across nodes? If it is relevant, the schema and solrconfig are controlled > entirely by me and is the same for all collections. Thus theoretically we > could actually just use one single collection for all of our customers > (adding a 'customer:' type fq to all queries) but since we never > need to query across customers it seemed more performant (as well as safer > - less chance of accidentally leaking data across customers) to use > separate collections. > > Better to give each tenant a separate Solr instance that you spin up and > > spin down based on demand. > > > Regarding this, if by tenant you mean "customer", this is not viable for us > from a cost perspective. As I mentioned initially, many of our customers > are very small so dedicating an entire machine to each of them would not be > economical (or efficient). Or perhaps I am not understanding what your > definition of "tenant" is? > > Cheers, > Ian > > > > On Tue, Mar 24, 2015 at 4:51 PM, Toke Eskildsen > wrote: > > > Jack Krupansky [jack.krupan...@gmail.com] wrote: > > > I'm sure that I am quite unqualified to describe his hypothetical > setup. > > I > > > mean, he's the one using the term multi-tenancy, so it's for him to be > > > clear. > > > > It was my understanding that Ian used them interchangeably, but of course > > Ian it the only one that knows. > > > > > For me, it's a question of who has control over the config and schema > and > > > collection creation. Having more than one business entity controlling > the > > > configuration of a single (Solr) server is a recipe for disaster. > > > > Thank you. Now your post makes a lot more sense. I will not argue against > > that. > > > > - Toke Eskildsen > > > -- Damien Kamerman
Re: rough maximum cores (shards) per machine?
First off thanks everyone for the very useful replies thus far. Shawn - thanks for the list of items to check. #1 and #2 should be fine for us and I'll check our ulimit for #3. To add a bit of clarification, we are indeed using SolrCloud. Our current setup is to create a new collection for each customer. For now we allow SolrCloud to decide for itself where to locate the initial shard(s) but in time we expect to refine this such that our system will automatically choose the least loaded nodes according to some metric(s). Having more than one business entity controlling the configuration of a > single (Solr) server is a recipe for disaster. Solr works well if there is > an architect for the system. Jack, can you explain a bit what you mean here? It looks like Toke caught your meaning but I'm afraid it missed me. What do you mean by "business entity"? Is your concern that with automatic creation of collections they will be distributed willy-nilly across the cluster, leading to uneven load across nodes? If it is relevant, the schema and solrconfig are controlled entirely by me and is the same for all collections. Thus theoretically we could actually just use one single collection for all of our customers (adding a 'customer:' type fq to all queries) but since we never need to query across customers it seemed more performant (as well as safer - less chance of accidentally leaking data across customers) to use separate collections. Better to give each tenant a separate Solr instance that you spin up and > spin down based on demand. Regarding this, if by tenant you mean "customer", this is not viable for us from a cost perspective. As I mentioned initially, many of our customers are very small so dedicating an entire machine to each of them would not be economical (or efficient). Or perhaps I am not understanding what your definition of "tenant" is? Cheers, Ian On Tue, Mar 24, 2015 at 4:51 PM, Toke Eskildsen wrote: > Jack Krupansky [jack.krupan...@gmail.com] wrote: > > I'm sure that I am quite unqualified to describe his hypothetical setup. > I > > mean, he's the one using the term multi-tenancy, so it's for him to be > > clear. > > It was my understanding that Ian used them interchangeably, but of course > Ian it the only one that knows. > > > For me, it's a question of who has control over the config and schema and > > collection creation. Having more than one business entity controlling the > > configuration of a single (Solr) server is a recipe for disaster. > > Thank you. Now your post makes a lot more sense. I will not argue against > that. > > - Toke Eskildsen >
Re: rough maximum cores (shards) per machine?
Test Test: >From Hossman's apache page: When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. Also, please format your stack trace for readability. On a quick glance, you probably have mis-matched jars in your classpath. On Tue, Mar 24, 2015 at 1:35 PM, Test Test wrote: > Hi there, > I'm trying to create my own TokenizerFactory (from tamingtext's book).After > setting schema.xml and have adding path in solrconfig.xml, i start solr.I > have this error message : Caused by: org.apache.solr.common.SolrException: > Plugin init failure for [schema.xml] fieldType "text": Plugin init failure > for [schema.xml] analyzer/tokenizer: class > com.tamingtext.texttamer.solr.SentenceTokenizerFactory. Schema file is > .../conf/schema.xml at > org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:595) at > org.apache.solr.schema.IndexSchema.(IndexSchema.java:166) at > org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) > at > org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) > at > org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:90) > at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62) > ... 7 moreCaused by: org.apache.solr.common.SolrException: Plugin init > failure for [schema.xml] fieldType "text": Plugin init failure for > [schema.xml] analyzer/tokenizer: class > com.tamingtext.texttamer.solr.SentenceTokenizerFactory at > org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) > at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:486) ... > 12 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure > for [schema.xml] analyzer/tokenizer: class > com.tamingtext.texttamer.solr.SentenceTokenizerFactory at > org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) > at > org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:362) > at > org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) > at > org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) > at > org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) > ... 13 moreCaused by: java.lang.ClassCastException: class > com.tamingtext.texttamer.solr.SentenceTokenizerFactory at > java.lang.Class.asSubclass(Class.java:3208) at > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:474) > at > org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:593) > at > org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:342) > at > org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:335) > at > org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) > Someone can help? > Thanks.Regards. > > > Le Mardi 24 mars 2015 21h24, Jack Krupansky a > écrit : > > > I'm sure that I am quite unqualified to describe his hypothetical setup. I > mean, he's the one using the term multi-tenancy, so it's for him to be > clear. > > For me, it's a question of who has control over the config and schema and > collection creation. Having more than one business entity controlling the > configuration of a single (Solr) server is a recipe for disaster. Solr > works well if there is an architect for the system. Ever hear the old > saying "Too many cooks spoil the stew"? > > -- Jack Krupansky > > On Tue, Mar 24, 2015 at 3:54 PM, Toke Eskildsen > wrote: > >> Jack Krupansky [jack.krupan...@gmail.com] wrote: >> > Don't confuse customers and tenants. >> >> Perhaps you could explain what you mean by multi-tenant in the context of >> Ian's setup? It is not clear to me what the distinction is in this case. >> >> - Toke Eskildsen >> > > >
RE: rough maximum cores (shards) per machine?
Jack Krupansky [jack.krupan...@gmail.com] wrote: > I'm sure that I am quite unqualified to describe his hypothetical setup. I > mean, he's the one using the term multi-tenancy, so it's for him to be > clear. It was my understanding that Ian used them interchangeably, but of course Ian it the only one that knows. > For me, it's a question of who has control over the config and schema and > collection creation. Having more than one business entity controlling the > configuration of a single (Solr) server is a recipe for disaster. Thank you. Now your post makes a lot more sense. I will not argue against that. - Toke Eskildsen
Re: rough maximum cores (shards) per machine?
Hi there, I'm trying to create my own TokenizerFactory (from tamingtext's book).After setting schema.xml and have adding path in solrconfig.xml, i start solr.I have this error message : Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType "text": Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactory. Schema file is .../conf/schema.xml at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:595) at org.apache.solr.schema.IndexSchema.(IndexSchema.java:166) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:90) at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62) ... 7 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType "text": Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactory at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:486) ... 12 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactory at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:362) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) ... 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr.SentenceTokenizerFactory at java.lang.Class.asSubclass(Class.java:3208) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:474) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:593) at org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:342) at org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:335) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) Someone can help? Thanks.Regards. Le Mardi 24 mars 2015 21h24, Jack Krupansky a écrit : I'm sure that I am quite unqualified to describe his hypothetical setup. I mean, he's the one using the term multi-tenancy, so it's for him to be clear. For me, it's a question of who has control over the config and schema and collection creation. Having more than one business entity controlling the configuration of a single (Solr) server is a recipe for disaster. Solr works well if there is an architect for the system. Ever hear the old saying "Too many cooks spoil the stew"? -- Jack Krupansky On Tue, Mar 24, 2015 at 3:54 PM, Toke Eskildsen wrote: > Jack Krupansky [jack.krupan...@gmail.com] wrote: > > Don't confuse customers and tenants. > > Perhaps you could explain what you mean by multi-tenant in the context of > Ian's setup? It is not clear to me what the distinction is in this case. > > - Toke Eskildsen >
Re: rough maximum cores (shards) per machine?
I'm sure that I am quite unqualified to describe his hypothetical setup. I mean, he's the one using the term multi-tenancy, so it's for him to be clear. For me, it's a question of who has control over the config and schema and collection creation. Having more than one business entity controlling the configuration of a single (Solr) server is a recipe for disaster. Solr works well if there is an architect for the system. Ever hear the old saying "Too many cooks spoil the stew"? -- Jack Krupansky On Tue, Mar 24, 2015 at 3:54 PM, Toke Eskildsen wrote: > Jack Krupansky [jack.krupan...@gmail.com] wrote: > > Don't confuse customers and tenants. > > Perhaps you could explain what you mean by multi-tenant in the context of > Ian's setup? It is not clear to me what the distinction is in this case. > > - Toke Eskildsen >
RE: rough maximum cores (shards) per machine?
Jack Krupansky [jack.krupan...@gmail.com] wrote: > Don't confuse customers and tenants. Perhaps you could explain what you mean by multi-tenant in the context of Ian's setup? It is not clear to me what the distinction is in this case. - Toke Eskildsen
Re: rough maximum cores (shards) per machine?
On 3/24/2015 11:22 AM, Ian Rose wrote: > Let me give a bit of background. Our Solr cluster is multi-tenant, where > we use one collection for each of our customers. In many cases, these > customers are very tiny, so their collection consists of just a single > shard on a single Solr node. In fact, a non-trivial number of them are > totally empty (e.g. trial customers that never did anything with their > trial account). However there are also some customers that are larger, > requiring their collection to be sharded. Our strategy is to try to keep > the total documents in any one shard under 20 million (honestly not sure > where my coworker got that number from - I am open to alternatives but I > realize this is heavily app-specific). > > So my original question is not related to indexing or query traffic, but > just the sheer number of cores. For example, if I have 10 active cores on > a machine and everything is working fine, should I expect that everything > will still work fine if I add 10 nearly-idle cores to that machine? What > about 100? 1000? I figure the overhead of each core is probably fairly > low but at some point starts to matter. One resource that may be exhausted faster than any other when you have a lot of cores on a solr instance (especially when they are not idle) is Java heap memory, so you might need to increase the java heap. Memory in the server is one of the most important resources you have for Solr performance, and here I am talking about memory that is *not* used in the Java heap (or any other program) -- the OS must be able to effectively cache your index data or Solr performance will be terrible. You have said "Solr cluster" and "collection" ... so that makes me think you're running SolrCloud. In cloud mode, you can't really use the LotsOfCores functionality, where you mark cores transient and tell Solr how many cores you'd like to have resident at the same time. If you are NOT in cloud mode, then you can use this feature: http://wiki.apache.org/solr/LotsOfCores In general, there are three resources other than memory which might become exhausted with a large number of cores: One resource is the "maximum open files" limit in the operating system, which typically defaults to 1024. Each core will typically have several dozen files in its index, so it's very easy to reach 1024 open files. The second resource is the maximum allowed threads in your servlet container config -- each core you add requires more threads. The default maxThreads value in most containers is 200. The Jetty container included in the Solr download is preconfigured with a maxThreads value of 1, effectively removing the limit for most setups. The third resource is related to the second -- some operating systems implement threads as hidden processes, and many operating systems will limit the number of processes that a user may start. On Linux, this limit is typically 1024, and may need to be increased. I really need to add this kind of info to the wiki. Thanks, Shawn
Re: rough maximum cores (shards) per machine?
Don't confuse customers and tenants. -- Jack Krupansky On Tue, Mar 24, 2015 at 2:24 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Sorry Jack. That doesn't scale when you have millions of customers. And > these are good problems to have! > > On Tue, Mar 24, 2015 at 10:47 AM, Jack Krupansky > > wrote: > > > Multi-tenancy is a bad idea for a single solr Cluster. Better to give > each > > tenant a separate Solr instance that you spin up and spin down based on > > demand. > > > > Think about it: If there are a small number of tenants, just giving each > > their own machine will be cheaper than the effort spent managing a > > multi-tenant cluster, and if there are a large number of tenants of even > a > > moderate number of large tenants, you can't expect them to all run > > reasonably on a relatively small cluster. Think about scalability. > > > > > > -- Jack Krupansky > > > > On Tue, Mar 24, 2015 at 1:22 PM, Ian Rose wrote: > > > > > Let me give a bit of background. Our Solr cluster is multi-tenant, > where > > > we use one collection for each of our customers. In many cases, these > > > customers are very tiny, so their collection consists of just a single > > > shard on a single Solr node. In fact, a non-trivial number of them are > > > totally empty (e.g. trial customers that never did anything with their > > > trial account). However there are also some customers that are larger, > > > requiring their collection to be sharded. Our strategy is to try to > keep > > > the total documents in any one shard under 20 million (honestly not > sure > > > where my coworker got that number from - I am open to alternatives but > I > > > realize this is heavily app-specific). > > > > > > So my original question is not related to indexing or query traffic, > but > > > just the sheer number of cores. For example, if I have 10 active cores > > on > > > a machine and everything is working fine, should I expect that > everything > > > will still work fine if I add 10 nearly-idle cores to that machine? > What > > > about 100? 1000? I figure the overhead of each core is probably > fairly > > > low but at some point starts to matter. > > > > > > Does that make sense? > > > - Ian > > > > > > > > > On Tue, Mar 24, 2015 at 11:12 AM, Jack Krupansky < > > jack.krupan...@gmail.com > > > > > > > wrote: > > > > > > > Shards per collection, or across all collections on the node? > > > > > > > > It will all depend on: > > > > > > > > 1. Your ingestion/indexing rate. High, medium or low? > > > > 2. Your query access pattern. Note that a typical query fans out to > all > > > > shards, so having more shards than CPU cores means less parallelism. > > > > 3. How many collections you will have per node. > > > > > > > > In short, it depends on what you want to achieve, not some limit of > > Solr > > > > per se. > > > > > > > > Why are you even sharding the node anyway? Why not just run with a > > single > > > > shard per node, and do sharding by having separate nodes, to maximize > > > > parallel processing and availability? > > > > > > > > Also be careful to be clear about using the Solr term "shard" (a > slice, > > > > across all replica nodes) as distinct from the Elasticsearch term > > "shard" > > > > (a single slice of an index for a single replica, analogous to a Solr > > > > "core".) > > > > > > > > > > > > -- Jack Krupansky > > > > > > > > On Tue, Mar 24, 2015 at 9:02 AM, Ian Rose > > wrote: > > > > > > > > > Hi all - > > > > > > > > > > I'm sure this topic has been covered before but I was unable to > find > > > any > > > > > clear references online or in the mailing list. > > > > > > > > > > Are there any rules of thumb for how many cores (aka shards, since > I > > am > > > > > using SolrCloud) is "too many" for one machine? I realize there is > > no > > > > one > > > > > answer (depends on size of the machine, etc.) so I'm just looking > > for a > > > > > rough idea. Something like the following would be very useful: > > > > > > > > > > * People commonly run up to X cores/shards on a mid-sized (4 or 8 > > core) > > > > > server without any problems. > > > > > * I have never heard of anyone successfully running X cores/shards > > on a > > > > > single machine, even if you throw a lot of hardware at it. > > > > > > > > > > Thanks! > > > > > - Ian > > > > > > > > > > > > > > > > > > -- > Regards, > Shalin Shekhar Mangar. >
Re: rough maximum cores (shards) per machine?
Sorry Jack. That doesn't scale when you have millions of customers. And these are good problems to have! On Tue, Mar 24, 2015 at 10:47 AM, Jack Krupansky wrote: > Multi-tenancy is a bad idea for a single solr Cluster. Better to give each > tenant a separate Solr instance that you spin up and spin down based on > demand. > > Think about it: If there are a small number of tenants, just giving each > their own machine will be cheaper than the effort spent managing a > multi-tenant cluster, and if there are a large number of tenants of even a > moderate number of large tenants, you can't expect them to all run > reasonably on a relatively small cluster. Think about scalability. > > > -- Jack Krupansky > > On Tue, Mar 24, 2015 at 1:22 PM, Ian Rose wrote: > > > Let me give a bit of background. Our Solr cluster is multi-tenant, where > > we use one collection for each of our customers. In many cases, these > > customers are very tiny, so their collection consists of just a single > > shard on a single Solr node. In fact, a non-trivial number of them are > > totally empty (e.g. trial customers that never did anything with their > > trial account). However there are also some customers that are larger, > > requiring their collection to be sharded. Our strategy is to try to keep > > the total documents in any one shard under 20 million (honestly not sure > > where my coworker got that number from - I am open to alternatives but I > > realize this is heavily app-specific). > > > > So my original question is not related to indexing or query traffic, but > > just the sheer number of cores. For example, if I have 10 active cores > on > > a machine and everything is working fine, should I expect that everything > > will still work fine if I add 10 nearly-idle cores to that machine? What > > about 100? 1000? I figure the overhead of each core is probably fairly > > low but at some point starts to matter. > > > > Does that make sense? > > - Ian > > > > > > On Tue, Mar 24, 2015 at 11:12 AM, Jack Krupansky < > jack.krupan...@gmail.com > > > > > wrote: > > > > > Shards per collection, or across all collections on the node? > > > > > > It will all depend on: > > > > > > 1. Your ingestion/indexing rate. High, medium or low? > > > 2. Your query access pattern. Note that a typical query fans out to all > > > shards, so having more shards than CPU cores means less parallelism. > > > 3. How many collections you will have per node. > > > > > > In short, it depends on what you want to achieve, not some limit of > Solr > > > per se. > > > > > > Why are you even sharding the node anyway? Why not just run with a > single > > > shard per node, and do sharding by having separate nodes, to maximize > > > parallel processing and availability? > > > > > > Also be careful to be clear about using the Solr term "shard" (a slice, > > > across all replica nodes) as distinct from the Elasticsearch term > "shard" > > > (a single slice of an index for a single replica, analogous to a Solr > > > "core".) > > > > > > > > > -- Jack Krupansky > > > > > > On Tue, Mar 24, 2015 at 9:02 AM, Ian Rose > wrote: > > > > > > > Hi all - > > > > > > > > I'm sure this topic has been covered before but I was unable to find > > any > > > > clear references online or in the mailing list. > > > > > > > > Are there any rules of thumb for how many cores (aka shards, since I > am > > > > using SolrCloud) is "too many" for one machine? I realize there is > no > > > one > > > > answer (depends on size of the machine, etc.) so I'm just looking > for a > > > > rough idea. Something like the following would be very useful: > > > > > > > > * People commonly run up to X cores/shards on a mid-sized (4 or 8 > core) > > > > server without any problems. > > > > * I have never heard of anyone successfully running X cores/shards > on a > > > > single machine, even if you throw a lot of hardware at it. > > > > > > > > Thanks! > > > > - Ian > > > > > > > > > > -- Regards, Shalin Shekhar Mangar.
Re: rough maximum cores (shards) per machine?
Multi-tenancy is a bad idea for a single solr Cluster. Better to give each tenant a separate Solr instance that you spin up and spin down based on demand. Think about it: If there are a small number of tenants, just giving each their own machine will be cheaper than the effort spent managing a multi-tenant cluster, and if there are a large number of tenants of even a moderate number of large tenants, you can't expect them to all run reasonably on a relatively small cluster. Think about scalability. -- Jack Krupansky On Tue, Mar 24, 2015 at 1:22 PM, Ian Rose wrote: > Let me give a bit of background. Our Solr cluster is multi-tenant, where > we use one collection for each of our customers. In many cases, these > customers are very tiny, so their collection consists of just a single > shard on a single Solr node. In fact, a non-trivial number of them are > totally empty (e.g. trial customers that never did anything with their > trial account). However there are also some customers that are larger, > requiring their collection to be sharded. Our strategy is to try to keep > the total documents in any one shard under 20 million (honestly not sure > where my coworker got that number from - I am open to alternatives but I > realize this is heavily app-specific). > > So my original question is not related to indexing or query traffic, but > just the sheer number of cores. For example, if I have 10 active cores on > a machine and everything is working fine, should I expect that everything > will still work fine if I add 10 nearly-idle cores to that machine? What > about 100? 1000? I figure the overhead of each core is probably fairly > low but at some point starts to matter. > > Does that make sense? > - Ian > > > On Tue, Mar 24, 2015 at 11:12 AM, Jack Krupansky > > wrote: > > > Shards per collection, or across all collections on the node? > > > > It will all depend on: > > > > 1. Your ingestion/indexing rate. High, medium or low? > > 2. Your query access pattern. Note that a typical query fans out to all > > shards, so having more shards than CPU cores means less parallelism. > > 3. How many collections you will have per node. > > > > In short, it depends on what you want to achieve, not some limit of Solr > > per se. > > > > Why are you even sharding the node anyway? Why not just run with a single > > shard per node, and do sharding by having separate nodes, to maximize > > parallel processing and availability? > > > > Also be careful to be clear about using the Solr term "shard" (a slice, > > across all replica nodes) as distinct from the Elasticsearch term "shard" > > (a single slice of an index for a single replica, analogous to a Solr > > "core".) > > > > > > -- Jack Krupansky > > > > On Tue, Mar 24, 2015 at 9:02 AM, Ian Rose wrote: > > > > > Hi all - > > > > > > I'm sure this topic has been covered before but I was unable to find > any > > > clear references online or in the mailing list. > > > > > > Are there any rules of thumb for how many cores (aka shards, since I am > > > using SolrCloud) is "too many" for one machine? I realize there is no > > one > > > answer (depends on size of the machine, etc.) so I'm just looking for a > > > rough idea. Something like the following would be very useful: > > > > > > * People commonly run up to X cores/shards on a mid-sized (4 or 8 core) > > > server without any problems. > > > * I have never heard of anyone successfully running X cores/shards on a > > > single machine, even if you throw a lot of hardware at it. > > > > > > Thanks! > > > - Ian > > > > > >
Re: rough maximum cores (shards) per machine?
Let me give a bit of background. Our Solr cluster is multi-tenant, where we use one collection for each of our customers. In many cases, these customers are very tiny, so their collection consists of just a single shard on a single Solr node. In fact, a non-trivial number of them are totally empty (e.g. trial customers that never did anything with their trial account). However there are also some customers that are larger, requiring their collection to be sharded. Our strategy is to try to keep the total documents in any one shard under 20 million (honestly not sure where my coworker got that number from - I am open to alternatives but I realize this is heavily app-specific). So my original question is not related to indexing or query traffic, but just the sheer number of cores. For example, if I have 10 active cores on a machine and everything is working fine, should I expect that everything will still work fine if I add 10 nearly-idle cores to that machine? What about 100? 1000? I figure the overhead of each core is probably fairly low but at some point starts to matter. Does that make sense? - Ian On Tue, Mar 24, 2015 at 11:12 AM, Jack Krupansky wrote: > Shards per collection, or across all collections on the node? > > It will all depend on: > > 1. Your ingestion/indexing rate. High, medium or low? > 2. Your query access pattern. Note that a typical query fans out to all > shards, so having more shards than CPU cores means less parallelism. > 3. How many collections you will have per node. > > In short, it depends on what you want to achieve, not some limit of Solr > per se. > > Why are you even sharding the node anyway? Why not just run with a single > shard per node, and do sharding by having separate nodes, to maximize > parallel processing and availability? > > Also be careful to be clear about using the Solr term "shard" (a slice, > across all replica nodes) as distinct from the Elasticsearch term "shard" > (a single slice of an index for a single replica, analogous to a Solr > "core".) > > > -- Jack Krupansky > > On Tue, Mar 24, 2015 at 9:02 AM, Ian Rose wrote: > > > Hi all - > > > > I'm sure this topic has been covered before but I was unable to find any > > clear references online or in the mailing list. > > > > Are there any rules of thumb for how many cores (aka shards, since I am > > using SolrCloud) is "too many" for one machine? I realize there is no > one > > answer (depends on size of the machine, etc.) so I'm just looking for a > > rough idea. Something like the following would be very useful: > > > > * People commonly run up to X cores/shards on a mid-sized (4 or 8 core) > > server without any problems. > > * I have never heard of anyone successfully running X cores/shards on a > > single machine, even if you throw a lot of hardware at it. > > > > Thanks! > > - Ian > > >
Re: rough maximum cores (shards) per machine?
Shards per collection, or across all collections on the node? It will all depend on: 1. Your ingestion/indexing rate. High, medium or low? 2. Your query access pattern. Note that a typical query fans out to all shards, so having more shards than CPU cores means less parallelism. 3. How many collections you will have per node. In short, it depends on what you want to achieve, not some limit of Solr per se. Why are you even sharding the node anyway? Why not just run with a single shard per node, and do sharding by having separate nodes, to maximize parallel processing and availability? Also be careful to be clear about using the Solr term "shard" (a slice, across all replica nodes) as distinct from the Elasticsearch term "shard" (a single slice of an index for a single replica, analogous to a Solr "core".) -- Jack Krupansky On Tue, Mar 24, 2015 at 9:02 AM, Ian Rose wrote: > Hi all - > > I'm sure this topic has been covered before but I was unable to find any > clear references online or in the mailing list. > > Are there any rules of thumb for how many cores (aka shards, since I am > using SolrCloud) is "too many" for one machine? I realize there is no one > answer (depends on size of the machine, etc.) so I'm just looking for a > rough idea. Something like the following would be very useful: > > * People commonly run up to X cores/shards on a mid-sized (4 or 8 core) > server without any problems. > * I have never heard of anyone successfully running X cores/shards on a > single machine, even if you throw a lot of hardware at it. > > Thanks! > - Ian >
Re: rough maximum cores (shards) per machine?
Well, there's a ticket out there for "thousands of collections on a single machine", although this is wy out there. I often see 10-20 small cores on a 4-8 core machine if they're reasonably small (a few million docs). I see a single replica strain a 128G 16 core machine if it has 300M docs Which is a way of saying "ya gotta test with your data/query mix". Wish there was a better answer. Erick On Tue, Mar 24, 2015 at 6:02 AM, Ian Rose wrote: > Hi all - > > I'm sure this topic has been covered before but I was unable to find any > clear references online or in the mailing list. > > Are there any rules of thumb for how many cores (aka shards, since I am > using SolrCloud) is "too many" for one machine? I realize there is no one > answer (depends on size of the machine, etc.) so I'm just looking for a > rough idea. Something like the following would be very useful: > > * People commonly run up to X cores/shards on a mid-sized (4 or 8 core) > server without any problems. > * I have never heard of anyone successfully running X cores/shards on a > single machine, even if you throw a lot of hardware at it. > > Thanks! > - Ian