Re: a core for every user, lots of users... are there issues
I don't know of anyone who's tried and failed to combine transient cores and SolrCloud. I also don't know of anyone who's tried and succeeded. I'm saying that the transient core stuff has been thoroughly tested in non-cloud mode. And people have been working with it for a couple of releases now. I know of no a-priori reason it wouldn't work in SolrCloud. But I haven't personally done it, nor do I know of anyone who has. It might "just work", but the proof is in the pudding. I've heard some scuttlebutt that the combination of SolrCloud and transient cores is being, or will be soon, investigated. As in testing and writing test cases. Being a pessimist by nature on these things, I suspect (but don't know) that something will come up. For instance, SolrCloud tries to keep track of all the states of all the nodes. I _think_ (but don't know for sure) that this is just keeping contact with the JVM, not particular cores. But what if there's something I don't know about that pings the individual cores? That would keep them constantly loading/unloading, which might crop up in unexpected ways. I've got to emphasize that this is an unknown (at least to me), but an example of something that could crop up. I'm sure there are other possibilities. Or distributed updates. For that, every core on every node for a shard in collectionX must process the update. So for updates, each and every core in each and every shard might have to be loaded for the update to succeed if the core is transient. Does this happen fast enough in all cases so a timeout doesn't cause the update to fail? Or the node to be marked as down? What about combining that with a heavy query load? I just don't know. It's uncharted territory is all. I'd love it for you to volunteer to be the first :). There's certainly committer interest in making this case work so you wouldn't be left hanging all alone. If I were planning a product though, I'd either treat the combination of transient cores and SolrCloud as a R&D project or go with non-cloud mode until I had some reassurance that transient cores and SolrCloud played nicely together. All that said, I don't want to paint too bleak a picture. All the transient core stuff is local to a particular node. SolrCloud and ZooKeeper shouldn't be interested in the details. It _should_ "just work". It's just that I can't point to any examples where that's been tried Best, Erick On Wed, Dec 4, 2013 at 5:08 PM, hank williams wrote: > Oh my... when you say "I don't know anyone who's combined the two." do you > mean that those that have tried have failed or that no one has gotten > around to trying? It sounds like you are saying you have some specific > knowledge that right now these wont work, otherwise you wouldnt say > "committers > will be addressing this sometime soon", right? > > I'm worried as we need to make a practical decision here and it sounds like > maybe we should stick with solr for now... is that what you are saying? > > > On Wed, Dec 4, 2013 at 5:01 PM, Erick Erickson >wrote: > > > Hank: > > > > I should add that lots of cores and SolrCloud aren't guaranteed to play > > nice together. I think some of the committers will be addressing this > > sometime soon. > > > > I'm not saying that this will certainly fail, OTOH I don't know anyone > > who's combined the two. > > > > Erick > > > > > > On Wed, Dec 4, 2013 at 3:18 PM, hank williams wrote: > > > > > Super helpful. Thanks. > > > > > > > > > On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey > wrote: > > > > > > > On 12/4/2013 12:34 PM, hank williams wrote: > > > > > > > >> Ok one more simple question. We just upgraded to 4.6 from 4.2. In > 4.2 > > we > > > >> were *trying* to use the rest API function "create" to create cores > > > >> without > > > >> having to manually mess with files on the server. Is this what > > "create" > > > >> was > > > >> supposed to do? If so it was borken or we werent using it right. In > > any > > > >> case in 4.6 is that the right way to programmatically add cores in > > > >> discovery mode? > > > >> > > > > > > > > If you are NOT in SolrCloud mode, in order to create new cores, the > > > config > > > > files need to already exist on the disk. This is the case with all > > > > versions of Solr. > > > > > > > > If you're running in SolrCloud mode, the core is associated with a > > > > collection. Collections have a link to aconfig in zookeeper. The > > config > > > > is not stored with the core on the disk. > > > > > > > > Thanks, > > > > Shawn > > > > > > > > > > > > > > > > > -- > > > blog: whydoeseverythingsuck.com > > > > > > > > > -- > blog: whydoeseverythingsuck.com >
Re: a core for every user, lots of users... are there issues
Oh my... when you say "I don't know anyone who's combined the two." do you mean that those that have tried have failed or that no one has gotten around to trying? It sounds like you are saying you have some specific knowledge that right now these wont work, otherwise you wouldnt say "committers will be addressing this sometime soon", right? I'm worried as we need to make a practical decision here and it sounds like maybe we should stick with solr for now... is that what you are saying? On Wed, Dec 4, 2013 at 5:01 PM, Erick Erickson wrote: > Hank: > > I should add that lots of cores and SolrCloud aren't guaranteed to play > nice together. I think some of the committers will be addressing this > sometime soon. > > I'm not saying that this will certainly fail, OTOH I don't know anyone > who's combined the two. > > Erick > > > On Wed, Dec 4, 2013 at 3:18 PM, hank williams wrote: > > > Super helpful. Thanks. > > > > > > On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey wrote: > > > > > On 12/4/2013 12:34 PM, hank williams wrote: > > > > > >> Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 > we > > >> were *trying* to use the rest API function "create" to create cores > > >> without > > >> having to manually mess with files on the server. Is this what > "create" > > >> was > > >> supposed to do? If so it was borken or we werent using it right. In > any > > >> case in 4.6 is that the right way to programmatically add cores in > > >> discovery mode? > > >> > > > > > > If you are NOT in SolrCloud mode, in order to create new cores, the > > config > > > files need to already exist on the disk. This is the case with all > > > versions of Solr. > > > > > > If you're running in SolrCloud mode, the core is associated with a > > > collection. Collections have a link to aconfig in zookeeper. The > config > > > is not stored with the core on the disk. > > > > > > Thanks, > > > Shawn > > > > > > > > > > > > -- > > blog: whydoeseverythingsuck.com > > > -- blog: whydoeseverythingsuck.com
Re: a core for every user, lots of users... are there issues
Hank: I should add that lots of cores and SolrCloud aren't guaranteed to play nice together. I think some of the committers will be addressing this sometime soon. I'm not saying that this will certainly fail, OTOH I don't know anyone who's combined the two. Erick On Wed, Dec 4, 2013 at 3:18 PM, hank williams wrote: > Super helpful. Thanks. > > > On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey wrote: > > > On 12/4/2013 12:34 PM, hank williams wrote: > > > >> Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we > >> were *trying* to use the rest API function "create" to create cores > >> without > >> having to manually mess with files on the server. Is this what "create" > >> was > >> supposed to do? If so it was borken or we werent using it right. In any > >> case in 4.6 is that the right way to programmatically add cores in > >> discovery mode? > >> > > > > If you are NOT in SolrCloud mode, in order to create new cores, the > config > > files need to already exist on the disk. This is the case with all > > versions of Solr. > > > > If you're running in SolrCloud mode, the core is associated with a > > collection. Collections have a link to aconfig in zookeeper. The config > > is not stored with the core on the disk. > > > > Thanks, > > Shawn > > > > > > > -- > blog: whydoeseverythingsuck.com >
Re: a core for every user, lots of users... are there issues
Super helpful. Thanks. On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey wrote: > On 12/4/2013 12:34 PM, hank williams wrote: > >> Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we >> were *trying* to use the rest API function "create" to create cores >> without >> having to manually mess with files on the server. Is this what "create" >> was >> supposed to do? If so it was borken or we werent using it right. In any >> case in 4.6 is that the right way to programmatically add cores in >> discovery mode? >> > > If you are NOT in SolrCloud mode, in order to create new cores, the config > files need to already exist on the disk. This is the case with all > versions of Solr. > > If you're running in SolrCloud mode, the core is associated with a > collection. Collections have a link to aconfig in zookeeper. The config > is not stored with the core on the disk. > > Thanks, > Shawn > > -- blog: whydoeseverythingsuck.com
Re: a core for every user, lots of users... are there issues
On 12/4/2013 12:34 PM, hank williams wrote: Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we were *trying* to use the rest API function "create" to create cores without having to manually mess with files on the server. Is this what "create" was supposed to do? If so it was borken or we werent using it right. In any case in 4.6 is that the right way to programmatically add cores in discovery mode? If you are NOT in SolrCloud mode, in order to create new cores, the config files need to already exist on the disk. This is the case with all versions of Solr. If you're running in SolrCloud mode, the core is associated with a collection. Collections have a link to aconfig in zookeeper. The config is not stored with the core on the disk. Thanks, Shawn
Re: a core for every user, lots of users... are there issues
Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we were *trying* to use the rest API function "create" to create cores without having to manually mess with files on the server. Is this what "create" was supposed to do? If so it was borken or we werent using it right. In any case in 4.6 is that the right way to programmatically add cores in discovery mode? On Tue, Dec 3, 2013 at 7:37 PM, Erick Erickson wrote: > bq: Do you have any sense of what a good upper limit might be, or how we > might figure that out? > > As always, "it depends" (tm). And the biggest thing it depends upon is the > number of simultaneous users you have and the size of their indexes. And > we've arrived at the black box of estimating size again. Siiihh... I'm > afraid that the only way is to test and establish some rules of thumb. > > The transient core constraint will limit the number of cores loaded at > once. If you allow too many cores at once, you'll get OOM errors when all > the users pile on at the same time. > > Let's say you've determined that 100 is the limit for transient cores. What > I suspect you'll see is degrading response times if this is too low. Say > 110 users are signed on and say they submit queries perfectly in order, one > after the other. Every request will require the core to be opened and it'll > take a bit. So that'll be a flag. > > Or that's a fine limit but your users have added more and more documents > and you're coming under memory pressure. > > As you can tell I don't have any good answers. I've seen between 10M and > 300M documents on a single machine > > BTW, on a _very_ casual test I found about 1000 cores/second were found in > discovery mode. While they aren't loaded if they're transient, it's still a > consideration if you have 10s of thousands. > > Best, > Erick > > > > On Tue, Dec 3, 2013 at 3:33 PM, hank williams wrote: > > > On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson > >wrote: > > > > > You probably want to look at "transient cores", see: > > > http://wiki.apache.org/solr/LotsOfCores > > > > > > But millions will be "interesting" for a single node, you must have > some > > > kind of partitioning in mind? > > > > > > > > Wow. Thanks for that great link. Yes we are sharding so its not like > there > > would be millions of cores on one machine or even cluster. And since the > > cores are one per user, this is a totally clean approach. But still we > want > > to make sure that we are not overloading the machine. Do you have any > sense > > of what a good upper limit might be, or how we might figure that out? > > > > > > > > > Best, > > > Erick > > > > > > > > > On Tue, Dec 3, 2013 at 2:38 PM, hank williams > wrote: > > > > > > > We are building a system where there is a core for every user. There > > > will > > > > be many tens or perhaps ultimately hundreds of thousands or millions > of > > > > users. We do not need each of those users to have “warm” data in > > memory. > > > In > > > > fact doing so would consume lots of memory unnecessarily, for users > > that > > > > might not have logged in in a long time. > > > > > > > > So my question is, is the default behavior of Solr to try to keep all > > of > > > > our cores warm, and if so, can we stop it? Also given the number of > > cores > > > > that we will likely have is there anything else we should be keeping > in > > > > mind to maximize performance and minimize memory usage? > > > > > > > > > > > > > > > -- > > blog: whydoeseverythingsuck.com > > > -- blog: whydoeseverythingsuck.com
Re: a core for every user, lots of users... are there issues
bq: Do you have any sense of what a good upper limit might be, or how we might figure that out? As always, "it depends" (tm). And the biggest thing it depends upon is the number of simultaneous users you have and the size of their indexes. And we've arrived at the black box of estimating size again. Siiihh... I'm afraid that the only way is to test and establish some rules of thumb. The transient core constraint will limit the number of cores loaded at once. If you allow too many cores at once, you'll get OOM errors when all the users pile on at the same time. Let's say you've determined that 100 is the limit for transient cores. What I suspect you'll see is degrading response times if this is too low. Say 110 users are signed on and say they submit queries perfectly in order, one after the other. Every request will require the core to be opened and it'll take a bit. So that'll be a flag. Or that's a fine limit but your users have added more and more documents and you're coming under memory pressure. As you can tell I don't have any good answers. I've seen between 10M and 300M documents on a single machine BTW, on a _very_ casual test I found about 1000 cores/second were found in discovery mode. While they aren't loaded if they're transient, it's still a consideration if you have 10s of thousands. Best, Erick On Tue, Dec 3, 2013 at 3:33 PM, hank williams wrote: > On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson >wrote: > > > You probably want to look at "transient cores", see: > > http://wiki.apache.org/solr/LotsOfCores > > > > But millions will be "interesting" for a single node, you must have some > > kind of partitioning in mind? > > > > > Wow. Thanks for that great link. Yes we are sharding so its not like there > would be millions of cores on one machine or even cluster. And since the > cores are one per user, this is a totally clean approach. But still we want > to make sure that we are not overloading the machine. Do you have any sense > of what a good upper limit might be, or how we might figure that out? > > > > > Best, > > Erick > > > > > > On Tue, Dec 3, 2013 at 2:38 PM, hank williams wrote: > > > > > We are building a system where there is a core for every user. There > > will > > > be many tens or perhaps ultimately hundreds of thousands or millions of > > > users. We do not need each of those users to have “warm” data in > memory. > > In > > > fact doing so would consume lots of memory unnecessarily, for users > that > > > might not have logged in in a long time. > > > > > > So my question is, is the default behavior of Solr to try to keep all > of > > > our cores warm, and if so, can we stop it? Also given the number of > cores > > > that we will likely have is there anything else we should be keeping in > > > mind to maximize performance and minimize memory usage? > > > > > > > > > -- > blog: whydoeseverythingsuck.com >
Re: a core for every user, lots of users... are there issues
Sorry, I see that we are up to solr 4.6. I missed that. On Tue, Dec 3, 2013 at 3:53 PM, hank williams wrote: > Also, I see that the "lotsofcores" stuff is for solr 4.4 and above. What > is the state of the 4.4 codebase? Could we start using it now? Is it safe? > > > On Tue, Dec 3, 2013 at 3:33 PM, hank williams wrote: > >> >> >> >> On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson >> wrote: >> >>> You probably want to look at "transient cores", see: >>> http://wiki.apache.org/solr/LotsOfCores >>> >>> But millions will be "interesting" for a single node, you must have some >>> kind of partitioning in mind? >>> >>> >> Wow. Thanks for that great link. Yes we are sharding so its not like >> there would be millions of cores on one machine or even cluster. And since >> the cores are one per user, this is a totally clean approach. But still we >> want to make sure that we are not overloading the machine. Do you have any >> sense of what a good upper limit might be, or how we might figure that out? >> >> >> >>> Best, >>> Erick >>> >>> >>> On Tue, Dec 3, 2013 at 2:38 PM, hank williams wrote: >>> >>> > We are building a system where there is a core for every user. There >>> will >>> > be many tens or perhaps ultimately hundreds of thousands or millions of >>> > users. We do not need each of those users to have “warm” data in >>> memory. In >>> > fact doing so would consume lots of memory unnecessarily, for users >>> that >>> > might not have logged in in a long time. >>> > >>> > So my question is, is the default behavior of Solr to try to keep all >>> of >>> > our cores warm, and if so, can we stop it? Also given the number of >>> cores >>> > that we will likely have is there anything else we should be keeping in >>> > mind to maximize performance and minimize memory usage? >>> > >>> >> >> >> >> -- >> blog: whydoeseverythingsuck.com >> > > > > -- > blog: whydoeseverythingsuck.com > -- blog: whydoeseverythingsuck.com
Re: a core for every user, lots of users... are there issues
Also, I see that the "lotsofcores" stuff is for solr 4.4 and above. What is the state of the 4.4 codebase? Could we start using it now? Is it safe? On Tue, Dec 3, 2013 at 3:33 PM, hank williams wrote: > > > > On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson wrote: > >> You probably want to look at "transient cores", see: >> http://wiki.apache.org/solr/LotsOfCores >> >> But millions will be "interesting" for a single node, you must have some >> kind of partitioning in mind? >> >> > Wow. Thanks for that great link. Yes we are sharding so its not like there > would be millions of cores on one machine or even cluster. And since the > cores are one per user, this is a totally clean approach. But still we want > to make sure that we are not overloading the machine. Do you have any sense > of what a good upper limit might be, or how we might figure that out? > > > >> Best, >> Erick >> >> >> On Tue, Dec 3, 2013 at 2:38 PM, hank williams wrote: >> >> > We are building a system where there is a core for every user. There >> will >> > be many tens or perhaps ultimately hundreds of thousands or millions of >> > users. We do not need each of those users to have “warm” data in >> memory. In >> > fact doing so would consume lots of memory unnecessarily, for users that >> > might not have logged in in a long time. >> > >> > So my question is, is the default behavior of Solr to try to keep all of >> > our cores warm, and if so, can we stop it? Also given the number of >> cores >> > that we will likely have is there anything else we should be keeping in >> > mind to maximize performance and minimize memory usage? >> > >> > > > > -- > blog: whydoeseverythingsuck.com > -- blog: whydoeseverythingsuck.com
Re: a core for every user, lots of users... are there issues
On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson wrote: > You probably want to look at "transient cores", see: > http://wiki.apache.org/solr/LotsOfCores > > But millions will be "interesting" for a single node, you must have some > kind of partitioning in mind? > > Wow. Thanks for that great link. Yes we are sharding so its not like there would be millions of cores on one machine or even cluster. And since the cores are one per user, this is a totally clean approach. But still we want to make sure that we are not overloading the machine. Do you have any sense of what a good upper limit might be, or how we might figure that out? > Best, > Erick > > > On Tue, Dec 3, 2013 at 2:38 PM, hank williams wrote: > > > We are building a system where there is a core for every user. There > will > > be many tens or perhaps ultimately hundreds of thousands or millions of > > users. We do not need each of those users to have “warm” data in memory. > In > > fact doing so would consume lots of memory unnecessarily, for users that > > might not have logged in in a long time. > > > > So my question is, is the default behavior of Solr to try to keep all of > > our cores warm, and if so, can we stop it? Also given the number of cores > > that we will likely have is there anything else we should be keeping in > > mind to maximize performance and minimize memory usage? > > > -- blog: whydoeseverythingsuck.com
Re: a core for every user, lots of users... are there issues
You probably want to look at "transient cores", see: http://wiki.apache.org/solr/LotsOfCores But millions will be "interesting" for a single node, you must have some kind of partitioning in mind? Best, Erick On Tue, Dec 3, 2013 at 2:38 PM, hank williams wrote: > We are building a system where there is a core for every user. There will > be many tens or perhaps ultimately hundreds of thousands or millions of > users. We do not need each of those users to have “warm” data in memory. In > fact doing so would consume lots of memory unnecessarily, for users that > might not have logged in in a long time. > > So my question is, is the default behavior of Solr to try to keep all of > our cores warm, and if so, can we stop it? Also given the number of cores > that we will likely have is there anything else we should be keeping in > mind to maximize performance and minimize memory usage? >
a core for every user, lots of users... are there issues
We are building a system where there is a core for every user. There will be many tens or perhaps ultimately hundreds of thousands or millions of users. We do not need each of those users to have “warm” data in memory. In fact doing so would consume lots of memory unnecessarily, for users that might not have logged in in a long time. So my question is, is the default behavior of Solr to try to keep all of our cores warm, and if so, can we stop it? Also given the number of cores that we will likely have is there anything else we should be keeping in mind to maximize performance and minimize memory usage?