Re: a core for every user, lots of users... are there issues

2013-12-04 Thread Erick Erickson
I don't know of anyone who's tried and failed to combine transient cores
and SolrCloud. I also don't know of anyone who's tried and succeeded.

I'm saying that the transient core stuff has been thoroughly tested in
non-cloud mode. And people have been working with it for a couple of
releases now. I know of no a-priori reason it wouldn't work in SolrCloud.
But I haven't personally done it, nor do I know of anyone who has. It might
"just work", but the proof is in the pudding.

I've heard some scuttlebutt that the combination of SolrCloud and transient
cores is being, or will be soon, investigated. As in testing and writing
test cases. Being a pessimist by nature on these things, I suspect (but
don't know) that something will come up.

For instance, SolrCloud tries to keep track of all the states of all the
nodes. I _think_ (but don't know for sure) that this is just keeping
contact with the JVM, not particular cores. But what if there's something I
don't know about that pings the individual cores? That would keep them
constantly loading/unloading, which might crop up in unexpected ways. I've
got to emphasize that this is an unknown (at least to me), but an example
of something that could crop up. I'm sure there are other possibilities.

Or distributed updates. For that, every core on every node for a shard in
collectionX must process the update. So for updates, each and every core in
each and every shard might have to be loaded for the update to succeed if
the core is transient. Does this happen fast enough in all cases so a
timeout doesn't cause the update to fail? Or the node to be marked as down?
What about combining that with a heavy query load? I just don't know.

It's uncharted territory is all. I'd love it for you to volunteer to be the
first :). There's certainly committer interest in making this case work so
you wouldn't be left hanging all alone. If I were planning a product
though, I'd either treat the combination of transient cores and SolrCloud
as a R&D project or go with non-cloud mode until I had some reassurance
that transient cores and SolrCloud played nicely together.

All that said, I don't want to paint too bleak a picture. All the transient
core stuff is local to a particular node. SolrCloud and ZooKeeper shouldn't
be interested in the details. It _should_ "just work". It's just that I
can't point to any examples where that's been tried

Best,
Erick


On Wed, Dec 4, 2013 at 5:08 PM, hank williams  wrote:

> Oh my... when you say "I don't know anyone who's combined the two." do you
> mean that those that have tried have failed or that no one has gotten
> around to trying? It sounds like you are saying you have some specific
> knowledge that right now these wont work, otherwise you wouldnt say
> "committers
> will be addressing this sometime soon", right?
>
> I'm worried as we need to make a practical decision here and it sounds like
> maybe we should stick with solr for now... is that what you are saying?
>
>
> On Wed, Dec 4, 2013 at 5:01 PM, Erick Erickson  >wrote:
>
> > Hank:
> >
> > I should add that lots of cores and SolrCloud aren't guaranteed to play
> > nice together. I think some of the committers will be addressing this
> > sometime soon.
> >
> > I'm not saying that this will certainly fail, OTOH I don't know anyone
> > who's combined the two.
> >
> > Erick
> >
> >
> > On Wed, Dec 4, 2013 at 3:18 PM, hank williams  wrote:
> >
> > > Super helpful. Thanks.
> > >
> > >
> > > On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey 
> wrote:
> > >
> > > > On 12/4/2013 12:34 PM, hank williams wrote:
> > > >
> > > >> Ok one more simple question. We just upgraded to 4.6 from 4.2. In
> 4.2
> > we
> > > >> were *trying* to use the rest API function "create" to create cores
> > > >> without
> > > >> having to manually mess with files on the server. Is this what
> > "create"
> > > >> was
> > > >> supposed to do? If so it was borken or we werent using it right. In
> > any
> > > >> case in 4.6 is that the right way to programmatically add cores in
> > > >> discovery mode?
> > > >>
> > > >
> > > > If you are NOT in SolrCloud mode, in order to create new cores, the
> > > config
> > > > files need to already exist on the disk.  This is the case with all
> > > > versions of Solr.
> > > >
> > > > If you're running in SolrCloud mode, the core is associated with a
> > > > collection.  Collections have a link to aconfig in zookeeper.  The
> > config
> > > > is not stored with the core on the disk.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> > >
> > >
> > > --
> > > blog: whydoeseverythingsuck.com
> > >
> >
>
>
>
> --
> blog: whydoeseverythingsuck.com
>


Re: a core for every user, lots of users... are there issues

2013-12-04 Thread hank williams
Oh my... when you say "I don't know anyone who's combined the two." do you
mean that those that have tried have failed or that no one has gotten
around to trying? It sounds like you are saying you have some specific
knowledge that right now these wont work, otherwise you wouldnt say "committers
will be addressing this sometime soon", right?

I'm worried as we need to make a practical decision here and it sounds like
maybe we should stick with solr for now... is that what you are saying?


On Wed, Dec 4, 2013 at 5:01 PM, Erick Erickson wrote:

> Hank:
>
> I should add that lots of cores and SolrCloud aren't guaranteed to play
> nice together. I think some of the committers will be addressing this
> sometime soon.
>
> I'm not saying that this will certainly fail, OTOH I don't know anyone
> who's combined the two.
>
> Erick
>
>
> On Wed, Dec 4, 2013 at 3:18 PM, hank williams  wrote:
>
> > Super helpful. Thanks.
> >
> >
> > On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey  wrote:
> >
> > > On 12/4/2013 12:34 PM, hank williams wrote:
> > >
> > >> Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2
> we
> > >> were *trying* to use the rest API function "create" to create cores
> > >> without
> > >> having to manually mess with files on the server. Is this what
> "create"
> > >> was
> > >> supposed to do? If so it was borken or we werent using it right. In
> any
> > >> case in 4.6 is that the right way to programmatically add cores in
> > >> discovery mode?
> > >>
> > >
> > > If you are NOT in SolrCloud mode, in order to create new cores, the
> > config
> > > files need to already exist on the disk.  This is the case with all
> > > versions of Solr.
> > >
> > > If you're running in SolrCloud mode, the core is associated with a
> > > collection.  Collections have a link to aconfig in zookeeper.  The
> config
> > > is not stored with the core on the disk.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
> >
> > --
> > blog: whydoeseverythingsuck.com
> >
>



-- 
blog: whydoeseverythingsuck.com


Re: a core for every user, lots of users... are there issues

2013-12-04 Thread Erick Erickson
Hank:

I should add that lots of cores and SolrCloud aren't guaranteed to play
nice together. I think some of the committers will be addressing this
sometime soon.

I'm not saying that this will certainly fail, OTOH I don't know anyone
who's combined the two.

Erick


On Wed, Dec 4, 2013 at 3:18 PM, hank williams  wrote:

> Super helpful. Thanks.
>
>
> On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey  wrote:
>
> > On 12/4/2013 12:34 PM, hank williams wrote:
> >
> >> Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
> >> were *trying* to use the rest API function "create" to create cores
> >> without
> >> having to manually mess with files on the server. Is this what "create"
> >> was
> >> supposed to do? If so it was borken or we werent using it right. In any
> >> case in 4.6 is that the right way to programmatically add cores in
> >> discovery mode?
> >>
> >
> > If you are NOT in SolrCloud mode, in order to create new cores, the
> config
> > files need to already exist on the disk.  This is the case with all
> > versions of Solr.
> >
> > If you're running in SolrCloud mode, the core is associated with a
> > collection.  Collections have a link to aconfig in zookeeper.  The config
> > is not stored with the core on the disk.
> >
> > Thanks,
> > Shawn
> >
> >
>
>
> --
> blog: whydoeseverythingsuck.com
>


Re: a core for every user, lots of users... are there issues

2013-12-04 Thread hank williams
Super helpful. Thanks.


On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey  wrote:

> On 12/4/2013 12:34 PM, hank williams wrote:
>
>> Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
>> were *trying* to use the rest API function "create" to create cores
>> without
>> having to manually mess with files on the server. Is this what "create"
>> was
>> supposed to do? If so it was borken or we werent using it right. In any
>> case in 4.6 is that the right way to programmatically add cores in
>> discovery mode?
>>
>
> If you are NOT in SolrCloud mode, in order to create new cores, the config
> files need to already exist on the disk.  This is the case with all
> versions of Solr.
>
> If you're running in SolrCloud mode, the core is associated with a
> collection.  Collections have a link to aconfig in zookeeper.  The config
> is not stored with the core on the disk.
>
> Thanks,
> Shawn
>
>


-- 
blog: whydoeseverythingsuck.com


Re: a core for every user, lots of users... are there issues

2013-12-04 Thread Shawn Heisey

On 12/4/2013 12:34 PM, hank williams wrote:

Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
were *trying* to use the rest API function "create" to create cores without
having to manually mess with files on the server. Is this what "create" was
supposed to do? If so it was borken or we werent using it right. In any
case in 4.6 is that the right way to programmatically add cores in
discovery mode?


If you are NOT in SolrCloud mode, in order to create new cores, the 
config files need to already exist on the disk.  This is the case with 
all versions of Solr.


If you're running in SolrCloud mode, the core is associated with a 
collection.  Collections have a link to aconfig in zookeeper.  The 
config is not stored with the core on the disk.


Thanks,
Shawn



Re: a core for every user, lots of users... are there issues

2013-12-04 Thread hank williams
Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
were *trying* to use the rest API function "create" to create cores without
having to manually mess with files on the server. Is this what "create" was
supposed to do? If so it was borken or we werent using it right. In any
case in 4.6 is that the right way to programmatically add cores in
discovery mode?


On Tue, Dec 3, 2013 at 7:37 PM, Erick Erickson wrote:

> bq: Do you have any sense of what a good upper limit might be, or how we
> might figure that out?
>
> As always, "it depends" (tm). And the biggest thing it depends upon is the
> number of simultaneous users you have and the size of their indexes. And
> we've arrived at the black box of estimating size again. Siiihh... I'm
> afraid that the only way is to test and establish some rules of thumb.
>
> The transient core constraint will limit the number of cores loaded at
> once. If you allow too many cores at once, you'll get OOM errors when all
> the users pile on at the same time.
>
> Let's say you've determined that 100 is the limit for transient cores. What
> I suspect you'll see is degrading response times if this is too low. Say
> 110 users are signed on and say they submit queries perfectly in order, one
> after the other. Every request will require the core to be opened and it'll
> take a bit. So that'll be a flag.
>
> Or that's a fine limit but your users have added more and more documents
> and you're coming under memory pressure.
>
> As you can tell I don't have any good answers. I've seen between 10M and
> 300M documents on a single machine
>
> BTW, on a _very_ casual test I found about 1000 cores/second were found in
> discovery mode. While they aren't loaded if they're transient, it's still a
> consideration if you have 10s of thousands.
>
> Best,
> Erick
>
>
>
> On Tue, Dec 3, 2013 at 3:33 PM, hank williams  wrote:
>
> > On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson  > >wrote:
> >
> > > You probably want to look at "transient cores", see:
> > > http://wiki.apache.org/solr/LotsOfCores
> > >
> > > But millions will be "interesting" for a single node, you must have
> some
> > > kind of partitioning in mind?
> > >
> > >
> > Wow. Thanks for that great link. Yes we are sharding so its not like
> there
> > would be millions of cores on one machine or even cluster. And since the
> > cores are one per user, this is a totally clean approach. But still we
> want
> > to make sure that we are not overloading the machine. Do you have any
> sense
> > of what a good upper limit might be, or how we might figure that out?
> >
> >
> >
> > > Best,
> > > Erick
> > >
> > >
> > > On Tue, Dec 3, 2013 at 2:38 PM, hank williams 
> wrote:
> > >
> > > >  We are building a system where there is a core for every user. There
> > > will
> > > > be many tens or perhaps ultimately hundreds of thousands or millions
> of
> > > > users. We do not need each of those users to have “warm” data in
> > memory.
> > > In
> > > > fact doing so would consume lots of memory unnecessarily, for users
> > that
> > > > might not have logged in in a long time.
> > > >
> > > > So my question is, is the default behavior of Solr to try to keep all
> > of
> > > > our cores warm, and if so, can we stop it? Also given the number of
> > cores
> > > > that we will likely have is there anything else we should be keeping
> in
> > > > mind to maximize performance and minimize memory usage?
> > > >
> > >
> >
> >
> >
> > --
> > blog: whydoeseverythingsuck.com
> >
>



-- 
blog: whydoeseverythingsuck.com


Re: a core for every user, lots of users... are there issues

2013-12-03 Thread Erick Erickson
bq: Do you have any sense of what a good upper limit might be, or how we
might figure that out?

As always, "it depends" (tm). And the biggest thing it depends upon is the
number of simultaneous users you have and the size of their indexes. And
we've arrived at the black box of estimating size again. Siiihh... I'm
afraid that the only way is to test and establish some rules of thumb.

The transient core constraint will limit the number of cores loaded at
once. If you allow too many cores at once, you'll get OOM errors when all
the users pile on at the same time.

Let's say you've determined that 100 is the limit for transient cores. What
I suspect you'll see is degrading response times if this is too low. Say
110 users are signed on and say they submit queries perfectly in order, one
after the other. Every request will require the core to be opened and it'll
take a bit. So that'll be a flag.

Or that's a fine limit but your users have added more and more documents
and you're coming under memory pressure.

As you can tell I don't have any good answers. I've seen between 10M and
300M documents on a single machine

BTW, on a _very_ casual test I found about 1000 cores/second were found in
discovery mode. While they aren't loaded if they're transient, it's still a
consideration if you have 10s of thousands.

Best,
Erick



On Tue, Dec 3, 2013 at 3:33 PM, hank williams  wrote:

> On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson  >wrote:
>
> > You probably want to look at "transient cores", see:
> > http://wiki.apache.org/solr/LotsOfCores
> >
> > But millions will be "interesting" for a single node, you must have some
> > kind of partitioning in mind?
> >
> >
> Wow. Thanks for that great link. Yes we are sharding so its not like there
> would be millions of cores on one machine or even cluster. And since the
> cores are one per user, this is a totally clean approach. But still we want
> to make sure that we are not overloading the machine. Do you have any sense
> of what a good upper limit might be, or how we might figure that out?
>
>
>
> > Best,
> > Erick
> >
> >
> > On Tue, Dec 3, 2013 at 2:38 PM, hank williams  wrote:
> >
> > >  We are building a system where there is a core for every user. There
> > will
> > > be many tens or perhaps ultimately hundreds of thousands or millions of
> > > users. We do not need each of those users to have “warm” data in
> memory.
> > In
> > > fact doing so would consume lots of memory unnecessarily, for users
> that
> > > might not have logged in in a long time.
> > >
> > > So my question is, is the default behavior of Solr to try to keep all
> of
> > > our cores warm, and if so, can we stop it? Also given the number of
> cores
> > > that we will likely have is there anything else we should be keeping in
> > > mind to maximize performance and minimize memory usage?
> > >
> >
>
>
>
> --
> blog: whydoeseverythingsuck.com
>


Re: a core for every user, lots of users... are there issues

2013-12-03 Thread hank williams
Sorry, I see that we are up to solr 4.6. I missed that.


On Tue, Dec 3, 2013 at 3:53 PM, hank williams  wrote:

> Also, I see that the "lotsofcores" stuff is for solr 4.4 and above. What
> is the state of the 4.4 codebase? Could we start using it now? Is it safe?
>
>
> On Tue, Dec 3, 2013 at 3:33 PM, hank williams  wrote:
>
>>
>>
>>
>> On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson 
>> wrote:
>>
>>> You probably want to look at "transient cores", see:
>>> http://wiki.apache.org/solr/LotsOfCores
>>>
>>> But millions will be "interesting" for a single node, you must have some
>>> kind of partitioning in mind?
>>>
>>>
>> Wow. Thanks for that great link. Yes we are sharding so its not like
>> there would be millions of cores on one machine or even cluster. And since
>> the cores are one per user, this is a totally clean approach. But still we
>> want to make sure that we are not overloading the machine. Do you have any
>> sense of what a good upper limit might be, or how we might figure that out?
>>
>>
>>
>>> Best,
>>> Erick
>>>
>>>
>>> On Tue, Dec 3, 2013 at 2:38 PM, hank williams  wrote:
>>>
>>> >  We are building a system where there is a core for every user. There
>>> will
>>> > be many tens or perhaps ultimately hundreds of thousands or millions of
>>> > users. We do not need each of those users to have “warm” data in
>>> memory. In
>>> > fact doing so would consume lots of memory unnecessarily, for users
>>> that
>>> > might not have logged in in a long time.
>>> >
>>> > So my question is, is the default behavior of Solr to try to keep all
>>> of
>>> > our cores warm, and if so, can we stop it? Also given the number of
>>> cores
>>> > that we will likely have is there anything else we should be keeping in
>>> > mind to maximize performance and minimize memory usage?
>>> >
>>>
>>
>>
>>
>> --
>> blog: whydoeseverythingsuck.com
>>
>
>
>
> --
> blog: whydoeseverythingsuck.com
>



-- 
blog: whydoeseverythingsuck.com


Re: a core for every user, lots of users... are there issues

2013-12-03 Thread hank williams
Also, I see that the "lotsofcores" stuff is for solr 4.4 and above. What is
the state of the 4.4 codebase? Could we start using it now? Is it safe?


On Tue, Dec 3, 2013 at 3:33 PM, hank williams  wrote:

>
>
>
> On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson wrote:
>
>> You probably want to look at "transient cores", see:
>> http://wiki.apache.org/solr/LotsOfCores
>>
>> But millions will be "interesting" for a single node, you must have some
>> kind of partitioning in mind?
>>
>>
> Wow. Thanks for that great link. Yes we are sharding so its not like there
> would be millions of cores on one machine or even cluster. And since the
> cores are one per user, this is a totally clean approach. But still we want
> to make sure that we are not overloading the machine. Do you have any sense
> of what a good upper limit might be, or how we might figure that out?
>
>
>
>> Best,
>> Erick
>>
>>
>> On Tue, Dec 3, 2013 at 2:38 PM, hank williams  wrote:
>>
>> >  We are building a system where there is a core for every user. There
>> will
>> > be many tens or perhaps ultimately hundreds of thousands or millions of
>> > users. We do not need each of those users to have “warm” data in
>> memory. In
>> > fact doing so would consume lots of memory unnecessarily, for users that
>> > might not have logged in in a long time.
>> >
>> > So my question is, is the default behavior of Solr to try to keep all of
>> > our cores warm, and if so, can we stop it? Also given the number of
>> cores
>> > that we will likely have is there anything else we should be keeping in
>> > mind to maximize performance and minimize memory usage?
>> >
>>
>
>
>
> --
> blog: whydoeseverythingsuck.com
>



-- 
blog: whydoeseverythingsuck.com


Re: a core for every user, lots of users... are there issues

2013-12-03 Thread hank williams
On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson wrote:

> You probably want to look at "transient cores", see:
> http://wiki.apache.org/solr/LotsOfCores
>
> But millions will be "interesting" for a single node, you must have some
> kind of partitioning in mind?
>
>
Wow. Thanks for that great link. Yes we are sharding so its not like there
would be millions of cores on one machine or even cluster. And since the
cores are one per user, this is a totally clean approach. But still we want
to make sure that we are not overloading the machine. Do you have any sense
of what a good upper limit might be, or how we might figure that out?



> Best,
> Erick
>
>
> On Tue, Dec 3, 2013 at 2:38 PM, hank williams  wrote:
>
> >  We are building a system where there is a core for every user. There
> will
> > be many tens or perhaps ultimately hundreds of thousands or millions of
> > users. We do not need each of those users to have “warm” data in memory.
> In
> > fact doing so would consume lots of memory unnecessarily, for users that
> > might not have logged in in a long time.
> >
> > So my question is, is the default behavior of Solr to try to keep all of
> > our cores warm, and if so, can we stop it? Also given the number of cores
> > that we will likely have is there anything else we should be keeping in
> > mind to maximize performance and minimize memory usage?
> >
>



-- 
blog: whydoeseverythingsuck.com


Re: a core for every user, lots of users... are there issues

2013-12-03 Thread Erick Erickson
You probably want to look at "transient cores", see:
http://wiki.apache.org/solr/LotsOfCores

But millions will be "interesting" for a single node, you must have some
kind of partitioning in mind?

Best,
Erick


On Tue, Dec 3, 2013 at 2:38 PM, hank williams  wrote:

>  We are building a system where there is a core for every user. There will
> be many tens or perhaps ultimately hundreds of thousands or millions of
> users. We do not need each of those users to have “warm” data in memory. In
> fact doing so would consume lots of memory unnecessarily, for users that
> might not have logged in in a long time.
>
> So my question is, is the default behavior of Solr to try to keep all of
> our cores warm, and if so, can we stop it? Also given the number of cores
> that we will likely have is there anything else we should be keeping in
> mind to maximize performance and minimize memory usage?
>


a core for every user, lots of users... are there issues

2013-12-03 Thread hank williams
 We are building a system where there is a core for every user. There will
be many tens or perhaps ultimately hundreds of thousands or millions of
users. We do not need each of those users to have “warm” data in memory. In
fact doing so would consume lots of memory unnecessarily, for users that
might not have logged in in a long time.

So my question is, is the default behavior of Solr to try to keep all of
our cores warm, and if so, can we stop it? Also given the number of cores
that we will likely have is there anything else we should be keeping in
mind to maximize performance and minimize memory usage?