Re: a core for every user, lots of users... are there issues

2013-12-04 Thread hank williams
Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
were *trying* to use the rest API function create to create cores without
having to manually mess with files on the server. Is this what create was
supposed to do? If so it was borken or we werent using it right. In any
case in 4.6 is that the right way to programmatically add cores in
discovery mode?


On Tue, Dec 3, 2013 at 7:37 PM, Erick Erickson erickerick...@gmail.comwrote:

 bq: Do you have any sense of what a good upper limit might be, or how we
 might figure that out?

 As always, it depends (tm). And the biggest thing it depends upon is the
 number of simultaneous users you have and the size of their indexes. And
 we've arrived at the black box of estimating size again. Siiihh... I'm
 afraid that the only way is to test and establish some rules of thumb.

 The transient core constraint will limit the number of cores loaded at
 once. If you allow too many cores at once, you'll get OOM errors when all
 the users pile on at the same time.

 Let's say you've determined that 100 is the limit for transient cores. What
 I suspect you'll see is degrading response times if this is too low. Say
 110 users are signed on and say they submit queries perfectly in order, one
 after the other. Every request will require the core to be opened and it'll
 take a bit. So that'll be a flag.

 Or that's a fine limit but your users have added more and more documents
 and you're coming under memory pressure.

 As you can tell I don't have any good answers. I've seen between 10M and
 300M documents on a single machine

 BTW, on a _very_ casual test I found about 1000 cores/second were found in
 discovery mode. While they aren't loaded if they're transient, it's still a
 consideration if you have 10s of thousands.

 Best,
 Erick



 On Tue, Dec 3, 2013 at 3:33 PM, hank williams hank...@gmail.com wrote:

  On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   You probably want to look at transient cores, see:
   http://wiki.apache.org/solr/LotsOfCores
  
   But millions will be interesting for a single node, you must have
 some
   kind of partitioning in mind?
  
  
  Wow. Thanks for that great link. Yes we are sharding so its not like
 there
  would be millions of cores on one machine or even cluster. And since the
  cores are one per user, this is a totally clean approach. But still we
 want
  to make sure that we are not overloading the machine. Do you have any
 sense
  of what a good upper limit might be, or how we might figure that out?
 
 
 
   Best,
   Erick
  
  
   On Tue, Dec 3, 2013 at 2:38 PM, hank williams hank...@gmail.com
 wrote:
  
 We are building a system where there is a core for every user. There
   will
be many tens or perhaps ultimately hundreds of thousands or millions
 of
users. We do not need each of those users to have “warm” data in
  memory.
   In
fact doing so would consume lots of memory unnecessarily, for users
  that
might not have logged in in a long time.
   
So my question is, is the default behavior of Solr to try to keep all
  of
our cores warm, and if so, can we stop it? Also given the number of
  cores
that we will likely have is there anything else we should be keeping
 in
mind to maximize performance and minimize memory usage?
   
  
 
 
 
  --
  blog: whydoeseverythingsuck.com
 




-- 
blog: whydoeseverythingsuck.com


Re: a core for every user, lots of users... are there issues

2013-12-04 Thread Shawn Heisey

On 12/4/2013 12:34 PM, hank williams wrote:

Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
were *trying* to use the rest API function create to create cores without
having to manually mess with files on the server. Is this what create was
supposed to do? If so it was borken or we werent using it right. In any
case in 4.6 is that the right way to programmatically add cores in
discovery mode?


If you are NOT in SolrCloud mode, in order to create new cores, the 
config files need to already exist on the disk.  This is the case with 
all versions of Solr.


If you're running in SolrCloud mode, the core is associated with a 
collection.  Collections have a link to aconfig in zookeeper.  The 
config is not stored with the core on the disk.


Thanks,
Shawn



Re: a core for every user, lots of users... are there issues

2013-12-04 Thread hank williams
Super helpful. Thanks.


On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey s...@elyograg.org wrote:

 On 12/4/2013 12:34 PM, hank williams wrote:

 Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
 were *trying* to use the rest API function create to create cores
 without
 having to manually mess with files on the server. Is this what create
 was
 supposed to do? If so it was borken or we werent using it right. In any
 case in 4.6 is that the right way to programmatically add cores in
 discovery mode?


 If you are NOT in SolrCloud mode, in order to create new cores, the config
 files need to already exist on the disk.  This is the case with all
 versions of Solr.

 If you're running in SolrCloud mode, the core is associated with a
 collection.  Collections have a link to aconfig in zookeeper.  The config
 is not stored with the core on the disk.

 Thanks,
 Shawn




-- 
blog: whydoeseverythingsuck.com


Re: a core for every user, lots of users... are there issues

2013-12-04 Thread Erick Erickson
Hank:

I should add that lots of cores and SolrCloud aren't guaranteed to play
nice together. I think some of the committers will be addressing this
sometime soon.

I'm not saying that this will certainly fail, OTOH I don't know anyone
who's combined the two.

Erick


On Wed, Dec 4, 2013 at 3:18 PM, hank williams hank...@gmail.com wrote:

 Super helpful. Thanks.


 On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey s...@elyograg.org wrote:

  On 12/4/2013 12:34 PM, hank williams wrote:
 
  Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2 we
  were *trying* to use the rest API function create to create cores
  without
  having to manually mess with files on the server. Is this what create
  was
  supposed to do? If so it was borken or we werent using it right. In any
  case in 4.6 is that the right way to programmatically add cores in
  discovery mode?
 
 
  If you are NOT in SolrCloud mode, in order to create new cores, the
 config
  files need to already exist on the disk.  This is the case with all
  versions of Solr.
 
  If you're running in SolrCloud mode, the core is associated with a
  collection.  Collections have a link to aconfig in zookeeper.  The config
  is not stored with the core on the disk.
 
  Thanks,
  Shawn
 
 


 --
 blog: whydoeseverythingsuck.com



Re: a core for every user, lots of users... are there issues

2013-12-04 Thread hank williams
Oh my... when you say I don't know anyone who's combined the two. do you
mean that those that have tried have failed or that no one has gotten
around to trying? It sounds like you are saying you have some specific
knowledge that right now these wont work, otherwise you wouldnt say committers
will be addressing this sometime soon, right?

I'm worried as we need to make a practical decision here and it sounds like
maybe we should stick with solr for now... is that what you are saying?


On Wed, Dec 4, 2013 at 5:01 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hank:

 I should add that lots of cores and SolrCloud aren't guaranteed to play
 nice together. I think some of the committers will be addressing this
 sometime soon.

 I'm not saying that this will certainly fail, OTOH I don't know anyone
 who's combined the two.

 Erick


 On Wed, Dec 4, 2013 at 3:18 PM, hank williams hank...@gmail.com wrote:

  Super helpful. Thanks.
 
 
  On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey s...@elyograg.org wrote:
 
   On 12/4/2013 12:34 PM, hank williams wrote:
  
   Ok one more simple question. We just upgraded to 4.6 from 4.2. In 4.2
 we
   were *trying* to use the rest API function create to create cores
   without
   having to manually mess with files on the server. Is this what
 create
   was
   supposed to do? If so it was borken or we werent using it right. In
 any
   case in 4.6 is that the right way to programmatically add cores in
   discovery mode?
  
  
   If you are NOT in SolrCloud mode, in order to create new cores, the
  config
   files need to already exist on the disk.  This is the case with all
   versions of Solr.
  
   If you're running in SolrCloud mode, the core is associated with a
   collection.  Collections have a link to aconfig in zookeeper.  The
 config
   is not stored with the core on the disk.
  
   Thanks,
   Shawn
  
  
 
 
  --
  blog: whydoeseverythingsuck.com
 




-- 
blog: whydoeseverythingsuck.com


Re: a core for every user, lots of users... are there issues

2013-12-04 Thread Erick Erickson
I don't know of anyone who's tried and failed to combine transient cores
and SolrCloud. I also don't know of anyone who's tried and succeeded.

I'm saying that the transient core stuff has been thoroughly tested in
non-cloud mode. And people have been working with it for a couple of
releases now. I know of no a-priori reason it wouldn't work in SolrCloud.
But I haven't personally done it, nor do I know of anyone who has. It might
just work, but the proof is in the pudding.

I've heard some scuttlebutt that the combination of SolrCloud and transient
cores is being, or will be soon, investigated. As in testing and writing
test cases. Being a pessimist by nature on these things, I suspect (but
don't know) that something will come up.

For instance, SolrCloud tries to keep track of all the states of all the
nodes. I _think_ (but don't know for sure) that this is just keeping
contact with the JVM, not particular cores. But what if there's something I
don't know about that pings the individual cores? That would keep them
constantly loading/unloading, which might crop up in unexpected ways. I've
got to emphasize that this is an unknown (at least to me), but an example
of something that could crop up. I'm sure there are other possibilities.

Or distributed updates. For that, every core on every node for a shard in
collectionX must process the update. So for updates, each and every core in
each and every shard might have to be loaded for the update to succeed if
the core is transient. Does this happen fast enough in all cases so a
timeout doesn't cause the update to fail? Or the node to be marked as down?
What about combining that with a heavy query load? I just don't know.

It's uncharted territory is all. I'd love it for you to volunteer to be the
first :). There's certainly committer interest in making this case work so
you wouldn't be left hanging all alone. If I were planning a product
though, I'd either treat the combination of transient cores and SolrCloud
as a RD project or go with non-cloud mode until I had some reassurance
that transient cores and SolrCloud played nicely together.

All that said, I don't want to paint too bleak a picture. All the transient
core stuff is local to a particular node. SolrCloud and ZooKeeper shouldn't
be interested in the details. It _should_ just work. It's just that I
can't point to any examples where that's been tried

Best,
Erick


On Wed, Dec 4, 2013 at 5:08 PM, hank williams hank...@gmail.com wrote:

 Oh my... when you say I don't know anyone who's combined the two. do you
 mean that those that have tried have failed or that no one has gotten
 around to trying? It sounds like you are saying you have some specific
 knowledge that right now these wont work, otherwise you wouldnt say
 committers
 will be addressing this sometime soon, right?

 I'm worried as we need to make a practical decision here and it sounds like
 maybe we should stick with solr for now... is that what you are saying?


 On Wed, Dec 4, 2013 at 5:01 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Hank:
 
  I should add that lots of cores and SolrCloud aren't guaranteed to play
  nice together. I think some of the committers will be addressing this
  sometime soon.
 
  I'm not saying that this will certainly fail, OTOH I don't know anyone
  who's combined the two.
 
  Erick
 
 
  On Wed, Dec 4, 2013 at 3:18 PM, hank williams hank...@gmail.com wrote:
 
   Super helpful. Thanks.
  
  
   On Wed, Dec 4, 2013 at 2:53 PM, Shawn Heisey s...@elyograg.org
 wrote:
  
On 12/4/2013 12:34 PM, hank williams wrote:
   
Ok one more simple question. We just upgraded to 4.6 from 4.2. In
 4.2
  we
were *trying* to use the rest API function create to create cores
without
having to manually mess with files on the server. Is this what
  create
was
supposed to do? If so it was borken or we werent using it right. In
  any
case in 4.6 is that the right way to programmatically add cores in
discovery mode?
   
   
If you are NOT in SolrCloud mode, in order to create new cores, the
   config
files need to already exist on the disk.  This is the case with all
versions of Solr.
   
If you're running in SolrCloud mode, the core is associated with a
collection.  Collections have a link to aconfig in zookeeper.  The
  config
is not stored with the core on the disk.
   
Thanks,
Shawn
   
   
  
  
   --
   blog: whydoeseverythingsuck.com
  
 



 --
 blog: whydoeseverythingsuck.com



a core for every user, lots of users... are there issues

2013-12-03 Thread hank williams
 We are building a system where there is a core for every user. There will
be many tens or perhaps ultimately hundreds of thousands or millions of
users. We do not need each of those users to have “warm” data in memory. In
fact doing so would consume lots of memory unnecessarily, for users that
might not have logged in in a long time.

So my question is, is the default behavior of Solr to try to keep all of
our cores warm, and if so, can we stop it? Also given the number of cores
that we will likely have is there anything else we should be keeping in
mind to maximize performance and minimize memory usage?


Re: a core for every user, lots of users... are there issues

2013-12-03 Thread Erick Erickson
You probably want to look at transient cores, see:
http://wiki.apache.org/solr/LotsOfCores

But millions will be interesting for a single node, you must have some
kind of partitioning in mind?

Best,
Erick


On Tue, Dec 3, 2013 at 2:38 PM, hank williams hank...@gmail.com wrote:

  We are building a system where there is a core for every user. There will
 be many tens or perhaps ultimately hundreds of thousands or millions of
 users. We do not need each of those users to have “warm” data in memory. In
 fact doing so would consume lots of memory unnecessarily, for users that
 might not have logged in in a long time.

 So my question is, is the default behavior of Solr to try to keep all of
 our cores warm, and if so, can we stop it? Also given the number of cores
 that we will likely have is there anything else we should be keeping in
 mind to maximize performance and minimize memory usage?



Re: a core for every user, lots of users... are there issues

2013-12-03 Thread hank williams
On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson erickerick...@gmail.comwrote:

 You probably want to look at transient cores, see:
 http://wiki.apache.org/solr/LotsOfCores

 But millions will be interesting for a single node, you must have some
 kind of partitioning in mind?


Wow. Thanks for that great link. Yes we are sharding so its not like there
would be millions of cores on one machine or even cluster. And since the
cores are one per user, this is a totally clean approach. But still we want
to make sure that we are not overloading the machine. Do you have any sense
of what a good upper limit might be, or how we might figure that out?



 Best,
 Erick


 On Tue, Dec 3, 2013 at 2:38 PM, hank williams hank...@gmail.com wrote:

   We are building a system where there is a core for every user. There
 will
  be many tens or perhaps ultimately hundreds of thousands or millions of
  users. We do not need each of those users to have “warm” data in memory.
 In
  fact doing so would consume lots of memory unnecessarily, for users that
  might not have logged in in a long time.
 
  So my question is, is the default behavior of Solr to try to keep all of
  our cores warm, and if so, can we stop it? Also given the number of cores
  that we will likely have is there anything else we should be keeping in
  mind to maximize performance and minimize memory usage?
 




-- 
blog: whydoeseverythingsuck.com


Re: a core for every user, lots of users... are there issues

2013-12-03 Thread hank williams
Also, I see that the lotsofcores stuff is for solr 4.4 and above. What is
the state of the 4.4 codebase? Could we start using it now? Is it safe?


On Tue, Dec 3, 2013 at 3:33 PM, hank williams hank...@gmail.com wrote:




 On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson erickerick...@gmail.comwrote:

 You probably want to look at transient cores, see:
 http://wiki.apache.org/solr/LotsOfCores

 But millions will be interesting for a single node, you must have some
 kind of partitioning in mind?


 Wow. Thanks for that great link. Yes we are sharding so its not like there
 would be millions of cores on one machine or even cluster. And since the
 cores are one per user, this is a totally clean approach. But still we want
 to make sure that we are not overloading the machine. Do you have any sense
 of what a good upper limit might be, or how we might figure that out?



 Best,
 Erick


 On Tue, Dec 3, 2013 at 2:38 PM, hank williams hank...@gmail.com wrote:

   We are building a system where there is a core for every user. There
 will
  be many tens or perhaps ultimately hundreds of thousands or millions of
  users. We do not need each of those users to have “warm” data in
 memory. In
  fact doing so would consume lots of memory unnecessarily, for users that
  might not have logged in in a long time.
 
  So my question is, is the default behavior of Solr to try to keep all of
  our cores warm, and if so, can we stop it? Also given the number of
 cores
  that we will likely have is there anything else we should be keeping in
  mind to maximize performance and minimize memory usage?
 




 --
 blog: whydoeseverythingsuck.com




-- 
blog: whydoeseverythingsuck.com


Re: a core for every user, lots of users... are there issues

2013-12-03 Thread hank williams
Sorry, I see that we are up to solr 4.6. I missed that.


On Tue, Dec 3, 2013 at 3:53 PM, hank williams hank...@gmail.com wrote:

 Also, I see that the lotsofcores stuff is for solr 4.4 and above. What
 is the state of the 4.4 codebase? Could we start using it now? Is it safe?


 On Tue, Dec 3, 2013 at 3:33 PM, hank williams hank...@gmail.com wrote:




 On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 You probably want to look at transient cores, see:
 http://wiki.apache.org/solr/LotsOfCores

 But millions will be interesting for a single node, you must have some
 kind of partitioning in mind?


 Wow. Thanks for that great link. Yes we are sharding so its not like
 there would be millions of cores on one machine or even cluster. And since
 the cores are one per user, this is a totally clean approach. But still we
 want to make sure that we are not overloading the machine. Do you have any
 sense of what a good upper limit might be, or how we might figure that out?



 Best,
 Erick


 On Tue, Dec 3, 2013 at 2:38 PM, hank williams hank...@gmail.com wrote:

   We are building a system where there is a core for every user. There
 will
  be many tens or perhaps ultimately hundreds of thousands or millions of
  users. We do not need each of those users to have “warm” data in
 memory. In
  fact doing so would consume lots of memory unnecessarily, for users
 that
  might not have logged in in a long time.
 
  So my question is, is the default behavior of Solr to try to keep all
 of
  our cores warm, and if so, can we stop it? Also given the number of
 cores
  that we will likely have is there anything else we should be keeping in
  mind to maximize performance and minimize memory usage?
 




 --
 blog: whydoeseverythingsuck.com




 --
 blog: whydoeseverythingsuck.com




-- 
blog: whydoeseverythingsuck.com


Re: a core for every user, lots of users... are there issues

2013-12-03 Thread Erick Erickson
bq: Do you have any sense of what a good upper limit might be, or how we
might figure that out?

As always, it depends (tm). And the biggest thing it depends upon is the
number of simultaneous users you have and the size of their indexes. And
we've arrived at the black box of estimating size again. Siiihh... I'm
afraid that the only way is to test and establish some rules of thumb.

The transient core constraint will limit the number of cores loaded at
once. If you allow too many cores at once, you'll get OOM errors when all
the users pile on at the same time.

Let's say you've determined that 100 is the limit for transient cores. What
I suspect you'll see is degrading response times if this is too low. Say
110 users are signed on and say they submit queries perfectly in order, one
after the other. Every request will require the core to be opened and it'll
take a bit. So that'll be a flag.

Or that's a fine limit but your users have added more and more documents
and you're coming under memory pressure.

As you can tell I don't have any good answers. I've seen between 10M and
300M documents on a single machine

BTW, on a _very_ casual test I found about 1000 cores/second were found in
discovery mode. While they aren't loaded if they're transient, it's still a
consideration if you have 10s of thousands.

Best,
Erick



On Tue, Dec 3, 2013 at 3:33 PM, hank williams hank...@gmail.com wrote:

 On Tue, Dec 3, 2013 at 3:20 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  You probably want to look at transient cores, see:
  http://wiki.apache.org/solr/LotsOfCores
 
  But millions will be interesting for a single node, you must have some
  kind of partitioning in mind?
 
 
 Wow. Thanks for that great link. Yes we are sharding so its not like there
 would be millions of cores on one machine or even cluster. And since the
 cores are one per user, this is a totally clean approach. But still we want
 to make sure that we are not overloading the machine. Do you have any sense
 of what a good upper limit might be, or how we might figure that out?



  Best,
  Erick
 
 
  On Tue, Dec 3, 2013 at 2:38 PM, hank williams hank...@gmail.com wrote:
 
We are building a system where there is a core for every user. There
  will
   be many tens or perhaps ultimately hundreds of thousands or millions of
   users. We do not need each of those users to have “warm” data in
 memory.
  In
   fact doing so would consume lots of memory unnecessarily, for users
 that
   might not have logged in in a long time.
  
   So my question is, is the default behavior of Solr to try to keep all
 of
   our cores warm, and if so, can we stop it? Also given the number of
 cores
   that we will likely have is there anything else we should be keeping in
   mind to maximize performance and minimize memory usage?
  
 



 --
 blog: whydoeseverythingsuck.com