Re: Large Number of Collections takes down Solr 7.3

2019-01-29 Thread Hendrik Haddorp
How much memory do the Solr instances have? Any more details on what 
happens when the Solr instances start to fail?

We are using multiple Solr clouds to keep the collection count low(er).

On 29.01.2019 06:53, Gus Heck wrote:

Does it all have to be in a single cloud?

On Mon, Jan 28, 2019, 10:34 PM Shawn Heisey 
On 1/28/2019 8:12 PM, Monica Skidmore wrote:

I would have to negotiate with the middle-ware teams - but, we've used a

core per customer in master-slave mode for about 3 years now, with great
success.  Our pool of data is very large, so limiting a customer's searches
to just their core keeps query times fast (or at least reduces the chances
of one customer impacting another with expensive queries.  There is also a
little security added - since the customer is required to provide the core
to search, there is less chance that they'll see another customer's data in
their responses (like they might if they 'forgot' to add a filter to their
query.  We were hoping that moving to Cloud would help our management of
the largest customers - some of which we'd like to sub-shard with the cloud
tooling.  We expected cloud to support as many cores/collections as our
2-versions-old Solr instances - but we didn't count on all the increased
network traffic or the extra complications of bringing up a large cloud
cluster.

At this time, SolrCloud will not handle what you're trying to throw at
it.  Without Cloud, Solr can fairly easily handle thousands of indexes,
because there is no communication between nodes about cluster state.
The immensity of that communication (handled via ZooKeeper) is why
SolrCloud can't scale to thousands of shard replicas.

The solution to this problem will be twofold:  1) Reduce the number of
work items in the Overseer queue.  2) Make the Overseer do its job a lot
faster.  There have been small incremental improvements towards these
goals, but as you've noticed, we're definitely not there yet.

On the subject of a customer forgetting to add a filter ... your systems
should be handling that for them ... if the customer has direct access
to Solr, then all bets are off... they'll be able to do just about
anything they want.  It is possible to configure a proxy to limit what
somebody can get to, but it would be pretty complicated to come up with
a proxy configuration that fully locks things down.

Using shards is completely possible without SolrCloud.  But SolrCloud
certainly does make it a lot easier.

How many records in your largest customer indexes?  How big are those
indexes on disk?

Thanks,
Shawn





Re: Large Number of Collections takes down Solr 7.3

2019-01-28 Thread Gus Heck
Does it all have to be in a single cloud?

On Mon, Jan 28, 2019, 10:34 PM Shawn Heisey  On 1/28/2019 8:12 PM, Monica Skidmore wrote:
> > I would have to negotiate with the middle-ware teams - but, we've used a
> core per customer in master-slave mode for about 3 years now, with great
> success.  Our pool of data is very large, so limiting a customer's searches
> to just their core keeps query times fast (or at least reduces the chances
> of one customer impacting another with expensive queries.  There is also a
> little security added - since the customer is required to provide the core
> to search, there is less chance that they'll see another customer's data in
> their responses (like they might if they 'forgot' to add a filter to their
> query.  We were hoping that moving to Cloud would help our management of
> the largest customers - some of which we'd like to sub-shard with the cloud
> tooling.  We expected cloud to support as many cores/collections as our
> 2-versions-old Solr instances - but we didn't count on all the increased
> network traffic or the extra complications of bringing up a large cloud
> cluster.
>
> At this time, SolrCloud will not handle what you're trying to throw at
> it.  Without Cloud, Solr can fairly easily handle thousands of indexes,
> because there is no communication between nodes about cluster state.
> The immensity of that communication (handled via ZooKeeper) is why
> SolrCloud can't scale to thousands of shard replicas.
>
> The solution to this problem will be twofold:  1) Reduce the number of
> work items in the Overseer queue.  2) Make the Overseer do its job a lot
> faster.  There have been small incremental improvements towards these
> goals, but as you've noticed, we're definitely not there yet.
>
> On the subject of a customer forgetting to add a filter ... your systems
> should be handling that for them ... if the customer has direct access
> to Solr, then all bets are off... they'll be able to do just about
> anything they want.  It is possible to configure a proxy to limit what
> somebody can get to, but it would be pretty complicated to come up with
> a proxy configuration that fully locks things down.
>
> Using shards is completely possible without SolrCloud.  But SolrCloud
> certainly does make it a lot easier.
>
> How many records in your largest customer indexes?  How big are those
> indexes on disk?
>
> Thanks,
> Shawn
>


Re: Large Number of Collections takes down Solr 7.3

2019-01-28 Thread Shawn Heisey

On 1/28/2019 8:12 PM, Monica Skidmore wrote:

I would have to negotiate with the middle-ware teams - but, we've used a core 
per customer in master-slave mode for about 3 years now, with great success.  
Our pool of data is very large, so limiting a customer's searches to just their 
core keeps query times fast (or at least reduces the chances of one customer 
impacting another with expensive queries.  There is also a little security 
added - since the customer is required to provide the core to search, there is 
less chance that they'll see another customer's data in their responses (like 
they might if they 'forgot' to add a filter to their query.  We were hoping 
that moving to Cloud would help our management of the largest customers - some 
of which we'd like to sub-shard with the cloud tooling.  We expected cloud to 
support as many cores/collections as our 2-versions-old Solr instances - but we 
didn't count on all the increased network traffic or the extra complications of 
bringing up a large cloud cluster.


At this time, SolrCloud will not handle what you're trying to throw at 
it.  Without Cloud, Solr can fairly easily handle thousands of indexes, 
because there is no communication between nodes about cluster state. 
The immensity of that communication (handled via ZooKeeper) is why 
SolrCloud can't scale to thousands of shard replicas.


The solution to this problem will be twofold:  1) Reduce the number of 
work items in the Overseer queue.  2) Make the Overseer do its job a lot 
faster.  There have been small incremental improvements towards these 
goals, but as you've noticed, we're definitely not there yet.


On the subject of a customer forgetting to add a filter ... your systems 
should be handling that for them ... if the customer has direct access 
to Solr, then all bets are off... they'll be able to do just about 
anything they want.  It is possible to configure a proxy to limit what 
somebody can get to, but it would be pretty complicated to come up with 
a proxy configuration that fully locks things down.


Using shards is completely possible without SolrCloud.  But SolrCloud 
certainly does make it a lot easier.


How many records in your largest customer indexes?  How big are those 
indexes on disk?


Thanks,
Shawn


Re: Large Number of Collections takes down Solr 7.3

2019-01-28 Thread Monica Skidmore
I would have to negotiate with the middle-ware teams - but, we've used a core 
per customer in master-slave mode for about 3 years now, with great success.  
Our pool of data is very large, so limiting a customer's searches to just their 
core keeps query times fast (or at least reduces the chances of one customer 
impacting another with expensive queries.  There is also a little security 
added - since the customer is required to provide the core to search, there is 
less chance that they'll see another customer's data in their responses (like 
they might if they 'forgot' to add a filter to their query.  We were hoping 
that moving to Cloud would help our management of the largest customers - some 
of which we'd like to sub-shard with the cloud tooling.  We expected cloud to 
support as many cores/collections as our 2-versions-old Solr instances - but we 
didn't count on all the increased network traffic or the extra complications of 
bringing up a large cloud cluster.

Monica Skidmore


 
 
 

On 1/22/19, 10:06 PM, "Dave"  wrote:

Do you mind if I ask why so many collections rather than a field in one 
collection that you can apply a filter query to each customer to restrict the 
result set, assuming you’re the one controlling the middle ware?

> On Jan 22, 2019, at 4:43 PM, Monica Skidmore 
 wrote:
> 
> We have been running Solr 5.4 in master-slave mode with ~4500 cores for a 
couple of years very successfully.  The cores represent individual customer 
data, so they can vary greatly in size, and some of them have gotten too large 
to be manageable.
> 
> We are trying to upgrade to Solr 7.3 in cloud mode, with ~4500 
collections, 2  NRTreplicas total per collection.  We have experimented with 
additional servers and ZK nodes as a part of this move.  We can create up to 
~4000 collections, with a slow-down to ~20s per collection to create, but if we 
go much beyond that, the time to create collections shoots up, some collections 
fail to be created, and we see some of the nodes crash.  Autoscaling brings 
nodes back into the cluster, but they don’t have all the replicas created on 
them that they should – we’re pretty sure this is related to the challenge of 
adding the large number of collections on those node as they come up.
> 
> There are some approaches we could take that don’t separate our customers 
into collections, but we get some benefits from this approach that we’d like to 
keep.  We’d also like to add the benefits of cloud, like balancing where 
collections are placed and the ability to split large collections.
> 
> Is anyone successfully running Solr 7x in cloud mode with thousands or 
more of collections?  Are there some configurations we should be taking a 
closer look at to make this feasible?  Should we try a different replica type?  
(We do want NRT-like query latency, but we also index heavily – this cluster 
will have 10’s of millions of documents.)
> 
> I should note that the problems are not due to the number of documents – 
the problems occur on a new cluster while we’re creating the collections we 
know we’ll need.
> 
> Monica Skidmore
> 
> 
> 




Re: Large Number of Collections takes down Solr 7.3

2019-01-22 Thread Dave
Do you mind if I ask why so many collections rather than a field in one 
collection that you can apply a filter query to each customer to restrict the 
result set, assuming you’re the one controlling the middle ware?

> On Jan 22, 2019, at 4:43 PM, Monica Skidmore 
>  wrote:
> 
> We have been running Solr 5.4 in master-slave mode with ~4500 cores for a 
> couple of years very successfully.  The cores represent individual customer 
> data, so they can vary greatly in size, and some of them have gotten too 
> large to be manageable.
> 
> We are trying to upgrade to Solr 7.3 in cloud mode, with ~4500 collections, 2 
>  NRTreplicas total per collection.  We have experimented with additional 
> servers and ZK nodes as a part of this move.  We can create up to ~4000 
> collections, with a slow-down to ~20s per collection to create, but if we go 
> much beyond that, the time to create collections shoots up, some collections 
> fail to be created, and we see some of the nodes crash.  Autoscaling brings 
> nodes back into the cluster, but they don’t have all the replicas created on 
> them that they should – we’re pretty sure this is related to the challenge of 
> adding the large number of collections on those node as they come up.
> 
> There are some approaches we could take that don’t separate our customers 
> into collections, but we get some benefits from this approach that we’d like 
> to keep.  We’d also like to add the benefits of cloud, like balancing where 
> collections are placed and the ability to split large collections.
> 
> Is anyone successfully running Solr 7x in cloud mode with thousands or more 
> of collections?  Are there some configurations we should be taking a closer 
> look at to make this feasible?  Should we try a different replica type?  (We 
> do want NRT-like query latency, but we also index heavily – this cluster will 
> have 10’s of millions of documents.)
> 
> I should note that the problems are not due to the number of documents – the 
> problems occur on a new cluster while we’re creating the collections we know 
> we’ll need.
> 
> Monica Skidmore
> 
> 
> 


Re: Large Number of Collections takes down Solr 7.3

2019-01-22 Thread Shawn Heisey

On 1/22/2019 2:43 PM, Monica Skidmore wrote:

Is anyone successfully running Solr 7x in cloud mode with thousands or more of 
collections?  Are there some configurations we should be taking a closer look 
at to make this feasible?  Should we try a different replica type?  (We do want 
NRT-like query latency, but we also index heavily – this cluster will have 10’s 
of millions of documents.)


That many collections will overwhelm SolrCloud.  This issue is marked as 
fixed, but it's actually not fixed:


https://issues.apache.org/jira/browse/SOLR-7191

SolrCloud simply will not scale to that many collections. I wish I had 
better news for you.  I would like to be able to solve the problem, but 
I am not familiar with that particular code.  Getting familiar with the 
code is a major undertaking.


Thanks,
Shawn



Large Number of Collections takes down Solr 7.3

2019-01-22 Thread Monica Skidmore
We have been running Solr 5.4 in master-slave mode with ~4500 cores for a 
couple of years very successfully.  The cores represent individual customer 
data, so they can vary greatly in size, and some of them have gotten too large 
to be manageable.

We are trying to upgrade to Solr 7.3 in cloud mode, with ~4500 collections, 2  
NRTreplicas total per collection.  We have experimented with additional servers 
and ZK nodes as a part of this move.  We can create up to ~4000 collections, 
with a slow-down to ~20s per collection to create, but if we go much beyond 
that, the time to create collections shoots up, some collections fail to be 
created, and we see some of the nodes crash.  Autoscaling brings nodes back 
into the cluster, but they don’t have all the replicas created on them that 
they should – we’re pretty sure this is related to the challenge of adding the 
large number of collections on those node as they come up.

There are some approaches we could take that don’t separate our customers into 
collections, but we get some benefits from this approach that we’d like to 
keep.  We’d also like to add the benefits of cloud, like balancing where 
collections are placed and the ability to split large collections.

Is anyone successfully running Solr 7x in cloud mode with thousands or more of 
collections?  Are there some configurations we should be taking a closer look 
at to make this feasible?  Should we try a different replica type?  (We do want 
NRT-like query latency, but we also index heavily – this cluster will have 10’s 
of millions of documents.)

I should note that the problems are not due to the number of documents – the 
problems occur on a new cluster while we’re creating the collections we know 
we’ll need.

Monica Skidmore