Re: How to Prevent Recovery?

2020-09-08 Thread Anshuman Singh
Hi,

I noticed that when I created TLOG Replicas using ADDREPLICA API, I called
the API parallely for all the shards, because of which all the replicas
were created on a single node i.e. replicas were not distributed evenly
across the nodes.

After fixing that, getting better indexing performance than NRT replicas,
also not facing any recovery issue.

Thanks,
Anshuman

On Mon, Aug 31, 2020 at 1:02 PM Dominique Bejean 
wrote:

> Hi,
>
> Even if it is not the root cause, I suggest to try to respect some basic
> best practices and so not have "2 Zk running on the
> same nodes where Solr is running". Maybe you can achieve this by just
> stopping these 2 Zk (and move them later). Did you increase
> ZK_CLIENT_TIMEOUT to 3 ?
>
> Did you check your GC logs ? Any consecutive full GC ? How big is your Solr
> heap size ? Not too big ?
>
> The last time I saw such long commits, it was due to slow segment merges
> related docValues and dynamicfield. Are you intensively using DynamicFields
> with docValues ?
>
> Can you enable Lucene detailed debug information
> (true) ?
>
> https://lucene.apache.org/solr/guide/8_5/indexconfig-in-solrconfig.html#other-indexing-settings
>
> With these Lucene debug information, are there any lines like this in your
> logs ?
>
> 2020-05-03 16:22:38.139 INFO  (qtp1837543557-787) [   x:###]
> o.a.s.u.LoggingInfoStream [MS][qtp1837543557-787]: too many merges;
> stalling...
> 2020-05-03 16:24:58.318 INFO  (commitScheduler-19-thread-1) [   x:###]
> o.a.s.u.DirectUpdateHandler2 start
>
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
> 2020-05-03 16:24:59.005 INFO  (commitScheduler-19-thread-1) [   x:###]
> o.a.s.u.LoggingInfoStream [MS][commitScheduler-19-thread-1]: too many
> merges; stalling...
> 2020-05-03 16:31:31.402 INFO  (Lucene Merge Thread #55) [   x:###]
> o.a.s.u.LoggingInfoStream [SM][Lucene Merge Thread #55]: 1291879 msec to
> merge doc values [464265 docs]
>
>
> Regards
>
> Dominique
>
>
>
>
>
> Le dim. 30 août 2020 à 20:44, Anshuman Singh  a
> écrit :
>
> > Hi,
> >
> > I changed all the replicas, 50x2, from NRT to TLOG by adding TLOG
> replicas
> > using the ADDREPLICA API and then deleting the NRT replicas.
> > But now, these replicas are going into recovery even more frequently
> during
> > indexing. Same errors are observed.
> > Also, commit is taking a lot of time compared to NRT replicas.
> > Can this be due to the fact that most of the indexes are on disk and not
> in
> > RAM, and therefore copying index from leader is causing high disk
> > utilisation and causing poor performance?
> > Do I need to tweak the auto commit settings? Right now it is 30 seconds
> max
> > time and 100k max docs.
> >
> > Regards,
> > Anshuman
> >
> > On Tue, Aug 25, 2020 at 10:23 PM Erick Erickson  >
> > wrote:
> >
> > > Commits should absolutely not be taking that much time, that’s where
> I’d
> > > focus first.
> > >
> > > Some sneaky places things go wonky:
> > > 1> you have  suggester configured that builds whenever there’s a
> commit.
> > > 2> you send commits from the client
> > > 3> you’re optimizing on commit
> > > 4> you have too much data for your hardware
> > >
> > > My guess though is that the root cause of your recovery is that the
> > > followers
> > > get backed up. If there are enough merge threads running, the
> > > next update can block until at least one is done. Then the scenario
> > > goes something like this:
> > >
> > > leader sends doc to follower
> > > follower does not index the document in time
> > > leader puts follower into “leader initiated recovery”.
> > >
> > > So one thing to look for if that scenario is correct is whether there
> are
> > > messages
> > > in your logs with "leader-initiated recovery” I’d personally grep my
> logs
> > > for
> > >
> > > grep logfile initated | grep recovery | grep leader
> > >
> > > ‘cause I never remember whether that’s the exact form. If it is this,
> you
> > > can
> > > lengthen the timeouts, look particularly for:
> > > • distribUpdateConnTimeout
> > > • distribUpdateSoTimeout
> > >
> > > All that said, your symptoms are consistent with a lot of merging going
> > > on. With NRT
> > > nodes, all replicas do all indexing and thus merging. Have you
> considered
> > > using TLOG/PULL replicas? In your case they could even all be TLOG
> > > replicas. In that
> > > case, only the leader does the indexing, the other TLOG replicas of a
> > > shard just stuff
> > > the documents into their local tlogs without indexing at all.
> > >
> > > Speaking of which, you could reduce some of the disk pressure if you
> can
> > > put your
> > > tlogs on another drive, don’t know if that’s possible. Ditto the Solr
> > logs.
> > >
> > > Beyond that, it may be a matter of increasing the hardware. You’re
> really
> > > indexing
> > > 120K records second ((1 leader + 2 followers) * 40K)/sec.
> > >
> > > Best,
> > > Erick
> > >
> > > > On Aug 25, 2020, at 12:0

Re: How to Prevent Recovery?

2020-08-31 Thread Dominique Bejean
Hi,

Even if it is not the root cause, I suggest to try to respect some basic
best practices and so not have "2 Zk running on the
same nodes where Solr is running". Maybe you can achieve this by just
stopping these 2 Zk (and move them later). Did you increase
ZK_CLIENT_TIMEOUT to 3 ?

Did you check your GC logs ? Any consecutive full GC ? How big is your Solr
heap size ? Not too big ?

The last time I saw such long commits, it was due to slow segment merges
related docValues and dynamicfield. Are you intensively using DynamicFields
with docValues ?

Can you enable Lucene detailed debug information
(true) ?
https://lucene.apache.org/solr/guide/8_5/indexconfig-in-solrconfig.html#other-indexing-settings

With these Lucene debug information, are there any lines like this in your
logs ?

2020-05-03 16:22:38.139 INFO  (qtp1837543557-787) [   x:###]
o.a.s.u.LoggingInfoStream [MS][qtp1837543557-787]: too many merges;
stalling...
2020-05-03 16:24:58.318 INFO  (commitScheduler-19-thread-1) [   x:###]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
2020-05-03 16:24:59.005 INFO  (commitScheduler-19-thread-1) [   x:###]
o.a.s.u.LoggingInfoStream [MS][commitScheduler-19-thread-1]: too many
merges; stalling...
2020-05-03 16:31:31.402 INFO  (Lucene Merge Thread #55) [   x:###]
o.a.s.u.LoggingInfoStream [SM][Lucene Merge Thread #55]: 1291879 msec to
merge doc values [464265 docs]


Regards

Dominique





Le dim. 30 août 2020 à 20:44, Anshuman Singh  a
écrit :

> Hi,
>
> I changed all the replicas, 50x2, from NRT to TLOG by adding TLOG replicas
> using the ADDREPLICA API and then deleting the NRT replicas.
> But now, these replicas are going into recovery even more frequently during
> indexing. Same errors are observed.
> Also, commit is taking a lot of time compared to NRT replicas.
> Can this be due to the fact that most of the indexes are on disk and not in
> RAM, and therefore copying index from leader is causing high disk
> utilisation and causing poor performance?
> Do I need to tweak the auto commit settings? Right now it is 30 seconds max
> time and 100k max docs.
>
> Regards,
> Anshuman
>
> On Tue, Aug 25, 2020 at 10:23 PM Erick Erickson 
> wrote:
>
> > Commits should absolutely not be taking that much time, that’s where I’d
> > focus first.
> >
> > Some sneaky places things go wonky:
> > 1> you have  suggester configured that builds whenever there’s a commit.
> > 2> you send commits from the client
> > 3> you’re optimizing on commit
> > 4> you have too much data for your hardware
> >
> > My guess though is that the root cause of your recovery is that the
> > followers
> > get backed up. If there are enough merge threads running, the
> > next update can block until at least one is done. Then the scenario
> > goes something like this:
> >
> > leader sends doc to follower
> > follower does not index the document in time
> > leader puts follower into “leader initiated recovery”.
> >
> > So one thing to look for if that scenario is correct is whether there are
> > messages
> > in your logs with "leader-initiated recovery” I’d personally grep my logs
> > for
> >
> > grep logfile initated | grep recovery | grep leader
> >
> > ‘cause I never remember whether that’s the exact form. If it is this, you
> > can
> > lengthen the timeouts, look particularly for:
> > • distribUpdateConnTimeout
> > • distribUpdateSoTimeout
> >
> > All that said, your symptoms are consistent with a lot of merging going
> > on. With NRT
> > nodes, all replicas do all indexing and thus merging. Have you considered
> > using TLOG/PULL replicas? In your case they could even all be TLOG
> > replicas. In that
> > case, only the leader does the indexing, the other TLOG replicas of a
> > shard just stuff
> > the documents into their local tlogs without indexing at all.
> >
> > Speaking of which, you could reduce some of the disk pressure if you can
> > put your
> > tlogs on another drive, don’t know if that’s possible. Ditto the Solr
> logs.
> >
> > Beyond that, it may be a matter of increasing the hardware. You’re really
> > indexing
> > 120K records second ((1 leader + 2 followers) * 40K)/sec.
> >
> > Best,
> > Erick
> >
> > > On Aug 25, 2020, at 12:02 PM, Anshuman Singh <
> singhanshuma...@gmail.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > We have a 10 node (150G RAM, 1TB SAS HDD, 32 cores) Solr 8.5.1 cluster
> > with
> > > 50 shards, rf 2 (NRT replicas), 7B docs, We have 5 Zk with 2 running on
> > the
> > > same nodes where Solr is running. Our use case requires continuous
> > > ingestions (updates mostly). If we ingest at 40k records per sec, after
> > > 10-15mins some replicas go into recovery with the errors observed given
> > in
> > > the end. We also observed high CPU during these ingestions (60-70%) and
> > > disks frequently reach 100% utilization.
> > >
> > > We know our hardware is limited but this system will be

Re: How to Prevent Recovery?

2020-08-30 Thread Anshuman Singh
Hi,

I changed all the replicas, 50x2, from NRT to TLOG by adding TLOG replicas
using the ADDREPLICA API and then deleting the NRT replicas.
But now, these replicas are going into recovery even more frequently during
indexing. Same errors are observed.
Also, commit is taking a lot of time compared to NRT replicas.
Can this be due to the fact that most of the indexes are on disk and not in
RAM, and therefore copying index from leader is causing high disk
utilisation and causing poor performance?
Do I need to tweak the auto commit settings? Right now it is 30 seconds max
time and 100k max docs.

Regards,
Anshuman

On Tue, Aug 25, 2020 at 10:23 PM Erick Erickson 
wrote:

> Commits should absolutely not be taking that much time, that’s where I’d
> focus first.
>
> Some sneaky places things go wonky:
> 1> you have  suggester configured that builds whenever there’s a commit.
> 2> you send commits from the client
> 3> you’re optimizing on commit
> 4> you have too much data for your hardware
>
> My guess though is that the root cause of your recovery is that the
> followers
> get backed up. If there are enough merge threads running, the
> next update can block until at least one is done. Then the scenario
> goes something like this:
>
> leader sends doc to follower
> follower does not index the document in time
> leader puts follower into “leader initiated recovery”.
>
> So one thing to look for if that scenario is correct is whether there are
> messages
> in your logs with "leader-initiated recovery” I’d personally grep my logs
> for
>
> grep logfile initated | grep recovery | grep leader
>
> ‘cause I never remember whether that’s the exact form. If it is this, you
> can
> lengthen the timeouts, look particularly for:
> • distribUpdateConnTimeout
> • distribUpdateSoTimeout
>
> All that said, your symptoms are consistent with a lot of merging going
> on. With NRT
> nodes, all replicas do all indexing and thus merging. Have you considered
> using TLOG/PULL replicas? In your case they could even all be TLOG
> replicas. In that
> case, only the leader does the indexing, the other TLOG replicas of a
> shard just stuff
> the documents into their local tlogs without indexing at all.
>
> Speaking of which, you could reduce some of the disk pressure if you can
> put your
> tlogs on another drive, don’t know if that’s possible. Ditto the Solr logs.
>
> Beyond that, it may be a matter of increasing the hardware. You’re really
> indexing
> 120K records second ((1 leader + 2 followers) * 40K)/sec.
>
> Best,
> Erick
>
> > On Aug 25, 2020, at 12:02 PM, Anshuman Singh 
> wrote:
> >
> > Hi,
> >
> > We have a 10 node (150G RAM, 1TB SAS HDD, 32 cores) Solr 8.5.1 cluster
> with
> > 50 shards, rf 2 (NRT replicas), 7B docs, We have 5 Zk with 2 running on
> the
> > same nodes where Solr is running. Our use case requires continuous
> > ingestions (updates mostly). If we ingest at 40k records per sec, after
> > 10-15mins some replicas go into recovery with the errors observed given
> in
> > the end. We also observed high CPU during these ingestions (60-70%) and
> > disks frequently reach 100% utilization.
> >
> > We know our hardware is limited but this system will be used by only a
> few
> > users and search times taking a few minutes and slow ingestions are fine
> so
> > we are trying to run with these specifications for now but recovery is
> > becoming a bottleneck.
> >
> > So to prevent recovery which I'm thinking could be due to high CPU/Disk
> > during ingestions, we reduced the data rate to 10k records per sec. Now
> CPU
> > usage is not high and recovery is not that frequent but it can happen in
> a
> > long run of 2-3 hrs. We further reduced the rate to 4k records per sec
> but
> > again it happened after 3-4 hrs. Logs were filled with the below error on
> > the instance on which recovery happened. Seems like reducing data rate is
> > not helping with recovery.
> >
> > *2020-08-25 12:16:11.008 ERROR (qtp1546693040-235) [c:collection
> s:shard41
> > r:core_node565 x:collection_shard41_replica_n562] o.a.s.s.HttpSolrCall
> > null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
> > timeout expired: 30/30 ms*
> >
> > Solr thread dump showed commit threads taking upto 10-15 minutes.
> Currently
> > auto commit happens at 10M docs or 30seconds.
> >
> > Can someone point me in the right direction? Also can we perform
> > core-binding for Solr processes?
> >
> > *2020-08-24 12:32:55.835 WARN  (zkConnectionManagerCallback-11-thread-1)
> [
> >  ] o.a.s.c.c.ConnectionManager Watcher
> > org.apache.solr.common.cloud.ConnectionManager@372ea2bc name:
> > ZooKeeperConnection Watcher:x.x.x.7:2181,x.x.x.8:2181/solr got event
> > WatchedEvent state:Disconnected type:None path:null path: null type:
> None*
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *2020-08-24 12:41:02.005 WARN  (main-SendThread(x.x.x.8:2181)) [   ]
> > o.a.z.ClientCnxn Unable to reconnect to ZooKeeper service, session
> > 0

Re: How to Prevent Recovery?

2020-08-25 Thread Erick Erickson
Commits should absolutely not be taking that much time, that’s where I’d focus 
first.

Some sneaky places things go wonky:
1> you have  suggester configured that builds whenever there’s a commit.
2> you send commits from the client
3> you’re optimizing on commit
4> you have too much data for your hardware

My guess though is that the root cause of your recovery is that the followers
get backed up. If there are enough merge threads running, the
next update can block until at least one is done. Then the scenario
goes something like this:

leader sends doc to follower
follower does not index the document in time
leader puts follower into “leader initiated recovery”.

So one thing to look for if that scenario is correct is whether there are 
messages
in your logs with "leader-initiated recovery” I’d personally grep my logs for

grep logfile initated | grep recovery | grep leader

‘cause I never remember whether that’s the exact form. If it is this, you can
lengthen the timeouts, look particularly for:
• distribUpdateConnTimeout
• distribUpdateSoTimeout

All that said, your symptoms are consistent with a lot of merging going on. 
With NRT
nodes, all replicas do all indexing and thus merging. Have you considered
using TLOG/PULL replicas? In your case they could even all be TLOG replicas. In 
that
case, only the leader does the indexing, the other TLOG replicas of a shard 
just stuff
the documents into their local tlogs without indexing at all.

Speaking of which, you could reduce some of the disk pressure if you can put 
your
tlogs on another drive, don’t know if that’s possible. Ditto the Solr logs.

Beyond that, it may be a matter of increasing the hardware. You’re really 
indexing
120K records second ((1 leader + 2 followers) * 40K)/sec.

Best,
Erick

> On Aug 25, 2020, at 12:02 PM, Anshuman Singh  
> wrote:
> 
> Hi,
> 
> We have a 10 node (150G RAM, 1TB SAS HDD, 32 cores) Solr 8.5.1 cluster with
> 50 shards, rf 2 (NRT replicas), 7B docs, We have 5 Zk with 2 running on the
> same nodes where Solr is running. Our use case requires continuous
> ingestions (updates mostly). If we ingest at 40k records per sec, after
> 10-15mins some replicas go into recovery with the errors observed given in
> the end. We also observed high CPU during these ingestions (60-70%) and
> disks frequently reach 100% utilization.
> 
> We know our hardware is limited but this system will be used by only a few
> users and search times taking a few minutes and slow ingestions are fine so
> we are trying to run with these specifications for now but recovery is
> becoming a bottleneck.
> 
> So to prevent recovery which I'm thinking could be due to high CPU/Disk
> during ingestions, we reduced the data rate to 10k records per sec. Now CPU
> usage is not high and recovery is not that frequent but it can happen in a
> long run of 2-3 hrs. We further reduced the rate to 4k records per sec but
> again it happened after 3-4 hrs. Logs were filled with the below error on
> the instance on which recovery happened. Seems like reducing data rate is
> not helping with recovery.
> 
> *2020-08-25 12:16:11.008 ERROR (qtp1546693040-235) [c:collection s:shard41
> r:core_node565 x:collection_shard41_replica_n562] o.a.s.s.HttpSolrCall
> null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
> timeout expired: 30/30 ms*
> 
> Solr thread dump showed commit threads taking upto 10-15 minutes. Currently
> auto commit happens at 10M docs or 30seconds.
> 
> Can someone point me in the right direction? Also can we perform
> core-binding for Solr processes?
> 
> *2020-08-24 12:32:55.835 WARN  (zkConnectionManagerCallback-11-thread-1) [
>  ] o.a.s.c.c.ConnectionManager Watcher
> org.apache.solr.common.cloud.ConnectionManager@372ea2bc name:
> ZooKeeperConnection Watcher:x.x.x.7:2181,x.x.x.8:2181/solr got event
> WatchedEvent state:Disconnected type:None path:null path: null type: None*
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> *2020-08-24 12:41:02.005 WARN  (main-SendThread(x.x.x.8:2181)) [   ]
> o.a.z.ClientCnxn Unable to reconnect to ZooKeeper service, session
> 0x273f9a8fb229269 has expired2020-08-24 12:41:06.177 WARN
> (MetricsHistoryHandler-8-thread-1) [   ] o.a.s.h.a.MetricsHistoryHandler
> Could not obtain overseer's address, skipping. =>
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /overseer_elect/leaderat
> org.apache.zookeeper.KeeperException.create(KeeperException.java:134)org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /overseer_elect/leaderat
> org.apache.zookeeper.KeeperException.create(KeeperException.java:134)
> ~[?:?]at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
> ~[?:?]at
> org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2131)
> ~[?:?]2020-08-24 12:41:13.365 WARN
> (zkConnectionManagerCallback-11-thread-1) [   ]
> o.a.s.c.c.ConnectionM

Re: How to Prevent Recovery?

2020-08-25 Thread Houston Putman
Are you able to use TLOG replicas? That should reduce the time it takes to
recover significantly. It doesn't seem like you have a hard need for
near-real-time, since slow ingestions are fine.

- Houston

On Tue, Aug 25, 2020 at 12:03 PM Anshuman Singh 
wrote:

> Hi,
>
> We have a 10 node (150G RAM, 1TB SAS HDD, 32 cores) Solr 8.5.1 cluster with
> 50 shards, rf 2 (NRT replicas), 7B docs, We have 5 Zk with 2 running on the
> same nodes where Solr is running. Our use case requires continuous
> ingestions (updates mostly). If we ingest at 40k records per sec, after
> 10-15mins some replicas go into recovery with the errors observed given in
> the end. We also observed high CPU during these ingestions (60-70%) and
> disks frequently reach 100% utilization.
>
> We know our hardware is limited but this system will be used by only a few
> users and search times taking a few minutes and slow ingestions are fine so
> we are trying to run with these specifications for now but recovery is
> becoming a bottleneck.
>
> So to prevent recovery which I'm thinking could be due to high CPU/Disk
> during ingestions, we reduced the data rate to 10k records per sec. Now CPU
> usage is not high and recovery is not that frequent but it can happen in a
> long run of 2-3 hrs. We further reduced the rate to 4k records per sec but
> again it happened after 3-4 hrs. Logs were filled with the below error on
> the instance on which recovery happened. Seems like reducing data rate is
> not helping with recovery.
>
> *2020-08-25 12:16:11.008 ERROR (qtp1546693040-235) [c:collection s:shard41
> r:core_node565 x:collection_shard41_replica_n562] o.a.s.s.HttpSolrCall
> null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
> timeout expired: 30/30 ms*
>
> Solr thread dump showed commit threads taking upto 10-15 minutes. Currently
> auto commit happens at 10M docs or 30seconds.
>
> Can someone point me in the right direction? Also can we perform
> core-binding for Solr processes?
>
> *2020-08-24 12:32:55.835 WARN  (zkConnectionManagerCallback-11-thread-1) [
>   ] o.a.s.c.c.ConnectionManager Watcher
> org.apache.solr.common.cloud.ConnectionManager@372ea2bc name:
> ZooKeeperConnection Watcher:x.x.x.7:2181,x.x.x.8:2181/solr got event
> WatchedEvent state:Disconnected type:None path:null path: null type: None*
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *2020-08-24 12:41:02.005 WARN  (main-SendThread(x.x.x.8:2181)) [   ]
> o.a.z.ClientCnxn Unable to reconnect to ZooKeeper service, session
> 0x273f9a8fb229269 has expired2020-08-24 12:41:06.177 WARN
>  (MetricsHistoryHandler-8-thread-1) [   ] o.a.s.h.a.MetricsHistoryHandler
> Could not obtain overseer's address, skipping. =>
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /overseer_elect/leaderat
>
> org.apache.zookeeper.KeeperException.create(KeeperException.java:134)org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /overseer_elect/leaderat
> org.apache.zookeeper.KeeperException.create(KeeperException.java:134)
> ~[?:?]at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
> ~[?:?]at
> org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2131)
> ~[?:?]2020-08-24 12:41:13.365 WARN
>  (zkConnectionManagerCallback-11-thread-1) [   ]
> o.a.s.c.c.ConnectionManager Watcher
> org.apache.solr.common.cloud.ConnectionManager@372ea2bc name:
> ZooKeeperConnection Watcher:x.x.x.7:2181,x.x.x.8:2181/solr got event
> WatchedEvent state:Expired type:None path:null path: null type:
> None2020-08-24 12:41:13.366 WARN  (zkConnectionManagerCallback-11-thread-1)
> [   ] o.a.s.c.c.ConnectionManager Our previous ZooKeeper session was
> expired. Attempting to reconnect to recover relationship with
> ZooKeeper...2020-08-24 12:41:16.705 ERROR (qtp1546693040-163255)
> [c:collection s:shard31 r:core_node525 x:collection_shard31_replica_n522]
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Cannot
> talk to ZooKeeper - Updates are disabled*
>