Solr NRT Replicas Out of Sync

2021-03-03 Thread Anshuman Singh
Hi,

In our Solr 7.4 cluster, we have noticed that some replicas of some of our
Collections are out of sync, the slave replica has more number of records
than the leader.
This is resulting in different number of records on subsequent queries on
the same Collection. Commit is also not helping in this case.

I'm able to replicate the issue using the steps given below:

   1. Create a collection with 1 shard and 2 rf
   2. Ingest 10k records in the collection
   3. Turn down node with replica 2
   4. Ingest 10k records in the collection
   5. Turn down replica 1
   6. Turn up replica 2, wait till it become leader
   7. Ingest 20k records on replica 2
   8. Turn down replica 2
   9. Turn up replica 1, wait till it become leader or use FORCELEADER
   action of Collections API
   10. Turn up replica 2
   11. Now replica 2 has 30k records and replica 1 has 20k records and they
   never sync

I tried the same steps with TLOG replicas and in that case both replicas
had 20k records in the end and were in sync but 10k records were lost.

Is there any way to sync the replicas? I am looking for a lightweight
solution that doesn't require re-creating the index.

Regards,
Anshuman


Re: How to Prevent Recovery?

2020-09-08 Thread Anshuman Singh
Hi,

I noticed that when I created TLOG Replicas using ADDREPLICA API, I called
the API parallely for all the shards, because of which all the replicas
were created on a single node i.e. replicas were not distributed evenly
across the nodes.

After fixing that, getting better indexing performance than NRT replicas,
also not facing any recovery issue.

Thanks,
Anshuman

On Mon, Aug 31, 2020 at 1:02 PM Dominique Bejean 
wrote:

> Hi,
>
> Even if it is not the root cause, I suggest to try to respect some basic
> best practices and so not have "2 Zk running on the
> same nodes where Solr is running". Maybe you can achieve this by just
> stopping these 2 Zk (and move them later). Did you increase
> ZK_CLIENT_TIMEOUT to 3 ?
>
> Did you check your GC logs ? Any consecutive full GC ? How big is your Solr
> heap size ? Not too big ?
>
> The last time I saw such long commits, it was due to slow segment merges
> related docValues and dynamicfield. Are you intensively using DynamicFields
> with docValues ?
>
> Can you enable Lucene detailed debug information
> (true) ?
>
> https://lucene.apache.org/solr/guide/8_5/indexconfig-in-solrconfig.html#other-indexing-settings
>
> With these Lucene debug information, are there any lines like this in your
> logs ?
>
> 2020-05-03 16:22:38.139 INFO  (qtp1837543557-787) [   x:###]
> o.a.s.u.LoggingInfoStream [MS][qtp1837543557-787]: too many merges;
> stalling...
> 2020-05-03 16:24:58.318 INFO  (commitScheduler-19-thread-1) [   x:###]
> o.a.s.u.DirectUpdateHandler2 start
>
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
> 2020-05-03 16:24:59.005 INFO  (commitScheduler-19-thread-1) [   x:###]
> o.a.s.u.LoggingInfoStream [MS][commitScheduler-19-thread-1]: too many
> merges; stalling...
> 2020-05-03 16:31:31.402 INFO  (Lucene Merge Thread #55) [   x:###]
> o.a.s.u.LoggingInfoStream [SM][Lucene Merge Thread #55]: 1291879 msec to
> merge doc values [464265 docs]
>
>
> Regards
>
> Dominique
>
>
>
>
>
> Le dim. 30 août 2020 à 20:44, Anshuman Singh  a
> écrit :
>
> > Hi,
> >
> > I changed all the replicas, 50x2, from NRT to TLOG by adding TLOG
> replicas
> > using the ADDREPLICA API and then deleting the NRT replicas.
> > But now, these replicas are going into recovery even more frequently
> during
> > indexing. Same errors are observed.
> > Also, commit is taking a lot of time compared to NRT replicas.
> > Can this be due to the fact that most of the indexes are on disk and not
> in
> > RAM, and therefore copying index from leader is causing high disk
> > utilisation and causing poor performance?
> > Do I need to tweak the auto commit settings? Right now it is 30 seconds
> max
> > time and 100k max docs.
> >
> > Regards,
> > Anshuman
> >
> > On Tue, Aug 25, 2020 at 10:23 PM Erick Erickson  >
> > wrote:
> >
> > > Commits should absolutely not be taking that much time, that’s where
> I’d
> > > focus first.
> > >
> > > Some sneaky places things go wonky:
> > > 1> you have  suggester configured that builds whenever there’s a
> commit.
> > > 2> you send commits from the client
> > > 3> you’re optimizing on commit
> > > 4> you have too much data for your hardware
> > >
> > > My guess though is that the root cause of your recovery is that the
> > > followers
> > > get backed up. If there are enough merge threads running, the
> > > next update can block until at least one is done. Then the scenario
> > > goes something like this:
> > >
> > > leader sends doc to follower
> > > follower does not index the document in time
> > > leader puts follower into “leader initiated recovery”.
> > >
> > > So one thing to look for if that scenario is correct is whether there
> are
> > > messages
> > > in your logs with "leader-initiated recovery” I’d personally grep my
> logs
> > > for
> > >
> > > grep logfile initated | grep recovery | grep leader
> > >
> > > ‘cause I never remember whether that’s the exact form. If it is this,
> you
> > > can
> > > lengthen the timeouts, look particularly for:
> > > • distribUpdateConnTimeout
> > > • distribUpdateSoTimeout
> > >
> > > All that said, your symptoms are consistent with a lot of merging going
> > > on. With NRT
> > > nodes, all replicas do all indexing and thus merging. Have you
> considered
> > > using TLOG/PULL replicas? In your 

Re: How to Prevent Recovery?

2020-08-30 Thread Anshuman Singh
Hi,

I changed all the replicas, 50x2, from NRT to TLOG by adding TLOG replicas
using the ADDREPLICA API and then deleting the NRT replicas.
But now, these replicas are going into recovery even more frequently during
indexing. Same errors are observed.
Also, commit is taking a lot of time compared to NRT replicas.
Can this be due to the fact that most of the indexes are on disk and not in
RAM, and therefore copying index from leader is causing high disk
utilisation and causing poor performance?
Do I need to tweak the auto commit settings? Right now it is 30 seconds max
time and 100k max docs.

Regards,
Anshuman

On Tue, Aug 25, 2020 at 10:23 PM Erick Erickson 
wrote:

> Commits should absolutely not be taking that much time, that’s where I’d
> focus first.
>
> Some sneaky places things go wonky:
> 1> you have  suggester configured that builds whenever there’s a commit.
> 2> you send commits from the client
> 3> you’re optimizing on commit
> 4> you have too much data for your hardware
>
> My guess though is that the root cause of your recovery is that the
> followers
> get backed up. If there are enough merge threads running, the
> next update can block until at least one is done. Then the scenario
> goes something like this:
>
> leader sends doc to follower
> follower does not index the document in time
> leader puts follower into “leader initiated recovery”.
>
> So one thing to look for if that scenario is correct is whether there are
> messages
> in your logs with "leader-initiated recovery” I’d personally grep my logs
> for
>
> grep logfile initated | grep recovery | grep leader
>
> ‘cause I never remember whether that’s the exact form. If it is this, you
> can
> lengthen the timeouts, look particularly for:
> • distribUpdateConnTimeout
> • distribUpdateSoTimeout
>
> All that said, your symptoms are consistent with a lot of merging going
> on. With NRT
> nodes, all replicas do all indexing and thus merging. Have you considered
> using TLOG/PULL replicas? In your case they could even all be TLOG
> replicas. In that
> case, only the leader does the indexing, the other TLOG replicas of a
> shard just stuff
> the documents into their local tlogs without indexing at all.
>
> Speaking of which, you could reduce some of the disk pressure if you can
> put your
> tlogs on another drive, don’t know if that’s possible. Ditto the Solr logs.
>
> Beyond that, it may be a matter of increasing the hardware. You’re really
> indexing
> 120K records second ((1 leader + 2 followers) * 40K)/sec.
>
> Best,
> Erick
>
> > On Aug 25, 2020, at 12:02 PM, Anshuman Singh 
> wrote:
> >
> > Hi,
> >
> > We have a 10 node (150G RAM, 1TB SAS HDD, 32 cores) Solr 8.5.1 cluster
> with
> > 50 shards, rf 2 (NRT replicas), 7B docs, We have 5 Zk with 2 running on
> the
> > same nodes where Solr is running. Our use case requires continuous
> > ingestions (updates mostly). If we ingest at 40k records per sec, after
> > 10-15mins some replicas go into recovery with the errors observed given
> in
> > the end. We also observed high CPU during these ingestions (60-70%) and
> > disks frequently reach 100% utilization.
> >
> > We know our hardware is limited but this system will be used by only a
> few
> > users and search times taking a few minutes and slow ingestions are fine
> so
> > we are trying to run with these specifications for now but recovery is
> > becoming a bottleneck.
> >
> > So to prevent recovery which I'm thinking could be due to high CPU/Disk
> > during ingestions, we reduced the data rate to 10k records per sec. Now
> CPU
> > usage is not high and recovery is not that frequent but it can happen in
> a
> > long run of 2-3 hrs. We further reduced the rate to 4k records per sec
> but
> > again it happened after 3-4 hrs. Logs were filled with the below error on
> > the instance on which recovery happened. Seems like reducing data rate is
> > not helping with recovery.
> >
> > *2020-08-25 12:16:11.008 ERROR (qtp1546693040-235) [c:collection
> s:shard41
> > r:core_node565 x:collection_shard41_replica_n562] o.a.s.s.HttpSolrCall
> > null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
> > timeout expired: 30/30 ms*
> >
> > Solr thread dump showed commit threads taking upto 10-15 minutes.
> Currently
> > auto commit happens at 10M docs or 30seconds.
> >
> > Can someone point me in the right direction? Also can we perform
> > core-binding for Solr processes?
> >
> > *2020-08-24 12:32:55.835 WARN  (zkConnectionManagerCallback-11-thread-1)
> [
> >  ] o.a.s.c.c.Connecti

How to Prevent Recovery?

2020-08-25 Thread Anshuman Singh
Hi,

We have a 10 node (150G RAM, 1TB SAS HDD, 32 cores) Solr 8.5.1 cluster with
50 shards, rf 2 (NRT replicas), 7B docs, We have 5 Zk with 2 running on the
same nodes where Solr is running. Our use case requires continuous
ingestions (updates mostly). If we ingest at 40k records per sec, after
10-15mins some replicas go into recovery with the errors observed given in
the end. We also observed high CPU during these ingestions (60-70%) and
disks frequently reach 100% utilization.

We know our hardware is limited but this system will be used by only a few
users and search times taking a few minutes and slow ingestions are fine so
we are trying to run with these specifications for now but recovery is
becoming a bottleneck.

So to prevent recovery which I'm thinking could be due to high CPU/Disk
during ingestions, we reduced the data rate to 10k records per sec. Now CPU
usage is not high and recovery is not that frequent but it can happen in a
long run of 2-3 hrs. We further reduced the rate to 4k records per sec but
again it happened after 3-4 hrs. Logs were filled with the below error on
the instance on which recovery happened. Seems like reducing data rate is
not helping with recovery.

*2020-08-25 12:16:11.008 ERROR (qtp1546693040-235) [c:collection s:shard41
r:core_node565 x:collection_shard41_replica_n562] o.a.s.s.HttpSolrCall
null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
timeout expired: 30/30 ms*

Solr thread dump showed commit threads taking upto 10-15 minutes. Currently
auto commit happens at 10M docs or 30seconds.

Can someone point me in the right direction? Also can we perform
core-binding for Solr processes?

*2020-08-24 12:32:55.835 WARN  (zkConnectionManagerCallback-11-thread-1) [
  ] o.a.s.c.c.ConnectionManager Watcher
org.apache.solr.common.cloud.ConnectionManager@372ea2bc name:
ZooKeeperConnection Watcher:x.x.x.7:2181,x.x.x.8:2181/solr got event
WatchedEvent state:Disconnected type:None path:null path: null type: None*














*2020-08-24 12:41:02.005 WARN  (main-SendThread(x.x.x.8:2181)) [   ]
o.a.z.ClientCnxn Unable to reconnect to ZooKeeper service, session
0x273f9a8fb229269 has expired2020-08-24 12:41:06.177 WARN
 (MetricsHistoryHandler-8-thread-1) [   ] o.a.s.h.a.MetricsHistoryHandler
Could not obtain overseer's address, skipping. =>
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /overseer_elect/leaderat
org.apache.zookeeper.KeeperException.create(KeeperException.java:134)org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /overseer_elect/leaderat
org.apache.zookeeper.KeeperException.create(KeeperException.java:134)
~[?:?]at
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
~[?:?]at
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2131)
~[?:?]2020-08-24 12:41:13.365 WARN
 (zkConnectionManagerCallback-11-thread-1) [   ]
o.a.s.c.c.ConnectionManager Watcher
org.apache.solr.common.cloud.ConnectionManager@372ea2bc name:
ZooKeeperConnection Watcher:x.x.x.7:2181,x.x.x.8:2181/solr got event
WatchedEvent state:Expired type:None path:null path: null type:
None2020-08-24 12:41:13.366 WARN  (zkConnectionManagerCallback-11-thread-1)
[   ] o.a.s.c.c.ConnectionManager Our previous ZooKeeper session was
expired. Attempting to reconnect to recover relationship with
ZooKeeper...2020-08-24 12:41:16.705 ERROR (qtp1546693040-163255)
[c:collection s:shard31 r:core_node525 x:collection_shard31_replica_n522]
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Cannot
talk to ZooKeeper - Updates are disabled*


Re: Replicas in Recovery During Atomic Updates

2020-08-19 Thread Anshuman Singh
Hi,

Anyone has any idea about this issue?
Apart from the errors in the previous email, facing below errors frequently:

2020-08-19 11:56:09.467 ERROR (qtp1546693040-32) [c:collection_4 s:shard3
r:core_node13 x:collection_4_shard3_replica_n10] o.a.s.u.SolrCmdDistributor
java.io.IOException: Request processing has stalled for 20017ms with 100
remaining elements in the queue.

2020-08-19 11:56:16.243 ERROR (qtp1546693040-72) [c:collection_4 s:shard3
r:core_node13 x:collection_4_shard3_replica_n10] o.a.s.h.RequestHandlerBase
java.io.IOException: Task queue processing has stalled for 20216 ms with 0
remaining elements to process.

2020-08-19 11:56:22.584 ERROR (qtp1546693040-32) [c:collection_4 s:shard3
r:core_node13 x:collection_4_shard3_replica_n10]
o.a.s.u.p.DistributedZkUpdateProcessor Setting up to try to start recovery
on replica core_node11 with url
http://x.x.x.25:8983/solr/collection_4_shard3_replica_n8/ by increasing
leader term => java.io.IOException: Request processing has stalled for
20017ms with 100 remaining elements in the queue.

2020-08-19 11:56:16.064 ERROR (updateExecutor-5-thread-8-processing-null) [
  ] o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling
SolrCmdDistributor$Req:
cmd=delete{_version_=-1675454745405292544,query=`{!cache=false}_expire_at_:[*
TO 2020-08-19T11:55:47.604Z]`,commitWithin=-1}; node=ForwardNode:
http://x.x.x.24:8983/solr/collection_4_shard3_replica_n10/ to
http://x.x.x.24:8983/solr/collection_4_shard3_replica_n10/ =>
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://x.x.x.24:8983/solr/collection_4_shard3_replica_n10/:
null


On Tue, Aug 11, 2020 at 2:08 AM Anshuman Singh 
wrote:

> Just to give you an idea, this is how we are ingesting:
>
> {"id": 1, "field1": {"inc": 20}, "field2": {"inc": 30}, "field3": 40.
> "field4": "some string"}
>
> We are using Solr-8.5.1. We have not configured any update processor. Hard
> commit happens every minute or at 100k docs, soft commit happens every 10
> mins.
> We have an external ZK setup with 5 nodes.
>
> Open files hard/soft limit is 65k and "max user processes" is unlimited.
>
> These are the different ERROR logs I found in the log files:
>
> ERROR (qtp1546693040-2637) [c:collection s:shard27 r:core_node109
> x:collection_shard27_replica_n106] o.a.s.s.HttpSolrCall
> null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
> Async exception during distributed update: java.net.ConnectException:
> Connection refused
>
> ERROR (qtp1546693040-1136) [c:collection s:shard101 r:core_node405
> x:collection_shard101_replica_n402] o.a.s.s.HttpSolrCall
> null:java.io.IOException: java.lang.InterruptedException
>
> ERROR (qtp1546693040-2704) [c:collection s:shard101 r:core_node405
> x:collection_shard101_replica_n402] o.a.s.s.HttpSolrCall
> null:org.eclipse.jetty.io.EofException: Reset cancel_stream_error
>
> ERROR (qtp1546693040-1344) [c:collection s:shard20 r:core_node79
> x:collection_shard20_replica_n76] o.a.s.h.RequestHandlerBase
> org.apache.solr.common.SolrException: No registered leader was found after
> waiting for 4000ms , collection: collection slice: shard48 saw
> state=DocCollection(collection//collections/collection/state.json/96434)={
>
> ERROR (qtp1546693040-2928) [c:collection s:shard80 r:core_node319
> x:collection_shard80_replica_n316] o.a.s.h.RequestHandlerBase
> org.apache.solr.common.SolrException: Request says it is coming from
> leader, but we are the leader
>
> ERROR (updateExecutor-5-thread-47-processing-n:192.100.20.19:8985_solr
> x:collection_shard161_replica_n641 c:collection s:shard161 r:core_node646)
> [c:collection s:shard161 r:core_node646 x:collection_shard161_replica_n641]
> o.a.s.u.SolrCmdDistributor
> org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException:
> Error from server at null: Expected mime type application/octet-stream but
> got application/json
>
> ERROR (recoveryExecutor-7-thread-16-processing-n:192.100.20.33:8984_solr
> x:collection_shard80_replica_n47 c:collection s:shard80 r:core_node48)
> [c:collection s:shard80 r:core_node48 x:collection_shard80_replica_n47]
> o.a.s.c.RecoveryStrategy Error while trying to recover.
> core=collection_shard80_replica_n47:java.util.concurrent.ExecutionException:
> org.apache.solr.client.solrj.SolrServerException: IOException occurred when
> talking to server at: http://192.100.20.34:8984/solr
>
> ERROR (zkCallback-10-thread-22) [c:collection s:shard19 r:core_node322
> x:collection_shard19_replica_n321] o.a.s.c.ShardLeaderElectionContext There
> was a problem trying to register as the
> leader:org.apache.solr.common.AlreadyClosedException
>
> ERROR
> (O

Re: Replicas in Recovery During Atomic Updates

2020-08-10 Thread Anshuman Singh
Just to give you an idea, this is how we are ingesting:

{"id": 1, "field1": {"inc": 20}, "field2": {"inc": 30}, "field3": 40.
"field4": "some string"}

We are using Solr-8.5.1. We have not configured any update processor. Hard
commit happens every minute or at 100k docs, soft commit happens every 10
mins.
We have an external ZK setup with 5 nodes.

Open files hard/soft limit is 65k and "max user processes" is unlimited.

These are the different ERROR logs I found in the log files:

ERROR (qtp1546693040-2637) [c:collection s:shard27 r:core_node109
x:collection_shard27_replica_n106] o.a.s.s.HttpSolrCall
null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
Async exception during distributed update: java.net.ConnectException:
Connection refused

ERROR (qtp1546693040-1136) [c:collection s:shard101 r:core_node405
x:collection_shard101_replica_n402] o.a.s.s.HttpSolrCall
null:java.io.IOException: java.lang.InterruptedException

ERROR (qtp1546693040-2704) [c:collection s:shard101 r:core_node405
x:collection_shard101_replica_n402] o.a.s.s.HttpSolrCall
null:org.eclipse.jetty.io.EofException: Reset cancel_stream_error

ERROR (qtp1546693040-1344) [c:collection s:shard20 r:core_node79
x:collection_shard20_replica_n76] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: No registered leader was found after
waiting for 4000ms , collection: collection slice: shard48 saw
state=DocCollection(collection//collections/collection/state.json/96434)={

ERROR (qtp1546693040-2928) [c:collection s:shard80 r:core_node319
x:collection_shard80_replica_n316] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Request says it is coming from
leader, but we are the leader

ERROR (updateExecutor-5-thread-47-processing-n:192.100.20.19:8985_solr
x:collection_shard161_replica_n641 c:collection s:shard161 r:core_node646)
[c:collection s:shard161 r:core_node646 x:collection_shard161_replica_n641]
o.a.s.u.SolrCmdDistributor
org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException:
Error from server at null: Expected mime type application/octet-stream but
got application/json

ERROR (recoveryExecutor-7-thread-16-processing-n:192.100.20.33:8984_solr
x:collection_shard80_replica_n47 c:collection s:shard80 r:core_node48)
[c:collection s:shard80 r:core_node48 x:collection_shard80_replica_n47]
o.a.s.c.RecoveryStrategy Error while trying to recover.
core=collection_shard80_replica_n47:java.util.concurrent.ExecutionException:
org.apache.solr.client.solrj.SolrServerException: IOException occurred when
talking to server at: http://192.100.20.34:8984/solr

ERROR (zkCallback-10-thread-22) [c:collection s:shard19 r:core_node322
x:collection_shard19_replica_n321] o.a.s.c.ShardLeaderElectionContext There
was a problem trying to register as the
leader:org.apache.solr.common.AlreadyClosedException

ERROR
(OverseerStateUpdate-176461820351853980-192.100.20.34:8985_solr-n_002357)
[   ] o.a.s.c.Overseer Overseer could not process the current clusterstate
state update message, skipping the message: {

ERROR (main-EventThread) [   ] o.a.z.ClientCnxn Error while calling watcher
 => java.lang.OutOfMemoryError: unable to create new native thread

ERROR 
(coreContainerWorkExecutor-2-thread-1-processing-n:192.100.20.34:8986_solr)
[   ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on
startup => org.apache.solr.cloud.ZkController$NotInClusterStateException:
coreNodeName core_node638 does not exist in shard shard105, ignore the
exception if the replica was deleted

ERROR (qtp836220863-249) [c:collection s:shard162 r:core_node548
x:collection_shard162_replica_n547] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: No registered leader was found after
waiting for 4000ms , collection: collection slice: shard162 saw
state=DocCollection(collection//collections/collection/state.json/43121)={

Regards,
Anshuman

On Mon, Aug 10, 2020 at 9:19 PM Jörn Franke  wrote:

> How do you ingest it exactly with Atomtic updates ? Is there an update
> processor in-between?
>
> What are your settings for hard/soft commit?
>
> For the shared going to recovery - do you have a log entry or something ?
>
> What is the Solr version?
>
> How do you setup ZK?
>
> > Am 10.08.2020 um 16:24 schrieb Anshuman Singh  >:
> >
> > Hi,
> >
> > We have a SolrCloud cluster with 10 nodes. We have 6B records ingested in
> > the Collection. Our use case requires atomic updates ("inc") on 5 fields.
> > Now almost 90% documents are atomic updates and as soon as we start our
> > ingestion pipelines, multiple shards start going into recovery, sometimes
> > all replicas of some shards go into down state.
> > The ingestion rate is also too slow with atomic updates, 4-5k per second.
> > We were abl

Replicas in Recovery During Atomic Updates

2020-08-10 Thread Anshuman Singh
Hi,

We have a SolrCloud cluster with 10 nodes. We have 6B records ingested in
the Collection. Our use case requires atomic updates ("inc") on 5 fields.
Now almost 90% documents are atomic updates and as soon as we start our
ingestion pipelines, multiple shards start going into recovery, sometimes
all replicas of some shards go into down state.
The ingestion rate is also too slow with atomic updates, 4-5k per second.
We were able to ingest records without atomic updates at the rate of 50k
records per second without any issues.

What I'm suspecting is, the fact that these "inc" atomic updates
require fetching of fields before indexing can cause slow rates but what
I'm not getting is, why are the replicas going into recovery?

Regards,
Anshuman


Ext4 or XFS

2020-08-05 Thread Anshuman Singh
Hi,

Which file system would be better for Solr, ext4 or XFS?

Regards,
Anshuman


Custom Snitch for Rack Awareness

2020-07-31 Thread Anshuman Singh
Hi,

I'm using Solr-7.4 and I want to create collections in my cluster such that
no two replicas should be assigned to the same Rack.

I read about Rule-based Replica Placement
https://lucene.apache.org/solr/guide/7_4/rule-based-replica-placement.html.
What I got is I have to create a tag/snitch which marks which node belongs
to which Rack and then I can specify the rule in the Collection creation
API. But I didn't get how to create this custom Snitch.

It is unclear to me how to create a custom Snitch. Can someone tell me how
to do it?

Regards,
Anshuman


Case insensitive search on String field

2020-07-25 Thread Anshuman Singh
Hi,

We missed the fact that case insensitive search doesn't work with
field type "string". We have 3B docs indexed and we cannot reindex the data.

Now, as schema changes require reindexing, is there any other way to
achieve case insensitive search on string fields?

Regards,
Anshuman


Solr Backup/Restore

2020-07-21 Thread Anshuman Singh
Hi,

I'm using Solr-7.4.0 and I want to export 4TB of data from our current Solr
cluster to a different cluster. The new cluster has twice the number of
nodes than the current cluster and I want data to be distributed among all
the nodes. Is this possible with the Backup/Restore feature considering the
fact that I want to increase the number of shards in the new Collections?

>From the official docs:
*"Support for backups when running SolrCloud is provided with
the Collections API
.
This allows the backups to be generated across multiple shards, and
restored to the same number of shards and replicas as the original
collection."*

I tried to create a backup of one collection using this feature but it is
giving me this error described here
https://issues.apache.org/jira/browse/SOLR-12523.

Can someone guide me on this and is there any other way to do this exercise
which would take less time?

Thanks,
Anshuman


Prevent Re-indexing if Doc Fields are Same

2020-06-26 Thread Anshuman Singh
I was reading about in-place updates
https://lucene.apache.org/solr/guide/7_4/updating-parts-of-documents.html,
In my use case I have to update the field "LASTUPDATETIME", all other
fields are same. Updates are very frequent and I can't bear the cost of
deleted docs.

If I provide all the fields, it deletes the document and re-index it. But
if I just "set" the "LASTUPDATETIME" field (non-indexed, non-stored,
docValue field), it does an in-place update without deletion. But the
problem is I don't know if the document is present or I'm indexing it the
first time.

Is there a way to prevent re-indexing if other fields are the same?

*P.S. I'm looking for a solution that doesn't require looking up if doc is
present in the Collection or not.*


Replicas going into recovery

2020-06-11 Thread Anshuman Singh
We are running a test case, ingesting 2B records in a collection in 24 hrs.
This collection is spread across 10 solr nodes with a replication factor of
2.

We are noticing many replicas going into recovery while indexing. And it is
degrading indexing performance.
We are observing errors like:

*org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://host:8983/solr/test_shard13_replica_n50
*


*Expected mime type application/octet-stream but got application/json.*
*o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: No
registered leader was found after waiting for 4000ms*

Sometimes both the replicas of a shard go into recovery and the error log
is something related to zookeeper, cannot elect a leader.

Also related to indexing performance, when we left a run overnight we can
see that in the morning the indexing performance had degraded from <1000ms
to >10s for 10k batch insertions.
But we have noticed that restarting solr on all nodes again starts gives
better performance. We are using Solr 7.4. What can be the issue here?


Re: Limit Solr Disk IO

2020-06-07 Thread Anshuman Singh
Hi Eric,

Thanks for your reply!
I have one more question which I think you missed in my previous email.
*"When our core size becomes ~100 G, indexing becomes really slow. Why is
this happening? Do we need to put a limit on how large each core can grow?"*

This question is unrelated to segments. I think I missed setting the
context properly in my previous email.

We have a collection with 20 shards and rf 2. Basically we want to hold
500M documents in each shard. Depending on our avg doc size (~1KB), it will
grow up to 400G. Is this shard size feasible or should we split it?

On Sat, Jun 6, 2020 at 10:50 PM Erick Erickson 
wrote:

> New segments are created when
> 1> the RAMBufferSizeMB is exceeded
> or
> 2> a commit happens.
>
> The maximum segment size defaults to 5G, but TieredMergePolicy can be
> configured in solrconfig.xml to have larger max sizes by setting
> maxMergedSegmentMB
>
> Depending on your indexing rate, requiring commits every 100K records may
> be too frequent, I have no idea what your indexing rate is. In general I
> prefer a time based autocommit policy. Say, for some reason, you stop
> indexing after 50K records. They’ll never be searchable unless you have a
> time-based commit. Besides, it’s much easier to explain to users “it may
> take 60 seconds for your doc to be searchable” than “well, depending on the
> indexing rate, it may be between 10 seconds and 6 hours for your docs to be
> searchable”. Of course if you’re indexing at a very fast rate, that may not
> matter.
>
> There’s no such thing as low disk read during segment merging”. If 5
> segments need to be read, they all must be read in their entirety and the
> new segment must be completely written out. At best you can try to cut down
> on the number of times segment merges happen, but from what you’re
> describing that may not be feasible.
>
> Attachments are aggressively stripped by the mail server, your graph did
> not come through.
>
> Once a segment grows to the max size (5g by default), it is not mreged
> again unless and until it accumulates quite a number of deleted documents.
> So one question is whether you update existing documents frequently. Is
> that the case? If not, then the index size really shouldn’t matter and your
> problem is something else.
>
> And I sincerely hope that part of your indexing does _NOT_ include
> optimize/forcemerge or expungeDeletes. Those are very expensive operations,
> and prior to Solr 7.5 would leave your index in an awkward state, see:
> https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/.
> There’s a link for how this is different in Solr 7.5+ in that article.
>
> But something smells fishy about this situation. Segment merging is
> typically not very noticeable. Perhaps you just have too much data on too
> small hardware? You’ve got some evidence that segment merging is the root
> cause, but I wonder if what’s happening is you’re just swapping instead?
> Segment merging will certainly increase the I/O pressure, but by and large
> that shouldn’t really affect search speed if the OS memory space is large
> enough to hold the important portions of your index. If the OS isn’t large
> enough, the additional I/O pressure from merging may be enough to start
> your system swapping which is A Bad Thing.
>
> See:
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> for how Lucene uses MMapDirectory...
>
> Best,
> Erick
>
> > On Jun 6, 2020, at 11:29 AM, Anshuman Singh 
> wrote:
> >
> > Hi Eric,
> >
> > We are looking into TLOG/PULL replicas. But I have some doubts regarding
> segments. Can you explain what causes creation of a new segment and how
> large it can grow?
> > And this is my index config:
> > maxMergeAtOnce - 20
> > segmentsPerTier - 20
> > ramBufferSizeMB - 512 MB
> >
> > Can I configure these settings optimally for low disk read during
> segment merging? Like increasing segmentsPerTier may help but a large
> number of segments may impact search. And as per the documentation,
> ramBufferSizeMB can trigger segment merging so maybe that can be tweaked.
> >
> > One more question:
> > This graph is representing index time wrt core size (0-100G). Commits
> were happening automatically at every 100k records.
> >
> >
> >
> > As you can see the density of spikes is increasing as the core size is
> increasing. When our core size becomes ~100 G, indexing becomes really
> slow. Why is this happening? Do we need to put a limit on how large each
> core can grow?
> >
> >
> > On Fri, Jun 5, 2020 at 5:59 PM Erick Erickson 
> wrote:
> > Have you considered TLOG/PULL replicas rather than NRT repli

Re: Limit Solr Disk IO

2020-06-06 Thread Anshuman Singh
Hi Eric,

We are looking into TLOG/PULL replicas. But I have some doubts regarding
segments. Can you explain what causes creation of a new segment and how
large it can grow?
And this is my index config:
maxMergeAtOnce - 20
segmentsPerTier - 20
ramBufferSizeMB - 512 MB

Can I configure these settings optimally for low disk read during segment
merging? Like increasing segmentsPerTier may help but a large number of
segments may impact search. And as per the documentation, ramBufferSizeMB
can trigger segment merging so maybe that can be tweaked.

One more question:
This graph is representing index time wrt core size (0-100G). Commits were
happening automatically at every 100k records.

[image: image.png]

As you can see the density of spikes is increasing as the core size is
increasing. When our core size becomes ~100 G, indexing becomes really
slow. Why is this happening? Do we need to put a limit on how large each
core can grow?


On Fri, Jun 5, 2020 at 5:59 PM Erick Erickson 
wrote:

> Have you considered TLOG/PULL replicas rather than NRT replicas?
> That way, all the indexing happens on a single machine and you can
> use shards.preference to confine the searches happen on the PULL replicas,
> see:  https://lucene.apache.org/solr/guide/7_7/distributed-requests.html
>
> No, you can’t really limit the number of segments. While that seems like a
> good idea, it quickly becomes counter-productive. Say you require that you
> have 10 segments. Say each one becomes 10G. What happens when the 11th
> segment is created and it’s 100M? Do you rewrite one of the 10G segments
> just
> to add 100M? Your problem gets worse, not better.
>
>
> Best,
> Erick
>
> > On Jun 5, 2020, at 1:41 AM, Anshuman Singh 
> wrote:
> >
> > Hi Nicolas,
> >
> > Commit happens automatically at 100k documents. We don't commit
> explicitly.
> > We didn't limit the number of segments. There are 35+ segments in each
> core.
> > But unrelated to the question, I would like to know if we can limit the
> > number of segments in the core. I tried it in the past but the merge
> > policies don't allow that.
> > The TieredMergePolicy has two parameters, maxMergeAtOnce and
> > segmentsPerTier. It seems like we cannot control the total number of
> > segments but only the segments per tier.(
> >
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
> > )
> >
> >
> > On Thu, Jun 4, 2020 at 5:48 PM Nicolas Franck 
> > wrote:
> >
> >> The real questions are:
> >>
> >> * how much often do you commit (either explicitly or automatically)?
> >> * how much segments do you allow? If you only allow 1 segment,
> >>  then that whole segment is recreated using the old documents and the
> >> updates.
> >>  And yes, that requires reading the old segment.
> >>  It is common to allow multiple segments when you update often,
> >>  so updating does not interfere with reading the index too often.
> >>
> >>
> >>> On 4 Jun 2020, at 14:08, Anshuman Singh 
> >> wrote:
> >>>
> >>> I noticed that while indexing, when commit happens, there is high disk
> >> read
> >>> by Solr. The problem is that it is impacting search performance when
> the
> >>> index is loaded from the disk with respect to the query, as the disk
> read
> >>> speed is not quite good and the whole index is not cached in RAM.
> >>>
> >>> When no searching is performed, I noticed that disk is usually read
> >> during
> >>> commit operations and sometimes even without commit at low rate. I
> guess
> >> it
> >>> is read due to segment merge operations. Can it be something else?
> >>> If it is merging, can we limit disk IO during merging?
> >>
> >>
>
>


Re: Limit Solr Disk IO

2020-06-04 Thread Anshuman Singh
Hi Nicolas,

Commit happens automatically at 100k documents. We don't commit explicitly.
We didn't limit the number of segments. There are 35+ segments in each core.
But unrelated to the question, I would like to know if we can limit the
number of segments in the core. I tried it in the past but the merge
policies don't allow that.
The TieredMergePolicy has two parameters, maxMergeAtOnce and
segmentsPerTier. It seems like we cannot control the total number of
segments but only the segments per tier.(
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
)


On Thu, Jun 4, 2020 at 5:48 PM Nicolas Franck 
wrote:

> The real questions are:
>
> * how much often do you commit (either explicitly or automatically)?
> * how much segments do you allow? If you only allow 1 segment,
>   then that whole segment is recreated using the old documents and the
> updates.
>   And yes, that requires reading the old segment.
>   It is common to allow multiple segments when you update often,
>   so updating does not interfere with reading the index too often.
>
>
> > On 4 Jun 2020, at 14:08, Anshuman Singh 
> wrote:
> >
> > I noticed that while indexing, when commit happens, there is high disk
> read
> > by Solr. The problem is that it is impacting search performance when the
> > index is loaded from the disk with respect to the query, as the disk read
> > speed is not quite good and the whole index is not cached in RAM.
> >
> > When no searching is performed, I noticed that disk is usually read
> during
> > commit operations and sometimes even without commit at low rate. I guess
> it
> > is read due to segment merge operations. Can it be something else?
> > If it is merging, can we limit disk IO during merging?
>
>


Limit Solr Disk IO

2020-06-04 Thread Anshuman Singh
I noticed that while indexing, when commit happens, there is high disk read
by Solr. The problem is that it is impacting search performance when the
index is loaded from the disk with respect to the query, as the disk read
speed is not quite good and the whole index is not cached in RAM.

When no searching is performed, I noticed that disk is usually read during
commit operations and sometimes even without commit at low rate. I guess it
is read due to segment merge operations. Can it be something else?
If it is merging, can we limit disk IO during merging?


Re: Solr multi core query too slow

2020-05-30 Thread Anshuman Singh
Thanks again, Erick, for pointing us in the right direction.

Yes, I am seeing heavy disk I/O while querying. I queried a single
collection. A query for 10 rows can cause 100-150 MB disk read on each
node. While querying for a 1000 rows, disk read is in range of 2-7 GB per
node.

Is this normal? I didn't quite get what is happening behind the scenes. I
mean just 1000 rows causing up to 2-7 GB of disk read? Now it may be
something basic but it would be helpful if you put some light on it.

It seems like disk I/O is the bottleneck here. With the amount of data we
are dealing with, is increasing the number of hosts the only option or we
may have missed on configuring Solr properly?

On Fri, May 29, 2020 at 5:13 PM Erick Erickson 
wrote:

> Right, you’re running into the “laggard” problem, you can’t get the overall
> result back until every shard has responded. There’s an interesting
> parameter “shards.info=true” will give you some information about
> the time taken by the sub-search on each shard.
>
> But given your numbers, I think your root problem is that
> your hardware is overmatched. In total, you have 14B documents, correct?
> and a replication factor of 2. Meaning each of your 10 machines has 2.8
> billion docs in 128G total memory. Another way to look at it is that you
> have 7 x 20 x 2 = 280 replicas each with a 40G index. So each node has
> 28 replicas/node and handles over a terabyte of index in aggregate. At
> first
> blush, you’ve overloaded your hardware. My guess here is that one node or
> the other has to do a lot of swapping/gc/whatever quite regularly when
> you query. Given that you’re on HDDs, this can be quite expensive.
>
> I think you can simplify your analysis problem a lot by concentrating on
> a single machine, load it up and analyze it heavily. Monitor I/O, analyze
> GC and CPU utilization. My first guess is you’ll see heavy I/O. Once the
> index
> is read into memory, a well-sized Solr installation won’t see much I/O,
> _especially_ if you simplify the query to only ask for, say, some docValues
> field or rows=0. I think that your OS is swapping significant segments of
> the
> Lucene index in and out to satisfy your queries.
>
> GC is always a place to look. You should have GC logs available to see if
> you spend lots of CPU cycles in GC. I doubt that GC tuning will fix your
> performance issues, but it’s something to look at.
>
> A quick-n-dirty way to see if it’s swapping as I suspect is to monitor
> CPU utilization. A red flag is if it’s low and your queries _still_ take
> a long time. That’s a “smoking gun” that you’re swapping. While
> indicative, that’s not definitive since I’ve also seen CPU pegged because
> of GC, so if CPUs are running hot, you have to dig deeper...
>
>
> So to answer your questions:
>
> 1> I doubt decreasing the number of shards will help.. I think you
>  simply have too many docs per node, changing the number of
>  shards isn’t going to change that.
>
> 2> A strongly suspect that increasing replicas will make the
>  problem _worse_ since it’ll just add more docs per node.
>
> 3> It Depends (tm). I wrote “the sizing blog” a long time ago because
>  this question comes up so often and is impossible to answer in the
>  abstract. Well, it’s possible in theory, but in practice there are
> just
>  too many variables. What you need to do to fully answer this in your
>  situation with your data is set up a node and “test it to
> destruction”.
>
>
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> I want to emphasize that you _must_ have realistic queries when
> you test as in that blog, and you _must_ have a bunch of them
> in order to not get fooled by hitting, say, your queryResultCache. I
> had one client who “stress tested” with the same query and was
> getting 3ms response times because, after the first one, they never
> needed to do any searching at all, everything was in caches. Needless
> to say that didn’t hold up when they used a realistic query mix.
>
> Best,
> Erick
>
> > On May 29, 2020, at 4:53 AM, Anshuman Singh 
> wrote:
> >
> > Thanks for your reply, Erick. You helped me in improving my understanding
> > of how Solr distributed requests work internally.
> >
> > Actually my ultimate goal is to improve search performance in one of our
> > test environment where the queries are taking upto 60 seconds to execute.
> > *We want to fetch at least the first top 100 rows in seconds (< 5
> seconds).
> > *
> >
> > Right now, we have 7 Collections across 10 Solr nodes, each Collection
> > having approx 2B records equally distributed across 20 shards with rf 2

Re: Solr multi core query too slow

2020-05-29 Thread Anshuman Singh
Thanks for your reply, Erick. You helped me in improving my understanding
of how Solr distributed requests work internally.

Actually my ultimate goal is to improve search performance in one of our
test environment where the queries are taking upto 60 seconds to execute.
*We want to fetch at least the first top 100 rows in seconds (< 5 seconds).
*

Right now, we have 7 Collections across 10 Solr nodes, each Collection
having approx 2B records equally distributed across 20 shards with rf 2.
Each replica/core is ~40GB in size . The number of users are very few
(<10). We are using HDDs and each host has 128 GB RAM. Solr JVM Heap size
is 24GB. In the actual production environment, we are planning for 100 such
machines and we will be ingesting ~2B records on daily basis. We will
retain data of upto 3 months.

I followed your suggestion of not querying more than 100 rows and this is
my observation. I ran queries with the debugQuery param and found that the
query response time depends on the worst performing shard as some of the
shards take longer to execute the query than other shards.

Here are my questions:

   1.  Is decreasing number of shards going to help us as there will be
   lesser number of shards to be queried?
   2.  Is increasing number of replicas going to help us as there will be
   load balancing?
   3.  How many records should we keep in each Collection or in each
   replica/core? Will we face performance issues if the core size becomes too
   big?

Any other suggestions are appreciated.

On Wed, May 27, 2020 at 9:23 PM Erick Erickson 
wrote:

> First of all, asking for that many rows will spend a lot of time
> gathering the document fields. Assuming you have stored fields,
> each doc requires
> 1> the aggregator node getting the candidate 10 docs from each shard
>
> 2> The aggregator node sorting those 10 docs from each shard into the
> true top 10 based on the sort criteria (score by default)
>
> 3> the aggregator node going back to the shards and asking them for those
> docs of that 10 that are resident on that shard
>
> 4> the aggregator node assembling the final docs to be sent to the client
> and sending them.
>
> So my guess is that when you fire requests at a particular replica that
> has to get them from the other shard’s replica on another host, the network
> back-and-forth is killing your perf. It’s not that your network is having
> problems, just that you’re pushing a lot of data back and forth in your
> poorly-performing cases.
>
> So first of all, specifying 100K rows is an anti-pattern. Outside of
> streaming, Solr is built on the presumption that you’re after the top few
> rows (< 100, say). The times vary a lot depending on whether you need to
> read stored fields BTW.
>
> Second, I suspect your test is bogus. If you run the tests in the order
> you gave, the first one will read the necessary data from disk and probably
> have it in the OS disk cache for the second and subsequent. And/or you’re
> getting results from your queryResultCache (although you’d have to have a
> big one). Specifying the exact same query when trying to time things is
> usually a mistake.
>
> If your use-case requires 100K rows, you should be using streaming or
> cursorMark. While that won’t make the end-to-end time shorter, but won’t
> put such a strain on the system.
>
> Best,
> Erick
>
> > On May 27, 2020, at 10:38 AM, Anshuman Singh 
> wrote:
> >
> > I have a Solr cloud setup (Solr 7.4) with a collection "test" having two
> > shards on two different nodes. There are 4M records equally distributed
> > across the shards.
> >
> > If I query the collection like below, it is slow.
> > http://localhost:8983/solr/*test*/select?q=*:*=10
> > QTime: 6930
> >
> > If I query a particular shard like below, it is also slow.
> > http://localhost:8983/solr/*test_shard1_replica_n2*
> > /select?q=*:*=10=*shard2*
> > QTime: 5494
> > *Notice shard2 in shards parameter and shard1 in the core being queried.*
> >
> > But this is faster:
> > http://localhost:8983/solr/*test_shard1_replica_n2*
> > /select?q=*:*=10=*shard1*
> > QTime: 57
> >
> > This is also faster:
> > http://localhost:8983/solr/*test_shard2_replica_n4*
> > /select?q=*:*=10=*shard2*
> > QTime: 71
> >
> > I don't think it is the network as I performed similar tests with a
> single
> > node setup as well. If you query a particular core and the corresponding
> > logical shard, it is much faster than querying a different shard or core.
> >
> > Why is this behaviour? How to make the first two queries work as fast as
> > the last two queries?
>
>


Solr multi core query too slow

2020-05-27 Thread Anshuman Singh
I have a Solr cloud setup (Solr 7.4) with a collection "test" having two
shards on two different nodes. There are 4M records equally distributed
across the shards.

If I query the collection like below, it is slow.
http://localhost:8983/solr/*test*/select?q=*:*=10
QTime: 6930

If I query a particular shard like below, it is also slow.
http://localhost:8983/solr/*test_shard1_replica_n2*
/select?q=*:*=10=*shard2*
QTime: 5494
*Notice shard2 in shards parameter and shard1 in the core being queried.*

But this is faster:
http://localhost:8983/solr/*test_shard1_replica_n2*
/select?q=*:*=10=*shard1*
QTime: 57

This is also faster:
http://localhost:8983/solr/*test_shard2_replica_n4*
/select?q=*:*=10=*shard2*
QTime: 71

I don't think it is the network as I performed similar tests with a single
node setup as well. If you query a particular core and the corresponding
logical shard, it is much faster than querying a different shard or core.

Why is this behaviour? How to make the first two queries work as fast as
the last two queries?


Why Solr query time is more in case the searched value frequency is more even if no sorting is applied, for the same number of rows?

2020-05-11 Thread Anshuman Singh
Suppose I have two phone numbers P1 and P2 and the number of records with
P1 are X and with P2 are 2X (2 times X) respectively. If I query for R rows
for P1 and P2, the QTime in case of P2 is more. I am not specifying any
sort parameter and the number of rows I'm asking for is same in both the
cases so why such difference?

I understand that if I use sorting on some basis then it has to go through
all the documents and then apply sorting on them before providing the
requested rows. But without sorting can't it just read the first R
documents from the index? In this case, I believe the QTime will not depend
on the total number of documents with respect to the query but on the
requested number of rows.