[jira] [Commented] (SOLR-6875) No data integrity between replicas
[ https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14583657#comment-14583657 ] Erick Erickson commented on SOLR-6875: -- Do any of the logs on the leaders mention leader initiated recovery? And how fast are you sending documents at Solr? I've seen situations where flooding too many updates at Solr can cause some wonky behavior, there are some inefficiencies in how leaders talk to replicas, see Tim Potter's blog here: http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ The symptom I saw was two-fold: 1 the leader forced the follower into recovery. No errors reported on the follower, just a timeout on the leader 2 There were a bazillion updates coming in as fast as possible, there were a lot of threads outstanding on the leader from ConcurrentUpdateSolrServer. Not saying this is your problem, but if you see something like this it'd be good to know when tracking this down. If you don't have followers going down then this isn't the issue. No data integrity between replicas -- Key: SOLR-6875 URL: https://issues.apache.org/jira/browse/SOLR-6875 Project: Solr Issue Type: Bug Affects Versions: 4.10.2 Environment: One replica is @ Linux solr1.devops.wegohealth.com 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Solr is running with the next options: * -Xms12G * -Xmx16G * -XX:+UseConcMarkSweepGC * -XX:+UseLargePages * -XX:+CMSParallelRemarkEnabled * -XX:+ParallelRefProcEnabled * -XX:+UseLargePages * -XX:+AggressiveOpts * -XX:CMSInitiatingOccupancyFraction=75 Reporter: Alexander S. Attachments: replica1.png, replica2.png Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total. Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, and another (Solr1.1) 45 574 038 docs. Solr1 is the leader, these errors appeared in the logs: {code} ERROR - 2014-12-20 09:54:38.783; org.apache.solr.update.StreamingSolrServers$1; error java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) WARN - 2014-12-20 09:54:38.787; org.apache.solr.update.processor.DistributedUpdateProcessor; Error sending update java.net.SocketException: Connection reset at
[jira] [Commented] (SOLR-6875) No data integrity between replicas
[ https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272877#comment-14272877 ] Alexander S. commented on SOLR-6875: Now we have 4 shards, each with 2 replics (8 total nodes) and the next picture: {noformat} Shard 1: Replica 1: 14 486 089 Replica 2: 14 496 445 Shard 2 Replica 1: 14 496 609 Replica 2: 14 496 609 Shard 3 Replica 1: 14 492 812 Replica 2: 14 492 812 Shard 4 Replica 1: 14 488 755 Replica 2: 14 488 755 {noformat} How could it be? We didn't see anything like that before upgrade from 4.8.1 to 4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason? No data integrity between replicas -- Key: SOLR-6875 URL: https://issues.apache.org/jira/browse/SOLR-6875 Project: Solr Issue Type: Bug Affects Versions: 4.10.2 Environment: One replica is @ Linux solr1.devops.wegohealth.com 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Solr is running with the next options: * -Xms12G * -Xmx16G * -XX:+UseConcMarkSweepGC * -XX:+UseLargePages * -XX:+CMSParallelRemarkEnabled * -XX:+ParallelRefProcEnabled * -XX:+UseLargePages * -XX:+AggressiveOpts * -XX:CMSInitiatingOccupancyFraction=75 Reporter: Alexander S. Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total. Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, and another (Solr1.1) 45 574 038 docs. Solr1 is the leader, these errors appeared in the logs: {code} ERROR - 2014-12-20 09:54:38.783; org.apache.solr.update.StreamingSolrServers$1; error java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) WARN - 2014-12-20 09:54:38.787; org.apache.solr.update.processor.DistributedUpdateProcessor; Error sending update java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273) at