Re: Solr Cloud Hangs consistently .
Thanks for letting us know! Erick On Wed, Jun 19, 2013 at 7:18 AM, Rishi Easwaran rishi.easwa...@aol.com wrote: Update!! Got SOLR cloud working, was able to do 90k document inserts with replicationFactor=2, with my jmeter script, previously was getting stuck with 3k inserts or less. After some investigation, figured out that ulimits for my process were not being set properly, OS defaults were kicking in, which is very small for a server app. One of our install script had changed. I had to up the ulimits - -n,-u,-v and for now no other issues seen. -Original Message- From: Rishi Easwaran rishi.easwa...@aol.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Jun 18, 2013 10:40 am Subject: Re: Solr Cloud Hangs consistently . Mark, All I am doing are inserts, afaik search side deadlocks should not be an issue. I am using Jmeter, standard test driver we use for most of our benchmarks and stats collection. My jmeter.jmx file- http://apaste.info/79IS , maybe i overlooked something Is there a benchmark script that solr community uses (preferably with jmeter), we are write heavy so at the moment focusing on inserts only. Thanks, Rishi. -Original Message- From: Yago Riveiro yago.rive...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Jun 17, 2013 6:19 pm Subject: Re: Solr Cloud Hangs consistently . I do all the indexing through a HTTP POST, with replicationFactor=1 no problem, if is higher deadlock problems can appear A stack trace like this http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862 is that I get -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 11:03 PM, Mark Miller wrote: If it actually happens with replicationFactor=1, it doesn't likely have anything to do with the update handler issue I'm referring to. In some cases like these, people have better luck with Jetty than Tomcat - we test it much more. For instance, it's setup to help avoid search side distributed deadlocks. In any case, there is something special about it - I do and have seen a lot of heavy indexing to SolrCloud by me and others without running into this. Both with replicationFacotor=1 and greater. So there is something specific in how the load is being done or what features/methods are being used that likely causes it or makes it easier to cause. But again, the issue I know about involves threads that are not even created in the replicationFactor = 1 case, so that could be a first report afaik. - Mark On Jun 17, 2013, at 5:52 PM, Rishi Easwaran rishi.easwa...@aol.com (mailto:rishi.easwa...@aol.com) wrote: Update!! This happens with replicationFactor=1 Just for kicks I created a collection with a 24 shards, replicationfactor=1 cluster on my exisiting benchmark env. Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu most metrics looks fine. Only indication seems to be netstat showing incoming request not being read in. Yago, I saw your previous post (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631) Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets fixed, but no luck. Looks like this is a dominant and easily reproducible issue on SOLR cloud. Thanks, Rishi. -Original Message- From: Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com) To: solr-user solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) Sent: Mon, Jun 17, 2013 5:15 pm Subject: Re: Solr Cloud Hangs consistently . I can confirm that the deadlock happen with only 2 replicas by shard. I need shutdown one node that host a replica of the shard to recover the indexation capability. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote: Hi All, I am trying to benchmark SOLR Cloud and it consistently hangs. Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. A little bit about my set up. I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. JVM configs: http://apaste.info/57Ai My cluster has 12 shards with replication factor 2- http://apaste.info/09sA I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. It got stuck repeatedly. I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. It still shows same behaviour and hangs through the test. My test schema and config. Schema.xml - http://apaste.info/imah SolrConfig.xml - http://apaste.info/ku4F
Re: Solr Cloud Hangs consistently .
Update!! Got SOLR cloud working, was able to do 90k document inserts with replicationFactor=2, with my jmeter script, previously was getting stuck with 3k inserts or less. After some investigation, figured out that ulimits for my process were not being set properly, OS defaults were kicking in, which is very small for a server app. One of our install script had changed. I had to up the ulimits - -n,-u,-v and for now no other issues seen. -Original Message- From: Rishi Easwaran rishi.easwa...@aol.com To: solr-user solr-user@lucene.apache.org Sent: Tue, Jun 18, 2013 10:40 am Subject: Re: Solr Cloud Hangs consistently . Mark, All I am doing are inserts, afaik search side deadlocks should not be an issue. I am using Jmeter, standard test driver we use for most of our benchmarks and stats collection. My jmeter.jmx file- http://apaste.info/79IS , maybe i overlooked something Is there a benchmark script that solr community uses (preferably with jmeter), we are write heavy so at the moment focusing on inserts only. Thanks, Rishi. -Original Message- From: Yago Riveiro yago.rive...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Jun 17, 2013 6:19 pm Subject: Re: Solr Cloud Hangs consistently . I do all the indexing through a HTTP POST, with replicationFactor=1 no problem, if is higher deadlock problems can appear A stack trace like this http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862 is that I get -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 11:03 PM, Mark Miller wrote: If it actually happens with replicationFactor=1, it doesn't likely have anything to do with the update handler issue I'm referring to. In some cases like these, people have better luck with Jetty than Tomcat - we test it much more. For instance, it's setup to help avoid search side distributed deadlocks. In any case, there is something special about it - I do and have seen a lot of heavy indexing to SolrCloud by me and others without running into this. Both with replicationFacotor=1 and greater. So there is something specific in how the load is being done or what features/methods are being used that likely causes it or makes it easier to cause. But again, the issue I know about involves threads that are not even created in the replicationFactor = 1 case, so that could be a first report afaik. - Mark On Jun 17, 2013, at 5:52 PM, Rishi Easwaran rishi.easwa...@aol.com (mailto:rishi.easwa...@aol.com) wrote: Update!! This happens with replicationFactor=1 Just for kicks I created a collection with a 24 shards, replicationfactor=1 cluster on my exisiting benchmark env. Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu most metrics looks fine. Only indication seems to be netstat showing incoming request not being read in. Yago, I saw your previous post (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631) Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets fixed, but no luck. Looks like this is a dominant and easily reproducible issue on SOLR cloud. Thanks, Rishi. -Original Message- From: Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com) To: solr-user solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) Sent: Mon, Jun 17, 2013 5:15 pm Subject: Re: Solr Cloud Hangs consistently . I can confirm that the deadlock happen with only 2 replicas by shard. I need shutdown one node that host a replica of the shard to recover the indexation capability. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote: Hi All, I am trying to benchmark SOLR Cloud and it consistently hangs. Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. A little bit about my set up. I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. JVM configs: http://apaste.info/57Ai My cluster has 12 shards with replication factor 2- http://apaste.info/09sA I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. It got stuck repeatedly. I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. It still shows same behaviour and hangs through the test. My test schema and config. Schema.xml - http://apaste.info/imah SolrConfig.xml - http://apaste.info/ku4F The test is pretty simple. its a jmeter test with update command via SOAP rpc (round robin
Re: Solr Cloud Hangs consistently .
Mark, All I am doing are inserts, afaik search side deadlocks should not be an issue. I am using Jmeter, standard test driver we use for most of our benchmarks and stats collection. My jmeter.jmx file- http://apaste.info/79IS , maybe i overlooked something Is there a benchmark script that solr community uses (preferably with jmeter), we are write heavy so at the moment focusing on inserts only. Thanks, Rishi. -Original Message- From: Yago Riveiro yago.rive...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Jun 17, 2013 6:19 pm Subject: Re: Solr Cloud Hangs consistently . I do all the indexing through a HTTP POST, with replicationFactor=1 no problem, if is higher deadlock problems can appear A stack trace like this http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862 is that I get -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 11:03 PM, Mark Miller wrote: If it actually happens with replicationFactor=1, it doesn't likely have anything to do with the update handler issue I'm referring to. In some cases like these, people have better luck with Jetty than Tomcat - we test it much more. For instance, it's setup to help avoid search side distributed deadlocks. In any case, there is something special about it - I do and have seen a lot of heavy indexing to SolrCloud by me and others without running into this. Both with replicationFacotor=1 and greater. So there is something specific in how the load is being done or what features/methods are being used that likely causes it or makes it easier to cause. But again, the issue I know about involves threads that are not even created in the replicationFactor = 1 case, so that could be a first report afaik. - Mark On Jun 17, 2013, at 5:52 PM, Rishi Easwaran rishi.easwa...@aol.com (mailto:rishi.easwa...@aol.com) wrote: Update!! This happens with replicationFactor=1 Just for kicks I created a collection with a 24 shards, replicationfactor=1 cluster on my exisiting benchmark env. Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu most metrics looks fine. Only indication seems to be netstat showing incoming request not being read in. Yago, I saw your previous post (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631) Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets fixed, but no luck. Looks like this is a dominant and easily reproducible issue on SOLR cloud. Thanks, Rishi. -Original Message- From: Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com) To: solr-user solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) Sent: Mon, Jun 17, 2013 5:15 pm Subject: Re: Solr Cloud Hangs consistently . I can confirm that the deadlock happen with only 2 replicas by shard. I need shutdown one node that host a replica of the shard to recover the indexation capability. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote: Hi All, I am trying to benchmark SOLR Cloud and it consistently hangs. Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. A little bit about my set up. I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. JVM configs: http://apaste.info/57Ai My cluster has 12 shards with replication factor 2- http://apaste.info/09sA I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. It got stuck repeatedly. I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. It still shows same behaviour and hangs through the test. My test schema and config. Schema.xml - http://apaste.info/imah SolrConfig.xml - http://apaste.info/ku4F The test is pretty simple. its a jmeter test with update command via SOAP rpc (round robin request across every node), adding in 5 fields from a csv file - id, guid, subject, body, compositeID (guid!id). number of jmeter threads = 150. loop count = 20, num of messages to add/per guid = 3; total 150*3*20 = 9000 documents. When cloud gets stuck, i don't get anything in the logs, but when i run netstat i see the following. Sample netstat on a stuck run. http://apaste.info/hr0O hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts. At the moment my benchmarking efforts are at a stand still. Any help from the community would be great, I got some heap dumps and stack dumps, but haven't
Re: Solr Cloud Hangs consistently .
Could you give a simple stack trace dump as well? It's likely the distributed update deadlock that has been reported a few times now - I think usually with a replication factor greater than 2, but I can't be sure. The deadlock involves sending docs concurrently to replicas and I wouldn't have expected it to be so easily hit with only 2 replicas per shard. I should be able to tell from a stack trace though. If it is that, it's on my short list to investigate (been there a long time now though - but I still hope to look at it soon). - Mark On Jun 17, 2013, at 1:44 PM, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi All, I am trying to benchmark SOLR Cloud and it consistently hangs. Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. A little bit about my set up. I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. JVM configs: http://apaste.info/57Ai My cluster has 12 shards with replication factor 2- http://apaste.info/09sA I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. It got stuck repeatedly. I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. It still shows same behaviour and hangs through the test. My test schema and config. Schema.xml - http://apaste.info/imah SolrConfig.xml - http://apaste.info/ku4F The test is pretty simple. its a jmeter test with update command via SOAP rpc (round robin request across every node), adding in 5 fields from a csv file - id, guid, subject, body, compositeID (guid!id). number of jmeter threads = 150. loop count = 20, num of messages to add/per guid = 3; total 150*3*20 = 9000 documents. When cloud gets stuck, i don't get anything in the logs, but when i run netstat i see the following. Sample netstat on a stuck run. http://apaste.info/hr0O hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts. At the moment my benchmarking efforts are at a stand still. Any help from the community would be great, I got some heap dumps and stack dumps, but haven't found a smoking gun yet. If I can provide anything else to diagnose this issue. just let me know. Thanks, Rishi.
Re: Solr Cloud Hangs consistently .
Mark, I got a few stack dumps of the instance that was stuck ssdtest-d03:8011 http://apaste.info/cofK http://apaste.info/sv4M http://apaste.info/cxUf I can get dumps of others if needed. Thanks, Rishi. -Original Message- From: Mark Miller markrmil...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Jun 17, 2013 1:57 pm Subject: Re: Solr Cloud Hangs consistently . Could you give a simple stack trace dump as well? It's likely the distributed update deadlock that has been reported a few times now - I think usually with a replication factor greater than 2, but I can't be sure. The deadlock involves sending docs concurrently to replicas and I wouldn't have expected it to be so easily hit with only 2 replicas per shard. I should be able to tell from a stack trace though. If it is that, it's on my short list to investigate (been there a long time now though - but I still hope to look at it soon). - Mark On Jun 17, 2013, at 1:44 PM, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi All, I am trying to benchmark SOLR Cloud and it consistently hangs. Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. A little bit about my set up. I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. JVM configs: http://apaste.info/57Ai My cluster has 12 shards with replication factor 2- http://apaste.info/09sA I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. It got stuck repeatedly. I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. It still shows same behaviour and hangs through the test. My test schema and config. Schema.xml - http://apaste.info/imah SolrConfig.xml - http://apaste.info/ku4F The test is pretty simple. its a jmeter test with update command via SOAP rpc (round robin request across every node), adding in 5 fields from a csv file - id, guid, subject, body, compositeID (guid!id). number of jmeter threads = 150. loop count = 20, num of messages to add/per guid = 3; total 150*3*20 = 9000 documents. When cloud gets stuck, i don't get anything in the logs, but when i run netstat i see the following. Sample netstat on a stuck run. http://apaste.info/hr0O hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts. At the moment my benchmarking efforts are at a stand still. Any help from the community would be great, I got some heap dumps and stack dumps, but haven't found a smoking gun yet. If I can provide anything else to diagnose this issue. just let me know. Thanks, Rishi.
Re: Solr Cloud Hangs consistently .
FYI..you can ignore http4ClientExpiryService thread in the stack dump. Its a dummy executor service, i created to test out something, unrelated to this issue. -Original Message- From: Rishi Easwaran rishi.easwa...@aol.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Jun 17, 2013 2:54 pm Subject: Re: Solr Cloud Hangs consistently . Mark, I got a few stack dumps of the instance that was stuck ssdtest-d03:8011 http://apaste.info/cofK http://apaste.info/sv4M http://apaste.info/cxUf I can get dumps of others if needed. Thanks, Rishi. -Original Message- From: Mark Miller markrmil...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Jun 17, 2013 1:57 pm Subject: Re: Solr Cloud Hangs consistently . Could you give a simple stack trace dump as well? It's likely the distributed update deadlock that has been reported a few times now - I think usually with a replication factor greater than 2, but I can't be sure. The deadlock involves sending docs concurrently to replicas and I wouldn't have expected it to be so easily hit with only 2 replicas per shard. I should be able to tell from a stack trace though. If it is that, it's on my short list to investigate (been there a long time now though - but I still hope to look at it soon). - Mark On Jun 17, 2013, at 1:44 PM, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi All, I am trying to benchmark SOLR Cloud and it consistently hangs. Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. A little bit about my set up. I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. JVM configs: http://apaste.info/57Ai My cluster has 12 shards with replication factor 2- http://apaste.info/09sA I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. It got stuck repeatedly. I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. It still shows same behaviour and hangs through the test. My test schema and config. Schema.xml - http://apaste.info/imah SolrConfig.xml - http://apaste.info/ku4F The test is pretty simple. its a jmeter test with update command via SOAP rpc (round robin request across every node), adding in 5 fields from a csv file - id, guid, subject, body, compositeID (guid!id). number of jmeter threads = 150. loop count = 20, num of messages to add/per guid = 3; total 150*3*20 = 9000 documents. When cloud gets stuck, i don't get anything in the logs, but when i run netstat i see the following. Sample netstat on a stuck run. http://apaste.info/hr0O hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts. At the moment my benchmarking efforts are at a stand still. Any help from the community would be great, I got some heap dumps and stack dumps, but haven't found a smoking gun yet. If I can provide anything else to diagnose this issue. just let me know. Thanks, Rishi.
Re: Solr Cloud Hangs consistently .
I can confirm that the deadlock happen with only 2 replicas by shard. I need shutdown one node that host a replica of the shard to recover the indexation capability. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote: Hi All, I am trying to benchmark SOLR Cloud and it consistently hangs. Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. A little bit about my set up. I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. JVM configs: http://apaste.info/57Ai My cluster has 12 shards with replication factor 2- http://apaste.info/09sA I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. It got stuck repeatedly. I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. It still shows same behaviour and hangs through the test. My test schema and config. Schema.xml - http://apaste.info/imah SolrConfig.xml - http://apaste.info/ku4F The test is pretty simple. its a jmeter test with update command via SOAP rpc (round robin request across every node), adding in 5 fields from a csv file - id, guid, subject, body, compositeID (guid!id). number of jmeter threads = 150. loop count = 20, num of messages to add/per guid = 3; total 150*3*20 = 9000 documents. When cloud gets stuck, i don't get anything in the logs, but when i run netstat i see the following. Sample netstat on a stuck run. http://apaste.info/hr0O hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts. At the moment my benchmarking efforts are at a stand still. Any help from the community would be great, I got some heap dumps and stack dumps, but haven't found a smoking gun yet. If I can provide anything else to diagnose this issue. just let me know. Thanks, Rishi.
Re: Solr Cloud Hangs consistently .
Update!! This happens with replicationFactor=1 Just for kicks I created a collection with a 24 shards, replicationfactor=1 cluster on my exisiting benchmark env. Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu most metrics looks fine. Only indication seems to be netstat showing incoming request not being read in. Yago, I saw your previous post (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631) Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets fixed, but no luck. Looks like this is a dominant and easily reproducible issue on SOLR cloud. Thanks, Rishi. -Original Message- From: Yago Riveiro yago.rive...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Jun 17, 2013 5:15 pm Subject: Re: Solr Cloud Hangs consistently . I can confirm that the deadlock happen with only 2 replicas by shard. I need shutdown one node that host a replica of the shard to recover the indexation capability. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote: Hi All, I am trying to benchmark SOLR Cloud and it consistently hangs. Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. A little bit about my set up. I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. JVM configs: http://apaste.info/57Ai My cluster has 12 shards with replication factor 2- http://apaste.info/09sA I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. It got stuck repeatedly. I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. It still shows same behaviour and hangs through the test. My test schema and config. Schema.xml - http://apaste.info/imah SolrConfig.xml - http://apaste.info/ku4F The test is pretty simple. its a jmeter test with update command via SOAP rpc (round robin request across every node), adding in 5 fields from a csv file - id, guid, subject, body, compositeID (guid!id). number of jmeter threads = 150. loop count = 20, num of messages to add/per guid = 3; total 150*3*20 = 9000 documents. When cloud gets stuck, i don't get anything in the logs, but when i run netstat i see the following. Sample netstat on a stuck run. http://apaste.info/hr0O hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts. At the moment my benchmarking efforts are at a stand still. Any help from the community would be great, I got some heap dumps and stack dumps, but haven't found a smoking gun yet. If I can provide anything else to diagnose this issue. just let me know. Thanks, Rishi.
Re: Solr Cloud Hangs consistently .
If it actually happens with replicationFactor=1, it doesn't likely have anything to do with the update handler issue I'm referring to. In some cases like these, people have better luck with Jetty than Tomcat - we test it much more. For instance, it's setup to help avoid search side distributed deadlocks. In any case, there is something special about it - I do and have seen a lot of heavy indexing to SolrCloud by me and others without running into this. Both with replicationFacotor=1 and greater. So there is something specific in how the load is being done or what features/methods are being used that likely causes it or makes it easier to cause. But again, the issue I know about involves threads that are not even created in the replicationFactor = 1 case, so that could be a first report afaik. - Mark On Jun 17, 2013, at 5:52 PM, Rishi Easwaran rishi.easwa...@aol.com wrote: Update!! This happens with replicationFactor=1 Just for kicks I created a collection with a 24 shards, replicationfactor=1 cluster on my exisiting benchmark env. Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu most metrics looks fine. Only indication seems to be netstat showing incoming request not being read in. Yago, I saw your previous post (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631) Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets fixed, but no luck. Looks like this is a dominant and easily reproducible issue on SOLR cloud. Thanks, Rishi. -Original Message- From: Yago Riveiro yago.rive...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Mon, Jun 17, 2013 5:15 pm Subject: Re: Solr Cloud Hangs consistently . I can confirm that the deadlock happen with only 2 replicas by shard. I need shutdown one node that host a replica of the shard to recover the indexation capability. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote: Hi All, I am trying to benchmark SOLR Cloud and it consistently hangs. Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. A little bit about my set up. I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. JVM configs: http://apaste.info/57Ai My cluster has 12 shards with replication factor 2- http://apaste.info/09sA I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. It got stuck repeatedly. I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. It still shows same behaviour and hangs through the test. My test schema and config. Schema.xml - http://apaste.info/imah SolrConfig.xml - http://apaste.info/ku4F The test is pretty simple. its a jmeter test with update command via SOAP rpc (round robin request across every node), adding in 5 fields from a csv file - id, guid, subject, body, compositeID (guid!id). number of jmeter threads = 150. loop count = 20, num of messages to add/per guid = 3; total 150*3*20 = 9000 documents. When cloud gets stuck, i don't get anything in the logs, but when i run netstat i see the following. Sample netstat on a stuck run. http://apaste.info/hr0O hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts. At the moment my benchmarking efforts are at a stand still. Any help from the community would be great, I got some heap dumps and stack dumps, but haven't found a smoking gun yet. If I can provide anything else to diagnose this issue. just let me know. Thanks, Rishi.
Re: Solr Cloud Hangs consistently .
I do all the indexing through a HTTP POST, with replicationFactor=1 no problem, if is higher deadlock problems can appear A stack trace like this http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862 is that I get -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 11:03 PM, Mark Miller wrote: If it actually happens with replicationFactor=1, it doesn't likely have anything to do with the update handler issue I'm referring to. In some cases like these, people have better luck with Jetty than Tomcat - we test it much more. For instance, it's setup to help avoid search side distributed deadlocks. In any case, there is something special about it - I do and have seen a lot of heavy indexing to SolrCloud by me and others without running into this. Both with replicationFacotor=1 and greater. So there is something specific in how the load is being done or what features/methods are being used that likely causes it or makes it easier to cause. But again, the issue I know about involves threads that are not even created in the replicationFactor = 1 case, so that could be a first report afaik. - Mark On Jun 17, 2013, at 5:52 PM, Rishi Easwaran rishi.easwa...@aol.com (mailto:rishi.easwa...@aol.com) wrote: Update!! This happens with replicationFactor=1 Just for kicks I created a collection with a 24 shards, replicationfactor=1 cluster on my exisiting benchmark env. Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu most metrics looks fine. Only indication seems to be netstat showing incoming request not being read in. Yago, I saw your previous post (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631) Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets fixed, but no luck. Looks like this is a dominant and easily reproducible issue on SOLR cloud. Thanks, Rishi. -Original Message- From: Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com) To: solr-user solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) Sent: Mon, Jun 17, 2013 5:15 pm Subject: Re: Solr Cloud Hangs consistently . I can confirm that the deadlock happen with only 2 replicas by shard. I need shutdown one node that host a replica of the shard to recover the indexation capability. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote: Hi All, I am trying to benchmark SOLR Cloud and it consistently hangs. Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck. A little bit about my set up. I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host is configured to have 8 SOLR cloud nodes running at 4GB each. JVM configs: http://apaste.info/57Ai My cluster has 12 shards with replication factor 2- http://apaste.info/09sA I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already running this configuration in production in Non-Cloud form. It got stuck repeatedly. I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 and tomcat7. It still shows same behaviour and hangs through the test. My test schema and config. Schema.xml - http://apaste.info/imah SolrConfig.xml - http://apaste.info/ku4F The test is pretty simple. its a jmeter test with update command via SOAP rpc (round robin request across every node), adding in 5 fields from a csv file - id, guid, subject, body, compositeID (guid!id). number of jmeter threads = 150. loop count = 20, num of messages to add/per guid = 3; total 150*3*20 = 9000 documents. When cloud gets stuck, i don't get anything in the logs, but when i run netstat i see the following. Sample netstat on a stuck run. http://apaste.info/hr0O hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts. At the moment my benchmarking efforts are at a stand still. Any help from the community would be great, I got some heap dumps and stack dumps, but haven't found a smoking gun yet. If I can provide anything else to diagnose this issue. just let me know. Thanks, Rishi.