Re: Solr Cloud Hangs consistently .

2013-06-21 Thread Erick Erickson
Thanks for letting us know!

Erick

On Wed, Jun 19, 2013 at 7:18 AM, Rishi Easwaran rishi.easwa...@aol.com wrote:
 Update!!

 Got SOLR cloud working, was able to do 90k document inserts with 
 replicationFactor=2, with my jmeter script, previously was getting stuck with 
 3k inserts or less.
 After some investigation, figured out that ulimits for my process were not 
 being set properly, OS defaults were kicking in, which is very small for a 
 server app.
 One of our install script had changed.
 I had to up the ulimits - -n,-u,-v and for now no other issues seen.






 -Original Message-
 From: Rishi Easwaran rishi.easwa...@aol.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Tue, Jun 18, 2013 10:40 am
 Subject: Re: Solr Cloud Hangs consistently .


 Mark,

 All I am doing are inserts, afaik search side deadlocks should not be an 
 issue.

 I am using Jmeter, standard test driver we use for most of our benchmarks and
 stats collection.
 My jmeter.jmx file- http://apaste.info/79IS , maybe i overlooked something


 Is there a benchmark script that solr community uses (preferably with jmeter),
 we are write heavy so at the moment focusing on inserts only.

 Thanks,

 Rishi.





 -Original Message-
 From: Yago Riveiro yago.rive...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Mon, Jun 17, 2013 6:19 pm
 Subject: Re: Solr Cloud Hangs consistently .


 I do all the indexing through a HTTP POST, with replicationFactor=1 no 
 problem,
 if is higher deadlock problems can appear

 A stack trace like this 
 http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862

 is that I get

 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Monday, June 17, 2013 at 11:03 PM, Mark Miller wrote:

 If it actually happens with replicationFactor=1, it doesn't likely have
 anything to do with the update handler issue I'm referring to. In some cases
 like these, people have better luck with Jetty than Tomcat - we test it much
 more. For instance, it's setup to help avoid search side distributed 
 deadlocks.

 In any case, there is something special about it - I do and have seen a lot 
 of

 heavy indexing to SolrCloud by me and others without running into this. Both
 with replicationFacotor=1 and greater. So there is something specific in how 
 the

 load is being done or what features/methods are being used that likely causes 
 it

 or makes it easier to cause.

 But again, the issue I know about involves threads that are not even created
 in the replicationFactor = 1 case, so that could be a first report afaik.

 - Mark

 On Jun 17, 2013, at 5:52 PM, Rishi Easwaran rishi.easwa...@aol.com
 (mailto:rishi.easwa...@aol.com) wrote:

  Update!!
 
  This happens with replicationFactor=1
  Just for kicks I created a collection with a 24 shards, replicationfactor=1
 cluster on my exisiting benchmark env.
  Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu
 most metrics looks fine.
  Only indication seems to be netstat showing incoming request not being read
 in.
 
  Yago,
 
  I saw your previous post 
  (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631)
  Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets
 fixed, but no luck.
  Looks like this is a dominant and easily reproducible issue on SOLR cloud.
 
 
  Thanks,
 
  Rishi.
 
 
 
 
 
 
 
 
 
 
 
  -Original Message-
  From: Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com)
  To: solr-user solr-user@lucene.apache.org 
  (mailto:solr-user@lucene.apache.org)
  Sent: Mon, Jun 17, 2013 5:15 pm
  Subject: Re: Solr Cloud Hangs consistently .
 
 
  I can confirm that the deadlock happen with only 2 replicas by shard. I 
  need


  shutdown one node that host a replica of the shard to recover the 
  indexation


  capability.
 
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
  On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote:
 
  
  
   Hi All,
  
   I am trying to benchmark SOLR Cloud and it consistently hangs.
   Nothing in the logs, no stack trace, no errors, no warnings, just seems
 stuck.
  
   A little bit about my set up.
   I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each
 host
  
 
  is configured to have 8 SOLR cloud nodes running at 4GB each.
   JVM configs: http://apaste.info/57Ai
  
   My cluster has 12 shards with replication factor 2- 
   http://apaste.info/09sA
  
   I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already
  running this configuration in production in Non-Cloud form.
   It got stuck repeatedly.
  
   I decided to upgrade to the latest and greatest of everything, SOLR 4.3,
 JDK7
  and tomcat7.
   It still shows same behaviour and hangs through the test.
  
   My test schema and config.
   Schema.xml - http://apaste.info/imah
   SolrConfig.xml - http://apaste.info/ku4F

Re: Solr Cloud Hangs consistently .

2013-06-19 Thread Rishi Easwaran
Update!!

Got SOLR cloud working, was able to do 90k document inserts with 
replicationFactor=2, with my jmeter script, previously was getting stuck with 
3k inserts or less.
After some investigation, figured out that ulimits for my process were not 
being set properly, OS defaults were kicking in, which is very small for a 
server app.
One of our install script had changed.
I had to up the ulimits - -n,-u,-v and for now no other issues seen.


 

 

-Original Message-
From: Rishi Easwaran rishi.easwa...@aol.com
To: solr-user solr-user@lucene.apache.org
Sent: Tue, Jun 18, 2013 10:40 am
Subject: Re: Solr Cloud Hangs consistently .


Mark,

All I am doing are inserts, afaik search side deadlocks should not be an issue.

I am using Jmeter, standard test driver we use for most of our benchmarks and 
stats collection.
My jmeter.jmx file- http://apaste.info/79IS , maybe i overlooked something

 
Is there a benchmark script that solr community uses (preferably with jmeter), 
we are write heavy so at the moment focusing on inserts only.

Thanks,

Rishi.

 

 

-Original Message-
From: Yago Riveiro yago.rive...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Jun 17, 2013 6:19 pm
Subject: Re: Solr Cloud Hangs consistently .


I do all the indexing through a HTTP POST, with replicationFactor=1 no problem, 
if is higher deadlock problems can appear

A stack trace like this 
http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862
 

is that I get

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, June 17, 2013 at 11:03 PM, Mark Miller wrote:

 If it actually happens with replicationFactor=1, it doesn't likely have 
anything to do with the update handler issue I'm referring to. In some cases 
like these, people have better luck with Jetty than Tomcat - we test it much 
more. For instance, it's setup to help avoid search side distributed deadlocks.
 
 In any case, there is something special about it - I do and have seen a lot 
 of 

heavy indexing to SolrCloud by me and others without running into this. Both 
with replicationFacotor=1 and greater. So there is something specific in how 
the 

load is being done or what features/methods are being used that likely causes 
it 

or makes it easier to cause.
 
 But again, the issue I know about involves threads that are not even created 
in the replicationFactor = 1 case, so that could be a first report afaik.
 
 - Mark
 
 On Jun 17, 2013, at 5:52 PM, Rishi Easwaran rishi.easwa...@aol.com 
(mailto:rishi.easwa...@aol.com) wrote:
 
  Update!!
  
  This happens with replicationFactor=1
  Just for kicks I created a collection with a 24 shards, replicationfactor=1 
cluster on my exisiting benchmark env.
  Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu 
most metrics looks fine.
  Only indication seems to be netstat showing incoming request not being read 
in.
  
  Yago,
  
  I saw your previous post 
  (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631)
  Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets 
fixed, but no luck.
  Looks like this is a dominant and easily reproducible issue on SOLR cloud.
  
  
  Thanks,
  
  Rishi. 
  
  
  
  
  
  
  
  
  
  
  
  -Original Message-
  From: Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com)
  To: solr-user solr-user@lucene.apache.org 
  (mailto:solr-user@lucene.apache.org)
  Sent: Mon, Jun 17, 2013 5:15 pm
  Subject: Re: Solr Cloud Hangs consistently .
  
  
  I can confirm that the deadlock happen with only 2 replicas by shard. I 
  need 


  shutdown one node that host a replica of the shard to recover the 
  indexation 


  capability.
  
  -- 
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
  
  
  On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote:
  
   
   
   Hi All,
   
   I am trying to benchmark SOLR Cloud and it consistently hangs. 
   Nothing in the logs, no stack trace, no errors, no warnings, just seems 
stuck.
   
   A little bit about my set up. 
   I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each 
host 
   
  
  is configured to have 8 SOLR cloud nodes running at 4GB each.
   JVM configs: http://apaste.info/57Ai
   
   My cluster has 12 shards with replication factor 2- 
   http://apaste.info/09sA
   
   I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
  running this configuration in production in Non-Cloud form. 
   It got stuck repeatedly.
   
   I decided to upgrade to the latest and greatest of everything, SOLR 4.3, 
JDK7 
  and tomcat7. 
   It still shows same behaviour and hangs through the test.
   
   My test schema and config.
   Schema.xml - http://apaste.info/imah
   SolrConfig.xml - http://apaste.info/ku4F
   
   The test is pretty simple. its a jmeter test with update command via SOAP 
rpc 
  (round robin

Re: Solr Cloud Hangs consistently .

2013-06-18 Thread Rishi Easwaran
Mark,

All I am doing are inserts, afaik search side deadlocks should not be an issue.

I am using Jmeter, standard test driver we use for most of our benchmarks and 
stats collection.
My jmeter.jmx file- http://apaste.info/79IS , maybe i overlooked something

 
Is there a benchmark script that solr community uses (preferably with jmeter), 
we are write heavy so at the moment focusing on inserts only.

Thanks,

Rishi.

 

 

-Original Message-
From: Yago Riveiro yago.rive...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Jun 17, 2013 6:19 pm
Subject: Re: Solr Cloud Hangs consistently .


I do all the indexing through a HTTP POST, with replicationFactor=1 no problem, 
if is higher deadlock problems can appear

A stack trace like this 
http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862
 
is that I get

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, June 17, 2013 at 11:03 PM, Mark Miller wrote:

 If it actually happens with replicationFactor=1, it doesn't likely have 
anything to do with the update handler issue I'm referring to. In some cases 
like these, people have better luck with Jetty than Tomcat - we test it much 
more. For instance, it's setup to help avoid search side distributed deadlocks.
 
 In any case, there is something special about it - I do and have seen a lot 
 of 
heavy indexing to SolrCloud by me and others without running into this. Both 
with replicationFacotor=1 and greater. So there is something specific in how 
the 
load is being done or what features/methods are being used that likely causes 
it 
or makes it easier to cause.
 
 But again, the issue I know about involves threads that are not even created 
in the replicationFactor = 1 case, so that could be a first report afaik.
 
 - Mark
 
 On Jun 17, 2013, at 5:52 PM, Rishi Easwaran rishi.easwa...@aol.com 
(mailto:rishi.easwa...@aol.com) wrote:
 
  Update!!
  
  This happens with replicationFactor=1
  Just for kicks I created a collection with a 24 shards, replicationfactor=1 
cluster on my exisiting benchmark env.
  Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu 
most metrics looks fine.
  Only indication seems to be netstat showing incoming request not being read 
in.
  
  Yago,
  
  I saw your previous post 
  (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631)
  Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets 
fixed, but no luck.
  Looks like this is a dominant and easily reproducible issue on SOLR cloud.
  
  
  Thanks,
  
  Rishi. 
  
  
  
  
  
  
  
  
  
  
  
  -Original Message-
  From: Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com)
  To: solr-user solr-user@lucene.apache.org 
  (mailto:solr-user@lucene.apache.org)
  Sent: Mon, Jun 17, 2013 5:15 pm
  Subject: Re: Solr Cloud Hangs consistently .
  
  
  I can confirm that the deadlock happen with only 2 replicas by shard. I 
  need 

  shutdown one node that host a replica of the shard to recover the 
  indexation 

  capability.
  
  -- 
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
  
  
  On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote:
  
   
   
   Hi All,
   
   I am trying to benchmark SOLR Cloud and it consistently hangs. 
   Nothing in the logs, no stack trace, no errors, no warnings, just seems 
stuck.
   
   A little bit about my set up. 
   I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each 
host 
   
  
  is configured to have 8 SOLR cloud nodes running at 4GB each.
   JVM configs: http://apaste.info/57Ai
   
   My cluster has 12 shards with replication factor 2- 
   http://apaste.info/09sA
   
   I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
  running this configuration in production in Non-Cloud form. 
   It got stuck repeatedly.
   
   I decided to upgrade to the latest and greatest of everything, SOLR 4.3, 
JDK7 
  and tomcat7. 
   It still shows same behaviour and hangs through the test.
   
   My test schema and config.
   Schema.xml - http://apaste.info/imah
   SolrConfig.xml - http://apaste.info/ku4F
   
   The test is pretty simple. its a jmeter test with update command via SOAP 
rpc 
  (round robin request across every node), adding in 5 fields from a csv file 
- 
  id, guid, subject, body, compositeID (guid!id).
   number of jmeter threads = 150. loop count = 20, num of messages to 
add/per 
  
  guid = 3; total 150*3*20 = 9000 documents. 
   
   When cloud gets stuck, i don't get anything in the logs, but when i run 
  netstat i see the following.
   Sample netstat on a stuck run. http://apaste.info/hr0O 
   hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.
   
   At the moment my benchmarking efforts are at a stand still.
   
   Any help from the community would be great, I got some heap dumps and 
stack 
  dumps, but haven't

Re: Solr Cloud Hangs consistently .

2013-06-17 Thread Mark Miller
Could you give a simple stack trace dump as well?

It's likely the distributed update deadlock that has been reported a few times 
now - I think usually with a replication factor greater than 2, but I can't be 
sure. The deadlock involves sending docs concurrently to replicas and I 
wouldn't have expected it to be so easily hit with only 2 replicas per shard. I 
should be able to tell from a stack trace though.

If it is that, it's on my short list to investigate (been there a long time now 
though - but I still hope to look at it soon).

- Mark

On Jun 17, 2013, at 1:44 PM, Rishi Easwaran rishi.easwa...@aol.com wrote:

 
 
 Hi All,
 
 I am trying to benchmark SOLR Cloud and it consistently hangs. 
 Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck.
 
 A little bit about my set up. 
 I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host 
 is configured to have 8 SOLR cloud nodes running at 4GB each.
 JVM configs: http://apaste.info/57Ai
 
 My cluster has 12 shards with replication factor 2- http://apaste.info/09sA
 
 I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
 running this configuration in production in Non-Cloud form. 
 It got stuck repeatedly.
 
 I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 
 and tomcat7. 
 It still shows same behaviour and hangs through the test.
 
 My test schema and config.
 Schema.xml - http://apaste.info/imah
 SolrConfig.xml - http://apaste.info/ku4F
 
 The test is pretty simple. its a jmeter test with update command via SOAP rpc 
 (round robin request across every node), adding in 5 fields from a csv file - 
 id, guid, subject, body, compositeID (guid!id).
 number of jmeter threads = 150. loop count = 20, num of messages to add/per 
 guid = 3; total 150*3*20 = 9000 documents.  
 
 When cloud gets stuck, i don't get anything in the logs, but when i run 
 netstat i see the following.
 Sample netstat on a stuck run. http://apaste.info/hr0O 
 hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.
 
 
 At the moment my benchmarking efforts are at a stand still.
 
 Any help from the community would be great, I got some heap dumps and stack 
 dumps, but haven't found a smoking gun yet.
 If I can provide anything else to diagnose this issue. just let me know.
 
 Thanks,
 
 Rishi.
 
 
 
 
 
 
 
 



Re: Solr Cloud Hangs consistently .

2013-06-17 Thread Rishi Easwaran
Mark,

I got a few stack dumps of the instance that was stuck ssdtest-d03:8011

http://apaste.info/cofK
http://apaste.info/sv4M
http://apaste.info/cxUf

 


 I can get dumps of others if needed.

Thanks,

Rishi.

 

-Original Message-
From: Mark Miller markrmil...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Jun 17, 2013 1:57 pm
Subject: Re: Solr Cloud Hangs consistently .


Could you give a simple stack trace dump as well?

It's likely the distributed update deadlock that has been reported a few times 
now - I think usually with a replication factor greater than 2, but I can't be 
sure. The deadlock involves sending docs concurrently to replicas and I 
wouldn't 
have expected it to be so easily hit with only 2 replicas per shard. I should 
be 
able to tell from a stack trace though.

If it is that, it's on my short list to investigate (been there a long time now 
though - but I still hope to look at it soon).

- Mark

On Jun 17, 2013, at 1:44 PM, Rishi Easwaran rishi.easwa...@aol.com wrote:

 
 
 Hi All,
 
 I am trying to benchmark SOLR Cloud and it consistently hangs. 
 Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck.
 
 A little bit about my set up. 
 I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host 
is configured to have 8 SOLR cloud nodes running at 4GB each.
 JVM configs: http://apaste.info/57Ai
 
 My cluster has 12 shards with replication factor 2- http://apaste.info/09sA
 
 I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
running this configuration in production in Non-Cloud form. 
 It got stuck repeatedly.
 
 I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 
and tomcat7. 
 It still shows same behaviour and hangs through the test.
 
 My test schema and config.
 Schema.xml - http://apaste.info/imah
 SolrConfig.xml - http://apaste.info/ku4F
 
 The test is pretty simple. its a jmeter test with update command via SOAP rpc 
(round robin request across every node), adding in 5 fields from a csv file - 
id, guid, subject, body, compositeID (guid!id).
 number of jmeter threads = 150. loop count = 20, num of messages to add/per 
guid = 3; total 150*3*20 = 9000 documents.  
 
 When cloud gets stuck, i don't get anything in the logs, but when i run 
netstat i see the following.
 Sample netstat on a stuck run. http://apaste.info/hr0O 
 hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.
 
 
 At the moment my benchmarking efforts are at a stand still.
 
 Any help from the community would be great, I got some heap dumps and stack 
dumps, but haven't found a smoking gun yet.
 If I can provide anything else to diagnose this issue. just let me know.
 
 Thanks,
 
 Rishi.
 
 
 
 
 
 
 
 


 


Re: Solr Cloud Hangs consistently .

2013-06-17 Thread Rishi Easwaran
FYI..you can ignore  http4ClientExpiryService thread in the stack dump.
Its a dummy executor service, i created to test out something, unrelated to 
this issue.  
 

 

 

-Original Message-
From: Rishi Easwaran rishi.easwa...@aol.com
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Jun 17, 2013 2:54 pm
Subject: Re: Solr Cloud Hangs consistently .


Mark,

I got a few stack dumps of the instance that was stuck ssdtest-d03:8011

http://apaste.info/cofK
http://apaste.info/sv4M
http://apaste.info/cxUf

 


 I can get dumps of others if needed.

Thanks,

Rishi.

 

-Original Message-
From: Mark Miller markrmil...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Jun 17, 2013 1:57 pm
Subject: Re: Solr Cloud Hangs consistently .


Could you give a simple stack trace dump as well?

It's likely the distributed update deadlock that has been reported a few times 
now - I think usually with a replication factor greater than 2, but I can't be 
sure. The deadlock involves sending docs concurrently to replicas and I 
wouldn't 

have expected it to be so easily hit with only 2 replicas per shard. I should 
be 

able to tell from a stack trace though.

If it is that, it's on my short list to investigate (been there a long time now 
though - but I still hope to look at it soon).

- Mark

On Jun 17, 2013, at 1:44 PM, Rishi Easwaran rishi.easwa...@aol.com wrote:

 
 
 Hi All,
 
 I am trying to benchmark SOLR Cloud and it consistently hangs. 
 Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck.
 
 A little bit about my set up. 
 I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host 
is configured to have 8 SOLR cloud nodes running at 4GB each.
 JVM configs: http://apaste.info/57Ai
 
 My cluster has 12 shards with replication factor 2- http://apaste.info/09sA
 
 I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
running this configuration in production in Non-Cloud form. 
 It got stuck repeatedly.
 
 I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 
and tomcat7. 
 It still shows same behaviour and hangs through the test.
 
 My test schema and config.
 Schema.xml - http://apaste.info/imah
 SolrConfig.xml - http://apaste.info/ku4F
 
 The test is pretty simple. its a jmeter test with update command via SOAP rpc 
(round robin request across every node), adding in 5 fields from a csv file - 
id, guid, subject, body, compositeID (guid!id).
 number of jmeter threads = 150. loop count = 20, num of messages to add/per 
guid = 3; total 150*3*20 = 9000 documents.  
 
 When cloud gets stuck, i don't get anything in the logs, but when i run 
netstat i see the following.
 Sample netstat on a stuck run. http://apaste.info/hr0O 
 hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.
 
 
 At the moment my benchmarking efforts are at a stand still.
 
 Any help from the community would be great, I got some heap dumps and stack 
dumps, but haven't found a smoking gun yet.
 If I can provide anything else to diagnose this issue. just let me know.
 
 Thanks,
 
 Rishi.
 
 
 
 
 
 
 
 


 

 


Re: Solr Cloud Hangs consistently .

2013-06-17 Thread Yago Riveiro
I can confirm that the deadlock happen with only 2 replicas by shard. I need 
shutdown one node that host a replica of the shard to recover the indexation 
capability.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote:

 
 
 Hi All,
 
 I am trying to benchmark SOLR Cloud and it consistently hangs. 
 Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck.
 
 A little bit about my set up. 
 I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host 
 is configured to have 8 SOLR cloud nodes running at 4GB each.
 JVM configs: http://apaste.info/57Ai
 
 My cluster has 12 shards with replication factor 2- http://apaste.info/09sA
 
 I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
 running this configuration in production in Non-Cloud form. 
 It got stuck repeatedly.
 
 I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 
 and tomcat7. 
 It still shows same behaviour and hangs through the test.
 
 My test schema and config.
 Schema.xml - http://apaste.info/imah
 SolrConfig.xml - http://apaste.info/ku4F
 
 The test is pretty simple. its a jmeter test with update command via SOAP rpc 
 (round robin request across every node), adding in 5 fields from a csv file - 
 id, guid, subject, body, compositeID (guid!id).
 number of jmeter threads = 150. loop count = 20, num of messages to add/per 
 guid = 3; total 150*3*20 = 9000 documents. 
 
 When cloud gets stuck, i don't get anything in the logs, but when i run 
 netstat i see the following.
 Sample netstat on a stuck run. http://apaste.info/hr0O 
 hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.
 
 At the moment my benchmarking efforts are at a stand still.
 
 Any help from the community would be great, I got some heap dumps and stack 
 dumps, but haven't found a smoking gun yet.
 If I can provide anything else to diagnose this issue. just let me know.
 
 Thanks,
 
 Rishi. 



Re: Solr Cloud Hangs consistently .

2013-06-17 Thread Rishi Easwaran
Update!!

This happens with replicationFactor=1
Just for kicks I created a collection with a 24 shards, replicationfactor=1 
cluster on my exisiting benchmark env.
Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu most 
metrics looks fine.
Only indication seems to be netstat showing incoming request not being read in.
 
Yago,

I saw your previous post 
(http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631)
Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets 
fixed, but no luck.
Looks like this is a dominant and easily reproducible issue on SOLR cloud.


Thanks,

Rishi. 





 

 

 

-Original Message-
From: Yago Riveiro yago.rive...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Mon, Jun 17, 2013 5:15 pm
Subject: Re: Solr Cloud Hangs consistently .


I can confirm that the deadlock happen with only 2 replicas by shard. I need 
shutdown one node that host a replica of the shard to recover the indexation 
capability.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote:

 
 
 Hi All,
 
 I am trying to benchmark SOLR Cloud and it consistently hangs. 
 Nothing in the logs, no stack trace, no errors, no warnings, just seems stuck.
 
 A little bit about my set up. 
 I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each host 
is configured to have 8 SOLR cloud nodes running at 4GB each.
 JVM configs: http://apaste.info/57Ai
 
 My cluster has 12 shards with replication factor 2- http://apaste.info/09sA
 
 I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
running this configuration in production in Non-Cloud form. 
 It got stuck repeatedly.
 
 I decided to upgrade to the latest and greatest of everything, SOLR 4.3, JDK7 
and tomcat7. 
 It still shows same behaviour and hangs through the test.
 
 My test schema and config.
 Schema.xml - http://apaste.info/imah
 SolrConfig.xml - http://apaste.info/ku4F
 
 The test is pretty simple. its a jmeter test with update command via SOAP rpc 
(round robin request across every node), adding in 5 fields from a csv file - 
id, guid, subject, body, compositeID (guid!id).
 number of jmeter threads = 150. loop count = 20, num of messages to add/per 
guid = 3; total 150*3*20 = 9000 documents. 
 
 When cloud gets stuck, i don't get anything in the logs, but when i run 
netstat i see the following.
 Sample netstat on a stuck run. http://apaste.info/hr0O 
 hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.
 
 At the moment my benchmarking efforts are at a stand still.
 
 Any help from the community would be great, I got some heap dumps and stack 
dumps, but haven't found a smoking gun yet.
 If I can provide anything else to diagnose this issue. just let me know.
 
 Thanks,
 
 Rishi. 


 


Re: Solr Cloud Hangs consistently .

2013-06-17 Thread Mark Miller
If it actually happens with replicationFactor=1, it doesn't likely have 
anything to do with the update handler issue I'm referring to. In some cases 
like these, people have better luck with Jetty than Tomcat - we test it much 
more. For instance, it's setup to help avoid search side distributed deadlocks.

In any case, there is something special about it - I do and have seen a lot of 
heavy indexing to SolrCloud by me and others without running into this. Both 
with replicationFacotor=1 and greater. So there is something specific in how 
the load is being done or what features/methods are being used that likely 
causes it or makes it easier to cause.

But again, the issue I know about involves threads that are not even created in 
the replicationFactor = 1 case, so that could be a first report afaik.

- Mark

On Jun 17, 2013, at 5:52 PM, Rishi Easwaran rishi.easwa...@aol.com wrote:

 Update!!
 
 This happens with replicationFactor=1
 Just for kicks I created a collection with a 24 shards, replicationfactor=1 
 cluster on my exisiting benchmark env.
 Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu most 
 metrics looks fine.
 Only indication seems to be netstat showing incoming request not being read 
 in.
 
 Yago,
 
 I saw your previous post 
 (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631)
 Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets 
 fixed, but no luck.
 Looks like this is a dominant and easily reproducible issue on SOLR cloud.
 
 
 Thanks,
 
 Rishi. 
 
 
 
 
 
 
 
 
 
 
 
 -Original Message-
 From: Yago Riveiro yago.rive...@gmail.com
 To: solr-user solr-user@lucene.apache.org
 Sent: Mon, Jun 17, 2013 5:15 pm
 Subject: Re: Solr Cloud Hangs consistently .
 
 
 I can confirm that the deadlock happen with only 2 replicas by shard. I need 
 shutdown one node that host a replica of the shard to recover the indexation 
 capability.
 
 -- 
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
 On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote:
 
 
 
 Hi All,
 
 I am trying to benchmark SOLR Cloud and it consistently hangs. 
 Nothing in the logs, no stack trace, no errors, no warnings, just seems 
 stuck.
 
 A little bit about my set up. 
 I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each 
 host 
 is configured to have 8 SOLR cloud nodes running at 4GB each.
 JVM configs: http://apaste.info/57Ai
 
 My cluster has 12 shards with replication factor 2- http://apaste.info/09sA
 
 I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
 running this configuration in production in Non-Cloud form. 
 It got stuck repeatedly.
 
 I decided to upgrade to the latest and greatest of everything, SOLR 4.3, 
 JDK7 
 and tomcat7. 
 It still shows same behaviour and hangs through the test.
 
 My test schema and config.
 Schema.xml - http://apaste.info/imah
 SolrConfig.xml - http://apaste.info/ku4F
 
 The test is pretty simple. its a jmeter test with update command via SOAP 
 rpc 
 (round robin request across every node), adding in 5 fields from a csv file - 
 id, guid, subject, body, compositeID (guid!id).
 number of jmeter threads = 150. loop count = 20, num of messages to add/per 
 guid = 3; total 150*3*20 = 9000 documents. 
 
 When cloud gets stuck, i don't get anything in the logs, but when i run 
 netstat i see the following.
 Sample netstat on a stuck run. http://apaste.info/hr0O 
 hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.
 
 At the moment my benchmarking efforts are at a stand still.
 
 Any help from the community would be great, I got some heap dumps and stack 
 dumps, but haven't found a smoking gun yet.
 If I can provide anything else to diagnose this issue. just let me know.
 
 Thanks,
 
 Rishi. 
 
 
 



Re: Solr Cloud Hangs consistently .

2013-06-17 Thread Yago Riveiro
I do all the indexing through a HTTP POST, with replicationFactor=1 no problem, 
if is higher deadlock problems can appear

A stack trace like this 
http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067862
 is that I get

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, June 17, 2013 at 11:03 PM, Mark Miller wrote:

 If it actually happens with replicationFactor=1, it doesn't likely have 
 anything to do with the update handler issue I'm referring to. In some cases 
 like these, people have better luck with Jetty than Tomcat - we test it much 
 more. For instance, it's setup to help avoid search side distributed 
 deadlocks.
 
 In any case, there is something special about it - I do and have seen a lot 
 of heavy indexing to SolrCloud by me and others without running into this. 
 Both with replicationFacotor=1 and greater. So there is something specific in 
 how the load is being done or what features/methods are being used that 
 likely causes it or makes it easier to cause.
 
 But again, the issue I know about involves threads that are not even created 
 in the replicationFactor = 1 case, so that could be a first report afaik.
 
 - Mark
 
 On Jun 17, 2013, at 5:52 PM, Rishi Easwaran rishi.easwa...@aol.com 
 (mailto:rishi.easwa...@aol.com) wrote:
 
  Update!!
  
  This happens with replicationFactor=1
  Just for kicks I created a collection with a 24 shards, replicationfactor=1 
  cluster on my exisiting benchmark env.
  Same behaviour, SOLR cloud just hangs. Nothing in the logs, top/heap/cpu 
  most metrics looks fine.
  Only indication seems to be netstat showing incoming request not being read 
  in.
  
  Yago,
  
  I saw your previous post 
  (http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html#a4067631)
  Following it, Last week, I upgraded to SOLR 4.3, to see if the issue gets 
  fixed, but no luck.
  Looks like this is a dominant and easily reproducible issue on SOLR cloud.
  
  
  Thanks,
  
  Rishi. 
  
  
  
  
  
  
  
  
  
  
  
  -Original Message-
  From: Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com)
  To: solr-user solr-user@lucene.apache.org 
  (mailto:solr-user@lucene.apache.org)
  Sent: Mon, Jun 17, 2013 5:15 pm
  Subject: Re: Solr Cloud Hangs consistently .
  
  
  I can confirm that the deadlock happen with only 2 replicas by shard. I 
  need 
  shutdown one node that host a replica of the shard to recover the 
  indexation 
  capability.
  
  -- 
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
  
  
  On Monday, June 17, 2013 at 6:44 PM, Rishi Easwaran wrote:
  
   
   
   Hi All,
   
   I am trying to benchmark SOLR Cloud and it consistently hangs. 
   Nothing in the logs, no stack trace, no errors, no warnings, just seems 
   stuck.
   
   A little bit about my set up. 
   I have 3 benchmark hosts, each with 96GB RAM, 24 CPU's and 1TB SSD. Each 
   host 
   
  
  is configured to have 8 SOLR cloud nodes running at 4GB each.
   JVM configs: http://apaste.info/57Ai
   
   My cluster has 12 shards with replication factor 2- 
   http://apaste.info/09sA
   
   I originally stated with SOLR 4.2., tomcat 5 and jdk 6, as we are already 
  running this configuration in production in Non-Cloud form. 
   It got stuck repeatedly.
   
   I decided to upgrade to the latest and greatest of everything, SOLR 4.3, 
   JDK7 
  and tomcat7. 
   It still shows same behaviour and hangs through the test.
   
   My test schema and config.
   Schema.xml - http://apaste.info/imah
   SolrConfig.xml - http://apaste.info/ku4F
   
   The test is pretty simple. its a jmeter test with update command via SOAP 
   rpc 
  (round robin request across every node), adding in 5 fields from a csv file 
  - 
  id, guid, subject, body, compositeID (guid!id).
   number of jmeter threads = 150. loop count = 20, num of messages to 
   add/per 
  
  guid = 3; total 150*3*20 = 9000 documents. 
   
   When cloud gets stuck, i don't get anything in the logs, but when i run 
  netstat i see the following.
   Sample netstat on a stuck run. http://apaste.info/hr0O 
   hycl-d20 is my jmeter host. ssd-d01/2/3 are my cloud hosts.
   
   At the moment my benchmarking efforts are at a stand still.
   
   Any help from the community would be great, I got some heap dumps and 
   stack 
  dumps, but haven't found a smoking gun yet.
   If I can provide anything else to diagnose this issue. just let me know.
   
   Thanks,
   
   Rishi.