Re: Replica shard stuck at initializing after client and data node restart

2014-03-24 Thread Glenn Snead
Mark, thank you for responding.  It is quite odd, but what happened this 
morning is stranger.  

I dropped and recreated the replica shard on the unused indicie.  Now the 
in-use indicie shows Green.  

FYI we're running ES version 0.90, and my in-use indicie is 717 gb with 
135M+ documents.  

On Friday I ran status reports on each indicie and compared both.  Nothing 
showed as failed or red or plain wrong so I left it over the 
weekend.  When I came in today the cluster was still Yellow.

Any idea if createating the other indicie's replica shard caused the 
cluster's status to go green?  It feels like a fluke, but I'm new to ES.

If this is indeed an expected ES behavior, I'll add this to my restoral 
procedures.

On Friday, March 21, 2014 9:48:27 PM UTC-4, Mark Walkom wrote:

 What version are you running?

 It's odd this would happen if, when you set replica's to zero, the cluster 
 state is green and your index is ok.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com


 On 22 March 2014 06:15, Glenn Snead glenn...@gmail.com javascript:wrote:

 I have a six node cluster: 2 master nodes and 4 client / data nodes.  I 
 have two indicies.  One with data and one that is set aside for future 
 use.  I'm having trouble with the indicie that is in use.
 After making some limits.conf configuraiton changes and restarting the 
 impacted nodes, one of my indicies' replica shard will not complete 
 initialization.
 I wasn't in charge of the node restarts and here is the sequence of 
 events:
 Shut down the client and data nodes on each of the four servers.
 Start the client and data node on each server.
 I don't believe time was allowed to allow the cluster to reallocate or to 
 move shards.

 limits.conf changes: 
 - memlock unlimited
 hard nofiles 32000
 soft  nofiles  32000

 Here's what I have tried thus far:

 Drop the replica shard, which brings the cluster status to Green.  
 Verify the cluster's status - no replication, no realocating, etc.  
 Re-add the replica shard.  

 Drop the replica shard and the data nodes that were to carry the replica 
 shard.  
 Verify the cluster's status.  
 Start the data nodes and allow the cluster to reallocate primary shards.  
  - The cluster's status is Green.
 Add the replica shard to the indicie.  The replica shard never completes 
 initialization, even over a 24 hour period.

 I've checked the transaction log files on each node and they are all zero 
 legnth files.
 The replica shard holding nodes are primary shards for the unused indicie.
 These nodes copied it's matching primary node's index Size (as seen in 
 paramedic), but now Paramedic shows an index Size of only a few bytes.  The 
 index folder on the replica shard servers still has the data.

 Unknown to me, my target system was put online and my leadership doesn't 
 want to schedule an outage window.  Most my reasearch suggests that I drop 
 the impacted indicie and re-initialize.  I can replace the data, but this 
 would impact the user interface while the indicie re-ingests the 
 documents.  This issue has occured before on my test system and the fix was 
 to rebuild the index.  However I never learned why the replica shard had 
 the issue in the first place.

 My questions are:
 - Does the replica shard hosting server's index Size (shown in paramedic) 
 indciate a course of action?
 - Is it possible to resolve this without dropping the indicie and 
 rebuilding?  I'd hate to resort to this each time we attempt ES server 
 maintenance or configuration changes.

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a868b3da-fd28-49b4-bc8f-2f60f2c34ec7%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a868b3da-fd28-49b4-bc8f-2f60f2c34ec7%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ed2501e5-b504-46e2-ae04-69097e6d46ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Replica shard stuck at initializing after client and data node restart

2014-03-21 Thread Glenn Snead
I have a six node cluster: 2 master nodes and 4 client / data nodes.  I 
have two indicies.  One with data and one that is set aside for future 
use.  I'm having trouble with the indicie that is in use.
After making some limits.conf configuraiton changes and restarting the 
impacted nodes, one of my indicies' replica shard will not complete 
initialization.
I wasn't in charge of the node restarts and here is the sequence of events:
Shut down the client and data nodes on each of the four servers.
Start the client and data node on each server.
I don't believe time was allowed to allow the cluster to reallocate or to 
move shards.

limits.conf changes: 
- memlock unlimited
hard nofiles 32000
soft  nofiles  32000

Here's what I have tried thus far:

Drop the replica shard, which brings the cluster status to Green.  
Verify the cluster's status - no replication, no realocating, etc.  
Re-add the replica shard.  

Drop the replica shard and the data nodes that were to carry the replica 
shard.  
Verify the cluster's status.  
Start the data nodes and allow the cluster to reallocate primary shards.  
 - The cluster's status is Green.
Add the replica shard to the indicie.  The replica shard never completes 
initialization, even over a 24 hour period.

I've checked the transaction log files on each node and they are all zero 
legnth files.
The replica shard holding nodes are primary shards for the unused indicie.
These nodes copied it's matching primary node's index Size (as seen in 
paramedic), but now Paramedic shows an index Size of only a few bytes.  The 
index folder on the replica shard servers still has the data.

Unknown to me, my target system was put online and my leadership doesn't 
want to schedule an outage window.  Most my reasearch suggests that I drop 
the impacted indicie and re-initialize.  I can replace the data, but this 
would impact the user interface while the indicie re-ingests the 
documents.  This issue has occured before on my test system and the fix was 
to rebuild the index.  However I never learned why the replica shard had 
the issue in the first place.

My questions are:
- Does the replica shard hosting server's index Size (shown in paramedic) 
indciate a course of action?
- Is it possible to resolve this without dropping the indicie and 
rebuilding?  I'd hate to resort to this each time we attempt ES server 
maintenance or configuration changes.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a868b3da-fd28-49b4-bc8f-2f60f2c34ec7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Replica shard stuck at initializing after client and data node restart

2014-03-21 Thread Mark Walkom
What version are you running?

It's odd this would happen if, when you set replica's to zero, the cluster
state is green and your index is ok.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 22 March 2014 06:15, Glenn Snead glennsn...@gmail.com wrote:

 I have a six node cluster: 2 master nodes and 4 client / data nodes.  I
 have two indicies.  One with data and one that is set aside for future
 use.  I'm having trouble with the indicie that is in use.
 After making some limits.conf configuraiton changes and restarting the
 impacted nodes, one of my indicies' replica shard will not complete
 initialization.
 I wasn't in charge of the node restarts and here is the sequence of events:
 Shut down the client and data nodes on each of the four servers.
 Start the client and data node on each server.
 I don't believe time was allowed to allow the cluster to reallocate or to
 move shards.

 limits.conf changes:
 - memlock unlimited
 hard nofiles 32000
 soft  nofiles  32000

 Here's what I have tried thus far:

 Drop the replica shard, which brings the cluster status to Green.
 Verify the cluster's status - no replication, no realocating, etc.
 Re-add the replica shard.

 Drop the replica shard and the data nodes that were to carry the replica
 shard.
 Verify the cluster's status.
 Start the data nodes and allow the cluster to reallocate primary shards.
  - The cluster's status is Green.
 Add the replica shard to the indicie.  The replica shard never completes
 initialization, even over a 24 hour period.

 I've checked the transaction log files on each node and they are all zero
 legnth files.
 The replica shard holding nodes are primary shards for the unused indicie.
 These nodes copied it's matching primary node's index Size (as seen in
 paramedic), but now Paramedic shows an index Size of only a few bytes.  The
 index folder on the replica shard servers still has the data.

 Unknown to me, my target system was put online and my leadership doesn't
 want to schedule an outage window.  Most my reasearch suggests that I drop
 the impacted indicie and re-initialize.  I can replace the data, but this
 would impact the user interface while the indicie re-ingests the
 documents.  This issue has occured before on my test system and the fix was
 to rebuild the index.  However I never learned why the replica shard had
 the issue in the first place.

 My questions are:
 - Does the replica shard hosting server's index Size (shown in paramedic)
 indciate a course of action?
 - Is it possible to resolve this without dropping the indicie and
 rebuilding?  I'd hate to resort to this each time we attempt ES server
 maintenance or configuration changes.

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/a868b3da-fd28-49b4-bc8f-2f60f2c34ec7%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a868b3da-fd28-49b4-bc8f-2f60f2c34ec7%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZsMpqnF16T_3-ZzDy7SjcsFouaDOBQQEEATby%2B7Lorzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.