Disk Watermark issues with 1.4.0

2014-12-01 Thread Chris Neal
Hi all,

I'm running 1.4.0. and using the default settings for:

cluster.routing.allocation.disk.watermark.low
and
cluster.routing.allocation.disk.watermark.high

I hit an OOME which caused me to need to cycle a node, and then all shards
that should live on that node stayed unallocated once I brought it back up.


There was no notification anywhere that I had hit any dis space limits, at
least that I could find.  I tried cycling again, nothing.  It wasn't until
I tried to manually reroute one of the shards that I got an indication of
what was going on:


root@ip-10-0-0-45:bddevw07[1038]:~ ./reroute
{error:RemoteTransportException[[elasticsearch-ip-10-0-0-12][inet[/10.0.0.12:9300]][cluster:admin/reroute]];
nested: ElasticsearchIllegalArgumentException[[allocate] allocation of
[derbysoft-20141130][0] on node
[elasticsearch-ip-10-0-0-45][Li1yyXUHR8qQn6QHCSahCg][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}
is not allowed, reason: [YES(shard is not allocated to same node or
host)][YES(node passes include/exclude/require filters)][YES(primary is
already active)][YES(below shard recovery limit of [2])][YES(allocation
disabling is ignored)][YES(allocation disabling is ignored)][YES(no
allocation awareness enabled)][YES(total shard limit disabled: [-1] =
0)][YES(target node version [1.4.0] is same or newer than source node
version [1.4.0])][NO(less than required [15.0%] free disk on node, free:
[15.0%])][YES(shard not primary or relocation disabled)]]; ,status:400}

Then I cleaned up some disk space, but there was no auto re-allocation
afterwards.  Once I again tried to manually re-route a shard, then ALL of
them began rerouting.

My questions are:

   - Is there a notification log message somewhere that I missed that would
   have let me know what was going on?  If not, there sure should be!
   - Should the shard allocation process have started automatically once I
   got the disk space issue resolved?


Thanks!
Chris

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAND3DphhmWn-amiDBrmYi4rB_tYZa7%3Dn2M9PF5jVY%3DfhPTqMpg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Disk Watermark issues with 1.4.0

2014-12-01 Thread Nikolas Everett
On Mon, Dec 1, 2014 at 11:28 AM, Chris Neal chris.n...@derbysoft.net
wrote:

 Hi all,

 I'm running 1.4.0. and using the default settings for:

 cluster.routing.allocation.disk.watermark.low
 and
 cluster.routing.allocation.disk.watermark.high

 I hit an OOME which caused me to need to cycle a node, and then all shards
 that should live on that node stayed unallocated once I brought it back up.


 There was no notification anywhere that I had hit any dis space limits, at
 least that I could find.  I tried cycling again, nothing.  It wasn't until
 I tried to manually reroute one of the shards that I got an indication of
 what was going on:


 root@ip-10-0-0-45:bddevw07[1038]:~ ./reroute

 {error:RemoteTransportException[[elasticsearch-ip-10-0-0-12][inet[/10.0.0.12:9300]][cluster:admin/reroute]];
 nested: ElasticsearchIllegalArgumentException[[allocate] allocation of
 [derbysoft-20141130][0] on node
 [elasticsearch-ip-10-0-0-45][Li1yyXUHR8qQn6QHCSahCg][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}
 is not allowed, reason: [YES(shard is not allocated to same node or
 host)][YES(node passes include/exclude/require filters)][YES(primary is
 already active)][YES(below shard recovery limit of [2])][YES(allocation
 disabling is ignored)][YES(allocation disabling is ignored)][YES(no
 allocation awareness enabled)][YES(total shard limit disabled: [-1] =
 0)][YES(target node version [1.4.0] is same or newer than source node
 version [1.4.0])][NO(less than required [15.0%] free disk on node, free:
 [15.0%])][YES(shard not primary or relocation disabled)]]; ,status:400}

 Then I cleaned up some disk space, but there was no auto re-allocation
 afterwards.  Once I again tried to manually re-route a shard, then ALL of
 them began rerouting.

 My questions are:

- Is there a notification log message somewhere that I missed that
would have let me know what was going on?  If not, there sure should be!


A WARN query log every 30 seconds was added in the very last release.



-
- Should the shard allocation process have started automatically once
I got the disk space issue resolved?


If you have unallocated shards it should kick in after a few seconds.  It
takes a few seconds for the cluster to notice the change in disk free.  If
there aren't unallocated shards I've sometime found that I need to manually
shift a shard around to prime the pump.  I'm not sure if that has been
fixed recently though.

I don't think that disk space should prevent a shard from coming up on a
node that already has it though.  I imagine that depends on how much data
has to be copied to that node but I'm not sure.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2YSZo4%3D2_J2quT_aR3F_SGH9p8WPEf-uMUy1W52H-L1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Disk Watermark issues with 1.4.0

2014-12-01 Thread Chris Neal
Thanks for the quick reply Nik :)

I've got updating to 1.4.1 on my TODO list for today, as I see there were
some updates in the Release notes pertaining to this as well.  I might let
things fill up again in Dev and see what happens.

Maybe I wasn't patient enough for the rerouting to start on its own.  It
seems like I waited several minutes before I did it manually, but I'll pay
more attention the next time.

Thanks again for the input.
Chris

On Mon, Dec 1, 2014 at 10:35 AM, Nikolas Everett nik9...@gmail.com wrote:



 On Mon, Dec 1, 2014 at 11:28 AM, Chris Neal chris.n...@derbysoft.net
 wrote:

 Hi all,

 I'm running 1.4.0. and using the default settings for:

 cluster.routing.allocation.disk.watermark.low
 and
 cluster.routing.allocation.disk.watermark.high

 I hit an OOME which caused me to need to cycle a node, and then all
 shards that should live on that node stayed unallocated once I brought it
 back up.

 There was no notification anywhere that I had hit any dis space limits,
 at least that I could find.  I tried cycling again, nothing.  It wasn't
 until I tried to manually reroute one of the shards that I got an
 indication of what was going on:


 root@ip-10-0-0-45:bddevw07[1038]:~ ./reroute

 {error:RemoteTransportException[[elasticsearch-ip-10-0-0-12][inet[/10.0.0.12:9300]][cluster:admin/reroute]];
 nested: ElasticsearchIllegalArgumentException[[allocate] allocation of
 [derbysoft-20141130][0] on node
 [elasticsearch-ip-10-0-0-45][Li1yyXUHR8qQn6QHCSahCg][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}
 is not allowed, reason: [YES(shard is not allocated to same node or
 host)][YES(node passes include/exclude/require filters)][YES(primary is
 already active)][YES(below shard recovery limit of [2])][YES(allocation
 disabling is ignored)][YES(allocation disabling is ignored)][YES(no
 allocation awareness enabled)][YES(total shard limit disabled: [-1] =
 0)][YES(target node version [1.4.0] is same or newer than source node
 version [1.4.0])][NO(less than required [15.0%] free disk on node, free:
 [15.0%])][YES(shard not primary or relocation disabled)]];
 ,status:400}

 Then I cleaned up some disk space, but there was no auto re-allocation
 afterwards.  Once I again tried to manually re-route a shard, then ALL of
 them began rerouting.

 My questions are:

- Is there a notification log message somewhere that I missed that
would have let me know what was going on?  If not, there sure should be!


 A WARN query log every 30 seconds was added in the very last release.



-
- Should the shard allocation process have started automatically once
I got the disk space issue resolved?


 If you have unallocated shards it should kick in after a few seconds.  It
 takes a few seconds for the cluster to notice the change in disk free.  If
 there aren't unallocated shards I've sometime found that I need to manually
 shift a shard around to prime the pump.  I'm not sure if that has been
 fixed recently though.

 I don't think that disk space should prevent a shard from coming up on a
 node that already has it though.  I imagine that depends on how much data
 has to be copied to that node but I'm not sure.

 Nik

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2YSZo4%3D2_J2quT_aR3F_SGH9p8WPEf-uMUy1W52H-L1g%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2YSZo4%3D2_J2quT_aR3F_SGH9p8WPEf-uMUy1W52H-L1g%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAND3DpgLsPhJBXWFUV8SNN8LR%2ByTdKrFgf6LpceyfDddxqCvxg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.