Disk Watermark issues with 1.4.0
Hi all, I'm running 1.4.0. and using the default settings for: cluster.routing.allocation.disk.watermark.low and cluster.routing.allocation.disk.watermark.high I hit an OOME which caused me to need to cycle a node, and then all shards that should live on that node stayed unallocated once I brought it back up. There was no notification anywhere that I had hit any dis space limits, at least that I could find. I tried cycling again, nothing. It wasn't until I tried to manually reroute one of the shards that I got an indication of what was going on: root@ip-10-0-0-45:bddevw07[1038]:~ ./reroute {error:RemoteTransportException[[elasticsearch-ip-10-0-0-12][inet[/10.0.0.12:9300]][cluster:admin/reroute]]; nested: ElasticsearchIllegalArgumentException[[allocate] allocation of [derbysoft-20141130][0] on node [elasticsearch-ip-10-0-0-45][Li1yyXUHR8qQn6QHCSahCg][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true} is not allowed, reason: [YES(shard is not allocated to same node or host)][YES(node passes include/exclude/require filters)][YES(primary is already active)][YES(below shard recovery limit of [2])][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][YES(total shard limit disabled: [-1] = 0)][YES(target node version [1.4.0] is same or newer than source node version [1.4.0])][NO(less than required [15.0%] free disk on node, free: [15.0%])][YES(shard not primary or relocation disabled)]]; ,status:400} Then I cleaned up some disk space, but there was no auto re-allocation afterwards. Once I again tried to manually re-route a shard, then ALL of them began rerouting. My questions are: - Is there a notification log message somewhere that I missed that would have let me know what was going on? If not, there sure should be! - Should the shard allocation process have started automatically once I got the disk space issue resolved? Thanks! Chris -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3DphhmWn-amiDBrmYi4rB_tYZa7%3Dn2M9PF5jVY%3DfhPTqMpg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Disk Watermark issues with 1.4.0
On Mon, Dec 1, 2014 at 11:28 AM, Chris Neal chris.n...@derbysoft.net wrote: Hi all, I'm running 1.4.0. and using the default settings for: cluster.routing.allocation.disk.watermark.low and cluster.routing.allocation.disk.watermark.high I hit an OOME which caused me to need to cycle a node, and then all shards that should live on that node stayed unallocated once I brought it back up. There was no notification anywhere that I had hit any dis space limits, at least that I could find. I tried cycling again, nothing. It wasn't until I tried to manually reroute one of the shards that I got an indication of what was going on: root@ip-10-0-0-45:bddevw07[1038]:~ ./reroute {error:RemoteTransportException[[elasticsearch-ip-10-0-0-12][inet[/10.0.0.12:9300]][cluster:admin/reroute]]; nested: ElasticsearchIllegalArgumentException[[allocate] allocation of [derbysoft-20141130][0] on node [elasticsearch-ip-10-0-0-45][Li1yyXUHR8qQn6QHCSahCg][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true} is not allowed, reason: [YES(shard is not allocated to same node or host)][YES(node passes include/exclude/require filters)][YES(primary is already active)][YES(below shard recovery limit of [2])][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][YES(total shard limit disabled: [-1] = 0)][YES(target node version [1.4.0] is same or newer than source node version [1.4.0])][NO(less than required [15.0%] free disk on node, free: [15.0%])][YES(shard not primary or relocation disabled)]]; ,status:400} Then I cleaned up some disk space, but there was no auto re-allocation afterwards. Once I again tried to manually re-route a shard, then ALL of them began rerouting. My questions are: - Is there a notification log message somewhere that I missed that would have let me know what was going on? If not, there sure should be! A WARN query log every 30 seconds was added in the very last release. - - Should the shard allocation process have started automatically once I got the disk space issue resolved? If you have unallocated shards it should kick in after a few seconds. It takes a few seconds for the cluster to notice the change in disk free. If there aren't unallocated shards I've sometime found that I need to manually shift a shard around to prime the pump. I'm not sure if that has been fixed recently though. I don't think that disk space should prevent a shard from coming up on a node that already has it though. I imagine that depends on how much data has to be copied to that node but I'm not sure. Nik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2YSZo4%3D2_J2quT_aR3F_SGH9p8WPEf-uMUy1W52H-L1g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Disk Watermark issues with 1.4.0
Thanks for the quick reply Nik :) I've got updating to 1.4.1 on my TODO list for today, as I see there were some updates in the Release notes pertaining to this as well. I might let things fill up again in Dev and see what happens. Maybe I wasn't patient enough for the rerouting to start on its own. It seems like I waited several minutes before I did it manually, but I'll pay more attention the next time. Thanks again for the input. Chris On Mon, Dec 1, 2014 at 10:35 AM, Nikolas Everett nik9...@gmail.com wrote: On Mon, Dec 1, 2014 at 11:28 AM, Chris Neal chris.n...@derbysoft.net wrote: Hi all, I'm running 1.4.0. and using the default settings for: cluster.routing.allocation.disk.watermark.low and cluster.routing.allocation.disk.watermark.high I hit an OOME which caused me to need to cycle a node, and then all shards that should live on that node stayed unallocated once I brought it back up. There was no notification anywhere that I had hit any dis space limits, at least that I could find. I tried cycling again, nothing. It wasn't until I tried to manually reroute one of the shards that I got an indication of what was going on: root@ip-10-0-0-45:bddevw07[1038]:~ ./reroute {error:RemoteTransportException[[elasticsearch-ip-10-0-0-12][inet[/10.0.0.12:9300]][cluster:admin/reroute]]; nested: ElasticsearchIllegalArgumentException[[allocate] allocation of [derbysoft-20141130][0] on node [elasticsearch-ip-10-0-0-45][Li1yyXUHR8qQn6QHCSahCg][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true} is not allowed, reason: [YES(shard is not allocated to same node or host)][YES(node passes include/exclude/require filters)][YES(primary is already active)][YES(below shard recovery limit of [2])][YES(allocation disabling is ignored)][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)][YES(total shard limit disabled: [-1] = 0)][YES(target node version [1.4.0] is same or newer than source node version [1.4.0])][NO(less than required [15.0%] free disk on node, free: [15.0%])][YES(shard not primary or relocation disabled)]]; ,status:400} Then I cleaned up some disk space, but there was no auto re-allocation afterwards. Once I again tried to manually re-route a shard, then ALL of them began rerouting. My questions are: - Is there a notification log message somewhere that I missed that would have let me know what was going on? If not, there sure should be! A WARN query log every 30 seconds was added in the very last release. - - Should the shard allocation process have started automatically once I got the disk space issue resolved? If you have unallocated shards it should kick in after a few seconds. It takes a few seconds for the cluster to notice the change in disk free. If there aren't unallocated shards I've sometime found that I need to manually shift a shard around to prime the pump. I'm not sure if that has been fixed recently though. I don't think that disk space should prevent a shard from coming up on a node that already has it though. I imagine that depends on how much data has to be copied to that node but I'm not sure. Nik -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2YSZo4%3D2_J2quT_aR3F_SGH9p8WPEf-uMUy1W52H-L1g%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2YSZo4%3D2_J2quT_aR3F_SGH9p8WPEf-uMUy1W52H-L1g%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3DpgLsPhJBXWFUV8SNN8LR%2ByTdKrFgf6LpceyfDddxqCvxg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.