We've restarted that node and it *seemed* to be working its way back to normality...
But the LockReleaseFailedException is here to stay: [2013-12-17 04:43:53,962][WARN ][cluster.action.shard ] [Porcupine] [zapier_legacy][0] sending failed shard for [zapier_legacy][0], node[QToCnTWtQLCWySMnbjm2IQ], [P], s[INITIALIZING], indexUUID [pzWL-WO_SsaGbuWfn2IQaw], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[zapier_legacy][0] failed recovery]; nested: EngineCreationFailureException[[zapier_legacy][0] failed to create engine]; nested: LockReleaseFailedException[Cannot forcefully unlock a NativeFSLock which is held by another indexer component: /var/data/elasticsearch/Rage Against the Machine/nodes/0/indices/zapier_legacy/0/index/write.lock]; ]] Any thoughts. On Monday, December 16, 2013 8:25:52 PM UTC-8, Bryan Helmig wrote: > > Here is are some logs of the start of the incident > > https://gist.github.com/bryanhelmig/3c17edfe5c4e9065e5a3 > > And basically these logs over and over: > > https://gist.github.com/bryanhelmig/cfb9303bc033a1183701 > > A little background: > > The cluster is 3 nodes on AWS & EBS, 100 shards (50 primaries & 50 > replicas) and just this single shard (so far) got corrupted (?). We're at > about 800gb of data and we're using routing keys to keep it all (mostly) > sane among shards. Here is the topograph of the cluster from ES Head: > > http://i.imgur.com/zJa9Beh.png > > I think it happened as it tried to relocate a shard. Now it refuses to > start the engine? > > Thanks! > -bryan > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4a153920-30da-40a2-8ca5-186442169c12%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
