Well, don't know what happened but suddenly the shard replicated. I was trying to copy the index to a new one with stream2es and when the copy started the cluster state turned green...
The problem, for me, is solved. On Thursday, July 24, 2014 11:13:41 AM UTC-3, Antonio Augusto Santos wrote: > > Dear, > > I've upgraded from 1.2.2 to 1.3.0 today and I've found one issue. > I've a 3 node cluster (one without data) running on CentOS 6.5. I've done > a rolling upgrade: first upgraded the no data node (server_0), then > shutdown server_1, upgraded (with the RPM), and restarted it. All shards > came back to live no problem, and my cluster was green. > Then I've shutdown node 3 (server_2), upgraded and restarted ES. After > this almost everything is back to normal but one shard from one index. And > I'm getting the following error on server_2 > > [2014-07-24 10:47:59,575][WARN ][cluster.action.shard ] [server_2] [ > MY_INDEX][0] received shard failed for [MY_INDEX][0], node[Y0EJ2oh2QI- > cdh2Jxi9z4A], [R], s[INITIALIZING], indexUUID [OR_0aHy6TIiHZVK_9PaPBQ], > reason [Failed to start shard, message [RecoveryFailedException[[MY_INDEX > ][0]: Recovery failed from [server_1][pkRqLLmtS8iUxF80uQJzFw][server_1][ > inet[/XXX.XXX.XXX.001:9300]]{master=true} into [server_2][Y0EJ2oh2QI- > cdh2Jxi9z4A][server_2][inet[/XXX.XXX.XXX.002:9300]]{master=true}]; nested: > RemoteTransportException[[server_1][inet[/XXX.XXX.XXX.001:9300]][index/ > shard/recovery/startRecovery]]; nested: RecoveryEngineException[[MY_INDEX > ][0] Phase[2] Execution failed]; nested: RemoteTransportException[[ > server_2][inet[/XXX.XXX.XXX.002:9300]][index/shard/recovery/ > prepareTranslog]]; nested: EngineCreationFailureException[[MY_INDEX][0] > failed to open reader on writer]; nested: FileNotFoundException[No such > file [_drr_Lucene45_0.dvm]]; ]] > > From what I got from the message server_1 i trying to send > *_drr_Lucene45_9.dvm* to server_2 but can't find it. I tried looking on > */var/lib/elasticsearch/MY_CLUSTER/nodes/0/indices/MY_INDEX/0/index > *on server_1, and there is no such file, but there is a > *_drr_Lucene49_0.dvm*. > > I've checked and both servers are running on 1.3.0: > # ps -ef | grep elastic > 498 121131 1 55 10:46 ? 00:11:41 /usr/bin/java -Xms5g - > Xmx5g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+ > UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+ > UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+ > DisableExplicitGC -Djna.tmpdir=/usr/share/elasticsearch/tmp -Djava.io. > tmpdir=/usr/share/elasticsearch/tmp -Delasticsearch -Des.pidfile=/var/run/ > elasticsearch/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp > :/usr/share/elasticsearch/lib/elasticsearch-1.3.0.jar:/usr/share/ > elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* > -Des.default.path.home=/usr/share/elasticsearch > -Des.default.path.logs=/var/log/elasticsearch > -Des.default.path.data=/var/lib/elasticsearch > -Des.default.path.work=/usr/share/elasticsearch/tmp > -Des.default.path.conf=/etc/elasticsearch > org.elasticsearch.bootstrap.Elasticsearch > > # ps -ef | grep elastic > 498 25573 1 59 10:41 ? 00:15:15 /usr/bin/java -Xms5g > -Xmx5g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError > -XX:+DisableExplicitGC -Djna.tmpdir=/usr/share/elasticsearch/tmp > -Djava.io.tmpdir=/usr/share/elasticsearch/tmp -Delasticsearch > -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid > -Des.path.home=/usr/share/elasticsearch -cp > :/usr/share/elasticsearch/lib/elasticsearch-1.3.0.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* > > -Des.default.path.home=/usr/share/elasticsearch > -Des.default.path.logs=/var/log/elasticsearch > -Des.default.path.data=/var/lib/elasticsearch > -Des.default.path.work=/usr/share/elasticsearch/tmp > -Des.default.path.conf=/etc/elasticsearch > org.elasticsearch.bootstrap.Elasticsearch > > > I've already restarted ES on server_2 to no vail. I haven't restarted > server_1 because I'm afraid to loose the data that is there (my searches > appear to be working Ok, and returning expected results). > > Maybe something went wrong during the update? Any suggestions on how to > fix this problem? > > Cheers > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8f632464-d749-4794-a5b4-fca31e475470%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.