Regarding the corruption - that is a lucene problem that we are working on to address. The corruption might have happened a while ago already but you are only happen to see it now. There is no integrity checks in lucene yet so if there is a corrputed index due to HW failure you will also replicate it if you recover from one of those corrputed shards. It's hard to address on the Elasticsaerch level at this point but we will have to fix it in lucene in the very near future.
On Saturday, March 22, 2014 2:04:56 PM UTC+1, Andrey Perminov wrote: > > We are using a small elasticsearch cluster of three nodes, version 1.0.1. > Each node has 7 GB RAM. Our software creates daily indexes for storing it's > data. Daily index is something around 5 GB. Unfortunately, for a reason, > Elasticsearch eats up all RAM and hangs the node, even though heap size is > set to 6 GB max. So we decided to use monit to restart it on reaching > memory limit of 90%. It works, but sometimes we got such errors: > > [2014-03-22 16:56:04,943][DEBUG][action.search.type ] [es-00] > [product-22-03-2014][0], node[jbUDVzuvS5GTM7iOG8iwzQ], [P], s[STARTED]: > Failed to execute [org.elasticsearch.action.search.SearchRequest@687dc039] > org.elasticsearch.search.fetch.FetchPhaseExecutionException: > [product-22-03-2014][0]: query[filtered(ToParentBlockJoinQuery > (filtered(history.created:[1392574921000 TO > *])->cache(_type:__history)))->cache(_type:product)],from[0],size[1000],sort[<custom:"history.created": > > org.elasticsearch.index.search.nested.NestedFieldComparatorSource@15e4ece9>]: > Fetch Failed [Failed to fetch doc id [7263214]] > at > org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:230) > at > org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:156) > at > org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:332) > at > org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:304) > at > org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71) > at > org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216) > at > org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4.run(TransportSearchTypeAction.java:292) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.lang.Thread.run(Unknown Source) > Caused by: java.io.EOFException: seek past EOF: > MMapIndexInput(path="/opt/elasticsearch/main/nodes/0/indices/product-22-03-2014/0/index/_9lz.fdt") > at > org.apache.lucene.store.ByteBufferIndexInput.seek(ByteBufferIndexInput.java:174) > at > org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:229) > at > org.apache.lucene.index.SegmentReader.document(SegmentReader.java:276) > at > org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110) > at > org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:196) > at > org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:228) > ... 9 more > [2014-03-22 16:56:04,944][DEBUG][action.search.type ] [es-00] All > shards failed for phase: [query_fetch] > > According to our logs, this might happen when one or two nodes get > restarted. More strangely, same shard got corrupted on all nodes of > cluster. Why could this happen? How can we fix it? Can you suggest us how > to fix memory usage? > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/add5f603-4088-4344-ac1d-1accfc7a487b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.