Re: Lucene index corruption on nodes restart

simonw Thu, 27 Mar 2014 06:53:05 -0700

Regarding the corruption - that is a lucene problem that we are working on 
to address. The corruption might have happened a while ago already but you 
are only happen to see it now. There is no integrity checks in lucene yet 
so if there is a corrputed index due to HW failure you will also replicate 
it if you recover from one of those corrputed shards. It's hard to address 
on the Elasticsaerch level at this point but we will have to fix it in 
lucene in the very near future.


On Saturday, March 22, 2014 2:04:56 PM UTC+1, Andrey Perminov wrote:
>
> We are using a small elasticsearch cluster of three nodes, version 1.0.1. 
> Each node has 7 GB RAM. Our software creates daily indexes for storing it's 
> data. Daily index is something around 5 GB. Unfortunately, for a reason, 
> Elasticsearch eats up all RAM and hangs the node, even though heap size is 
> set to 6 GB max. So we decided to use monit to restart it on reaching 
> memory limit of 90%. It works, but sometimes we got such errors:
>
> [2014-03-22 16:56:04,943][DEBUG][action.search.type       ] [es-00] 
> [product-22-03-2014][0], node[jbUDVzuvS5GTM7iOG8iwzQ], [P], s[STARTED]: 
> Failed to execute [org.elasticsearch.action.search.SearchRequest@687dc039]
> org.elasticsearch.search.fetch.FetchPhaseExecutionException: 
> [product-22-03-2014][0]: query[filtered(ToParentBlockJoinQuery 
> (filtered(history.created:[1392574921000 TO 
> *])->cache(_type:__history)))->cache(_type:product)],from[0],size[1000],sort[<custom:"history.created":
>  
> org.elasticsearch.index.search.nested.NestedFieldComparatorSource@15e4ece9>]: 
> Fetch Failed [Failed to fetch doc id [7263214]]
>         at 
> org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:230)
>         at 
> org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:156)
>         at 
> org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:332)
>         at 
> org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:304)
>         at 
> org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
>         at 
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
>         at 
> org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4.run(TransportSearchTypeAction.java:292)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>         at java.lang.Thread.run(Unknown Source)
> Caused by: java.io.EOFException: seek past EOF: 
> MMapIndexInput(path="/opt/elasticsearch/main/nodes/0/indices/product-22-03-2014/0/index/_9lz.fdt")
>         at 
> org.apache.lucene.store.ByteBufferIndexInput.seek(ByteBufferIndexInput.java:174)
>         at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:229)
>         at 
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:276)
>         at 
> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
>         at 
> org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:196)
>         at 
> org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:228)
>         ... 9 more
> [2014-03-22 16:56:04,944][DEBUG][action.search.type       ] [es-00] All 
> shards failed for phase: [query_fetch]
>
> According to our logs, this might happen when one or two nodes get 
> restarted. More strangely, same shard got corrupted on all nodes of 
> cluster. Why could this happen? How can we fix it? Can you suggest us how 
> to fix memory usage?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/add5f603-4088-4344-ac1d-1accfc7a487b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Lucene index corruption on nodes restart

Reply via email to