Re: Index corruption when upload large number of documents (4billion+)

Darshat Shah Fri, 09 Jan 2015 09:45:56 -0800

Hi, the full stack is like this (from outstanding tasks api). We are using 
ES 1.4.1


insert_order         : 69862 
priority             : HIGH 
source               : shard-failed ([agora_v1][24], 
                       node[SEIBtFznTtGpLFPgCLgW4w], [R], s[INITIALIZING]), 
                       reason [Failed to start shard, message 
                       [CorruptIndexException[[agora_v1][24] Preexisting 
                       corrupted index [corrupted_LrKHKRF7Q2KuL15TT_hPvw] 
                       caused by: CorruptIndexException[Read past EOF while 
                       reading segment infos] 
                           EOFException[read past EOF: 
MMapIndexInput(path="D:\ 
                      
 app\ES.ElasticSearch_v010\elasticsearch-1.4.1\data\AP-el 
                      
 asticsearch\nodes\0\indices\agora_v1\24\index\segments_6 
                       w")] 
                       org.apache.lucene.index.CorruptIndexException: Read 
                       past EOF while reading segment infos 
                           at 
org.elasticsearch.index.store.Store.readSegmentsI 
                       nfo(Store.java:127) 
                           at 
org.elasticsearch.index.store.Store.access$400(St 
                       ore.java:80) 
                           at 
org.elasticsearch.index.store.Store$MetadataSnaps 
                       hot.buildMetadata(Store.java:575) 
                           at 
org.elasticsearch.index.store.Store$MetadataSnaps 
                       hot.<init>(Store.java:568) 
                           at 
org.elasticsearch.index.store.Store.getMetadata(S 
                       tore.java:186) 
                           at 
org.elasticsearch.index.store.Store.getMetadataOr 
                       Empty(Store.java:150) 
                           at 
org.elasticsearch.indices.store.TransportNodesLis 
                      
 tShardStoreMetaData.listStoreMetaData(TransportNodesList 
                       ShardStoreMetaData.java:152) 
                           at 
org.elasticsearch.indices.store.TransportNodesLis 
                      
 tShardStoreMetaData.nodeOperation(TransportNodesListShar 
                       dStoreMetaData.java:138) 
                           at 
org.elasticsearch.indices.store.TransportNodesLis 
                      
 tShardStoreMetaData.nodeOperation(TransportNodesListShar 
                       dStoreMetaData.java:59) 
                           at 
org.elasticsearch.action.support.nodes.TransportN 
                      
 odesOperationAction$NodeTransportHandler.messageReceived 
                       (TransportNodesOperationAction.java:278) 
                           at 
org.elasticsearch.action.support.nodes.TransportN 
                      
 odesOperationAction$NodeTransportHandler.messageReceived 
                       (TransportNodesOperationAction.java:269) 
                           at 
org.elasticsearch.transport.netty.MessageChannelH 
                      
 andler$RequestHandler.run(MessageChannelHandler.java:275 
                       ) 
                           at 
java.util.concurrent.ThreadPoolExecutor.runWorker 
                       (ThreadPoolExecutor.java:1142) 
                           at 
java.util.concurrent.ThreadPoolExecutor$Worker.ru 
                       n(ThreadPoolExecutor.java:617) 
                           at java.lang.Thread.run(Thread.java:745) 
                       Caused by: java.io.EOFException: read past EOF: 
MMapInde 
                      
 xInput(path="D:\app\ES.ElasticSearch_v010\elasticsearch- 
                      
 1.4.1\data\AP-elasticsearch\nodes\0\indices\agora_v1\24\ 
                       index\segments_6w") 
                           at 
org.apache.lucene.store.ByteBufferIndexInput.read 
                       Byte(ByteBufferIndexInput.java:81) 
                           at 
org.apache.lucene.store.BufferedChecksumIndexInpu 
                       t.readByte(BufferedChecksumIndexInput.java:41) 
                           at 
org.apache.lucene.store.DataInput.readInt(DataInp 
                       ut.java:98) 
                           at 
org.apache.lucene.index.SegmentInfos.read(Segment 
                       Infos.java:343) 
                           at 
org.apache.lucene.index.SegmentInfos$1.doBody(Seg 
                       mentInfos.java:454) 
                           at 
org.apache.lucene.index.SegmentInfos$FindSegments 
                       File.run(SegmentInfos.java:906) 
                           at 
org.apache.lucene.index.SegmentInfos$FindSegments 
                       File.run(SegmentInfos.java:752) 
                           at 
org.apache.lucene.index.SegmentInfos.read(Segment 
                       Infos.java:450) 
                           at 
org.elasticsearch.common.lucene.Lucene.readSegmen 
                       tInfos(Lucene.java:85) 
                           at 
org.elasticsearch.index.store.Store.readSegmentsI 
                       nfo(Store.java:124) 
                           ... 14 more 
                       ]]] 
executing            : True 
time_in_queue_millis : 52865 
time_in_queue        : 52.8s 

insert_order         : 69863 
priority             : HIGH 
source               : shard-failed ([agora_v1][24], 
                       node[SEIBtFznTtGpLFPgCLgW4w], [R], s[INITIALIZING]), 
                       reason [engine failure, message [corrupted 
preexisting 
                       index][CorruptIndexException[[agora_v1][24] 
Preexisting 
                       corrupted index [corrupted_LrKHKRF7Q2KuL15TT_hPvw] 
                       caused by: CorruptIndexException[Read past EOF while 
                       reading segment infos] 
                           EOFException[read past EOF: 
MMapIndexInput(path="D:\ 
                      
 app\ES.ElasticSearch_v010\elasticsearch-1.4.1\data\AP-el 
                      
 asticsearch\nodes\0\indices\agora_v1\24\index\segments_6 
                       w")] 
                       org.apache.lucene.index.CorruptIndexException: Read 
                       past EOF while reading segment infos 
                           at 
org.elasticsearch.index.store.Store.readSegmentsI 
                       nfo(Store.java:127) 
                           at 
org.elasticsearch.index.store.Store.access$400(St 
                       ore.java:80) 
                           at 
org.elasticsearch.index.store.Store$MetadataSnaps 
                       hot.buildMetadata(Store.java:575) 
                           at 
org.elasticsearch.index.store.Store$MetadataSnaps 
                       hot.<init>(Store.java:568) 
                           at 
org.elasticsearch.index.store.Store.getMetadata(S 
                       tore.java:186) 
                           at 
org.elasticsearch.index.store.Store.getMetadataOr 
                       Empty(Store.java:150) 
                           at 
org.elasticsearch.indices.store.TransportNodesLis 
                      
 tShardStoreMetaData.listStoreMetaData(TransportNodesList 
                       ShardStoreMetaData.java:152) 
                           at 
org.elasticsearch.indices.store.TransportNodesLis 
                      
 tShardStoreMetaData.nodeOperation(TransportNodesListShar 
                       dStoreMetaData.java:138) 
                           at 
org.elasticsearch.indices.store.TransportNodesLis 
                      
 tShardStoreMetaData.nodeOperation(TransportNodesListShar 
                       dStoreMetaData.java:59) 
                           at 
org.elasticsearch.action.support.nodes.TransportN 
                      
 odesOperationAction$NodeTransportHandler.messageReceived 
                       (TransportNodesOperationAction.java:278) 
                           at 
org.elasticsearch.action.support.nodes.TransportN 
                      
 odesOperationAction$NodeTransportHandler.messageReceived 
                       (TransportNodesOperationAction.java:269) 
                           at 
org.elasticsearch.transport.netty.MessageChannelH 
                      
 andler$RequestHandler.run(MessageChannelHandler.java:275 
                       ) 
                           at 
java.util.concurrent.ThreadPoolExecutor.runWorker 
                       (ThreadPoolExecutor.java:1142) 
                           at 
java.util.concurrent.ThreadPoolExecutor$Worker.ru 
                       n(ThreadPoolExecutor.java:617) 
                           at java.lang.Thread.run(Thread.java:745) 
                       Caused by: java.io.EOFException: read past EOF: 
MMapInde 
                      
 xInput(path="D:\app\ES.ElasticSearch_v010\elasticsearch- 
                      
 1.4.1\data\AP-elasticsearch\nodes\0\indices\agora_v1\24\ 
                       index\segments_6w") 
                           at 
org.apache.lucene.store.ByteBufferIndexInput.read 
                       Byte(ByteBufferIndexInput.java:81) 
                           at 
org.apache.lucene.store.BufferedChecksumIndexInpu 
                       t.readByte(BufferedChecksumIndexInput.java:41) 
                           at 
org.apache.lucene.store.DataInput.readInt(DataInp 
                       ut.java:98) 
                           at 
org.apache.lucene.index.SegmentInfos.read(Segment 
                       Infos.java:343) 
                           at 
org.apache.lucene.index.SegmentInfos$1.doBody(Seg 
                       mentInfos.java:454) 
                           at 
org.apache.lucene.index.SegmentInfos$FindSegments 
                       File.run(SegmentInfos.java:906) 
                           at 
org.apache.lucene.index.SegmentInfos$FindSegments 
                       File.run(SegmentInfos.java:752) 
                           at 
org.apache.lucene.index.SegmentInfos.read(Segment 
                       Infos.java:450) 
                           at 
org.elasticsearch.common.lucene.Lucene.readSegmen 
                       tInfos(Lucene.java:85) 
                           at 
org.elasticsearch.index.store.Store.readSegmentsI 
                       nfo(Store.java:124) 
                           ... 14 more 
                       ]]] 
executing            : False 
time_in_queue_millis : 52862 
time_in_queue        : 52.8s 

insert_order         : 69865 
priority             : HIGH 
source               : shard-failed ([kibana-int][88], 
                       node[adjp-WHHSP6kWEiPd3HkeQ], [R], s[INITIALIZING]), 
                       reason [Failed to start shard, message 
                       [RecoveryFailedException[[kibana-int][88]: Recovery 
                       failed from 
[Quasimodo][spfLOfnjTeiGwrYPMIiRjg][CH1SCH06 
                       0021734][inet[/10.46.208.169:9300]] into 
[Hyperion][adjp 
                      
 -WHHSP6kWEiPd3HkeQ][CH1SCH050051642][inet[/10.46.216.169 
                       :9300]]]; nested: 
RemoteTransportException[[Quasimodo][i 
                      
 net[/10.46.208.169:9300]][internal:index/shard/recovery/ 
                       start_recovery]]; nested: 
                       RecoveryEngineException[[kibana-int][88] Phase[1] 
                       Execution failed]; nested: 
                       RecoverFilesRecoveryException[[kibana-int][88] 
Failed 
                       to transfer [0] files with total size of [0b]]; 
nested: 
                      
 NoSuchFileException[D:\app\ES.ElasticSearch_v010\elastic 
                      
 search-1.4.1\data\AP-elasticsearch\nodes\0\indices\kiban 
                       a-int\88\index\segments_2]; ]] 
executing            : False 
time_in_queue_millis : 52860 
time_in_queue        : 52.8s 


On Friday, January 9, 2015 at 5:50:44 PM UTC+5:30, Robert Muir wrote:
>
> Why did you snip the stack trace? can you provide all the information? 
>
> On Thu, Jan 8, 2015 at 10:37 PM, Darshat <dar...@outlook.com <javascript:>> 
> wrote: 
> > Hi, 
> > We have a 98 node cluster of ES with each node 32GB RAM. 16GB is 
> reserved 
> > for ES via config file. The index has 98 shards with 2 replicas. 
> > 
> > On this cluster we are loading a large number of documents (when done it 
> > would be about 10 billion). In this use case about 40million documents 
> are 
> > generated per hour and we are pre-loading several days worth of 
> documents to 
> > prototype how ES will scale, and its query performance. 
> > 
> > Right now we are facing problems getting data loaded. Indexing is turned 
> > off. We use NEST client, with batch size of 10k. To speed up data load, 
> we 
> > distribute the hourly data to each of the 98 nodes to insert in 
> parallel. 
> > This worked ok for a few hours till we got 4.5B documents in the 
> cluster. 
> > 
> > After that the cluster state went to red. The outstanding tasks CAT API 
> > shows errors like below. CPU/Disk/Memory seems to be fine on the nodes. 
> > 
> > Why are we getting these errors?. any help greatly appreciated since 
> this 
> > blocks prototyping ES for our use case. 
> > 
> > thanks 
> > Darshat 
> > 
> > Sample errors: 
> > 
> > source               : shard-failed ([agora_v1][24], 
> >                        node[00ihc1ToRiqMDJ1lou1Sig], [R], 
> s[INITIALIZING]), 
> >                        reason [Failed to start shard, message 
> >                        [RecoveryFailedException[[agora_v1][24]: Recovery 
> >                        failed from [Shingen 
> > Harada][RDAwqX9yRgud9f7YtZAJPg][CH1 
> >                        SCH060051438][inet[/10.46.153.84:9300]] into 
> > [Elfqueen][ 
> > 
> > 00ihc1ToRiqMDJ1lou1Sig][CH1SCH050053435][inet[/10.46.182 
> >                        .106:9300]]]; nested: 
> > RemoteTransportException[[Shingen 
> > 
> > Harada][inet[/10.46.153.84:9300]][internal:index/shard/r 
> >                        ecovery/start_recovery]]; nested: 
> >                        RecoveryEngineException[[agora_v1][24] Phase[1] 
> >                        Execution failed]; nested: 
> >                        RecoverFilesRecoveryException[[agora_v1][24] 
> Failed 
> > to 
> >                        transfer [0] files with total size of [0b]]; 
> nested: 
> > NoS 
> > 
> > uchFileException[D:\app\ES.ElasticSearch_v010\elasticsea 
> > 
> > rch-1.4.1\data\AP-elasticsearch\nodes\0\indices\agora_v1 
> >                        \24\index\segments_6r]; ]] 
> > 
> > 
> > AND 
> > 
> > source               : shard-failed ([agora_v1][95], 
> >                        node[PUsHFCStRaecPA6MuvJV9g], [P], 
> s[INITIALIZING]), 
> >                        reason [Failed to start shard, message 
> >                       
>  [IndexShardGatewayRecoveryException[[agora_v1][95] 
> >                        failed to fetch index version after copying it 
> over]; 
> >                        nested: CorruptIndexException[[agora_v1][95] 
> >                        Preexisting corrupted index 
> >                        [corrupted_1wegvS7BSKSbOYQkX9zJSw] caused by: 
> >                        CorruptIndexException[Read past EOF while reading 
> >                        segment infos] 
> >                            EOFException[read past EOF: 
> > MMapIndexInput(path="D:\ 
> > 
> > app\ES.ElasticSearch_v010\elasticsearch-1.4.1\data\AP-el 
> > 
> > asticsearch\nodes\0\indices\agora_v1\95\index\segments_1 
> >                        1j")] 
> >                        org.apache.lucene.index.CorruptIndexException: 
> Read 
> >                        past EOF while reading segment infos 
> >                            at 
> > org.elasticsearch.index.store.Store.readSegmentsI 
> >                        nfo(Store.java:127) 
> >                            at 
> > org.elasticsearch.index.store.Store.access$400(St 
> >                        ore.java:80) 
> >                            at 
> > org.elasticsearch.index.store.Store$MetadataSnaps 
> >                        hot.buildMetadata(Store.java:575) 
> > ---snip more stack trace----- 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > View this message in context: 
> http://elasticsearch-users.115913.n3.nabble.com/Index-corruption-when-upload-large-number-of-documents-4billion-tp4068742.html
>  
> > Sent from the ElasticSearch Users mailing list archive at Nabble.com. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to elasticsearc...@googlegroups.com <javascript:>. 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/1420774624607-4068742.post%40n3.nabble.com.
>  
>
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/301a744c-6dfa-44f7-95ca-1ca007634d37%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Index corruption when upload large number of documents (4billion+)

Reply via email to