Re: corruption when indexing large number of documents (4 billion+)

Darshat Shah Sun, 11 Jan 2015 20:28:17 -0800

Hi Mark,
We restarted our data upload process (setting replica=0 and batchsize to 
5k). It proceeded for about 24hrs, somewhere after that we started to see 
issues with master not being present in the cluster.


The log on the master node has lines like below. It appears there might be 
a network issue where some nodes were not pingable and fell below the 
threshold of minimum quorum for master node election, and the master sort 
of demoted itself. But whats not clear is why a new master did not get 
elected for several hours after this. The cluster cannot be queried. Whats 
the best way to recover from this situation?

[2015-01-11 19:12:12,612][WARN ][discovery.zen            ] [Earthquake] 
not enough master nodes, current nodes: 
{[Eliminator][8_j19XFyQcupAkRUYCjI6w][CH1SCH050103339][inet[/10.47.44.70:9300]],[Dragoness][5wL9nCAESDOs3jLMBzvunQ][CH1SCH060022834][inet[/10.47.50.155:9300]],[Ludi][7zEEioZIRMigz25356K7Yg][CH1SCH050051103][inet[/10.46.196.234:9300]],[The
 
Angel][BbFDPq87TMiVgMeNWeJIFg][CH1SCH060040431][inet[/10.47.33.247:9300]],[Helio][5UekTGE2Qq6YcyDb-daxmw][CH1SCH050103338][inet[/10.46.238.181:9300]],[Stacy
 
X][_sCSZZVtTGmoowmg-1ggVQ][CH1SCH050051544][inet[/10.46.178.74:9300]],[Thunderfist][wKHKYQ9qT9GPLdt6tCQ3Rg][CH1SCH060033137][inet[/10.47.58.209:9300]],[Dominic
 
Petros][E85WKXuLRDC4_WXzoyaTnQ][CH1SCH060021537][inet[/10.46.234.92:9300]],[Umbo][YFysv6iRSLWSJBax5ObMHg][CH1SCH060021735][inet[/10.46.243.90:9300]],[Earthquake][-8C9YSAbQMy4THZIgV7qUw][CH1SCH060051026][inet[/10.46.153.118:9300]],[Atum][0DWVWBrZR2OFIEQTq9QBBw][CH1SCH050051545][inet[/10.47.50.232:9300]],[Baron
 
Von 
Blitzschlag][mADN2KUYS2uPiuSl4qMHIw][CH1SCH060040630][inet[/10.47.42.139:9300]],[Foolkiller][Iw9PyArIR2OCDaCm1gW-Vw][CH1SCH060040629][inet[/10.46.149.174:9300]],[The
 
Grip][38K3nW5wR7aUCkR6i8BCiQ][CH1SCH050051645][inet[/10.46.203.126:9300]],[Shamrock][aObajuGRShKIA9TXF5_yQg][CH1SCH050071331][inet[/10.46.194.122:9300]],[Orphan][dCC94_8VSHqP7_hFf3r3Qg][CH1SCH060051535][inet[/10.47.46.65:9300]],[Tombstone][dq1NNI8IREuPig2qaT27Xw][CH1SCH050102843][inet[/10.47.54.208:9300]],[Logan][C5bigQQ1QomGpExc9NQq-w][CH1SCH060040333][inet[/10.46.164.129:9300]],[Rama-Tut][gLnUtgR6Q12r4oJKUiKGZQ][CH1SCH050051642][inet[/10.46.216.169:9300]],[Grotesk][Dc61zzt9QRC6mYDeswKugw][CH1SCH050051640][inet[/10.47.9.152:9300]],[Machine
 
Teen][O39QiBsaQ6iq1kJvOw_QDA][CH1SCH060021439][inet[/10.46.242.110:9300]],[Edwin
 
Jarvis][31cJWsbnRDuvWNJ8X5ybJQ][CH1SCH050063532][inet[/10.46.190.247:9300]],[Harry
 
Osborn][wBaDgNdJSkmW5vCAfODQXA][CH1SCH060021440][inet[/10.47.20.183:9300]],[Brain
 
Drain][Oy4Mo_2HSO6Y_fX3C11YBw][CH1SCH060040433][inet[/10.47.0.136:9300]],[Romany
 
Wisdom][20Tc5x2ATdC3o6AdD7f67w][CH1SCH060031345][inet[/10.47.3.169:9300]],[Jekyll][lS_ugcBORj-485okkVScPg][CH1SCH060033044][inet[/10.46.41.179:9300]],[Ajaxis][q4Shk4fhSWuYiZVq2oCnWw][CH1SCH060021243][inet[/10.46.221.243:9300]]{master=false},[Cheetah][GH_6-7OQQ7CH1YrObU6wVA][CH1SCH060051532][inet[/10.46.150.153:9300]],[Marduk
 
Kurios][Aqfc5EUIQYmvacnY6sGEpg][CH1SCH050051546][inet[/10.46.162.240:9300]],[Left
 
Hand][VloLsymETGmy9Yu14TkygA][CH1SCH060051521][inet[/10.47.63.194:9300]],[Washout][tHz2w8mGRmKO9W7vUljXrA][CH1SCH050063531][inet[/10.47.54.30:9300]],[Fixer][HH3gxjFoSxeHbvF4SMnVsw][CH1SCH050051643][inet[/10.46.173.163:9300]],[Tyrannus][g1znl166QaO84xJjsABsfQ][CH1SCH060033139][inet[/10.46.42.62:9300]],[Master
 
Pandemonium][p7ALngKoTrW9NaTKIyP1XA][CH1SCH060051421][inet[/10.47.23.193:9300]],[Digitek][z5SzLFtlTWCL0R5cJQWqUw][CH1SCH050051641][inet[/10.46.154.21:9300]],[Jane
 
Kincaid][UtjfYoxjQ2mqa40vX_eGhQ][CH1SCH060041325][inet[/10.47.2.144:9300]],[Baron
 
Samedi][a-gBYy2vTFKgdjwG6jWeSA][CH1SCH060033138][inet[/10.47.63.100:9300]],[Fan 
Boy][-iehboPZTHud5kttbYlQyQ][CH1SCH060051438][inet[/10.46.153.84:9300]],[Unseen][PB4SZuwUTwaJtp43iMIArg][CH1SCH060021739][inet[/10.46.217.50:9300]],[Elektro][ZsVkXHPHRJ-VCmvtbb_PHA][CH1SCH050051542][inet[/10.46.179.1:9300]],[Vashti][K2BO32gNTru8xUPqETOL-g][CH1SCH050063239][inet[/10.46.213.4:9300]],}

[2015-01-11 19:12:12,612][INFO ][cluster.service          ] [Earthquake] 
removed 
{[Blackheart][yVhuAFprS3SwXZoKsfbEPg][CH1SCH060040527][inet[/10.46.150.20:9300]],},
 
reason: 
zen-disco-node_failed([Blackheart][yVhuAFprS3SwXZoKsfbEPg][CH1SCH060040527][inet[/10.46.150.20:9300]]),
 
reason failed to ping, tried [3] times, each with maximum [30s] timeout

[2015-01-11 19:12:12,633][ERROR][cluster.action.shard     ] [Earthquake] 
unexpected failure during [shard-failed ([agora_v1][23], 
node[g1znl166QaO84xJjsABsfQ], relocating [F9SdmzilSHiuib4jihKstw], [P], 
s[RELOCATING]), reason [Failed to perform [indices:data/write/bulk[s]] on 
replica, message 
[SendRequestTransportException[[Klaw][inet[/10.46.158.143:9300]][indices:data/write/bulk[s][r]]];
 
nested: NodeNotConnectedException[[Klaw][inet[/10.46.158.143:9300]] Node 
not connected]; ]]]
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: no 
longer master. source: [shard-failed ([agora_v1][23], 
node[g1znl166QaO84xJjsABsfQ], relocating [F9SdmzilSHiuib4jihKstw], [P], 
s[RELOCATING]), reason [Failed to perform [indices:data/write/bulk[s]] on 
replica, message 
[SendRequestTransportException[[Klaw][inet[/10.46.158.143:9300]][indices:data/write/bulk[s][r]]];
 
nested: NodeNotConnectedException[[Klaw][inet[/10.46.158.143:9300]] Node 
not connected]; ]]]
 at 
org.elasticsearch.cluster.ClusterStateUpdateTask.onNoLongerMaster(ClusterStateUpdateTask.java:53)
 at 
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:324)
 at 
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)


On Friday, January 9, 2015 at 11:33:58 PM UTC+5:30, Darshat Shah wrote:

> Hi Mark, 
> Thanks for the reply. I need to prototype and demonstrate at this scale 
> first to ensure feasibility. Once we've proven ES works for this use case 
> then its quite possible that we'd engage with support for production. 
>   
> Regarding your questions: 
> *What version of ES, java? *
> [Darshat]ES 1.4.1, JVM 1.8.0 (latest I found from Oracle for Win64) 
>
>
> *What are you using to monitor your cluster?  *
> Not much really. I tried installing marvel after we ran into issues, but 
> mostly looking at cat apis and indices apis. 
>
>
> *How many GB is that index?  *
> About 1GB for every million entries. We don't have string fields but many 
> numeric fields on which aggregations are needed. With 4.5billion docs, the 
> total index size was about 4.5 TB spread over the 98 nodes. 
>
>
> *Is it in one massive index?  *
> Yes one hell of an index if it works! We need aggregations over this data 
> across multiple fields 
>
>
> *How many GB in your data in total? *
> It can be order of 30TB. 
>
>
> *Why do you have 2 replicas?  *
> Data loss is generally considered not ok. However for our next run, as you 
> suggest, we will start with 0 replica and update it after bulk load is 
> done. 
>
>
> *Are you searching while indexing, or just indexing the data? If it's the 
> latter then you might want to try disabling replica's and then setting the 
> index refresh rate to -1 for the index, insert your data, and then turn 
> refresh back on and then let the data index. That's best practice for large 
> amounts of indexing. *
> Just indexing to get historical data up there. We did set the refresh to 
> -1 before we began upload. 
>
>
> *Also, consider dropping your bulk size down to 5K, that's generally 
> considered the upper limit for bulk API batches. *
>
>
> We are going to attempt another upload with these changes. I also set the 
> index_buffer_size to 40% in case it helps. 
>
>
>
>
>
> On Friday, January 9, 2015 at 3:29:48 PM UTC+5:30, Mark Walkom wrote:
>
>> Honestly, with this sort of scale you should be thinking about support 
>> (disclaimer: I work for Elasticsearch support).
>>
>> However let's see what we can do;
>> What version of ES, java?
>> What are you using to monitor your cluster?
>> How many GB is that index?
>> Is it in one massive index?
>> How many GB in your data in total?
>> Why do you have 2 replicas?
>> Are you searching while indexing, or just indexing the data? If it's the 
>> latter then you might want to try disabling replica's and then setting the 
>> index refresh rate to -1 for the index, insert your data, and then turn 
>> refresh back on and then let the data index. That's best practice for large 
>> amounts of indexing.
>>
>> Also, consider dropping your bulk size down to 5K, that's generally 
>> considered the upper limit for bulk API batches.
>>
>> On 9 January 2015 at 14:44, Darshat Shah <dar...@gmail.com> wrote:
>>
>>> Hi, 
>>> We have a 98 node cluster of ES with each node 32GB RAM. 16GB is 
>>> reserved for ES via config file. The index has 98 shards with 2 replicas. 
>>>
>>> On this cluster we are loading a large number of documents (when done it 
>>> would be about 10 billion). About 40million documents are generated per 
>>> hour and we are pre-loading several days worth of documents to prototype 
>>> how ES will scale, and its query performance. 
>>>
>>> Right now we are facing problems getting data pre-loaded. Indexing is 
>>> turned off. We use NEST client, with batch size of 10k. To speed up data 
>>> load, we distribute the hourly data to each of the 98 nodes to insert in 
>>> parallel. This worked ok for a few hours till we got 4.5B documents in the 
>>> cluster. 
>>>
>>> After that the cluster state went to red. The outstanding tasks CAT API 
>>> shows errors like below. CPU/Disk/Memory seems to be fine on the nodes. 
>>>
>>> Why are we getting these errors and is there a best practice? any help 
>>> greatly appreciated since this blocks prototyping ES for our use case. 
>>>
>>> thanks 
>>> Darshat 
>>>
>>> Sample errors: 
>>>
>>> source               : shard-failed ([agora_v1][24], 
>>>                        node[00ihc1ToRiqMDJ1lou1Sig], [R], 
>>> s[INITIALIZING]), 
>>>                        reason [Failed to start shard, message 
>>>                        [RecoveryFailedException[[agora_v1][24]: Recovery 
>>>                        failed from [Shingen 
>>> Harada][RDAwqX9yRgud9f7YtZAJPg][CH1 
>>>                        SCH060051438][inet[/10.46.153.84:9300]] into 
>>> [Elfqueen][ 
>>>                       
>>>  00ihc1ToRiqMDJ1lou1Sig][CH1SCH050053435][inet[/10.46.182 
>>>                        .106:9300]]]; nested: 
>>> RemoteTransportException[[Shingen 
>>>                       
>>>  Harada][inet[/10.46.153.84:9300]][internal:index/shard/r 
>>>                        ecovery/start_recovery]]; nested: 
>>>                        RecoveryEngineException[[agora_v1][24] Phase[1] 
>>>                        Execution failed]; nested: 
>>>                        RecoverFilesRecoveryException[[agora_v1][24] 
>>> Failed to 
>>>                        transfer [0] files with total size of [0b]]; 
>>> nested: NoS 
>>>                       
>>>  uchFileException[D:\app\ES.ElasticSearch_v010\elasticsea 
>>>                       
>>>  rch-1.4.1\data\AP-elasticsearch\nodes\0\indices\agora_v1 
>>>                        \24\index\segments_6r]; ]] 
>>>
>>>
>>> AND 
>>>
>>> source               : shard-failed ([agora_v1][95], 
>>>                        node[PUsHFCStRaecPA6MuvJV9g], [P], 
>>> s[INITIALIZING]), 
>>>                        reason [Failed to start shard, message 
>>>                       
>>>  [IndexShardGatewayRecoveryException[[agora_v1][95] 
>>>                        failed to fetch index version after copying it 
>>> over]; 
>>>                        nested: CorruptIndexException[[agora_v1][95] 
>>>                        Preexisting corrupted index 
>>>                        [corrupted_1wegvS7BSKSbOYQkX9zJSw] caused by: 
>>>                        CorruptIndexException[Read past EOF while reading 
>>>                        segment infos] 
>>>                            EOFException[read past EOF: 
>>> MMapIndexInput(path="D:\ 
>>>                       
>>>  app\ES.ElasticSearch_v010\elasticsearch-1.4.1\data\AP-el 
>>>                       
>>>  asticsearch\nodes\0\indices\agora_v1\95\index\segments_1 
>>>                        1j")] 
>>>                        org.apache.lucene.index.CorruptIndexException: 
>>> Read 
>>>                        past EOF while reading segment infos 
>>>                            at 
>>> org.elasticsearch.index.store.Store.readSegmentsI 
>>>                        nfo(Store.java:127) 
>>>                            at 
>>> org.elasticsearch.index.store.Store.access$400(St 
>>>                        ore.java:80) 
>>>                            at 
>>> org.elasticsearch.index.store.Store$MetadataSnaps 
>>>                        hot.buildMetadata(Store.java:575) 
>>> ---snip more stack trace-----  
>>>
>>>
>>>   
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/0f24b939-2cba-41a9-8de8-49565f77e567%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/0f24b939-2cba-41a9-8de8-49565f77e567%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/00582fb0-f62d-4fd4-ba11-0ef8b7a94ec3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: corruption when indexing large number of documents (4 billion+)

Reply via email to