Thanks for the rest of the info, that helps rule out a couple of 
possibilities.  Unfortunately, I was hoping you had fiddled with the merge 
settings and it was causing problems...but it looks like everything is 
default (which is good!).  Back to the drawing board

Would it be possible to get a heapdump and store it somewhere I can access 
it?  At this point, I think that is our best chance of debugging this 
problem.


Which size do you prefer as bulk size and how many threads can/should 
> process bulks at the same time? As you can see in _nodes.txt we have 12 
> available processors...
> Should we may slow down bulk loader with adding a wait of a few seconds?


Bulks tend to be most efficient around 5-15mb in size.  For your machine, I 
would start with 12 concurrent threads and slowly increase from there.  If 
you start running into rejections from ES, that's the point where you stop 
increasing threads because you've filled the bulk queue with pending tasks 
and ES cannot keep up anymore.  With rejections, the general pattern is to 
wait a random time (1-5s) and then retry all the rejected actions in a new, 
smaller bulk.

-Zach



On Tuesday, March 18, 2014 8:44:00 AM UTC-4, Alexander Ott wrote:
>
> Attached the elasticsearch.yml and the curl -XGET 'localhost:9200/_nodes/'
> We have 5 shard per index and we have not enabled any codecs.
>
> Which size do you prefer as bulk size and how many threads can/should 
> process bulks at the same time? As you can see in _nodes.txt we have 12 
> available processors...
> Should we may slow down bulk loader with adding a wait of a few seconds?
>
> Am Dienstag, 18. März 2014 13:22:57 UTC+1 schrieb Zachary Tong:
>>
>> My observations from your Node Stats
>>
>>    - Your node tends to have around 20-25 merges happening at any given 
>>    time.  The default max is 10...have you changed any of the merge policy 
>>    settings?  Can you attach your elasticsearch.yml?
>>    - At one point, your segments were using 24gb of the heap (due to 
>>    associated memory structures like bloom filter, etc).  How many primary 
>>    shards are in your index?
>>    - Your bulks look mostly ok, but you are getting rejections.  I'd 
>>    slow the bulk loader down a little bit (rejections mean ES is overloaded)
>>
>> If you can take a heap dump, I would be willing to load it up and look 
>> through the allocated objects.  That would be the fastest way to identify 
>> what is eating your heap and start to work on why.  To take a heap dump run 
>> this, zip it up and save somewhere:  jmap -dump:format=b,file=dump.bin 
>> <javaProcessIdHere>
>>
>> As an aside, it's hard to help debug when you don't answer all of the 
>> questions I've asked :P
>>
>> Unanswered questions from upthread:
>>
>>    - Have you enabled any codecs and/or changed the `posting_format` of 
>>    any fields in your document?
>>    - curl -XGET 'localhost:9200/_nodes/'
>>
>>
>> Hope we can get this sorted for you soon!
>> -Zach
>>
>>
>> On Tuesday, March 18, 2014 5:29:40 AM UTC-4, Alexander Ott wrote:
>>>
>>> Attached the captured node stats and again the newest es_log.
>>> I changed the garbage collector from UseParNewGC to UseG1GC with the 
>>> result that the OutOfMemoryError doesn't occur. But as you can see in the 
>>> attached es_log file the warnings of monitor.jvm are still present.
>>>
>>>
>>> Am Montag, 17. März 2014 14:32:29 UTC+1 schrieb Zachary Tong:
>>>>
>>>> Ah, sorry, I misread your JVM stats dump (thought it was one long list, 
>>>> instead of multiple calls to the same API).  With a single node cluster, 
>>>> 20 
>>>> concurrent bulks may be too many.  Bulk requests have to sit in memory 
>>>> while they are waiting to be processed, so it is possible to eat up your 
>>>> heap with many pending bulk requests just hanging out, especially if they 
>>>> are very large.  I'll know more once I can see the Node Stats output.
>>>>
>>>> More questions! :)
>>>>
>>>>    - How big are your documents on average?
>>>>    - Have you enabled any codecs and/or changed the `posting_format` 
>>>>    of any fields in your document?
>>>>    - Are you using warmers?
>>>>    
>>>>
>>>>
>>>>
>>>> On Monday, March 17, 2014 8:36:04 AM UTC-4, Alexander Ott wrote:
>>>>>
>>>>> At the moment i can provide only the jvm stats ... i will capture the 
>>>>> other stats as soon as possible.
>>>>>
>>>>> We use 5-20 threads which will proccess bulks with a max size of 100 
>>>>> entries.
>>>>> We only use one node/maschine for development so we have no cluster 
>>>>> for development...
>>>>> The maschine has 64gb RAM and we increase the heap from 16gb to 32gb...
>>>>>  
>>>>>
>>>>> Am Montag, 17. März 2014 12:21:09 UTC+1 schrieb Zachary Tong:
>>>>>>
>>>>>> Can you attach the full Node Stats and Node Info output?  There were 
>>>>>> other stats/metrics that I wanted to check (such as field data, bulk 
>>>>>> queue/size, etc).
>>>>>>
>>>>>>    - How large (physically, in kb/mb) are your bulk indexing 
>>>>>>    requests?  Bulks should be 5-15mb in size
>>>>>>    - How many concurrent bulks are you performing?  Given you 
>>>>>>    cluster size, a good number should probably be around 20-30
>>>>>>    - Are you distributing bulks evenly across the cluster?
>>>>>>    - I see that your heap is 32gb.  How big are these machines?
>>>>>>
>>>>>>
>>>>>> -Zach
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Monday, March 17, 2014 5:33:30 AM UTC-4, Alexander Ott wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> attatched you can find the es_log and the captured node jvm stats.
>>>>>>> We are only indexing at this time and we use bulk requests.
>>>>>>>
>>>>>>> As you can see at log entry "2014-03-14 21:18:59,873" in es_log... 
>>>>>>> at this time our indexing process finished and afterwards the OOM 
>>>>>>> occurs...
>>>>>>>
>>>>>>>
>>>>>>> Am Freitag, 14. März 2014 14:47:14 UTC+1 schrieb Zachary Tong:
>>>>>>>>
>>>>>>>> Are you running searches at the same time, or only indexing?  Are 
>>>>>>>> you bulk indexing?  How big (in physical kb/mb) are your bulk requests?
>>>>>>>>
>>>>>>>> Can you attach the output of these APIs (preferably during memory 
>>>>>>>> buildup but before the OOM):
>>>>>>>>
>>>>>>>>    - curl -XGET 'localhost:9200/_nodes/'
>>>>>>>>    - curl -XGET 'localhost:9200/_nodes/stats'
>>>>>>>>
>>>>>>>> I would recommend downgrading your JVM to Java 1.7.0_u25.  There 
>>>>>>>> are known sigsegv bugs in the most recent versions of the JVM which 
>>>>>>>> have 
>>>>>>>> not been fixed yet.  It should be unrelated to your problem, but best 
>>>>>>>> to 
>>>>>>>> rule the JVM out.
>>>>>>>>
>>>>>>>> I would not touch any of those configs.  In general, when debugging 
>>>>>>>> problems it is best to restore as many of the configs to their default 
>>>>>>>> settings as possible.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Friday, March 14, 2014 5:46:12 AM UTC-4, Alexander Ott wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> we always run in an OutOfMemoryError while indexing documents or 
>>>>>>>>> shortly afterwards.
>>>>>>>>> We only have one instance of elasticsearch version 1.0.1 (no 
>>>>>>>>> cluster)
>>>>>>>>>
>>>>>>>>> Index informations:
>>>>>>>>> size: 203G (203G)
>>>>>>>>> docs: 237.354.313 (237.354.313)
>>>>>>>>>
>>>>>>>>> Our JVM settings as following:
>>>>>>>>>
>>>>>>>>> /usr/lib/jvm/java-7-oracle/bin/java -Xms16g -Xmx16g -Xss256k -
>>>>>>>>> Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -
>>>>>>>>> XX:CMSInitiatingOccupancyFraction=75 -XX:+
>>>>>>>>> UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -
>>>>>>>>> Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid -Des.path.
>>>>>>>>> home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/
>>>>>>>>> elasticsearch-1.0.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
>>>>>>>>>  
>>>>>>>>> -Des.default.config=/etc/elasticsearch/elasticsearch.yml 
>>>>>>>>> -Des.default.path.home=/usr/share/elasticsearch 
>>>>>>>>> -Des.default.path.logs=/var/log/elasticsearch 
>>>>>>>>> -Des.default.path.data=/var/lib/elasticsearch 
>>>>>>>>> -Des.default.path.work=/tmp/elasticsearch 
>>>>>>>>> -Des.default.path.conf=/etc/elasticsearch 
>>>>>>>>> org.elasticsearch.bootstrap.Elasticsearch
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> OutOfMemoryError:
>>>>>>>>> [2014-03-12 01:27:27,964][INFO ][monitor.jvm              ] 
>>>>>>>>> [Stiletto] [gc][old][32451][309] duration [5.1s], collections 
>>>>>>>>> [1]/[5.9s], 
>>>>>>>>> total [5.1s]/[3.1m], memory [15.8gb]->[15.7gb]/[15.9gb], all_pools 
>>>>>>>>> {[young] 
>>>>>>>>> [665.6mb]->[583.7mb]/[665.6mb]}{[survivor] 
>>>>>>>>> [32.9mb]->[0b]/[83.1mb]}{[old] 
>>>>>>>>> [15.1gb]->[15.1gb]/[15.1gb]}
>>>>>>>>> [2014-03-12 01:28:23,822][INFO ][monitor.jvm              ] 
>>>>>>>>> [Stiletto] [gc][old][32466][322] duration [5s], collections 
>>>>>>>>> [1]/[5.9s], 
>>>>>>>>> total [5s]/[3.8m], memory [15.8gb]->[15.8gb]/[15.9gb], all_pools 
>>>>>>>>> {[young] 
>>>>>>>>> [652.5mb]->[663.8mb]/[665.6mb]}{[survivor] [0b]->[0b]/[83.1mb]}{[old] 
>>>>>>>>> [15.1gb]->[15.1gb]/[15.1gb]}
>>>>>>>>> [2014-03-12 01:33:29,814][WARN ][index.merge.scheduler    ] 
>>>>>>>>> [Stiletto] [myIndex][0] failed to merge
>>>>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.util.fst.BytesStore.writeByte(BytesStore.java:83)
>>>>>>>>>         at org.apache.lucene.util.fst.FST.<init>(FST.java:282)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.util.fst.Builder.<init>(Builder.java:163)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.codecs.BlockTreeTermsWriter$PendingBlock.compileIndex(BlockTreeTermsWriter.java:420)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:569)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter$FindBlocks.freeze(BlockTreeTermsWriter.java:544)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.util.fst.Builder.freezeTail(Builder.java:214)
>>>>>>>>>         at org.apache.lucene.util.fst.Builder.add(Builder.java:394)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:166)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:383)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4071)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3668)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>>>>>>>>>
>>>>>>>>> We also increased heap to 32g but with the same result
>>>>>>>>> [2014-03-12 22:39:53,817][INFO ][monitor.jvm              ] 
>>>>>>>>> [Charcoal] [gc][old][32895][86] duration [6.9s], collections 
>>>>>>>>> [1]/[7.3s], 
>>>>>>>>> total [6.9s]/[19.6s], memory [20.5gb]->[12.7gb]/[31.9gb], all_pools 
>>>>>>>>> {[youn
>>>>>>>>> g] [654.9mb]->[1.9mb]/[665.6mb]}{[survivor] 
>>>>>>>>> [83.1mb]->[0b]/[83.1mb]}{[old] [19.8gb]->[12.7gb]/[31.1gb]}
>>>>>>>>> [2014-03-12 23:11:07,015][INFO ][monitor.jvm              ] 
>>>>>>>>> [Charcoal] [gc][old][34750][166] duration [8s], collections 
>>>>>>>>> [1]/[8.6s], 
>>>>>>>>> total [8s]/[29.1s], memory [30.9gb]->[30.9gb]/[31.9gb], all_pools 
>>>>>>>>> {[young]
>>>>>>>>> [660.6mb]->[1mb]/[665.6mb]}{[survivor] 
>>>>>>>>> [83.1mb]->[0b]/[83.1mb]}{[old] [30.2gb]->[30.9gb]/[31.1gb]}
>>>>>>>>> [2014-03-12 23:12:18,117][INFO ][monitor.jvm              ] 
>>>>>>>>> [Charcoal] [gc][old][34812][182] duration [7.1s], collections 
>>>>>>>>> [1]/[8.1s], 
>>>>>>>>> total [7.1s]/[36.6s], memory [31.5gb]->[31.5gb]/[31.9gb], all_pools 
>>>>>>>>> {[you
>>>>>>>>> ng] [655.6mb]->[410.3mb]/[665.6mb]}{[survivor] 
>>>>>>>>> [0b]->[0b]/[83.1mb]}{[old] [30.9gb]->[31.1gb]/[31.1gb]}
>>>>>>>>> [2014-03-12 23:12:56,294][INFO ][monitor.jvm              ] 
>>>>>>>>> [Charcoal] [gc][old][34844][193] duration [7.1s], collections 
>>>>>>>>> [1]/[7.1s], 
>>>>>>>>> total [7.1s]/[43.9s], memory [31.9gb]->[31.9gb]/[31.9gb], all_pools 
>>>>>>>>> {[you
>>>>>>>>> ng] [665.6mb]->[665.2mb]/[665.6mb]}{[survivor] 
>>>>>>>>> [81.9mb]->[82.8mb]/[83.1mb]}{[old] [31.1gb]->[31.1gb]/[31.1gb]}
>>>>>>>>> [2014-03-12 23:13:11,836][WARN ][index.merge.scheduler    ] 
>>>>>>>>> [Charcoal] [myIndex][3] failed to merge
>>>>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:228)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:188)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:159)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:516)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:232)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:127)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4071)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3668)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
>>>>>>>>>         at 
>>>>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
>>>>>>>>>
>>>>>>>>> *java version: *
>>>>>>>>> java version "1.7.0_51" 
>>>>>>>>> Java(TM) SE Runtime Environment (build 1.7.0_51-b13) 
>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
>>>>>>>>>
>>>>>>>>> *Elasticsearch.yml*: Settings which may should enabled?
>>>>>>>>> #indices.memory.index_buffer_size: 40%
>>>>>>>>> #indices.store.throttle.type: merge
>>>>>>>>> #indices.store.throttle.max_bytes_per_sec: 50mb
>>>>>>>>> #index.refresh_interval: 2s
>>>>>>>>> #index.fielddata.cache: soft
>>>>>>>>> #index.store.type: mmapfs
>>>>>>>>> #index.fielddata.cache.size: 20%
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Any ideas how to solve this problem? Why old gen won't be clean 
>>>>>>>>> up? shouldn't it?
>>>>>>>>>
>>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/66754aee-0b34-4b03-be18-d0aba1781b0c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to