Hi Michael, Did you get a chance to look at the hot_threads and iostat output? I also tried with EBS Provisioned SSB with 4000 IOPS and with that I was able to ingest only at around 30K per second after which there are EsRejectedExecutionException. There were 4 elasticsearch instances of type c3.2xlarge. CPU utilization was around 650% (out of 800). The iostat output on the instances looks like this:
avg-cpu: %user %nice %system %iowait %steal %idle 1.66 0.00 0.14 0.15 0.04 98.01 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn xvdep1 7.86 36.95 266.05 392378 2825424 xvdf 0.03 0.20 0.00 2146 8 xvdg 0.03 0.21 0.07 2178 736 *xvdj 52.53 0.33 2693.62 3506 28605624* On an instance store SSD I can go upto 48K per second with occasional occurrences of EsRejectedExecutionException. Do you think I should try storage optimized instances like i2.xlarge or i2.2xlarge to handle this kind of load? Regards, Srinath. On Wed, Jul 16, 2014 at 5:57 PM, Srinath C <srinat...@gmail.com> wrote: > Hi Michael, > You were right. Its the IO that was the bottleneck. The data was being > written into a standard EBS device - no provisioned IOPS. > > After redirecting data into the local instance store SSD storage, I was > able to get to a rate of around 50-55K without any EsRejectExceptions. The > CPU utilization too is not too high - around 200 - 400%. I have attached > the hot_threads output with this email. After running for around 1.5 hrs I > could see a lot of EsRejectedExecutionException for certain periods of time. > > std_ebs_all_fine.txt - when using standard EBS. Around 25K docs per > second. No EsRejectedExecutionExceptions. > std_ebs_bulk_rejects.txt - when using standard EBS. Around 28K docs per > second. No EsRejectedExecutionExceptions. > > instance_ssd_40K.txt - when using instance store SSD. Around 40K docs per > second. No EsRejectedExecutionExceptions. > instance_ssd_60K_few_rejects.txt - when using instance store SSD. Around > 60K docs per second. Some EsRejectedExecutionExceptions were seen. > instance_ssd_60K_lot_of_rejects.txt - when using instance store SSD. > Around 60K docs per second. A lot of EsRejectedExecutionExceptions were > seen. > > Also attaching the iostat output for these instances. > > Regards, > Srinath. > > > > > On Wed, Jul 16, 2014 at 3:34 PM, joergpra...@gmail.com < > joergpra...@gmail.com> wrote: > >> Adding to this recommendations, I would suggest running iostat tool to >> monitor for any suspicious "%iowait" states while >> ESRejectedExecutionExceptions do arise. >> >> Jörg >> >> >> On Wed, Jul 16, 2014 at 11:53 AM, Michael McCandless < >> m...@elasticsearch.com> wrote: >> >>> Where is the index stored in your EC2 instances? It's it just an EBS >>> attached storage (magnetic or SSDs? provisioned IOPs or the default). >>> >>> Maybe try putting the index on the SSD instance storage instead? I >>> realize this is not a long term solution (limited storage, and it's cleared >>> on reboot), but it would be a simple test to see if the IO limitations of >>> EBS is the bottleneck here. >>> >>> Can you capture the hot threads output when you're at 200% CPU after >>> indexing for a while? >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> >>> On Wed, Jul 16, 2014 at 3:03 AM, Srinath C <srinat...@gmail.com> wrote: >>> >>>> Hi Joe/Michael, >>>> I tried all your suggestions and found a remarkable difference in >>>> the way elasticsearch is able to handle the bulk indexing. >>>> Right now, I'm able to ingest at the rate of 25K per second with the >>>> same setup. But occasionally there are still some >>>> EsRejectedExecutionException being raised. The CPUUtilization on the >>>> elasticsearch nodes is so low (around 200% on an 8 core system) that it >>>> seems that something else is wrong. I have also tried to increase >>>> queue_size but it just delays the EsRejectedExecutionException. >>>> >>>> Any more suggestions on how to handle this? >>>> >>>> *Current setup*: 4 c3.2xlarge instances of ES 1.2.2. >>>> *Current Configurations*: >>>> index.codec.bloom.load: false >>>> index.compound_format: false >>>> index.compound_on_flush: false >>>> index.merge.policy.max_merge_at_once: 4 >>>> index.merge.policy.max_merge_at_once_explicit: 4 >>>> index.merge.policy.max_merged_segment: 1gb >>>> index.merge.policy.segments_per_tier: 4 >>>> index.merge.policy.type: tiered >>>> index.merge.scheduler.max_thread_count: 4 >>>> index.merge.scheduler.type: concurrent >>>> index.refresh_interval: 10s >>>> index.translog.flush_threshold_ops: 50000 >>>> index.translog.interval: 10s >>>> index.warmer.enabled: false >>>> indices.memory.index_buffer_size: 50% >>>> indices.store.throttle.type: none >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Jul 15, 2014 at 6:24 PM, Srinath C <srinat...@gmail.com> wrote: >>>> >>>>> Thanks Joe, Michael and all. Really appreciate you help. >>>>> I'll try out as per your suggestions and run the tests. Will post back >>>>> on my progress. >>>>> >>>>> >>>>> >>>>> On Tue, Jul 15, 2014 at 3:17 PM, Michael McCandless < >>>>> m...@elasticsearch.com> wrote: >>>>> >>>>>> First off, upgrade ES to the latest (1.2.2) release; there have been >>>>>> a number of bulk indexing improvements since 1.1. >>>>>> >>>>>> Second, disable merge IO throttling. >>>>>> >>>>>> Third, use the default settings, but increase index.refresh_interval >>>>>> to perhaps 5s, and set index.translog.flush_threshold_ops to maybe 50000: >>>>>> this decreases the frequency of Lucene level commits (= filesystem >>>>>> fsyncs). >>>>>> >>>>>> If possible, use SSDs: they are much faster for merging. >>>>>> >>>>>> Mike McCandless >>>>>> >>>>>> http://blog.mikemccandless.com >>>>>> >>>>>> >>>>>> On Mon, Jul 14, 2014 at 11:03 PM, Srinath C <srinat...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> Each document is around 300 bytes on average so that bring up the >>>>>>> data rate to around 17Mb per sec. >>>>>>> This is running on ES version 1.1.1. I have been trying out >>>>>>> different values for these configurations. queue_size was increased >>>>>>> when I >>>>>>> got EsRejectedException due to queue going full (default size of 50). >>>>>>> segments_per_tier was picked up from some articles on scaling. What >>>>>>> would >>>>>>> be a reasonable value based on my data rate? >>>>>>> >>>>>>> If 60K seems to be too high are there any benchmarks available for >>>>>>> ElasticSearch? >>>>>>> >>>>>>> Thanks all for your replies. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote: >>>>>>> >>>>>>>> index.merge.policy.segments_per_tier: 100 and >>>>>>>> threadpool.bulk.queue_size: 500 are extreme settings that should >>>>>>>> be avoided as they allocate much resources. What you see by >>>>>>>> UnavailbleShardException >>>>>>>> / NoNodes is congestion because of such extreme values. >>>>>>>> >>>>>>>> What ES version is this? Why don't you use the default settings? >>>>>>>> >>>>>>>> Jörg >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jul 14, 2014 at 4:46 AM, Srinath C <srin...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> I'm having a tough time to keep ElasticSearch running healthily >>>>>>>>> for even 20-30 mins in my setup. At an indexing rate of 28-36K per >>>>>>>>> second, >>>>>>>>> the CPU utilization soon drops to 100% and never recovers. All client >>>>>>>>> requests fail with UnavailbleShardException or "No Nodes" exception. >>>>>>>>> The >>>>>>>>> logs show warnings from "monitor.jvm" saying that GC did not free up >>>>>>>>> much >>>>>>>>> of memory. >>>>>>>>> >>>>>>>>> The ultimate requirement is to import data into the ES cluster at >>>>>>>>> around 60K per second on a setup explained below. The only operation >>>>>>>>> being >>>>>>>>> performed is bulk import of documents. Soon the ES nodes become >>>>>>>>> unresponsive and the CPU utilization drops to 100% (from 400-500%). >>>>>>>>> They >>>>>>>>> don't seem to recover even after the bulk import operations are >>>>>>>>> ceased. >>>>>>>>> >>>>>>>>> Any suggestions on how to tune the GC based on my requirements? >>>>>>>>> What other information would be needed to look into this? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Srinath. >>>>>>>>> >>>>>>>>> >>>>>>>>> The setup: >>>>>>>>> - Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2. >>>>>>>>> - Load: The only operation during this test is bulk import of >>>>>>>>> data. The documents are small around the size of ~200-500 bytes and >>>>>>>>> are >>>>>>>>> being bulk imported into the cluster using storm. >>>>>>>>> - Bulk Import: A total of 7-9 storm workers using a single >>>>>>>>> BulkProcessor each to import data into the ES cluster. As seen from >>>>>>>>> the >>>>>>>>> logs, each of the worker processes are importing around 4K docs per >>>>>>>>> second >>>>>>>>> from each worker i.e. around 28-36K docs per second getting imported >>>>>>>>> into >>>>>>>>> ES. >>>>>>>>> - JVM Args: Around 8G of heap, tried with CMS collector as well >>>>>>>>> as G1 collector >>>>>>>>> - ES configuration: >>>>>>>>> - "mlockall": true >>>>>>>>> - "threadpool.bulk.size": 20 >>>>>>>>> - "threadpool.bulk.queue_size": 500 >>>>>>>>> - "indices.memory.index_buffer_size": "50%" >>>>>>>>> - "index.refresh_interval": "30s" >>>>>>>>> - "index.merge.policy.segments_per_tier": 100 >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "elasticsearch" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>>>>> >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/CAHhx- >>>>>>>>> GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com >>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "elasticsearch" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to elasticsearch+unsubscr...@googlegroups.com. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to a topic in >>>>>> the Google Groups "elasticsearch" group. >>>>>> To unsubscribe from this topic, visit >>>>>> https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe >>>>>> . >>>>>> To unsubscribe from this group and all its topics, send an email to >>>>>> elasticsearch+unsubscr...@googlegroups.com. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com >>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to elasticsearch+unsubscr...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearch+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com >>> <https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "elasticsearch" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKb5fV86CSgWJyDNgKepMi40KOHyxPqyxk38FPOEmP8g%40mail.gmail.com >> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKb5fV86CSgWJyDNgKepMi40KOHyxPqyxk38FPOEmP8g%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLL5MN_6E%2B4F%3Duq_4sNwYkgdBJuLNaJf%2BNLMdf7Ld_6mw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.