Re: Bulk indexing creates a lot of disk read OPS

David Pilato Fri, 24 Apr 2015 01:03:02 -0700

Merging segments could be the cause here?

David


> Le 24 avr. 2015 à 09:54, Eran <era...@gmail.com> a écrit :
> 
> Forgot some stats:
> 
> I have 10 shards, no replicas, all on the same machine.
> ATM, there are some 1.5 billion records in the index.
> 
> 
>> On Friday, April 24, 2015 at 10:18:27 AM UTC+3, Eran wrote:
>> attachments hereby
>> 
>>> On Friday, April 24, 2015 at 9:49:56 AM UTC+3, Eran wrote:
>>> Hello,
>>> 
>>> I've created an index I use for logging.
>>> 
>>> This means there are mostly writes, and some searches once in a while.
>>> In the phase of the first loading, I'm using several clients to 
>>> concurrently index documents using the bulk API.
>>> 
>>> At first, indexing takes 200 ms for a bulk of 5000 documents.
>>> As time goes by, the indexing time increases, and gets to 1000-4500 ms.
>>> 
>>> I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, with 
>>> an IO provisioned volume set to 7000 IOPS.
>>> 
>>> Looking at the metrics, I see that the CPU and memory are fine, the write 
>>> IOPS are at 300, but the read IOPS have slowly gone up and got to 7000.
>>> 
>>> How come I'm only indexing, but most of the IOPS are read?
>>> 
>>> I am attaching some screen captures from the BigDesk plugin, that show the 
>>> two states of the index, ater about 20% of the graphs is the point in time 
>>> where I stopped the clients, so you can see the load drop of.
>>> 
>>> My settings are:
>>> 
>>> threadpool.bulk.type: fixed
>>> threadpool.bulk.size: 32                 # availableProcessors
>>> threadpool.bulk.queue_size: 1000
>>> 
>>> # Indices settings
>>> indices.memory.index_buffer_size: 50%
>>>                                                                             
>>>                                                                             
>>>          376,1         97%
>>> indices.cache.filter.expire: 6h
>>> 
>>> bootstrap.mlockall: true
>>> 
>>> 
>>> and I've change the index settings to:
>>> 
>>> {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"50000"}}}
>>> I also tried "refresh_interval":"-1"
>>> 
>>> 
>>> Please let me know what else I need to provide if needed (settings, logs, 
>>> metrics)
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/a64e78f3-5d69-4ca1-a3c9-86735a25343d%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/590AAAE0-75D2-45D2-B105-864444DF6521%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk indexing creates a lot of disk read OPS

Reply via email to