Re: Extremly slow troughput on large index

Yitzhak Kesselman Wed, 09 Apr 2014 12:34:10 -0700

Attached the index rate (using bigdesk):
<https://lh5.googleusercontent.com/-Jve-j75qB9o/U0WgK5ZMvMI/AAAAAAAAAFo/5_WZuCryeRw/s1600/bigdesk.png>


The indexing requests per second is around 2K and the Indexing time per 
second is around 3K

On Wednesday, April 9, 2014 9:36:12 PM UTC+3, Yitzhak Kesselman wrote:

> Answers inline.
>
> Regarding the slow I/O. When I analyzed the creation of the Lucene index 
> files I see that they are created without any special flags (such as no 
> buffering or write through). This means that we’re paying costs twice – 
> when we write the file we’re going cache data in Windows’ Cache Manager, 
> which takes a lot of memory (which is then not available to the application 
> itself) but when we read the file we don’t actually read it using the 
> cache, which makes the operation slow. *Any ideas?*
>
>
> On Wednesday, April 9, 2014 5:28:11 PM UTC+3, Itamar Syn-Hershko wrote:
>
>> Shooting in the dark here, but here it goes:
>>
>> 1. Do you have anything else running on the system? for example AVs are 
>> known to cause slow-downs for such services, and other I/O or memory heavy 
>> services could cause thrashing or just general slowdown
>>
> No, nothing else is running on that machine. Initially it was working fast 
> it got slower with that amount of data that in index. Moreover is there a 
> way to increase buffer size for the Lucene index files (.tim, .doc, and 
> .pos) from 8K to something much bigger.
>
>>
>> 2. What JVM version are you running this with?
>>
>  java version "1.7.0_51"
>
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
>
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
>
>  OS_NAME="Windows"
>
> OS_VERSION="5.2"
>
> OS_ARCH="amd64"
>
> 3. If you changed any of the default settings for merge factors etc - can 
>> you revert that and try again?
>>
>  Tried before was same behavior.
>
>>
>> 4. Can you try with embedded=false and see if it makes a difference?
>>
>  Tried before was same behavior.
>
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko <https://twitter.com/synhershko>
>> Freelance Developer & Consultant
>> Author of RavenDB in Action <http://manning.com/synhershko/>
>>
>>
>> On Wed, Apr 9, 2014 at 4:11 PM, Yitzhak Kesselman <ikess...@gmail.com>wrote:
>>
>>>  Hi,
>>>
>>>  
>>>
>>> I have configured a single node ES with logstash 1.4.0 (8GB memory) with 
>>> the following configuration:
>>>
>>>    - 
>>>    
>>>    index.number_of_shards: 7
>>>    - 
>>>    
>>>    number_of_replicas: 0
>>>    - 
>>>    
>>>    refresh_interval: -1
>>>    - 
>>>    
>>>    translog.flush_threshold_ops: 100000
>>>    - 
>>>    
>>>    merge.policy.merge_factor: 30
>>>    - 
>>>    
>>>    codec.bloom.load: false
>>>    - 
>>>    
>>>    min_shard_index_buffer_size: 12m
>>>    - 
>>>    
>>>    compound_format : true
>>>    - 
>>>    
>>>    indices.fielddata.cache.size: 15%
>>>    - 
>>>    
>>>    indices.fielddata.cache.expire: 5m
>>>    - 
>>>    
>>>    indices.cache.filter.size: 15%
>>>    - 
>>>    
>>>    indices.cache.filter.expire: 5m
>>>    
>>> Machine : 16GB RAM, Intel I&-2600 CPU @ 3.4GHZ.
>>> OS: 64bit WindowsServer 2012 R2
>>>
>>> My raw data is CSV file and I use grok as a filter to parse it with 
>>> output configuration (elasticsearch {  embedded => true flush_size => 
>>> 100000  idle_flush_time => 30 }).
>>> Row data size is about 100GB events per day which ES tries to input into 
>>> one index (with 7 shards).
>>>
>>> At the beginning the insert was fast however after a while 
>>> it's got extremely slow,  1.5K doc in 8K seconds :(
>>>
>>> Currently the index has around 140Million docs with size of 55GB.
>>>
>>>  
>>>
>>> When I have analyzed the write to the disk with ProcMon I have seen that 
>>> the process is writing in an interleaved manner to three kinds of files 
>>> (.tim, .doc, and .pos) in 4K and 8K segments, instead of batching writes to 
>>> some reasonable number.
>>>
>>>  
>>>
>>> Appreciate the help.
>>>  
>>>  
>>>
>>> All the best,
>>>
>>> Yitzhak
>>>  
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6c6d0aa9-110e-4e6a-9aa2-de0d5b487c6a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Extremly slow troughput on large index

Reply via email to