Attached the index rate (using bigdesk): <https://lh5.googleusercontent.com/-Jve-j75qB9o/U0WgK5ZMvMI/AAAAAAAAAFo/5_WZuCryeRw/s1600/bigdesk.png>
The indexing requests per second is around 2K and the Indexing time per second is around 3K On Wednesday, April 9, 2014 9:36:12 PM UTC+3, Yitzhak Kesselman wrote: > Answers inline. > > Regarding the slow I/O. When I analyzed the creation of the Lucene index > files I see that they are created without any special flags (such as no > buffering or write through). This means that we’re paying costs twice – > when we write the file we’re going cache data in Windows’ Cache Manager, > which takes a lot of memory (which is then not available to the application > itself) but when we read the file we don’t actually read it using the > cache, which makes the operation slow. *Any ideas?* > > > On Wednesday, April 9, 2014 5:28:11 PM UTC+3, Itamar Syn-Hershko wrote: > >> Shooting in the dark here, but here it goes: >> >> 1. Do you have anything else running on the system? for example AVs are >> known to cause slow-downs for such services, and other I/O or memory heavy >> services could cause thrashing or just general slowdown >> > No, nothing else is running on that machine. Initially it was working fast > it got slower with that amount of data that in index. Moreover is there a > way to increase buffer size for the Lucene index files (.tim, .doc, and > .pos) from 8K to something much bigger. > >> >> 2. What JVM version are you running this with? >> > java version "1.7.0_51" > > Java(TM) SE Runtime Environment (build 1.7.0_51-b13) > > Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) > > OS_NAME="Windows" > > OS_VERSION="5.2" > > OS_ARCH="amd64" > > 3. If you changed any of the default settings for merge factors etc - can >> you revert that and try again? >> > Tried before was same behavior. > >> >> 4. Can you try with embedded=false and see if it makes a difference? >> > Tried before was same behavior. > >> >> -- >> >> Itamar Syn-Hershko >> http://code972.com | @synhershko <https://twitter.com/synhershko> >> Freelance Developer & Consultant >> Author of RavenDB in Action <http://manning.com/synhershko/> >> >> >> On Wed, Apr 9, 2014 at 4:11 PM, Yitzhak Kesselman <ikess...@gmail.com>wrote: >> >>> Hi, >>> >>> >>> >>> I have configured a single node ES with logstash 1.4.0 (8GB memory) with >>> the following configuration: >>> >>> - >>> >>> index.number_of_shards: 7 >>> - >>> >>> number_of_replicas: 0 >>> - >>> >>> refresh_interval: -1 >>> - >>> >>> translog.flush_threshold_ops: 100000 >>> - >>> >>> merge.policy.merge_factor: 30 >>> - >>> >>> codec.bloom.load: false >>> - >>> >>> min_shard_index_buffer_size: 12m >>> - >>> >>> compound_format : true >>> - >>> >>> indices.fielddata.cache.size: 15% >>> - >>> >>> indices.fielddata.cache.expire: 5m >>> - >>> >>> indices.cache.filter.size: 15% >>> - >>> >>> indices.cache.filter.expire: 5m >>> >>> Machine : 16GB RAM, Intel I&-2600 CPU @ 3.4GHZ. >>> OS: 64bit WindowsServer 2012 R2 >>> >>> My raw data is CSV file and I use grok as a filter to parse it with >>> output configuration (elasticsearch { embedded => true flush_size => >>> 100000 idle_flush_time => 30 }). >>> Row data size is about 100GB events per day which ES tries to input into >>> one index (with 7 shards). >>> >>> At the beginning the insert was fast however after a while >>> it's got extremely slow, 1.5K doc in 8K seconds :( >>> >>> Currently the index has around 140Million docs with size of 55GB. >>> >>> >>> >>> When I have analyzed the write to the disk with ProcMon I have seen that >>> the process is writing in an interleaved manner to three kinds of files >>> (.tim, .doc, and .pos) in 4K and 8K segments, instead of batching writes to >>> some reasonable number. >>> >>> >>> >>> Appreciate the help. >>> >>> >>> >>> All the best, >>> >>> Yitzhak >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/aa0193d2-cb6a-49e4-b4a3-a2f821f732f8%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c6d0aa9-110e-4e6a-9aa2-de0d5b487c6a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.