Hi Jörg, Thanks for your reply - that's given me a number of leads to follow up on.
> Errors in allocating direct buffers will result in Java errors. You mention Linux memory errors but unfortunately you do not quote it, so I have to guess. We see nothing useful in elasticsearch logs. What we do see is either the console saying, "Out of memory: Kill process ... score 1 or sacrifice child" or, once, we saw, "Loading dm-mirror.ko module, Waiting for required block device discovery, Waiting for 2 sda-like device(s)...Kernel panic - not syncing: Out of memory and no killable processes". The first message I understand as the OOM-Killer coming out to whack a process on the head. I don't understand the last one. I have screenshots of these if required. > You should have enabled memory mapped files by index store mmapfs (default on RHEL) We haven't changed this setting so I expect it is the default. I looked for a way to verify this but the es api appears not to return it. > bootstrap.mlockall = true...set memlock to unlimited Yes - both done. > If you still encounter issues from Linux OS errors it is most probably because of VMware limitations Is there a way to get evidence to show this? I reviewed the VMWare event log and there was no ballooning in there (assuming we were looking at the right spot). > If you run a VM, you should assign at most 50% of the configured guest OS memory to ES. We use the elasticsearch Puppet module but I modified it with a version of the code in the elasticsearch Chef cookbook to automatically assign this - where it appears to be assigning 60%. I was surprised by this too but I copied it on the assumption that the cookbook writer knew what they were doing. I've raised an issue to ask the question: https://github.com/elasticsearch/cookbook-elasticsearch/issues/209 For the curious: I've setup some monitoring to capture /proc/meminfo, the count of the /proc/<pid>/maps for elasticsearch and Flume as well as the top few entries in top by memory usage. Now I'm just waiting for the next failure. Thanks for any help provided. Cheers, Edward On Tuesday, May 6, 2014 3:23:10 PM UTC-7, Jörg Prante wrote: > > Yes, of course Elasticsearch is using off-heap memory. All the Lucene > index I/O is using direct buffers in native OS memory. > > Errors in allocating direct buffers will result in Java errors. You > mention Linux memory errors but unfortunately you do not quote it, so I > have to guess. > > You should have enabled memory mapped files by index store mmapfs (default > on RHEL) so all files that are read by ES are mapped into virtual address > space of the OS VM management. > > And also bootstrap.mlockall = true, so you also need to set memlock to > unlimited in /etc/security/limits.conf, because RHEL/Centos memlockable > memory is limited to 25% of RAM by default. In that case, Java should throw > an IOException "Map failed". > > Note, because of the memory page lock support of the host OS, you should > also check what kind of virtualization you have enabled for the guest, it > should be HW (full) virtualization, not paravirtualization. > > If you still encounter issues from Linux OS errors it is most probably > because of VMware limitations, so you should disable the bootstrap.mlockall > setting. > > As a side note, the recommended heap size is 50% of the RAM that is > available to the ES process. If you run a VM, you should assign at most 50% > of the configured guest OS memory to ES. > > Jörg > > > On Tue, May 6, 2014 at 10:35 PM, Edward Sargisson > <ejs...@gmail.com<javascript:> > > wrote: > >> Hi all, >> We have a problem where our es nodes will fail with an out of memory >> error from Linux (note, not Java). Our es processes are configured with a >> fixed amount of heap (60% of total RAM - just as in in the elasticsearch >> chef cookbook). >> >> So, something is consuming all of the memory available to Linux. >> >> Is there any other memory that ES can use? Does it lock OS cache or >> buffer memory so that it can't be released? If it opens lots of files does >> it use up too much RAM? Is it doing off-heap allocation? (I'm pretty sure >> the answer is no to the last). >> >> We're struggling to find the exact memory resource being used up. >> >> For the record. this is ES 1.1.0 on CentOS 6.4 running in VMWare. >> >> Thanks! >> Edward >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/ab6421e3-89a1-409f-b89b-f09ca5bc9551%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/ab6421e3-89a1-409f-b89b-f09ca5bc9551%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5fe5ed6-7bfc-4ba2-ba81-cc56a4007a74%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.