Thanks Mike. I’m still a bit unclear on these comments: > IndexReader requires some RAM for each segment to hold structures like live > docs, terms index, index data structures for doc values fields, and holds > open a number of file descriptors in proportion to how many segments are in > the index. > There is also a per-indexed-field cost in Lucene; if you have a great many > unique indexed fields that may matter.
Aren’t these structures dependent on the size of the “lucene index"? Say if I have 1 large lucene index vs 10 small lucene indices (considering not much duplicated data across indices) wouldn’t the total memory used be the same? I understand that there will be more file descriptors because there will be more segments. > IndexWriter has a RAM buffer (indices.memory.index_buffer_size in ES) to hold > recently indexed/deleted documents, and periodically opens readers (10 at a > time by default) to do merging, which bumps up RAM usage and file descriptors > while the merge runs. According to the doc at https://github.com/elasticsearch/elasticsearch/blob/master/docs/reference/modules/indices.asciidoc <https://github.com/elasticsearch/elasticsearch/blob/master/docs/reference/modules/indices.asciidoc> seems like indices.memory.index_buffer_size is the “total” size of the buffer for all the shards on a node, so not sure how this would matter in case of having too many shards. I understand that there will be more file descriptors and a lot more “smaller” merge jobs running. I’m going to test this myself, but I just wanted to understand the model better first so I have more accurate tests. Thanks again, Drew > On Jan 23, 2015, at 2:18 AM, Michael McCandless <m...@elasticsearch.com> > wrote: > > There is definitely a non-trivial per-index cost. > > From Lucene's standpoint, ES holds an IndexReader (for searching) and > IndexWriter (for indexing) open. > > IndexReader requires some RAM for each segment to hold structures like live > docs, terms index, index data structures for doc values fields, and holds > open a number of file descriptors in proportion to how many segments are in > the index. > > IndexWriter has a RAM buffer (indices.memory.index_buffer_size in ES) to hold > recently indexed/deleted documents, and periodically opens readers (10 at a > time by default) to do merging, which bumps up RAM usage and file descriptors > while the merge runs. > > There is also a per-indexed-field cost in Lucene; if you have a great many > unique indexed fields that may matter. > > If you use field data, it's entirely RAM resident (doc values is a better > choice since it uses much less RAM). > > ES has common thread pools on the node which are shared for all ops across > all shards on that node, so I don't think more indices translates to more > threads. > > Net/net you really should just conduct your own tests to get a feel of > resource consumption in your use case... > > Mike McCandless > > http://blog.mikemccandless.com <http://blog.mikemccandless.com/> > On Thu, Jan 22, 2015 at 4:07 PM, Drew Kutcharian <d...@venarc.com > <mailto:d...@venarc.com>> wrote: > Hi, > > I just came across this blog post: > http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-searching.html > <http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-searching.html> > > Seems like there has been a lot of work done on Lucene to reduce its memory > requirements and even more on Lucene 5.0. This is specifically interesting to > me since I’m working on a project that uses Elasticsearch and we are planning > on using 1 index per customer model (each with 1 or maybe 2 shards and no > replicas) and shard allocation, mainly because: > > 1. We are going to have few thousand customers at most > > 2. Each customer will only need access to their own data (no global queries) > > 3. The indices are going be relatively large (each with millions of small > docs) > > 4. We are going to need to do a lot of parent/child type queries (and ES > doesn’t support cross-shard parent/child relationships and the parent id > cache seems not that efficient, see > http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child.html > > <http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/parent-child.html> > and > https://github.com/elasticsearch/elasticsearch/issues/3516#issuecomment-23081662 > > <https://github.com/elasticsearch/elasticsearch/issues/3516#issuecomment-23081662>). > This is the main reason we feel we can’t use time based (daily, monthly, …) > indices. > > 5. Being able to easily “drop” an index if a customer leaves the initial > trial. > > > I wanted to better understand the overheads of an Elasticsearch shard. Is it > just memory or CPU/threads too? Where can I find more information about this? > > Thanks, > > Drew > > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com > <mailto:elasticsearch+unsubscr...@googlegroups.com>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/F59813A2-904C-4B29-BBC9-6174DD3C8DAF%40venarc.com > > <https://groups.google.com/d/msgid/elasticsearch/F59813A2-904C-4B29-BBC9-6174DD3C8DAF%40venarc.com?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. > > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com > <mailto:elasticsearch+unsubscr...@googlegroups.com>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAD7smRcpOy6RYgvi-GC6jpsuO1-qsRcTecUvr066Rkr3qxZijA%40mail.gmail.com > > <https://groups.google.com/d/msgid/elasticsearch/CAD7smRcpOy6RYgvi-GC6jpsuO1-qsRcTecUvr066Rkr3qxZijA%40mail.gmail.com?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/85AA9AA2-2B5A-49DF-969F-96F5C3438290%40venarc.com. For more options, visit https://groups.google.com/d/optout.