All that really matters here (at least, on a high level) is the size of the index.
On 31 January 2015 at 02:21, Chris Neal <chris.n...@derbysoft.net> wrote: > Thank you for the reply, Mark. > > Heaps are adjusted to 30GB (I liked round numbers :)). > > 50GB is a good max shard size to keep in mind, and I'll adjust index > groupings as needed based on that. > > With regards to number of indexes, here is what I was thinking, and please > tell me if I'm off base here. > > With all log files going to a single daily index, assume log file A is > 45GB of data (in its own _type), and log file B is 5GB of data (in its own > _type). Searching for data in log file B is "penalized" in terms of search > performance because ES loads terms from the index (based on some predictive > algorithm). Also the heap is "penalized" because it now has terms loaded > from this large index that it probably will not need. > > If log file B is instead gathered into its own index, then it is both > faster from a search performance perspective, and less pressure on the heap > because there are far less terms loaded by ES. > > Maybe I'm incorrect in my assumptions though about how ES does its work, > and all I *really* care about is raw index size? Perhaps both the > predictive term loading done by ES, and its search logic is savvy enough to > restrict itself to the _type specified in the query? > > Thank you again for your help! I'm getting a better understanding for > sure. :) > Chris > > On Tue, Jan 27, 2015 at 7:01 PM, Mark Walkom <markwal...@gmail.com> wrote: > >> Be aware that we do not yet officially support G1GC. You should also >> reduce your heap to 31GB. >> >> Ideally you want to keep shard size below 50GB, so you will need to >> adjust things as you grow. Be careful creating a lot of indices though, >> each one takes overhead and if you increase the number of indices and the >> amount of data you have in them you could be wasting resources. >> >> However when querying, 100 indices with 1 shard is the same as 1 index >> with 100 shards. >> >> On 28 January 2015 at 10:11, Chris Neal <chris.n...@derbysoft.net> wrote: >> >>> Hi all, >>> >>> I've seen lots of posts about this, and want to make sure I'm >>> understanding correctly. >>> >>> Background: >>> >>> - Our cluster has 6 servers. They are Dell R720xd with 64GB RAM, >>> 2xE5-2600v2 CPU (2 sockets, 6 cores/socket), 16TB disk >>> - Elasticsearch is set to have 6 shards, and 1 replica, giving two >>> shards per server. I'm giving ES 32GB heaps on Java 1.7 with G1 GC. >>> >>> >>> I'm concerned about the size of our indexes. Right now, we store all >>> data in one index per day, with various types within that to separate data. >>> >>> >>> The indexes are averaging about 50GB/day (not including replicas). >>> Shard size is 8GB each. >>> >>> We have a LOT more data to index. At least 20x more. Should I be >>> concerned with indexes of that size (~1000GB) and shards of that size >>> (~160GB)? Is it merely a question of having enough hardware, or is there >>> more to it? >>> >>> I'm considering splitting the data into a different indexing strategy so >>> that the index size is smaller, but there are more of them. The result is >>> the amount of data is the same, so I'm not sure if that will do anything or >>> not. >>> >>> If I'm optimizing for searching, does querying multiple smaller indices >>> perform better than querying fewer larger ones? >>> >>> Thank you for your time. >>> Chris >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearch+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/CAND3Dpgr78LJ%3DcWb0ZbyHZqMin4tDSVPvjG%3D_PYgsQym9EzZ%3Dg%40mail.gmail.com >>> <https://groups.google.com/d/msgid/elasticsearch/CAND3Dpgr78LJ%3DcWb0ZbyHZqMin4tDSVPvjG%3D_PYgsQym9EzZ%3Dg%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9Tmoc20khrdn85eO%2B7eptq0SNGwUd1-6XfBoH0cs8-Hw%40mail.gmail.com >> <https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9Tmoc20khrdn85eO%2B7eptq0SNGwUd1-6XfBoH0cs8-Hw%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAND3DpgDNsFquMJw2T7pOZMHhnimfYAHxH3iSnRnCqx_9k40-w%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CAND3DpgDNsFquMJw2T7pOZMHhnimfYAHxH3iSnRnCqx_9k40-w%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9KH1j8vLr82i9rZDE3O1JC-h407L_vTkZQrCw8jYs%2Bqw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.