All that really matters here (at least, on a high level) is the size of the
index.

On 31 January 2015 at 02:21, Chris Neal <chris.n...@derbysoft.net> wrote:

> Thank you for the reply, Mark.
>
> Heaps are adjusted to 30GB (I liked round numbers :)).
>
> 50GB is a good max shard size to keep in mind, and I'll adjust index
> groupings as needed based on that.
>
> With regards to number of indexes, here is what I was thinking, and please
> tell me if I'm off base here.
>
> With all log files going to a single daily index, assume log file A is
> 45GB of data (in its own _type), and log file B is 5GB of data (in its own
> _type).  Searching for data in log file B is "penalized" in terms of search
> performance because ES loads terms from the index (based on some predictive
> algorithm).  Also the heap is "penalized" because it now has terms loaded
> from this large index that it probably will not need.
>
> If log file B is instead gathered into its own index, then it is both
> faster from a search performance perspective, and less pressure on the heap
> because there are far less terms loaded by ES.
>
> Maybe I'm incorrect in my assumptions though about how ES does its work,
> and all I *really* care about is raw index size?  Perhaps both the
> predictive term loading done by ES, and its search logic is savvy enough to
> restrict itself to the _type specified in the query?
>
> Thank you again for your help!  I'm getting a better understanding for
> sure. :)
> Chris
>
> On Tue, Jan 27, 2015 at 7:01 PM, Mark Walkom <markwal...@gmail.com> wrote:
>
>> Be aware that we do not yet officially support G1GC. You should also
>> reduce your heap to 31GB.
>>
>> Ideally you want to keep shard size below 50GB, so you will need to
>> adjust things as you grow. Be careful creating a lot of indices though,
>> each one takes overhead and if you increase the number of indices and the
>> amount of data you have in them you could be wasting resources.
>>
>> However when querying, 100 indices with 1 shard is the same as 1 index
>> with 100 shards.
>>
>> On 28 January 2015 at 10:11, Chris Neal <chris.n...@derbysoft.net> wrote:
>>
>>> Hi all,
>>>
>>> I've seen lots of posts about this, and want to make sure I'm
>>> understanding correctly.
>>>
>>> Background:
>>>
>>>    - Our cluster has 6 servers.  They are Dell R720xd with 64GB RAM,
>>>    2xE5-2600v2 CPU (2 sockets, 6 cores/socket), 16TB disk
>>>    - Elasticsearch is set to have 6 shards, and 1 replica, giving two
>>>    shards per server.  I'm giving ES 32GB heaps on Java 1.7 with G1 GC.
>>>
>>>
>>> I'm concerned about the size of our indexes.  Right now, we store all
>>> data in one index per day, with various types within that to separate data.
>>>
>>>
>>> The indexes are averaging about 50GB/day (not including replicas).
>>> Shard size is 8GB each.
>>>
>>> We have a LOT more data to index.  At least 20x more.  Should I be
>>> concerned with indexes of that size (~1000GB) and shards of that size
>>> (~160GB)?  Is it merely a question of having enough hardware, or is there
>>> more to it?
>>>
>>> I'm considering splitting the data into a different indexing strategy so
>>> that the index size is smaller, but there are more of them.  The result is
>>> the amount of data is the same, so I'm not sure if that will do anything or
>>> not.
>>>
>>> If I'm optimizing for searching, does querying multiple smaller indices
>>> perform better than querying fewer larger ones?
>>>
>>> Thank you for your time.
>>> Chris
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CAND3Dpgr78LJ%3DcWb0ZbyHZqMin4tDSVPvjG%3D_PYgsQym9EzZ%3Dg%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/elasticsearch/CAND3Dpgr78LJ%3DcWb0ZbyHZqMin4tDSVPvjG%3D_PYgsQym9EzZ%3Dg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9Tmoc20khrdn85eO%2B7eptq0SNGwUd1-6XfBoH0cs8-Hw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9Tmoc20khrdn85eO%2B7eptq0SNGwUd1-6XfBoH0cs8-Hw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAND3DpgDNsFquMJw2T7pOZMHhnimfYAHxH3iSnRnCqx_9k40-w%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAND3DpgDNsFquMJw2T7pOZMHhnimfYAHxH3iSnRnCqx_9k40-w%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9KH1j8vLr82i9rZDE3O1JC-h407L_vTkZQrCw8jYs%2Bqw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to