Re: powerful cluster is not able to handle 1.5Tb of data, how to optimize?

Mark Walkom Fri, 12 Sep 2014 02:11:49 -0700

The answer is it depends on what sort of use case you have.
But if you are experiencing problems like you are then usually it's due to
the cluster being at capacity and needing more resources.


You may find it cheaper to move to more numerous and smaller nodes that you
can distribute the load across, as that is where ES excels and also how
many other big data platforms operate.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 12 September 2014 19:01, Pavel P <pa...@kredito.de> wrote:

> Java version is "1.7.0_55"
> Elasticsearch is 1.3.1
>
> Well, the cost of the whole setup is the question.
> currently it's something about 1000$ per month on AWS. Do we really need
> to pay a lot more then 1000$/month to support the 1.5Tb data?
>
> Could you briefly describe how much nodes do you expect to handle that
> much of data?
>
> The side question is, how the the really Big Data solution works, when
> they do the search or aggregation from the data which size is far more then
> 1.5Tb? Or it's as well is the size of the architecture.
>
> Regards,
>
> On Friday, September 12, 2014 11:53:35 AM UTC+3, Mark Walkom wrote:
>>
>> That's a lot of data for 3 nodes!
>> You really need to adjust your infrastructure; add more nodes, more ram,
>> or alternatively remove some old indexes (delete or close).
>>
>> What ES and java version are you running?
>>
>> Regards,
>> Mark Walkom
>>
>> Infrastructure Engineer
>> Campaign Monitor
>> email: ma...@campaignmonitor.com
>> web: www.campaignmonitor.com
>>
>>
>> On 12 September 2014 18:48, Pavel P <pa...@kredito.de> wrote:
>>
>>> Hi,
>>>
>>> Again I have an issue with the power of the cluster.
>>>
>>> I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb disk
>>> attached.
>>>
>>>
>>> <https://lh4.googleusercontent.com/-W1AVatn9Cq0/VBKzYgR3QKI/AAAAAAAAAJc/S3TWMBqqqX0/s1600/ES_cluster.png>
>>>
>>>
>>> There are 1323957069 docs (1.64TB) there, the documents distribution is
>>> the next:
>>>
>>>
>>> <https://lh5.googleusercontent.com/-kjlQG7xBfIw/VBKwCt8sKQI/AAAAAAAAAJQ/s8kuqouFUkQ/s1600/Screen%2BShot%2B2014-09-12%2Bat%2B11.33.49%2BAM.png>
>>>
>>> All the 3 nodes are data nodes.
>>>
>>> The index throughput is something about 10-20k documents per minute.
>>> (it's the logstash -> elasticsearch setup, we store different logs in the
>>> cluster)
>>>
>>> My concerns are the next:
>>>
>>> 1. When I load the index page of kibana - the loading of the document
>>> types panel takes about a minute. It that ok?
>>> 2. For the document type user_account, when I try to build the terms
>>> panel for the field "message.raw" (the string of 20-30 characters). My
>>> cluster stucks.
>>> In the logs I can find the next
>>>
>>> [2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker] [morbius]
>>>> New used memory 6499531395 [6gb] from field [message.raw] would be larger
>>>> than configured breaker: 6414558822 [5.9gb], breaking
>>>
>>>
>>> But, despite of the breakers, when it tries to calculate that terms pie,
>>> it stops indexing the input documents. The queue goes up. Then, it happens
>>> that I see the heap exceptions and to solve them the only thing I could do
>>> is to reboot the cluster.
>>>
>>> *My question is the next:*
>>>
>>> It looks like I have quite powerful servers and the correct
>>> configuration (my ES_HEAP_SIZE is set to 15g), while they are still not
>>> able to process the 1.5Tb of information or doing that quite slowly.
>>> Do you have any advice of how to overcome that and make my cluster to
>>> response more fast? How should I adjust the infrastructure?
>>>
>>> Which hardware should I need to manipulate the 1.5Tb in the reasonable
>>> amount of time?
>>>
>>> Any thoughts are welcome.
>>>
>>> Regards,
>>>
>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624b2svEQR-SQvgH9hNYkRi8AoPVswLGFv%2BLaPU3Wrw301w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: powerful cluster is not able to handle 1.5Tb of data, how to optimize?

Reply via email to