Re: Best cluster environment for search

Marcelo Paes Rech Wed, 11 Jun 2014 05:50:28 -0700

I have just created a issue:

https://github.com/elasticsearch/elasticsearch/issues/6463


Regards.

Em quinta-feira, 5 de junho de 2014 20h02min05s UTC-3, Mark Walkom escreveu:
>
> This would probably be worth raising as a github issue - 
> https://github.com/elasticsearch/
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com <javascript:>
> web: www.campaignmonitor.com
>  
>
> On 5 June 2014 22:38, Marcelo Paes Rech <marcelo...@gmail.com 
> <javascript:>> wrote:
>
>> Hi Jörg. Thanks for your reply again.
>>
>> As I said, I already had used ids filter, but I got the same behaviour. 
>>
>> I realized what was wrong. Maybe it could be a bug in ES or not. When I 
>> executed the filter I included "from" and "size" attibutes. In this case 
>> "size" was 999999, but the final result would be just 10 documents. 
>> Aparently ES pre-allocates the objects that I say I will use (maybe for 
>> performance reasons), but if the final result is not the total (999999), ES 
>> doesn't remove remaining pre-allocated objects until the memory (heap) is 
>> full.
>>
>> I changed the size attribute to 10 and heap became stable.
>>
>>
>> That's it. Thanks.
>>
>> Regards.
>>
>> Em quarta-feira, 4 de junho de 2014 19h54min15s UTC-3, Jörg Prante 
>> escreveu:
>>>
>>> Why do you use terms on _id field and not the the ids filter? ids filter 
>>> is more efficient since it reuses the _uid field which is cached by default.
>>>
>>> Do the terms in the query vary from query to query? If so, caching might 
>>> kill your heap. 
>>>
>>> Another possible issue is that your query is not distributed to all 
>>> shards, if the query does not vary from user to user in your test. If so, 
>>> you created a "hot spot", all the load from the 100 users wold go to a 
>>> limited number of node with a limited shard count. 
>>>
>>> The search thread pool seems small with 50 searches if you execute 
>>> searches for 100 users in parallel, this can lead to a congestion of the 
>>> search module. Why don't you use 100 (at least)?
>>>
>>> Jörg
>>>
>>>
>>> On Wed, Jun 4, 2014 at 2:40 PM, Marcelo Paes Rech <marcelo...@gmail.com> 
>>> wrote:
>>>
>>>> Hi Jörg. Thanks for your reply.
>>>>
>>>> Here is my filter.
>>>> {"filter":
>>>> {
>>>>   "terms" : {
>>>>     "_id" : [ "QSxrbEM8TKe5zr8931xBjA", "wj63ghegRwC6qLsWq2chkA", 
>>>> "hYEhDbAqQwSRxhYfvDgFkg", "4bZmPE1fTYqijphRyyWiuQ", 
>>>> "Fhq53yYyT3CEw6vclKu_NA", "XL2atBraTEyx57MefjFVhA", 
>>>> "951i0dZkT064FlQkzHnnWA", "O8Ixbir1TrGT_IA3wKfsHg", 
>>>> "8k4U7KsuTmsThqxy-5YaKw", "GNOoQTHglf22kzcE7EOf8g", 
>>>> "-RQeY48fTg2kYnh2M4E1cQ", "u8DGBdfVR9WRVj6d9E4Ebw", 
>>>> "WFHSXd7UQvCMYFBhFcTsng", "qnQ7q7FyTsg397lM1EWgqA", 
>>>> "wRQtUzdMRy2qOkMCNxdpgA", "Ll83iglxSUS_Gs7mjkMt8w", 
>>>> "d2sxZ1oBTfuvAfov5EJ0iw", "cyht-vB4Q-mMSg9N5jcGXg", 
>>>> "bNSVaO47QTOCkfJhWo0qjg", "BHuhm55IRerKnynJ8WgFTw", 
>>>> "fHKA4PF2QteWm8E7dW7CAw", "DLE6A7tyQJ-zcKcCa6IPSA", 
>>>> "qfelTW7-SuGRQ0GKbngARA", "R7VHHJhYsUqfuxYof8BJ8w", 
>>>> "W4PqiJfPSlSFjVKFsGkA4Q", "Juq62zOsRdheuW3O6Gb2KA", 
>>>> "U9v0IKj_RrgRNjE31ZTt2g", "uNHa0kOOT5qjPpzxZcs35A", 
>>>> "SwOgVNgIRwyVU3pEEycBuQ", "LaEpxFGIQgCArsNZ2rd4Pw", 
>>>> "CiJ9gouZsbmTtxTWx7w6lA", "TaQV_I01RfCq3B6uAtIBoQ", 
>>>> "9Jpjo5k-RlGfLVLF6nDgze", "57YpjRdASsrrae-RD3spog", 
>>>> "bmA4EWFSTiKUaDzaNcCFKQ", "Fui9z_UbRe6AY1VhAr8Crw", 
>>>> "2PORr5BzSDOmBXgmQkO5Zg", "snfwTmtuTv-uj5mOWSJpgA", 
>>>> "0nHIrtePSaeW8aWArh_Mrg", "s0g9QHnjTgWX3rCIu1g0Hg", 
>>>> "Jl67fACuQvCFgZxXAFtDOg" ],
>>>>     "_cache" : true,
>>>>     "_cache_key" : "my_terms_cache"
>>>>   }
>>>> }
>>>> }
>>>>
>>>> I already used "*ids filter*" but I got same behaviour. One thing that 
>>>> I realized is that one of the cluster's nodes is increasing the Search 
>>>> Thread Pool (something like Queue: 50 and Count: 47) and the others don't 
>>>> (something like Queue: 0 and Count: 1). If I remove this node from the 
>>>> cluster another one starts with the same problem.
>>>>
>>>> My current environment is:
>>>> - 7 Data nodes with 16Gb (8Gb for ES)and 8 cores each one;
>>>> - 4 Load balancer Nodes (no data, no master)  with 4Gb (3Gb for ES) and 
>>>> 8 cores each one;
>>>> - 4 MasterNodes (only master, no data)  with 4Gb (3Gb for ES) and 8 
>>>> cores each one;
>>>> - Thread Pool Search 47 (the others are standard config);
>>>> - 7 Shards and 2 replicas Index;
>>>> - 14.6Gb Index size (14.524.273 documents);
>>>>
>>>>
>>>> I'm executing this filter with 50 concurrent users.
>>>>
>>>>
>>>> Regards
>>>>
>>>> Em terça-feira, 3 de junho de 2014 20h33min45s UTC-3, Jörg Prante 
>>>> escreveu:
>>>>>
>>>>> Can you show your test code?
>>>>>
>>>>> You seem to look at the wrong settings - by adjusting node number, 
>>>>> shard number, replica number alone, you can not find out the maximum node 
>>>>> performance. E.g. concurrency settings, index optimizations, query 
>>>>> optimizations, thread pooling, and most of all, fast disk subsystem I/O 
>>>>> is 
>>>>> important.
>>>>>
>>>>> Jörg
>>>>>
>>>>>
>>>>> On Wed, Jun 4, 2014 at 12:18 AM, Marcelo Paes Rech <
>>>>> marcelo...@gmail.com> wrote:
>>>>>
>>>>>> Thanks for your reply Nikolas. It helps a lot.
>>>>>>
>>>>>> And about the quantity of documents of each shard, or size of each 
>>>>>> shard. And the need of no data nodes or only master nodes. When is it 
>>>>>> necessary?
>>>>>>
>>>>>> Some tests I did, when I increased request's number (like 100 users 
>>>>>> at same moment, and redo it again and again), 5 nodes with 1 shard and 2 
>>>>>> replicas each and 16Gb RAM (8Gb for ES and 8Gb for OS) weren't enough. 
>>>>>> The 
>>>>>> response time start to increase more than 5s (I think less than 1s,  in 
>>>>>> this case, would be acceptable) .
>>>>>>
>>>>>> This test has a lot of documents (something like 14 millions).
>>>>>>
>>>>>>
>>>>>> Thanks. Regards.
>>>>>>
>>>>>> Em segunda-feira, 2 de junho de 2014 17h09min04s UTC-3, Nikolas 
>>>>>> Everett escreveu:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 2, 2014 at 3:52 PM, Marcelo Paes Rech <
>>>>>>> marcelo...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi guys,
>>>>>>>>
>>>>>>>> I'm looking for an article or a guide for the best cluster 
>>>>>>>> configuration. I read a lot of articles like "change this 
>>>>>>>> configuration" 
>>>>>>>> and "you must create X shards per node" but I didn't saw nothing like 
>>>>>>>> ElasticSearch Official guide for creating a cluster.
>>>>>>>>
>>>>>>>> What I would like to know are informations like. 
>>>>>>>> - How to calculate how many shards will be good for the cluster.
>>>>>>>> - How many shards do we need per node? And if this is variable, how 
>>>>>>>> do I calculate this?
>>>>>>>> - How much memory do I need per node and how many nodes?
>>>>>>>>
>>>>>>>> I think ElasticSearch is well documentated. But it is very 
>>>>>>>> fragmented.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> For some of these that is because "it depends" is the answer.  For 
>>>>>>> example, you'll want larger heaps for aggregations and faceting.
>>>>>>>
>>>>>>> There are some rules of thumb:
>>>>>>> 1.  Set Elasticsearch's heap memory to 1/2 of ram but not more then 
>>>>>>> 30GB.  Bigger then that and the JVM can't do pointer compression and 
>>>>>>> you 
>>>>>>> effectively lose ram.
>>>>>>> 2.  #1 implies that having much more then 60GB of ram on each node 
>>>>>>> doesn't make a big difference.  It helps but its not really as good as 
>>>>>>> having more nodes.
>>>>>>> 3.  The most efficient efficient way of sharding is likely one shard 
>>>>>>> on each node.  So if you have 9 nodes and a replication factor of 2 (so 
>>>>>>> 3 
>>>>>>> total copies) then 3 shards is likely to be more efficient then having 
>>>>>>> 2 or 
>>>>>>> 4.  But this only really matters when those shards get lots of traffic. 
>>>>>>>  
>>>>>>> And it breaks down a bit when you get lots of nodes.  And the in 
>>>>>>> presence 
>>>>>>> of routing.  Its complicated.
>>>>>>>
>>>>>>> But these are really just starting points, safe-ish defaults.
>>>>>>>
>>>>>>> Nik
>>>>>>>
>>>>>>  -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/94b8ecf9-efc4-4046-a862-63b670ccc23e%40goo
>>>>>> glegroups.com 
>>>>>> <https://groups.google.com/d/msgid/elasticsearch/94b8ecf9-efc4-4046-a862-63b670ccc23e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/d32487cb-db92-4b7a-b6b3-afd431beaf61%
>>>> 40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/d32487cb-db92-4b7a-b6b3-afd431beaf61%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/6f2f0e11-dd3d-4fc5-8adb-6eccddf83640%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/6f2f0e11-dd3d-4fc5-8adb-6eccddf83640%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c727d421-6426-4c52-b3ce-f8533e3dc68b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Best cluster environment for search

Reply via email to