Hello

This is something I still struggle with, though not to the degree that I 
once did.  I've been in production for several months now with limited 
issues, though I still don't consider it to be a solved problem for myself, 
as it requires regular manual maintenance.

First, I threw more hardware at it.  I moved from two full time nodes to 
three.  This helped quite a bit, and I definitely needed it once more users 
started hitting the cluster and more data was cached (I also added more 
warmers once usage patterns become more clear).

Second, I've fine-tuned my sync process quite a bit to avoid unnecessary 
reindexing.

Third, since I'm running this cluster on EC2 instances, I just spin up more 
nodes when I need to clean things up, then drop the number after for normal 
use.  I had been moving to four, but now I sometimes add a fifth depending 
upon shard allocation - sometimes I seem to accumulate the most active 
shards on the same node, and I still run into memory issues.  I also drop 
the filter cache to almost nothing before I run the optimize/delete step.  
For the most part, this gets me through with minimal memory issues, though 
I'd be screwed if I had to do this during the day.  Also, there IS overhead 
to moving shards across nodes, and long query times when (I presume) the 
shards become active on the new nodes and any non-warmed fields are 
loaded.  

So, not a perfect solution by any means, but it's working.

Which version of ES are you on?  I'm still on 1.1.2, with plans to update 
soon, and am very much hoping that the update will help things become more 
hands-off.  The bloom filter being rendered unnecessary should free memory, 
plus general performance improvements, I can't remember them all offhand.  
Being able to actually update the merge settings dynamically will also be a 
bit help in testing various configs.

Hope something in there helps.  I'm definitely open to suggestions on ways 
to improve things.


On Tuesday, December 2, 2014 5:54:13 PM UTC-5, Govind Chandrasekhar wrote:
>
> Jonathan,
> Did you find a solution to this? I've been facing pretty much the same 
> issue since I've added nested documents to my index - delete percentage 
> goes really high and an explicit optimize leads to an OOM.
> Thanks.
>
> On Saturday, August 23, 2014 8:08:32 AM UTC-7, Jonathan Foy wrote:
>>
>> Hello
>>
>> I was a bit surprised to see the number of deleted docs grow so large, 
>> but I won't rule out my having something setup wrong.  Non-default merge 
>> settings are below, by all means let me know if I've done something stupid:
>>
>> indices.store.throttle.type: none
>> index.merge.policy.reclaim_deletes_weight: 6.0
>> index.merge.policy.max_merge_at_once: 5
>> index.merge.policy.segments_per_tier: 5
>> index.merge.policy.max_merged_segment: 2gb
>> index.merge.scheduler.max_thread_count: 3
>>
>> I make extensive use of nested documents, and to a smaller degree child 
>> docs.  Right now things are hovering around 15% deleted after a cleanup on 
>> Wednesday.  I've also cleaned up my mappings a lot since I saw the 45% 
>> deleted number (less redundant data, broke some things off into child docs 
>> to maintain separately), but it was up to 30% this last weekend.  When I've 
>> looked in the past when I saw the 40+% numbers, the segments in the largest 
>> tier (2 GB) would sometimes have up to 50+% deleted docs in them, the 
>> smaller segments all seemed pretty contained, which I guess makes sense as 
>> they didn't stick around for nearly as long.
>>
>> As for where the memory is spent, according to ElasticHQ, right now on 
>> one server I have a 20 GB heap (out of 30.5, which I know is above the 50% 
>> suggested, just trying to get things to work), I'm using 90% as follows:
>>
>> Field cache: 5.9 GB
>> Filter cache: 4.0 GB (I had reduced this before the last restart, but 
>> forgot to make the changes permanent.  I do use a lot of filters though, so 
>> would like to be able to use the cache).
>> ID cache: 3.5 GB
>>
>> Node stats "Segments: memory_in_bytes": 6.65 GB (I'm not exactly sure how 
>> this one contributes to the total heap number).
>>
>> As for the disk-based "doc values", I don't know how I have not come 
>> across them thus far, but that sounds quite promising.  I'm a little late 
>> in the game to be changing everything yet again, but it may be a good idea 
>> regardless, and is definitely something I'll read more about and consider 
>> going forward.  Thank you for bringing it to my attention.
>>
>> Anyway, my current plan, since I'm running in AWS and have the 
>> flexibility, is just to add another r3.xlarge node to the cluster over the 
>> weekend, try the deleted-doc purge, and then pull the node back out after 
>> moving all shards off of it.  I'm hoping this will allow me to clean things 
>> up with extra horsepower, but not increase costs too much throughout the 
>> week.
>>
>> Thanks for you input, it's very much appreciated.
>>
>>
>> On Friday, August 22, 2014 7:14:18 PM UTC-4, Adrien Grand wrote:
>>>
>>> Hi Jonathan,
>>>
>>> The default merge policy is already supposed to merge quite aggressively 
>>> segments that contain lots of deleted documents so it is a bit surprising 
>>> that you can see that many numbers of deleted documents, even with merge 
>>> throttling disabled.
>>>
>>> You mention having memory pressure because of the number of documents in 
>>> your index, do you know what causes this memory pressure? In case it is due 
>>> to field data maybe you could consider storing field data on disk? (what we 
>>> call "doc values")
>>>
>>>
>>>
>>> On Fri, Aug 22, 2014 at 5:27 AM, Jonathan Foy <the...@gmail.com> wrote:
>>>
>>>> Hello
>>>>
>>>> I'm in the process of putting a two-node Elasticsearch cluster (1.1.2) 
>>>> into production, but I'm having a bit of trouble keeping it stable enough 
>>>> for comfort.  Specifically, I'm trying to figure out the best way to keep 
>>>> the number of deleted documents under control.
>>>>
>>>> Both nodes are r3.xlarge EC2 instances (4 cores, 30.5 GB RAM).  The ES 
>>>> cluster mirrors the primary data store, a MySQL database.  Relevant 
>>>> updates 
>>>> to the database are caught via triggers which populate a table that's 
>>>> monitored by an indexing process.  This results in what I'd consider of 
>>>> lot 
>>>> of reindexing, any time the primary data is updated.  Search and indexing 
>>>> performance thus far has been in line with expectations when the number of 
>>>> deleted documents is small, but as it grows (up to 30-40%), the amount of 
>>>> available RAM becomes limited, ultimately causing memory problems.  If I 
>>>> optimize/purge deletes then things return to normal, though I usually end 
>>>> up having to restart at least one server if not both due to OOM problems 
>>>> and shard failures during optimization.  When ES becomes the source of all 
>>>> searches for the application, I can't really afford this downtime.
>>>>
>>>> What would be the preferred course of action here?  I do have a window 
>>>> over the weekend where I could work with somewhat reduced capacity;  I was 
>>>> thinking perhaps I could pull one node out of search rotation, optimize 
>>>> it, 
>>>> swap it with the other, optimize it, and then go on my way.  However, I 
>>>> don't know that I CAN pull one node out of rotation (it seems like the 
>>>> search API lets me specify a node, but nothing to say "Node X doesn't need 
>>>> any searches"), nor does it appear that I can optimize an index on one 
>>>> node 
>>>> without doing the same to the other.
>>>>
>>>> I've tried tweaking the merge settings to favour segments containing 
>>>> large numbers of deletions, but it doesn't seem to make enough of a 
>>>> difference.  I've also disabled merge throttling (I do have SSD-backed 
>>>> storage).  Is there any safe way to perform regular maintenance on the 
>>>> cluster, preferably one node at a time, without causing TOO many problems? 
>>>>  
>>>> Am I just trying to do too much with the hardware I have?
>>>>
>>>> Any advice is appreciated.  Let me know what info I left out that would 
>>>> help.
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>>
>>> -- 
>>> Adrien Grand
>>>  
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a0573e03-08c3-4b37-9375-57de4afedf2d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to