Interesting...does the very large max_merged_segment not result in memory 
issues when the largest segments are merged?  When I run my the cleanup 
command (_optimize?only_expunge_deletes) I see a steep spike in memor as 
each merge is completing, followed by an immediate drop, presumably as the 
new segment is fully initialized and then the old ones are subsequently 
dropped.  I'd be worried that I'd run out of memory when initializing the 
larger segments.  That being said, I only notice the large spikes when 
merging via the explicit optimize/only_expunge_deletes command, the 
continuous merging throughout the day results in very mild spikes by 
comparison.  

I guess I could always add a single node with the higher settings and just 
drop it if it becomes problematic in order to test (since, though dynamic, 
prior to 1.4 the merge settings only take effect on shard initialization if 
I remember correctly).

Thanks for the advice though, I'll definitely try that.

Jonathan


On Tuesday, December 2, 2014 11:30:08 PM UTC-5, Nikolas Everett wrote:
>
> I've had some issues with high IO exacerbated by lots of deleted docs as 
> well.  I'd get deleted docs in the 30%-40% range on some indexes.  We 
> attacked the problem in two ways:
> 1.  Hardware.  More ram and better SSDs really really helped.  No consumer 
> grade SSDs for me.
> 2.  Tweak some merge settings:
> The most important setting is index.merge.policy.max_merged_segment.  You 
> never want your settings to get near that size so set it to 30GB or 
> something stupid huge.  The way the merge policy works segments near 
> max_merged_segment in size will end up with tons and tons of deletes before 
> they are considered for merging and even then the merge policy will still 
> shy away from merging them.
> I raised reclaim_deletes_weight slightly (2.5 or 3 or so) and lowered 
> segments_per_tier and max_merge_at_once to get slightly better search 
> performance.  These were likely less important.
>
> I hope that helps some!
>
> Nik
>
>
> On Tue, Dec 2, 2014 at 9:38 PM, Jonathan Foy <the...@gmail.com 
> <javascript:>> wrote:
>
>> Hello
>>
>> This is something I still struggle with, though not to the degree that I 
>> once did.  I've been in production for several months now with limited 
>> issues, though I still don't consider it to be a solved problem for myself, 
>> as it requires regular manual maintenance.
>>
>> First, I threw more hardware at it.  I moved from two full time nodes to 
>> three.  This helped quite a bit, and I definitely needed it once more users 
>> started hitting the cluster and more data was cached (I also added more 
>> warmers once usage patterns become more clear).
>>
>> Second, I've fine-tuned my sync process quite a bit to avoid unnecessary 
>> reindexing.
>>
>> Third, since I'm running this cluster on EC2 instances, I just spin up 
>> more nodes when I need to clean things up, then drop the number after for 
>> normal use.  I had been moving to four, but now I sometimes add a fifth 
>> depending upon shard allocation - sometimes I seem to accumulate the most 
>> active shards on the same node, and I still run into memory issues.  I also 
>> drop the filter cache to almost nothing before I run the optimize/delete 
>> step.  For the most part, this gets me through with minimal memory issues, 
>> though I'd be screwed if I had to do this during the day.  Also, there IS 
>> overhead to moving shards across nodes, and long query times when (I 
>> presume) the shards become active on the new nodes and any non-warmed 
>> fields are loaded.  
>>
>> So, not a perfect solution by any means, but it's working.
>>
>> Which version of ES are you on?  I'm still on 1.1.2, with plans to update 
>> soon, and am very much hoping that the update will help things become more 
>> hands-off.  The bloom filter being rendered unnecessary should free memory, 
>> plus general performance improvements, I can't remember them all offhand.  
>> Being able to actually update the merge settings dynamically will also be a 
>> bit help in testing various configs.
>>
>> Hope something in there helps.  I'm definitely open to suggestions on 
>> ways to improve things.
>>
>>
>>
>> On Tuesday, December 2, 2014 5:54:13 PM UTC-5, Govind Chandrasekhar wrote:
>>>
>>> Jonathan,
>>> Did you find a solution to this? I've been facing pretty much the same 
>>> issue since I've added nested documents to my index - delete percentage 
>>> goes really high and an explicit optimize leads to an OOM.
>>> Thanks.
>>>
>>> On Saturday, August 23, 2014 8:08:32 AM UTC-7, Jonathan Foy wrote:
>>>>
>>>> Hello
>>>>
>>>> I was a bit surprised to see the number of deleted docs grow so large, 
>>>> but I won't rule out my having something setup wrong.  Non-default merge 
>>>> settings are below, by all means let me know if I've done something stupid:
>>>>
>>>> indices.store.throttle.type: none
>>>> index.merge.policy.reclaim_deletes_weight: 6.0
>>>> index.merge.policy.max_merge_at_once: 5
>>>> index.merge.policy.segments_per_tier: 5
>>>> index.merge.policy.max_merged_segment: 2gb
>>>> index.merge.scheduler.max_thread_count: 3
>>>>
>>>> I make extensive use of nested documents, and to a smaller degree child 
>>>> docs.  Right now things are hovering around 15% deleted after a cleanup on 
>>>> Wednesday.  I've also cleaned up my mappings a lot since I saw the 45% 
>>>> deleted number (less redundant data, broke some things off into child docs 
>>>> to maintain separately), but it was up to 30% this last weekend.  When 
>>>> I've 
>>>> looked in the past when I saw the 40+% numbers, the segments in the 
>>>> largest 
>>>> tier (2 GB) would sometimes have up to 50+% deleted docs in them, the 
>>>> smaller segments all seemed pretty contained, which I guess makes sense as 
>>>> they didn't stick around for nearly as long.
>>>>
>>>> As for where the memory is spent, according to ElasticHQ, right now on 
>>>> one server I have a 20 GB heap (out of 30.5, which I know is above the 50% 
>>>> suggested, just trying to get things to work), I'm using 90% as follows:
>>>>
>>>> Field cache: 5.9 GB
>>>> Filter cache: 4.0 GB (I had reduced this before the last restart, but 
>>>> forgot to make the changes permanent.  I do use a lot of filters though, 
>>>> so 
>>>> would like to be able to use the cache).
>>>> ID cache: 3.5 GB
>>>>
>>>> Node stats "Segments: memory_in_bytes": 6.65 GB (I'm not exactly sure 
>>>> how this one contributes to the total heap number).
>>>>
>>>> As for the disk-based "doc values", I don't know how I have not come 
>>>> across them thus far, but that sounds quite promising.  I'm a little late 
>>>> in the game to be changing everything yet again, but it may be a good idea 
>>>> regardless, and is definitely something I'll read more about and consider 
>>>> going forward.  Thank you for bringing it to my attention.
>>>>
>>>> Anyway, my current plan, since I'm running in AWS and have the 
>>>> flexibility, is just to add another r3.xlarge node to the cluster over the 
>>>> weekend, try the deleted-doc purge, and then pull the node back out after 
>>>> moving all shards off of it.  I'm hoping this will allow me to clean 
>>>> things 
>>>> up with extra horsepower, but not increase costs too much throughout the 
>>>> week.
>>>>
>>>> Thanks for you input, it's very much appreciated.
>>>>
>>>>
>>>> On Friday, August 22, 2014 7:14:18 PM UTC-4, Adrien Grand wrote:
>>>>>
>>>>> Hi Jonathan,
>>>>>
>>>>> The default merge policy is already supposed to merge quite 
>>>>> aggressively segments that contain lots of deleted documents so it is a 
>>>>> bit 
>>>>> surprising that you can see that many numbers of deleted documents, even 
>>>>> with merge throttling disabled.
>>>>>
>>>>> You mention having memory pressure because of the number of documents 
>>>>> in your index, do you know what causes this memory pressure? In case it 
>>>>> is 
>>>>> due to field data maybe you could consider storing field data on disk? 
>>>>> (what we call "doc values")
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 22, 2014 at 5:27 AM, Jonathan Foy <the...@gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> Hello
>>>>>>
>>>>>> I'm in the process of putting a two-node Elasticsearch cluster 
>>>>>> (1.1.2) into production, but I'm having a bit of trouble keeping it 
>>>>>> stable 
>>>>>> enough for comfort.  Specifically, I'm trying to figure out the best way 
>>>>>> to 
>>>>>> keep the number of deleted documents under control.
>>>>>>
>>>>>> Both nodes are r3.xlarge EC2 instances (4 cores, 30.5 GB RAM).  The 
>>>>>> ES cluster mirrors the primary data store, a MySQL database.  Relevant 
>>>>>> updates to the database are caught via triggers which populate a table 
>>>>>> that's monitored by an indexing process.  This results in what I'd 
>>>>>> consider 
>>>>>> of lot of reindexing, any time the primary data is updated.  Search and 
>>>>>> indexing performance thus far has been in line with expectations when 
>>>>>> the 
>>>>>> number of deleted documents is small, but as it grows (up to 30-40%), 
>>>>>> the 
>>>>>> amount of available RAM becomes limited, ultimately causing memory 
>>>>>> problems.  If I optimize/purge deletes then things return to normal, 
>>>>>> though 
>>>>>> I usually end up having to restart at least one server if not both due 
>>>>>> to 
>>>>>> OOM problems and shard failures during optimization.  When ES becomes 
>>>>>> the 
>>>>>> source of all searches for the application, I can't really afford this 
>>>>>> downtime.
>>>>>>
>>>>>> What would be the preferred course of action here?  I do have a 
>>>>>> window over the weekend where I could work with somewhat reduced 
>>>>>> capacity;  
>>>>>> I was thinking perhaps I could pull one node out of search rotation, 
>>>>>> optimize it, swap it with the other, optimize it, and then go on my way. 
>>>>>>  
>>>>>> However, I don't know that I CAN pull one node out of rotation (it seems 
>>>>>> like the search API lets me specify a node, but nothing to say "Node X 
>>>>>> doesn't need any searches"), nor does it appear that I can optimize an 
>>>>>> index on one node without doing the same to the other.
>>>>>>
>>>>>> I've tried tweaking the merge settings to favour segments containing 
>>>>>> large numbers of deletions, but it doesn't seem to make enough of a 
>>>>>> difference.  I've also disabled merge throttling (I do have SSD-backed 
>>>>>> storage).  Is there any safe way to perform regular maintenance on the 
>>>>>> cluster, preferably one node at a time, without causing TOO many 
>>>>>> problems?  
>>>>>> Am I just trying to do too much with the hardware I have?
>>>>>>
>>>>>> Any advice is appreciated.  Let me know what info I left out that 
>>>>>> would help.
>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%
>>>>>> 40googlegroups.com 
>>>>>> <https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Adrien Grand
>>>>>  
>>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a0573e03-08c3-4b37-9375-57de4afedf2d%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/a0573e03-08c3-4b37-9375-57de4afedf2d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dc824142-8252-4a3f-be13-8cfac9097079%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to