Re: failing to understand the issues with transparent huge paging

Tom Lee Wed, 23 Aug 2017 10:51:53 -0700

Peter, just want to say I've also seen very similar behavior with JVM heap
sizes ~16GB. I feel like I've seen multiple "failure" modes with THP, but
most alarmingly we observed brief system-wide lockups in some cases,
similar to those described in: https://access.redhat.com/solutions/1560893.
(Don't quite recall if we saw that exact "soft lockup" message, but do
recall something similar -- and around the time we saw that message we also
observed gaps in the output of a separate shell script that was
periodically writing a message to a file every 5 seconds.)


I'm probably just scarred from the experience, but to me the question of
whether to leave THP=always in such environments feels more like "do I want
to gamble on this pathological behavior occurring?" than some dial for fine
tuning performance. Maybe it's better in more recent RHEL kernels, but
never really had a reason to roll the dice on it.

(This shouldn't scare folks off [non-transparent] hugepages entirely though
-- had much better results with those.)

On Wed, Aug 23, 2017 at 3:52 AM, Peter Booth <pboot...@gmail.com> wrote:

>
> Some points:
>
> Those of us working in large corporate settings are likely to be running
> close to vanilla RHEL 7.3 or 6.9 with kernel versions 3.10.0-514 or 2.6.32-696
> respectively.
>
>  I have seen the THP issue first hand in a dramatic fashion. One Java
> trading application I supported ran with heaps that ranged from 32GB to
> 64GB,
> running on Azul Zing, with no appreciable GC pauses. It was migrated from
> Westmere hardware on RHEL 5.6 to (faster) Ivy Bridge hardware on RHEL 6.4.
> In non-production environments only, the application suddenly began
> showing occasional pauses of upto a few seconds. Occasional meaning only
> four or five out of 30 instances showed a pause, and they might only have
> one or two or three pauses in a day. These instances ran a workload that
> replicated a production workload.  I noticed that the only difference
> between these hosts and the healthy production hosts was that, due to human
> error,
> THP was disabled on the production hosts but not the non-prod hosts. As
> soon as we disabled THP on the non-prod hosts the pauses disappeared.
>
> This was a reactive discovery - I haven't done any proactive investigation
> of the effects of THP. This was sufficient for me to rule it out for today.
>
>
>
>
> On Sunday, August 20, 2017 at 10:32:45 AM UTC-4, Alexandr Nikitin wrote:
>>
>> Thank you for the feedback! Appreciate it. Yes, you are right. The
>> intention was not to show that THP is an awesome feature but to share
>> techniques to measure and control risks. I made the changes
>> <https://github.com/alexandrnikitin/blog/compare/2139c405f0c50a3ab907fb2530421bf352caa412...3e58094386b14d19e06752d9faa0435be2cbe651>to
>> highlight the purpose and risks.
>>
>> The experiment is indeed interesting. I believe the "defer" option should
>> help in that environment. I'm really keen to try the latest kernel (related
>> not only to THP).
>>
>> *Frankly, I still don't have strong opinion about huge latency spikes in
>> allocation path in general. I'm not sure whether it's a THP issue or
>> application/environment itself. Likely it's high memory pressure in general
>> that causes spikes. Or the root of the issues is in something else, e.g.
>> the jemalloc case.*
>>
>>
>> On Friday, August 18, 2017 at 6:32:40 PM UTC+3, Gil Tene wrote:
>>>
>>> This is very well written and quite detailed. It has all the makings of
>>> a great post I'd point people to. However, as currently stated, I'd worry
>>> that it would (mis)lead readers into using THP with "always"
>>> /sys/kernel/mm/transparent_hugepage/defrag settings (instead of
>>> "defer"), and/or on older (pre-4.6) kernels with a false sense that the
>>> many-msec slow path allocation latency problems many people warn about
>>> don't actually exist. You do link to the discussions on the subject, but
>>> the measurements and summary conclusion of the posting alone would not end
>>> up warning people who don't actually follow those links.
>>>
>>> I assume your intention is not to have the reader conclude that "there
>>> is lots of advise out there telling you to turn off THP, and it is wrong.
>>> Turning it on is perfectly safe, and may significantly speed up your
>>> application", but are instead are aiming for something like "THP used to be
>>> problematic enough to cause wide ranging recommendations to simply turn it
>>> off, but this has changed with recent Linux kernels. It is now safe to use
>>> in widely applicable ways (will th the right settings) and can really help
>>> application performance without risking huge stalls". Unfortunately, I
>>> think that many readers would understand the current text as the former,
>>> not the latter.
>>>
>>> Here is what I'd change to improve on the current text:
>>>
>>> 1. Highlight the risk of high slow path allocation latencies with the
>>> "always" (and even "madvise") setting in /sys/kernel/mm/transparent_
>>> hugepage/defrag, the fact that the "defer" option is intended to
>>> address those risks, and this defer option is available with Linux kernel
>>> versions 4.6 or later.
>>>
>>> 2. Create an environment that would actually demonstrate these very high
>>> (many msec or worse) latencies in the allocation slow path with defrag set
>>> to "always". This is the part that will probably take some extra work, but
>>> it will also be a very valuable contribution. The issues are so widely
>>> reported (into the 100s of msec or more, and with a wide verity of
>>> workloads as your links show) that intentional reproduction *should* be
>>> possible. And being able to demonstrate it actually happening will also
>>> allow you to demonstrate how newer kernels address it with the defer
>>> setting.
>>>
>>> 3. Show how changing the defrag setting to "defer" removes the high
>>> latencies seen by the allocation slow path under the same conditions.
>>>
>>> For (2) above, I'd look to induce a situation where the allocation slow
>>> path can't find a free 2MB page without having to defragment one directly.
>>> E.g.
>>> - I'd start by significantly slowing down the background defragmentation
>>> in khugepaged (e.g set /sys/kernel/mm/transparent
>>> _hugepage/khugepaged/scan_sleep_millisecs to 3600000). I'd avoid
>>> turning it off completely in order to make sure you are still measuring the
>>> system in a configuration that believes it does background defragmentation.
>>> - I'd add some static physical memory pressure (e.g. allocate and touch
>>> a bunch of anonymous memory in a process that would just sit on it) such
>>> that the system would only have 2-3GB free for buffers and your netty
>>> workload's heap. A sleeping jvm launched with an empirically sized and big
>>> enough -Xmx and -Xms and with AlwaysPretouch on is an easy way to do that.
>>> - I'd then create an intentional and spiky fragmentation load (e.g.
>>> perform spikes of a scanning through a 20GB file every minute or so).
>>> - with all that in place, I'd then repeatedly launch and run your Netty
>>> workload without the PreTouch flag, in order to try to induce situations
>>> where an on-demand allocated 2MB heap page hits the slow path, and the
>>> effect shows up in your netty latency measurements.
>>>
>>> All the above are obviously experimentation starting points, and may
>>> take some iteration to actually induce the demonstrated high latencies we
>>> are looking for. But once you are able to demonstrate the impact of
>>> on-demand allocation doing direct (synchronous) compaction both in your
>>> application latency measurement and in your kernel tracing data, you would
>>> then be able to try the same experiment with the defrag setting set to
>>> "defer" to show how newer kernels and this new setting now make it safe (or
>>> at least much more safe) to use THP. And with that actually demonstrated,
>>> everything about THP recommendations for freeze-averse applications can
>>> change, making for a really great posting.
>>>
>>> Sent from my iPad
>>>
>>> On Aug 18, 2017, at 3:00 AM, Alexandr Nikitin <nikitin.a...@gmail.com>
>>> wrote:
>>>
>>> I decided to write a post about measuring the performance impact
>>> (otherwise it stays in my messy notes forever)
>>> Any feedback is appreciated.
>>> https://alexandrnikitin.github.io/blog/transparent-hugepages
>>> -measuring-the-performance-impact/
>>>
>>> On Saturday, August 12, 2017 at 1:01:31 PM UTC+3, Alexandr Nikitin
>>> wrote:
>>>>
>>>> I played with Transparent Hugepages some time ago and I want to share
>>>> some numbers based on real world high-load applications.
>>>> We have a JVM application: high-load tcp server based on netty. No
>>>> clear bottleneck, CPU, memory and network are equally highly loaded. The
>>>> amount of work depends on request content.
>>>> The following numbers are based on normal server load ~40% of maximum
>>>> number of requests one server can handle.
>>>>
>>>> *When THP is off:*
>>>> End-to-end application latency in microseconds:
>>>> "p50" : 718.891,
>>>> "p95" : 4110.26,
>>>> "p99" : 7503.938,
>>>> "p999" : 15564.827,
>>>>
>>>> perf stat -e dTLB-load-misses,iTLB-load-misses -p PID -I 1000
>>>> ...
>>>> ...         25,164,369      iTLB-load-misses
>>>> ...         81,154,170      dTLB-load-misses
>>>> ...
>>>>
>>>> *When THP is always on:*
>>>> End-to-end application latency in microseconds:
>>>> "p50" : 601.196,
>>>> "p95" : 3260.494,
>>>> "p99" : 7104.526,
>>>> "p999" : 11872.642,
>>>>
>>>> perf stat -e dTLB-load-misses,iTLB-load-misses -p PID -I 1000
>>>> ...
>>>> ...    21,400,513      dTLB-load-misses
>>>> ...      4,633,644      iTLB-load-misses
>>>> ...
>>>>
>>>> As you can see THP performance impact is measurable and too significant
>>>> to ignore. 4.1 ms vs 3.2 ms 99%% and 100M vs 25M TLB misses.
>>>> I also used SytemTap to measure few kernel functions like
>>>> collapse_huge_page, clear_huge_page, split_huge_page. There were no
>>>> significant spikes using THP.
>>>> AFAIR that was 3.10 kernel which is 4 years old now. I can repeat
>>>> experiments with the newer kernels if there's interest. (I don't know what
>>>> was changed there though)
>>>>
>>>>
>>>> On Monday, August 7, 2017 at 6:42:21 PM UTC+3, Peter Veentjer wrote:
>>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> I'm failing to understand the problem with transparent huge pages.
>>>>>
>>>>> I 'understand' how normal pages work. A page is typically 4kb in a
>>>>> virtual address space; each process has its own.
>>>>>
>>>>> I understand how the TLB fits in; a cache providing a mapping of
>>>>> virtual to real addresses to speed up address conversion.
>>>>>
>>>>> I understand that using a large page e.g. 2mb instead of a 4kb page
>>>>> can reduce pressure on the TLB.
>>>>>
>>>>> So till so far it looks like huge large pages makes a lot of sense; of
>>>>> course at the expensive of wasting memory if only a small section of a 
>>>>> page
>>>>> is being used.
>>>>>
>>>>> The first part I don't understand is: why is it called transparent
>>>>> huge pages? So what is transparent about it?
>>>>>
>>>>> The second part I'm failing to understand is: why can it cause
>>>>> problems? There are quite a few applications that recommend disabling THP
>>>>> and I recently helped a customer that was helped by disabling it. It seems
>>>>> there is more going on behind the scene's than having an increased page
>>>>> size. Is it caused due to fragmentation? So if a new page is needed and
>>>>> memory is fragmented (due to smaller pages); that small-pages need to be
>>>>> compacted before a new huge page can be allocated? But if this would be 
>>>>> the
>>>>> only thing; this shouldn't be a problem once all pages for the application
>>>>> have been touched and all pages are retained.
>>>>>
>>>>> So I'm probably missing something simple.
>>>>>
>>>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "mechanical-sympathy" group.
>>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>>> pic/mechanical-sympathy/sljzehnCNZU/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> mechanical-sympathy+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mechanical-sympathy+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
*Tom Lee */ http://tomlee.co / @tglee <http://twitter.com/tglee>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: failing to understand the issues with transparent huge paging

Reply via email to