Peter, just want to say I've also seen very similar behavior with JVM heap sizes ~16GB. I feel like I've seen multiple "failure" modes with THP, but most alarmingly we observed brief system-wide lockups in some cases, similar to those described in: https://access.redhat.com/solutions/1560893. (Don't quite recall if we saw that exact "soft lockup" message, but do recall something similar -- and around the time we saw that message we also observed gaps in the output of a separate shell script that was periodically writing a message to a file every 5 seconds.)
I'm probably just scarred from the experience, but to me the question of whether to leave THP=always in such environments feels more like "do I want to gamble on this pathological behavior occurring?" than some dial for fine tuning performance. Maybe it's better in more recent RHEL kernels, but never really had a reason to roll the dice on it. (This shouldn't scare folks off [non-transparent] hugepages entirely though -- had much better results with those.) On Wed, Aug 23, 2017 at 3:52 AM, Peter Booth <pboot...@gmail.com> wrote: > > Some points: > > Those of us working in large corporate settings are likely to be running > close to vanilla RHEL 7.3 or 6.9 with kernel versions 3.10.0-514 or 2.6.32-696 > respectively. > > I have seen the THP issue first hand in a dramatic fashion. One Java > trading application I supported ran with heaps that ranged from 32GB to > 64GB, > running on Azul Zing, with no appreciable GC pauses. It was migrated from > Westmere hardware on RHEL 5.6 to (faster) Ivy Bridge hardware on RHEL 6.4. > In non-production environments only, the application suddenly began > showing occasional pauses of upto a few seconds. Occasional meaning only > four or five out of 30 instances showed a pause, and they might only have > one or two or three pauses in a day. These instances ran a workload that > replicated a production workload. I noticed that the only difference > between these hosts and the healthy production hosts was that, due to human > error, > THP was disabled on the production hosts but not the non-prod hosts. As > soon as we disabled THP on the non-prod hosts the pauses disappeared. > > This was a reactive discovery - I haven't done any proactive investigation > of the effects of THP. This was sufficient for me to rule it out for today. > > > > > On Sunday, August 20, 2017 at 10:32:45 AM UTC-4, Alexandr Nikitin wrote: >> >> Thank you for the feedback! Appreciate it. Yes, you are right. The >> intention was not to show that THP is an awesome feature but to share >> techniques to measure and control risks. I made the changes >> <https://github.com/alexandrnikitin/blog/compare/2139c405f0c50a3ab907fb2530421bf352caa412...3e58094386b14d19e06752d9faa0435be2cbe651>to >> highlight the purpose and risks. >> >> The experiment is indeed interesting. I believe the "defer" option should >> help in that environment. I'm really keen to try the latest kernel (related >> not only to THP). >> >> *Frankly, I still don't have strong opinion about huge latency spikes in >> allocation path in general. I'm not sure whether it's a THP issue or >> application/environment itself. Likely it's high memory pressure in general >> that causes spikes. Or the root of the issues is in something else, e.g. >> the jemalloc case.* >> >> >> On Friday, August 18, 2017 at 6:32:40 PM UTC+3, Gil Tene wrote: >>> >>> This is very well written and quite detailed. It has all the makings of >>> a great post I'd point people to. However, as currently stated, I'd worry >>> that it would (mis)lead readers into using THP with "always" >>> /sys/kernel/mm/transparent_hugepage/defrag settings (instead of >>> "defer"), and/or on older (pre-4.6) kernels with a false sense that the >>> many-msec slow path allocation latency problems many people warn about >>> don't actually exist. You do link to the discussions on the subject, but >>> the measurements and summary conclusion of the posting alone would not end >>> up warning people who don't actually follow those links. >>> >>> I assume your intention is not to have the reader conclude that "there >>> is lots of advise out there telling you to turn off THP, and it is wrong. >>> Turning it on is perfectly safe, and may significantly speed up your >>> application", but are instead are aiming for something like "THP used to be >>> problematic enough to cause wide ranging recommendations to simply turn it >>> off, but this has changed with recent Linux kernels. It is now safe to use >>> in widely applicable ways (will th the right settings) and can really help >>> application performance without risking huge stalls". Unfortunately, I >>> think that many readers would understand the current text as the former, >>> not the latter. >>> >>> Here is what I'd change to improve on the current text: >>> >>> 1. Highlight the risk of high slow path allocation latencies with the >>> "always" (and even "madvise") setting in /sys/kernel/mm/transparent_ >>> hugepage/defrag, the fact that the "defer" option is intended to >>> address those risks, and this defer option is available with Linux kernel >>> versions 4.6 or later. >>> >>> 2. Create an environment that would actually demonstrate these very high >>> (many msec or worse) latencies in the allocation slow path with defrag set >>> to "always". This is the part that will probably take some extra work, but >>> it will also be a very valuable contribution. The issues are so widely >>> reported (into the 100s of msec or more, and with a wide verity of >>> workloads as your links show) that intentional reproduction *should* be >>> possible. And being able to demonstrate it actually happening will also >>> allow you to demonstrate how newer kernels address it with the defer >>> setting. >>> >>> 3. Show how changing the defrag setting to "defer" removes the high >>> latencies seen by the allocation slow path under the same conditions. >>> >>> For (2) above, I'd look to induce a situation where the allocation slow >>> path can't find a free 2MB page without having to defragment one directly. >>> E.g. >>> - I'd start by significantly slowing down the background defragmentation >>> in khugepaged (e.g set /sys/kernel/mm/transparent >>> _hugepage/khugepaged/scan_sleep_millisecs to 3600000). I'd avoid >>> turning it off completely in order to make sure you are still measuring the >>> system in a configuration that believes it does background defragmentation. >>> - I'd add some static physical memory pressure (e.g. allocate and touch >>> a bunch of anonymous memory in a process that would just sit on it) such >>> that the system would only have 2-3GB free for buffers and your netty >>> workload's heap. A sleeping jvm launched with an empirically sized and big >>> enough -Xmx and -Xms and with AlwaysPretouch on is an easy way to do that. >>> - I'd then create an intentional and spiky fragmentation load (e.g. >>> perform spikes of a scanning through a 20GB file every minute or so). >>> - with all that in place, I'd then repeatedly launch and run your Netty >>> workload without the PreTouch flag, in order to try to induce situations >>> where an on-demand allocated 2MB heap page hits the slow path, and the >>> effect shows up in your netty latency measurements. >>> >>> All the above are obviously experimentation starting points, and may >>> take some iteration to actually induce the demonstrated high latencies we >>> are looking for. But once you are able to demonstrate the impact of >>> on-demand allocation doing direct (synchronous) compaction both in your >>> application latency measurement and in your kernel tracing data, you would >>> then be able to try the same experiment with the defrag setting set to >>> "defer" to show how newer kernels and this new setting now make it safe (or >>> at least much more safe) to use THP. And with that actually demonstrated, >>> everything about THP recommendations for freeze-averse applications can >>> change, making for a really great posting. >>> >>> Sent from my iPad >>> >>> On Aug 18, 2017, at 3:00 AM, Alexandr Nikitin <nikitin.a...@gmail.com> >>> wrote: >>> >>> I decided to write a post about measuring the performance impact >>> (otherwise it stays in my messy notes forever) >>> Any feedback is appreciated. >>> https://alexandrnikitin.github.io/blog/transparent-hugepages >>> -measuring-the-performance-impact/ >>> >>> On Saturday, August 12, 2017 at 1:01:31 PM UTC+3, Alexandr Nikitin >>> wrote: >>>> >>>> I played with Transparent Hugepages some time ago and I want to share >>>> some numbers based on real world high-load applications. >>>> We have a JVM application: high-load tcp server based on netty. No >>>> clear bottleneck, CPU, memory and network are equally highly loaded. The >>>> amount of work depends on request content. >>>> The following numbers are based on normal server load ~40% of maximum >>>> number of requests one server can handle. >>>> >>>> *When THP is off:* >>>> End-to-end application latency in microseconds: >>>> "p50" : 718.891, >>>> "p95" : 4110.26, >>>> "p99" : 7503.938, >>>> "p999" : 15564.827, >>>> >>>> perf stat -e dTLB-load-misses,iTLB-load-misses -p PID -I 1000 >>>> ... >>>> ... 25,164,369 iTLB-load-misses >>>> ... 81,154,170 dTLB-load-misses >>>> ... >>>> >>>> *When THP is always on:* >>>> End-to-end application latency in microseconds: >>>> "p50" : 601.196, >>>> "p95" : 3260.494, >>>> "p99" : 7104.526, >>>> "p999" : 11872.642, >>>> >>>> perf stat -e dTLB-load-misses,iTLB-load-misses -p PID -I 1000 >>>> ... >>>> ... 21,400,513 dTLB-load-misses >>>> ... 4,633,644 iTLB-load-misses >>>> ... >>>> >>>> As you can see THP performance impact is measurable and too significant >>>> to ignore. 4.1 ms vs 3.2 ms 99%% and 100M vs 25M TLB misses. >>>> I also used SytemTap to measure few kernel functions like >>>> collapse_huge_page, clear_huge_page, split_huge_page. There were no >>>> significant spikes using THP. >>>> AFAIR that was 3.10 kernel which is 4 years old now. I can repeat >>>> experiments with the newer kernels if there's interest. (I don't know what >>>> was changed there though) >>>> >>>> >>>> On Monday, August 7, 2017 at 6:42:21 PM UTC+3, Peter Veentjer wrote: >>>>> >>>>> Hi Everyone, >>>>> >>>>> I'm failing to understand the problem with transparent huge pages. >>>>> >>>>> I 'understand' how normal pages work. A page is typically 4kb in a >>>>> virtual address space; each process has its own. >>>>> >>>>> I understand how the TLB fits in; a cache providing a mapping of >>>>> virtual to real addresses to speed up address conversion. >>>>> >>>>> I understand that using a large page e.g. 2mb instead of a 4kb page >>>>> can reduce pressure on the TLB. >>>>> >>>>> So till so far it looks like huge large pages makes a lot of sense; of >>>>> course at the expensive of wasting memory if only a small section of a >>>>> page >>>>> is being used. >>>>> >>>>> The first part I don't understand is: why is it called transparent >>>>> huge pages? So what is transparent about it? >>>>> >>>>> The second part I'm failing to understand is: why can it cause >>>>> problems? There are quite a few applications that recommend disabling THP >>>>> and I recently helped a customer that was helped by disabling it. It seems >>>>> there is more going on behind the scene's than having an increased page >>>>> size. Is it caused due to fragmentation? So if a new page is needed and >>>>> memory is fragmented (due to smaller pages); that small-pages need to be >>>>> compacted before a new huge page can be allocated? But if this would be >>>>> the >>>>> only thing; this shouldn't be a problem once all pages for the application >>>>> have been touched and all pages are retained. >>>>> >>>>> So I'm probably missing something simple. >>>>> >>>>> -- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "mechanical-sympathy" group. >>> To unsubscribe from this topic, visit https://groups.google.com/d/to >>> pic/mechanical-sympathy/sljzehnCNZU/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> mechanical-sympathy+unsubscr...@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- > You received this message because you are subscribed to the Google Groups > "mechanical-sympathy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to mechanical-sympathy+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- *Tom Lee */ http://tomlee.co / @tglee <http://twitter.com/tglee> -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.