[
https://issues.apache.org/jira/browse/TS-4897?focusedWorklogId=29795&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-29795
]
ASF GitHub Bot logged work on TS-4897:
--------------------------------------
Author: ASF GitHub Bot
Created on: 27/Sep/16 04:29
Start Date: 27/Sep/16 04:29
Worklog Time Spent: 10m
Work Description: Github user canselcik commented on the issue:
https://github.com/apache/trafficserver/pull/1050
I remember seeing the glibc implementation of it basically doing the
`madvise` syscall as well but their behavior really do seem to be different on
both our production and deployment systems, which are RHEL6.5.
We were also suspecting transparent huge pages but we can elaborate on that
potential explanation if this one ends up having shortcomings.
-When `ats_madvice` is a noop, map count doesn't grow unbounded.
(regardless of the `ssl_ticket_enabled` value)
-When this change is made and `posix_madvise` is called with `0`, map count
doesn't grow unbounded. (regardless of the `ssl_ticket_enabled` value)
-If `ats_madvise` calls `madvise` instead of `posix_madvise` (our systems
have both), map count doesn't grow unbounded.
Calls to `ats_madvise` when `ssl_ticket_enabled=0`:
```
ats_madvice(addr=0x7fc4bc022000, len=32768, flags=16) = 0
ats_madvice(addr=0x7fc4b801f000, len=32768, flags=16) = 0
ats_madvice(addr=0x7fc4cc01f000, len=8192, flags=0) = 0
ats_madvice(addr=0x7fc4dc017000, len=32768, flags=16) = 0
ats_madvice(addr=0x7fc4c8005000, len=8192, flags=0) = 0
ats_madvice(addr=0x7fc4d41e5000, len=32768, flags=16) = 0
ats_madvice(addr=0x7fc4a4003000, len=8192, flags=0) = 0
ats_madvice(addr=0x7fc4b0019000, len=32768, flags=16) = 0
ats_madvice(addr=0x7fc4a8013000, len=32768, flags=16) = 0
ats_madvice(addr=0x7fc4a4006000, len=8192, flags=0) = 0
ats_madvice(addr=0x7fc4e0022000, len=32768, flags=16) = 0
ats_madvice(addr=0x1dd7000, len=8192, flags=0) = 0
ats_madvice(addr=0x7fc4c8028000, len=32768, flags=16) = 0
ats_madvice(addr=0x7fc4c408b000, len=32768, flags=16) = 0
ats_madvice(addr=0x7fc4b8003000, len=8192, flags=0) = 0
ats_madvice(addr=0x7fc4b401c000, len=32768, flags=16) = 0
ats_madvice(addr=0x7fc4d801d000, len=8192, flags=0) = 0
ats_madvice(addr=0x7fc4b8028000, len=32768, flags=16) = 0
ats_madvice(addr=0x7fc4b0022000, len=8192, flags=0) = 0
ats_madvice(addr=0x7fc4a401e000, len=32768, flags=16) = 0
ats_madvice(addr=0x7fc4a0003000, len=8192, flags=0) = 0
ats_madvice(addr=0x7fc4d41ee000, len=32768, flags=16) = 0
<pattern repeats as long as more connections are accepted and handed>
```
Calls to `ats_madvise` when `ssl_ticket_enabled=1` (not repeating despite
more connections are accepted and handled):
```
ats_madvice(addr=0x7f05aea43000, len=131072, flags=16) = 0
ats_madvice(addr=0x7f057c003000, len=8192, flags=0) = 0
ats_madvice(addr=0x7f057c01e000, len=4096, flags=0) = 0
ats_madvice(addr=0x7f057c022000, len=20480, flags=0) = 0
ats_madvice(addr=0x7f057c028000, len=94208, flags=0) = 0
ats_madvice(addr=0x7f057c040000, len=28672, flags=0) = 0
ats_madvice(addr=0x7f058c02e000, len=32768, flags=0) = 0
ats_madvice(addr=0x7f05aea01000, len=262144, flags=0) = 0
ats_madvice(addr=0x7f05ae9bf000, len=262144, flags=16) = 0
ats_madvice(addr=0x7f05ae97d000, len=262144, flags=0) = 0
ats_madvice(addr=0x7f055c01d000, len=12288, flags=0) = 0
ats_madvice(addr=0x7f05ae93b000, len=262144, flags=16) = 0
ats_madvice(addr=0x7f05ae8f9000, len=262144, flags=16) = 0
ats_madvice(addr=0x7f05ae8b7000, len=262144, flags=16) = 0
ats_madvice(addr=0x7f055c021000, len=12288, flags=0) = 0
```
Even with the non-stop calls to `ats_madvise` in the `ssl_ticket_enabled=0`
case, if `ats_madvise` is noop, or uses `flags=0` or calls `madvise` rather
than `posix_madvise`, we don't observe the growth in memory maps.
Issue Time Tracking
-------------------
Worklog Id: (was: 29795)
Time Spent: 40m (was: 0.5h)
> Unbound growth of number of memory maps for traffic_server under SSL
> termination load when ssl_ticket_enabled=0
> ---------------------------------------------------------------------------------------------------------------
>
> Key: TS-4897
> URL: https://issues.apache.org/jira/browse/TS-4897
> Project: Traffic Server
> Issue Type: Bug
> Components: TLS
> Reporter: Can Selcik
> Assignee: Phil Sorber
> Priority: Blocker
> Fix For: 7.1.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> The number of {{\[anon\]}} memory regions mapped to the {{traffic_server}}
> process displays unbound growth until the kernel thresholds are reached and
> the process is terminated.
> This happens when ATS is used to terminate SSL and {{ssl_ticket_enabled=0}}
> in {{ssl_multicert.config}}.
> We've experienced this issue on our staging and production hosts and were
> able to replicate it with the above configuration under high volume HTTPS
> load. We didn't experience this with {{5.2.x}} and it will make sense why at
> the end.
> While generating {{https}} traffic with {{siege}} or {{ab}}, the issue can be
> observed with:
> {{watch "pmap $(pidof traffic_server) | wc -l"}}
> {{git bisect}} pointed us to: <TS-3883: Fix madvise>
> Turns out a no-op {{ats_madvise}} hides the symptoms of the issue.
> Going in deeper, we realize that {{ssl_ticket_enabled}} option is relevant
> because after enabling the {{ssl.session_cache}} tag, we see that ATS doesn't
> manage its own session cache for SSL, it is done by the library instead. In
> that case, the code path doing the problematic allocation within ATS doesn't
> get executed often since OpenSSL takes care of the session tokens.
> But why does this happen? It happens because {{MADV_DONTDUMP}} is passed to
> {{posix_madvise}} even though {{MADV_DONTDUMP}} is not a valid flag for
> {{posix_madvise}} as it is not a drop-in replacement to {{madvise}}.
> Looking at {{<bits/mman.h>}}:
> {noformat}
> 87 /* Advice to `madvise'. */
> 88 #ifdef __USE_BSD
> 89 # define MADV_NORMAL▸ 0▸ /* No further special treatment. */
> 90 # define MADV_RANDOM▸ 1▸ /* Expect random page references. */
> 91 # define MADV_SEQUENTIAL 2▸ /* Expect sequential page references.
> */
> 92 # define MADV_WILLNEED▸ 3▸ /* Will need these pages. */
> 93 # define MADV_DONTNEED▸ 4▸ /* Don't need these pages. */
> 94 # define MADV_REMOVE▸ 9▸ /* Remove these pages and resources.
> */
> 95 # define MADV_DONTFORK▸ 10▸ /* Do not inherit across fork. */
> 96 # define MADV_DOFORK▸ 11▸ /* Do inherit across fork. */
> 97 # define MADV_MERGEABLE▸ 12▸ /* KSM may merge identical pages. */
> 98 # define MADV_UNMERGEABLE 13▸ /* KSM may not merge identical pages.
> */
> 99 # define MADV_DONTDUMP▸ 16 /* Explicity exclude from the core
> dump,
> 100 overrides the coredump filter
> bits. */
> 101 # define MADV_DODUMP▸ 17▸ /* Clear the MADV_DONTDUMP flag. */
> 102 # define MADV_HWPOISON▸ 100▸ /* Poison a page for testing. */
> 103 #endif
> {noformat}
> However {{posix_madvise}} takes:
> {noformat}
> 107 # define POSIX_MADV_NORMAL▸ 0 /* No further special treatment. */
> 108 # define POSIX_MADV_RANDOM▸ 1 /* Expect random page references.
> */
> 109 # define POSIX_MADV_SEQUENTIAL▸ 2 /* Expect sequential page
> references. */
> 110 # define POSIX_MADV_WILLNEED▸ 3 /* Will need these pages. */
> 111 # define POSIX_MADV_DONTNEED▸ 4 /* Don't need these pages. */
> {noformat}
> Also {{posix_madvise}} and {{madvise}} can both be present on the same
> system. However they do not have the same capability. That's why {{Explicity
> exclude from the core dump, overrides the coredump filter bits}}
> functionality isn't achievable through {{posix_madvise}}.
> Will post a PR momentarily.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)