subject:"Re\: httpd 2.4.25, mpm_event, ssl\: segfaults"

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-28 Thread Daniel Lescohier

All writes to Linux sockets means the kernel copies to 2kiB buffers used by
SKBs.  It's copied to somewhere in the middle of that 2kiB buffer, so that
TCP/IP headers can be prepended by the kernel.  Even with TCP Segmentation
Offload, 2kiB buffers are still used; it just means that the TCP/IP headers
just need to be calculated once for an array of buffers, and then the
kernel puts an array of pointers in the network card's ring buffer.

The kernel will only put on the wire as much data as the current TCP
congestion window says, but it has to keep each packet in it's buffers
until the remote side ACKs that packet.

On Mon, Feb 27, 2017 at 2:25 PM, William A Rowe Jr 
wrote:

> On Mon, Feb 27, 2017 at 12:16 PM, Jacob Champion 
> wrote:
> >
> > On 02/23/2017 04:48 PM, Yann Ylavic wrote:
> >> On Wed, Feb 22, 2017 at 8:55 PM, Daniel Lescohier wrote:
> >>>
> >>>
> >>> IOW: read():Three copies: copy from filesystem cache to httpd
> >>> read() buffer to encrypted-data buffer to kernel socket buffer.
> >
> >>
> >> Not really, "copy from filesystem cache to httpd read() buffer" is
> >> likely mapping to userspace, so no copy (on read) here.
> >
> > Oh, cool. Which kernels do this? It seems like the VM tricks would have
> to
> > be incredibly intricate for this to work; reads typically don't happen in
> > page-sized chunks, nor to aligned addresses. Linux in particular has
> > comments in the source explaining that they *don't* do it for other
> syscalls
> > (e.g. vmsplice)... but I don't have much experience with non-Linux
> systems.
>
> I don't understand this claim.
>
> If read() returned an API-provisioned buffer, it could point wherever it
> liked,
> including a 4k page. As things stand the void* (or char*) of the read()
> buffer
> is at an arbitrary offset, no common OS I'm familiar with maps a page to
> a non-page-aligned address.
>
> The kernel socket send[v]() call might avoid copy in the direct-send case,
> depending on the implementation.
>

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-27 Thread William A Rowe Jr

On Mon, Feb 27, 2017 at 12:16 PM, Jacob Champion  wrote:
>
> On 02/23/2017 04:48 PM, Yann Ylavic wrote:
>> On Wed, Feb 22, 2017 at 8:55 PM, Daniel Lescohier wrote:
>>>
>>>
>>> IOW: read():Three copies: copy from filesystem cache to httpd
>>> read() buffer to encrypted-data buffer to kernel socket buffer.
>
>>
>> Not really, "copy from filesystem cache to httpd read() buffer" is
>> likely mapping to userspace, so no copy (on read) here.
>
> Oh, cool. Which kernels do this? It seems like the VM tricks would have to
> be incredibly intricate for this to work; reads typically don't happen in
> page-sized chunks, nor to aligned addresses. Linux in particular has
> comments in the source explaining that they *don't* do it for other syscalls
> (e.g. vmsplice)... but I don't have much experience with non-Linux systems.

I don't understand this claim.

If read() returned an API-provisioned buffer, it could point wherever it liked,
including a 4k page. As things stand the void* (or char*) of the read() buffer
is at an arbitrary offset, no common OS I'm familiar with maps a page to
a non-page-aligned address.

The kernel socket send[v]() call might avoid copy in the direct-send case,
depending on the implementation.

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-27 Thread Jacob Champion

[combining two replies]

On 02/23/2017 04:47 PM, Yann Ylavic wrote:

On Thu, Feb 23, 2017 at 7:16 PM, Jacob Champion  wrote:

Power users can break the system, and this is a power tool, right?

It's not about power users, I don't think we can recommend anyone to
use 4MB buffers even if they (seem to) have RAM for it.

Even if this is the case (and I'm still skeptical), there's a middle 
ground between "don't let users configure it at all" and "let users 
configure it to dangerous extents".

I'm not talking about hardcoding anything, nor reading or sendind hard
limited sizes on filesystem/sockets.
I'm proposing that the configuration relates to how much we "coalesce"
data on output, and all buffers' reuses (though each of fixed size)
should follow the needs.

I consider the fixed block size to be a major tuning point. That's what 
I'm referring to with "hardcoding". Coalescing doesn't help performance 
if we're bottlenecking on the overhead of tiny blocks. (Even if 
*everything* else can be vectored -- and right now, OpenSSL can't be -- 
allocation itself would still still be a fixed overhead.)

I've no idea how much it costs to have 8K vs 16K records, though.
Maybe in the mod_ssl case we'd want 16K buffers, still reasonable?

We can't/shouldn't hardcode this especially. People who want maximum
throughput may want nice big records, but IIRC users who want progressive
rendering need smaller records so that they don't have to wait as long for
the first decrypted chunk. It needs to be tunable, possibly per-location.

Well, the limit is 16K at the TLS level.

Maximum limit, yes. We also need to not set a minimum limit, which is, 
IIUC, what we're currently doing. mod_ssl calls SSL_write() once for 
every chunk read from apr_bucket_read().

On 02/23/2017 04:48 PM, Yann Ylavic wrote:
> On Wed, Feb 22, 2017 at 8:55 PM, Daniel Lescohier wrote:

IOW: read():Three copies: copy from filesystem cache to httpd
read() buffer to encrypted-data buffer to kernel socket buffer.

>
> Not really, "copy from filesystem cache to httpd read() buffer" is
> likely mapping to userspace, so no copy (on read) here.

Oh, cool. Which kernels do this? It seems like the VM tricks would have 
to be incredibly intricate for this to work; reads typically don't 
happen in page-sized chunks, nor to aligned addresses. Linux in 
particular has comments in the source explaining that they *don't* do it 
for other syscalls (e.g. vmsplice)... but I don't have much experience 
with non-Linux systems.

--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-24 Thread Niklas Edmundsson


On Fri, 24 Feb 2017, Yann Ylavic wrote:


On Thu, Feb 23, 2017 at 8:50 PM, Jacob Champion  wrote:

Going off on a tangent here:

For those of you who actually know how the ssl stuff really works, is it
possible to get multiple threads involved in doing the encryption, or do
you need the results from the previous block in order to do the next
one?


I'm not a cryptographer, but I think how parallelizable it is depends on the
ciphersuite in use. Like you say, some ciphersuites require one block to be
fed into the next as an input; others don't.


Yes, and the cost of scheduling threads for non dedicated cypto device
is not worth it I think.
But mainly there is not only one stream involved in a typical HTTP
server, so probably multiple simulteneous connections already saturate
the AES-NI...


Actually, the AES-NI capability can be seen as a dedicated crypto 
device of sorts... It's just a bit more versatile with a CPU core 
stuck in there as well ;-)


I would much prefer if httpd could be able to push full bandwidth 
single-stream https using multiple cores instead of enticing users to 
use silly "parallel get" clients, download accelerators and whatnot.


Granted, the use cases are not perhaps the standard 
serve-many-files-to-the-public ones, but they do exist. And depending 
on which way the computing trends blow we might start seeing more 
competing low-power cpus with more cores and less capability requiring 
more threads to perform on single/few-stream workloads.


In any event I don't think the basic idea of multiple-thread-crypto 
should be dismissed lightly, especially if someone (definitely not me) 
figures out a neat way to do it :-)


Personally, it's the angst! of having to wait more than 10 seconds for 
a DVD-sized Linux-distro.iso download when I KNOW that there are 7 
cores idling and knowing that without the single-core bottleneck I 
would have 6 additionals seconds of time to spend on something useful 
8-()



/Nikke - thinking that the Subject is not that accurate anymore...
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 Sattinger's Law:  It works better if you plug it in.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-24 Thread Niklas Edmundsson


On Fri, 24 Feb 2017, Yann Ylavic wrote:


The issue is potentially the huge (order big-n) allocations which
finally may hurt the system (fragmentation, OOM...).



Is this a real or theoretical problem?


Both. Fragmentation is a hard issue, but a constant is: the more you
ask for big allocs, the more likely you won't be serviced one day or
another (or another task will be killed for that, until yours).

Long living (or closed to memory limits) systems suffer from this, no
matter what allocator, small and large allocations fragment the memory
(httpd is likely not the only program on the system), the only
"remedy" is small order allocations (2^order pages, a "sane" order
being lower than 2, hence order-1 on a system with 4K pages is 8K
bytes).


I've only seen this class of issues on Linux systems where 
vm.min_free_kbytes is left at default in combination with 
better-than-GigE networking. Since we started to tune 
vm.min_free_kbytes to be in the order of 0.5s bursts at 
maximum-network-speed (ie. 512M for 10GigE) we haven't seen it in 
production. I think our working theory was too little space to handle 
bursts resulted in the kernel unable to figure out which file cache 
pages to throw out in time, but I think we never got to figuring out 
the exact reason...





However, for large file performance I really don't buy into the notion that
it's a good idea to break everything into tiny puny blocks. The potential
for wasting CPU cycles on this micro-management is rather big...


I don't think that a readv/writev of 16 iovecs of 8KB is (noticeably)
slower than read/write of contiguous 128K, both might well end up in a
scaterlist for the kernel/hardware.


Ah, true. Scatter/gather is magic...


I do find iovecs useful, it the small blocks that gets me into skeptic
mode...


Small blocks is not for networking, it's for internal use only.
And remember that TLS records are 16K max anyway, give 128KB to
openssl's SSL_write it will output 8 chunks of 16KB.


Oh, I had completely missed that limit on TLS record size...


Kinda related: We also have the support for larger page sizes with modern
CPUs. Has anyone investigated if it makes sense allocating memory pools in
chunks that fit those large pages?


I think PPC64 have 64K pages already.
APR pools are already based on the page size IIRC.


I was thinking of the huge/large pages available on x86:s, 2 MiB and 
maybe 1 GiB IIRC.


My thought was that doing 2 MiB allocations for the large memory pools 
instead of 4k might make sense for configurations when you have a lot 
of threads resulting in allocating that much memory eventually, one 
page instead of lots. On Linux transparent huge page support, when 
enabled, can take advantage of this leading to less TLB 
entries/misses.



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 *  <- Tribble  *  <- Tribble having Safe Sex
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-24 Thread Daniel Lescohier

On Thu, Feb 23, 2017 at 7:48 PM, Yann Ylavic  wrote:

> On Wed, Feb 22, 2017 at 8:55 PM, Daniel Lescohier wrote:
> >
> > IOW:
> > read():Three copies: copy from filesystem cache to httpd read() buffer to
> > encrypted-data buffer to kernel socket buffer.
>
> Not really, "copy from filesystem cache to httpd read() buffer" is
> likely mapping to userspace, so no copy (on read) here.
>
> > mmap(): Two copies: filesystem page already mapped into httpd, so just
> copy
> > from filesystem (cached) page to encrypted-data buffer to kernel socket
> > buffer.
>
> So, as you said earlier the "write to socket" isn't a copy either,
> hence both read() and mmap() implementations could work with a single
> copy when mod_ssl is involved (this is more than a copy but you
> counted it above so), and no copy at all without it.
>

When you do a write() system call to a socket, the kernel must copy the
data from the userspace buffer to it's own buffers, because of data
lifetime.  When the write() system call returns, userspace is free to
modify the buffer (which it owns).  But, the data from the last write()
call must live a long time in the kernel.  The kernel needs to keep a copy
of it until the remote system ACKs all of it.  The data is referenced first
in the kernel transmission control system, then in the network card's ring
buffers.  If the remote system's feedback indicates that a packet was
dropped or corrupted, the kernel may send it multiple times.  The data has
a different lifetime than the userspace buffer, so the kernel must copy it
to a buffer it owns.

On the userspace high-order memory allocations.  I still don't see what the
problem is. Say you're using 64kiB buffers.  When you free the buffers
(e.g., at the end of the http request), they go into the memory allocator's
64kiB free-list, and they're available to be allocated again (e.g., by
another http request).  The memory allocator won't use the 64kiB free
chunks for smaller allocations, unless the free-lists for the
smaller-orders are emptied out.  That'd mean there was a surge in demand
for smaller-size allocations, so it'd make sense to start using the
higher-order free chunk instead of calling brk().  Only if there are no
more high-order free chunks left will the allocator have to call brk().
When the kernel gets the brk() request, if the system is short of
high-order contiguous memory, it doesn't have to give contiguous-physical
pages on that brk() calls.  The Page Table Entries for that request can be
composed of many individual 4kiB pages scattered all over physical memory.
That's hidden from userspace, userspace sees a contiguous range of virtual
memory.

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-23 Thread Yann Ylavic

On Thu, Feb 23, 2017 at 10:06 PM, Daniel Lescohier
 wrote:
> Why would high-order memory allocations be a problem in userspace code,
> which is using virtual memory?  I thought high-order allocations is a big
> problem in kernel space, which has to deal with physical pages.

Well, both in kernel or user space, the difficulty is finding large
contiguous memory.

With virtual memory (admittedly virtually larger than physical
memory), it needs more "active" regions to fail, but still it can fail
if many heterogeneous chunks are to be mapped at a time, the OOM
killer will do its job.

It depends on how closed to the resident memory limit you are of
course (it won't happen if some memory can be compressed or swapped),
but large chunks are no better with lot of RAM either.

>
> But when you write to a socket, doesn't the kernel scatter the userspace
> buffer into multiple SKBs?  SKBs on order-0 pages allocated by the kernel?

Right, in Linux network stack (or drivers) is mainly using (or is
moving to) order 0 or 1 chunks (with scatterlists when needed).

But this is where httpd's job end, we talk about before this here :)

>From the other message:

On Wed, Feb 22, 2017 at 8:55 PM, Daniel Lescohier wrote:
>
> IOW:
> read():Three copies: copy from filesystem cache to httpd read() buffer to
> encrypted-data buffer to kernel socket buffer.

Not really, "copy from filesystem cache to httpd read() buffer" is
likely mapping to userspace, so no copy (on read) here.

> mmap(): Two copies: filesystem page already mapped into httpd, so just copy
> from filesystem (cached) page to encrypted-data buffer to kernel socket
> buffer.

So, as you said earlier the "write to socket" isn't a copy either,
hence both read() and mmap() implementations could work with a single
copy when mod_ssl is involved (this is more than a copy but you
counted it above so), and no copy at all without it.

Regards,
Yann.

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-23 Thread Yann Ylavic

On Thu, Feb 23, 2017 at 8:50 PM, Jacob Champion  wrote:
> On 02/22/2017 02:16 PM, Niklas Edmundsson wrote:
>
> I don't think s_server is particularly optimized for performance anyway.
>
> Oh, and just to complete my local testing table:
>
> - test server, writing from memory: 1.2 GiB/s
> - test server, mmap() from disk: 1.1 GiB/s
> - test server, 64K read()s from disk: 1.0 GiB/s
> - httpd trunk with `EnableMMAP on` and serving from disk: 850 MiB/s
> - httpd trunk with 'EnableMMAP off': 580 MiB/s
> - httpd trunk with my no-mmap-64K-block file bucket: 810 MiB/s
>
> My test server's read() implementation is a really naive "block on read,
> then block on write, repeat" loop, so there's probably some improvement to
> be had there, but this is enough proof in my mind that there are major gains
> to be made regardless.

Does no-mmap-2MB-block beats MMap on?

>
>> Going off on a tangent here:
>>
>> For those of you who actually know how the ssl stuff really works, is it
>> possible to get multiple threads involved in doing the encryption, or do
>> you need the results from the previous block in order to do the next
>> one?
>
> I'm not a cryptographer, but I think how parallelizable it is depends on the
> ciphersuite in use. Like you say, some ciphersuites require one block to be
> fed into the next as an input; others don't.

Yes, and the cost of scheduling threads for non dedicated cypto device
is not worth it I think.
But mainly there is not only one stream involved in a typical HTTP
server, so probably multiple simulteneous connections already saturate
the AES-NI...

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-23 Thread Yann Ylavic

On Thu, Feb 23, 2017 at 7:16 PM, Jacob Champion  wrote:
> On 02/23/2017 08:34 AM, Yann Ylavic wrote:
>> Actually I'm not very pleased with this solution (or the final one
>> that would make this size open / configurable).
>> The issue is potentially the huge (order big-n) allocations which
>> finally may hurt the system (fragmentation, OOM...).
>
> Power users can break the system, and this is a power tool, right?

It's not about power users, I don't think we can recommend anyone to
use 4MB buffers even if they (seem to) have RAM for it.

> And we
> have HugeTLB kernels and filesystems to play with, with 2MB and bigger
> pages... Making all these assumptions for 90% of users is perfect for the
> out-of-the-box experience, but hardcoding them so that no one can fix broken
> assumptions seems Bad.

I'm not talking about hardcoding anything, nor reading or sendind hard
limited sizes on filesystem/sockets.
I'm proposing that the configuration relates to how much we "coalesce"
data on output, and all buffers' reuses (though each of fixed size)
should follow the needs.

>
> (And don't get me wrong, I think applying vectored I/O to the brigade would
> be a great thing to try out and benchmark. I just think it's a long-term and
> heavily architectural fix, when a short-term change to get rid of some
> #defined constants could have immediate benefits.)

Of course, apr_bucket_file_set_read_size() is a quick patch (I dispute
it for the general case, but it may be useful, or not, for the 16K SSL
case, let's see), and so is another for configuring core_output_filter
constants, but we don't need them for testing right?

>
>> I've no idea how much it costs to have 8K vs 16K records, though.
>> Maybe in the mod_ssl case we'd want 16K buffers, still reasonable?
>
>
> We can't/shouldn't hardcode this especially. People who want maximum
> throughput may want nice big records, but IIRC users who want progressive
> rendering need smaller records so that they don't have to wait as long for
> the first decrypted chunk. It needs to be tunable, possibly per-location.

Well, the limit is 16K at the TLS level.

Regards,
Yann.

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-23 Thread Yann Ylavic

On Thu, Feb 23, 2017 at 7:15 PM, Niklas Edmundsson  wrote:
> On Thu, 23 Feb 2017, Yann Ylavic wrote:
>
>>> Technically, Yann's patch doesn't redefine APR_BUCKET_BUFF_SIZE, it just
>>> defines a new buffer size for use with the file bucket. It's a little
>>> less
>>> than 64K, I assume to make room for an allocation header:
>>>
>>> #define FILE_BUCKET_BUFF_SIZE (64 * 1024 - 64)
>>
>>
>> Actually I'm not very pleased with this solution (or the final one
>> that would make this size open / configurable).
>> The issue is potentially the huge (order big-n) allocations which
>> finally may hurt the system (fragmentation, OOM...).
>
>
> Is this a real or theoretical problem?

Both. Fragmentation is a hard issue, but a constant is: the more you
ask for big allocs, the more likely you won't be serviced one day or
another (or another task will be killed for that, until yours).

Long living (or closed to memory limits) systems suffer from this, no
matter what allocator, small and large allocations fragment the memory
(httpd is likely not the only program on the system), the only
"remedy" is small order allocations (2^order pages, a "sane" order
being lower than 2, hence order-1 on a system with 4K pages is 8K
bytes).

>
> Our large-file cache module does 128k allocs to get a sane block size when
> copying files to the cache. The only potential drawback we noticed was httpd
> processes becoming bloated due to the default MaxMemFree 2048, so we're
> running with MaxMemFree 256 now.

With MaxMemFree 256 (KB per allocator), each APR allocator reclaims
but mainly returns a lot more memory chunks to the system's allocator,
which does a better job to in recycling many differently sized chunks
than APR's (which is pretty basic in this regard, it's role is more
about quickly recycling common sizes).

With MaxMemFree 2048 (2MB), there is more builtin handling of "exotic"
chunks, which may be painful.

That might be something else, though...

>
> Granted, doing alloc/free for all outgoing data means way more alloc/free:s,
> so we might just miss the issues because cache fills aren't as common.

That's why reuse of common and reasonably sized chunks in the APR
allocator can help.

>
> However, for large file performance I really don't buy into the notion that
> it's a good idea to break everything into tiny puny blocks. The potential
> for wasting CPU cycles on this micro-management is rather big...

I don't think that a readv/writev of 16 iovecs of 8KB is (noticeably)
slower than read/write of contiguous 128K, both might well end up in a
scaterlist for the kernel/hardware.

>
> I do find iovecs useful, it the small blocks that gets me into skeptic
> mode...

Small blocks is not for networking, it's for internal use only.
And remember that TLS records are 16K max anyway, give 128KB to
openssl's SSL_write it will output 8 chunks of 16KB.

>
>
> Kinda related: We also have the support for larger page sizes with modern
> CPUs. Has anyone investigated if it makes sense allocating memory pools in
> chunks that fit those large pages?

I think PPC64 have 64K pages already.
APR pools are already based on the page size IIRC.

Regards,
Yann.

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-23 Thread Daniel Lescohier

Why would high-order memory allocations be a problem in userspace code,
which is using virtual memory?  I thought high-order allocations is a big
problem in kernel space, which has to deal with physical pages.

But when you write to a socket, doesn't the kernel scatter the userspace
buffer into multiple SKBs?  SKBs on order-0 pages allocated by the kernel?


On Thu, Feb 23, 2017 at 1:16 PM, Jacob Champion 
wrote:

> On 02/23/2017 08:34 AM, Yann Ylavic wrote:
> > Actually I'm not very pleased with this solution (or the final one
> > that would make this size open / configurable).
> > The issue is potentially the huge (order big-n) allocations which
> > finally may hurt the system (fragmentation, OOM...).
>
> Power users can break the system, and this is a power tool, right? And we
> have HugeTLB kernels and filesystems to play with, with 2MB and bigger
> pages... Making all these assumptions for 90% of users is perfect for the
> out-of-the-box experience, but hardcoding them so that no one can fix
> broken assumptions seems Bad.
>
> (And don't get me wrong, I think applying vectored I/O to the brigade
> would be a great thing to try out and benchmark. I just think it's a
> long-term and heavily architectural fix, when a short-term change to get
> rid of some #defined constants could have immediate benefits.)
>
> I've no idea how much it costs to have 8K vs 16K records, though.
>> Maybe in the mod_ssl case we'd want 16K buffers, still reasonable?
>>
>
> We can't/shouldn't hardcode this especially. People who want maximum
> throughput may want nice big records, but IIRC users who want progressive
> rendering need smaller records so that they don't have to wait as long for
> the first decrypted chunk. It needs to be tunable, possibly per-location.
>
> --Jacob
>

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-23 Thread Jacob Champion


On 02/22/2017 02:16 PM, Niklas Edmundsson wrote:

Any joy with something simpler like gprof? (Caveat: haven't used it in
ages to I don't know if its even applicable nowadays).


Well, if I had thought about it a little more, I would have remembered 
that instrumenting profilers don't profile syscalls very well, and they 
especially mess with I/O timing. Valgrind was completely inconclusive on 
the read() vs. mmap() front. :(


(...except that it showed that a good 25% of my test server's CPU time 
was spent inside OpenSSL in a memcpy(). Interesting...)



So httpd isn't beat by the naive openssl s_server approach at least ;-)


I don't think s_server is particularly optimized for performance anyway.

Oh, and just to complete my local testing table:

- test server, writing from memory: 1.2 GiB/s
- test server, mmap() from disk: 1.1 GiB/s
- test server, 64K read()s from disk: 1.0 GiB/s
- httpd trunk with `EnableMMAP on` and serving from disk: 850 MiB/s
- httpd trunk with 'EnableMMAP off': 580 MiB/s
- httpd trunk with my no-mmap-64K-block file bucket: 810 MiB/s

My test server's read() implementation is a really naive "block on read, 
then block on write, repeat" loop, so there's probably some improvement 
to be had there, but this is enough proof in my mind that there are 
major gains to be made regardless.



Going off on a tangent here:

For those of you who actually know how the ssl stuff really works, is it
possible to get multiple threads involved in doing the encryption, or do
you need the results from the previous block in order to do the next
one?


I'm not a cryptographer, but I think how parallelizable it is depends on 
the ciphersuite in use. Like you say, some ciphersuites require one 
block to be fed into the next as an input; others don't.


--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-23 Thread Jacob Champion

On 02/23/2017 08:34 AM, Yann Ylavic wrote:
> Actually I'm not very pleased with this solution (or the final one
> that would make this size open / configurable).
> The issue is potentially the huge (order big-n) allocations which
> finally may hurt the system (fragmentation, OOM...).

Power users can break the system, and this is a power tool, right? And 
we have HugeTLB kernels and filesystems to play with, with 2MB and 
bigger pages... Making all these assumptions for 90% of users is perfect 
for the out-of-the-box experience, but hardcoding them so that no one 
can fix broken assumptions seems Bad.

(And don't get me wrong, I think applying vectored I/O to the brigade 
would be a great thing to try out and benchmark. I just think it's a 
long-term and heavily architectural fix, when a short-term change to get 
rid of some #defined constants could have immediate benefits.)

I've no idea how much it costs to have 8K vs 16K records, though.
Maybe in the mod_ssl case we'd want 16K buffers, still reasonable?

We can't/shouldn't hardcode this especially. People who want maximum 
throughput may want nice big records, but IIRC users who want 
progressive rendering need smaller records so that they don't have to 
wait as long for the first decrypted chunk. It needs to be tunable, 
possibly per-location.

--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-23 Thread Niklas Edmundsson


On Thu, 23 Feb 2017, Yann Ylavic wrote:


Technically, Yann's patch doesn't redefine APR_BUCKET_BUFF_SIZE, it just
defines a new buffer size for use with the file bucket. It's a little less
than 64K, I assume to make room for an allocation header:

#define FILE_BUCKET_BUFF_SIZE (64 * 1024 - 64)


Actually I'm not very pleased with this solution (or the final one
that would make this size open / configurable).
The issue is potentially the huge (order big-n) allocations which
finally may hurt the system (fragmentation, OOM...).


Is this a real or theoretical problem?

Our large-file cache module does 128k allocs to get a sane block size 
when copying files to the cache. The only potential drawback we 
noticed was httpd processes becoming bloated due to the default 
MaxMemFree 2048, so we're running with MaxMemFree 256 now. I don't 
know if things got much better, but it isn't breaking anything 
either...


Granted, doing alloc/free for all outgoing data means way more 
alloc/free:s, so we might just miss the issues because cache fills 
aren't as common.


However, for large file performance I really don't buy into the notion 
that it's a good idea to break everything into tiny puny blocks. The 
potential for wasting CPU cycles on this micro-management is rather 
big...


I can see it working for a small-file workload where files aren't much 
bigger than tens of kB anyway, but not so much for large-file 
delivery.


A prudent way forward might be to investigate what impact different 
block sizes have wrt ssl/https first.


As networking speeds go up it is kind of expected that block sizes 
needs to go up as well, especially as per-core clock frequency isn't 
increasing much (it's been at 2-ish GHz base frequency for server CPUs 
the last ten years now?) and we're relying more and more on various 
offload mechanisms in CPUs/NICs etc to get us from 1 Gbps to 10 Gbps 
to 100 Gbps ...


I do find iovecs useful, it the small blocks that gets me into skeptic 
mode...



Kinda related: We also have the support for larger page sizes with 
modern CPUs. Has anyone investigated if it makes sense allocating 
memory pools in chunks that fit those large pages?




/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 You need not worry about your future
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-23 Thread Yann Ylavic

On Thu, Feb 23, 2017 at 5:34 PM, Yann Ylavic  wrote:
> On Thu, Feb 23, 2017 at 4:58 PM, Stefan Eissing
>  wrote:
>>
>>> Am 23.02.2017 um 16:38 schrieb Yann Ylavic :
>>>
>>> On Wed, Feb 22, 2017 at 6:36 PM, Jacob Champion  
>>> wrote:
 On 02/22/2017 12:00 AM, Stefan Eissing wrote:
>
> Just so I do not misunderstand:
>
> you increased BUCKET_BUFF_SIZE in APR from 8000 to 64K? That is what you
> are testing?


 Essentially, yes, *and* turn off mmap and sendfile. My hope is to disable
 the mmap-optimization by default while still improving overall performance
 for most users.

 Technically, Yann's patch doesn't redefine APR_BUCKET_BUFF_SIZE, it just
 defines a new buffer size for use with the file bucket. It's a little less
 than 64K, I assume to make room for an allocation header:

#define FILE_BUCKET_BUFF_SIZE (64 * 1024 - 64)
>>>
>>> Actually I'm not very pleased with this solution (or the final one
>>> that would make this size open / configurable).
>>> The issue is potentially the huge (order big-n) allocations which
>>> finally may hurt the system (fragmentation, OOM...).
>>>
>>> So I'm thinking of another way to achieve the same with the current
>>> APR_BUCKET_BUFF_SIZE (2 pages) per alloc.
>>>
>>> The idea is to have a new apr_allocator_allocv() function which would
>>> fill an iovec with what's available in the allocator's freelist (i.e.
>>> spare apr_memnodes) of at least the given min_size bytes (possibly a
>>> max too but I don't see the need for now) and up to the size of the
>>> given iovec.
>>
>> Interesting. Not only for pure files maybe.
>>
>> It would be great if there'd be a SSL_writev()...
>
> Indeed, openssl would fully fill the TLS records.
>
>> but until there is, the TLS case will either turn every iovec into
>> its own TLS record
>
> Yes, but it's more a client issue to work with these records finally
> (because from there all is networking only, and coalescing will happen
> in the core output filter from a socket POV).
>
> Anyway mod_proxy(s) could be the client, so...
>
> I've no idea how much it costs to have 8K vs 16K records, though.
> Maybe in the mod_ssl case we'd want 16K buffers, still reasonable?
>
>> or one needs another copy before that. This last strategy is used by
>> mod_http2. Since there are 9 header bytes per frame, copying data
>> into a right sized buffer gives better performance. So, it would be
>> nice to read n bytes from a bucket brigade and get iovecs back.
>> Which, as I understand it, you propose?
>
> I didn't think of apr_bucket_readv(), more focused on
> file_bucket_read() to do this internally/transparently, but once file
> buckets can do that I think we can generalize the concept, at worse
> filling only iovec[0] when it's ENOTIMPL and/or makes no sense...
>
> That'd help the mod_ssl case with something like
> apr_bucket_readv(min_size=16K), I'll try to think of it once/if I can
> have a more simple to do file_bucket_read() only working ;)

Hm no, this needs to happen on the producer side, not at final
apr_bucket_read().
So for the mod_ssl case I think we could have a simple (new)
apr_bucket_alloc_set_size(16K) in the first place if it helps more
than hurts.

As for your question about "iovecs back" (which I finally didn't
answer), all happens at the bucket alloc level, when buckets are
deleted.

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-23 Thread Yann Ylavic

On Thu, Feb 23, 2017 at 4:58 PM, Stefan Eissing
 wrote:
>
>> Am 23.02.2017 um 16:38 schrieb Yann Ylavic :
>>
>> On Wed, Feb 22, 2017 at 6:36 PM, Jacob Champion  wrote:
>>> On 02/22/2017 12:00 AM, Stefan Eissing wrote:

 Just so I do not misunderstand:

 you increased BUCKET_BUFF_SIZE in APR from 8000 to 64K? That is what you
 are testing?
>>>
>>>
>>> Essentially, yes, *and* turn off mmap and sendfile. My hope is to disable
>>> the mmap-optimization by default while still improving overall performance
>>> for most users.
>>>
>>> Technically, Yann's patch doesn't redefine APR_BUCKET_BUFF_SIZE, it just
>>> defines a new buffer size for use with the file bucket. It's a little less
>>> than 64K, I assume to make room for an allocation header:
>>>
>>>#define FILE_BUCKET_BUFF_SIZE (64 * 1024 - 64)
>>
>> Actually I'm not very pleased with this solution (or the final one
>> that would make this size open / configurable).
>> The issue is potentially the huge (order big-n) allocations which
>> finally may hurt the system (fragmentation, OOM...).
>>
>> So I'm thinking of another way to achieve the same with the current
>> APR_BUCKET_BUFF_SIZE (2 pages) per alloc.
>>
>> The idea is to have a new apr_allocator_allocv() function which would
>> fill an iovec with what's available in the allocator's freelist (i.e.
>> spare apr_memnodes) of at least the given min_size bytes (possibly a
>> max too but I don't see the need for now) and up to the size of the
>> given iovec.
>
> Interesting. Not only for pure files maybe.
>
> It would be great if there'd be a SSL_writev()...

Indeed, openssl would fully fill the TLS records.

> but until there is, the TLS case will either turn every iovec into
> its own TLS record

Yes, but it's more a client issue to work with these records finally
(because from there all is networking only, and coalescing will happen
in the core output filter from a socket POV).

Anyway mod_proxy(s) could be the client, so...

I've no idea how much it costs to have 8K vs 16K records, though.
Maybe in the mod_ssl case we'd want 16K buffers, still reasonable?

> or one needs another copy before that. This last strategy is used by
> mod_http2. Since there are 9 header bytes per frame, copying data
> into a right sized buffer gives better performance. So, it would be
> nice to read n bytes from a bucket brigade and get iovecs back.
> Which, as I understand it, you propose?

I didn't think of apr_bucket_readv(), more focused on
file_bucket_read() to do this internally/transparently, but once file
buckets can do that I think we can generalize the concept, at worse
filling only iovec[0] when it's ENOTIMPL and/or makes no sense...

That'd help the mod_ssl case with something like
apr_bucket_readv(min_size=16K), I'll try to think of it once/if I can
have a more simple to do file_bucket_read() only working ;)

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-23 Thread Stefan Eissing


> Am 23.02.2017 um 16:38 schrieb Yann Ylavic :
> 
> On Wed, Feb 22, 2017 at 6:36 PM, Jacob Champion  wrote:
>> On 02/22/2017 12:00 AM, Stefan Eissing wrote:
>>> 
>>> Just so I do not misunderstand:
>>> 
>>> you increased BUCKET_BUFF_SIZE in APR from 8000 to 64K? That is what you
>>> are testing?
>> 
>> 
>> Essentially, yes, *and* turn off mmap and sendfile. My hope is to disable
>> the mmap-optimization by default while still improving overall performance
>> for most users.
>> 
>> Technically, Yann's patch doesn't redefine APR_BUCKET_BUFF_SIZE, it just
>> defines a new buffer size for use with the file bucket. It's a little less
>> than 64K, I assume to make room for an allocation header:
>> 
>>#define FILE_BUCKET_BUFF_SIZE (64 * 1024 - 64)
> 
> Actually I'm not very pleased with this solution (or the final one
> that would make this size open / configurable).
> The issue is potentially the huge (order big-n) allocations which
> finally may hurt the system (fragmentation, OOM...).
> 
> So I'm thinking of another way to achieve the same with the current
> APR_BUCKET_BUFF_SIZE (2 pages) per alloc.
> 
> The idea is to have a new apr_allocator_allocv() function which would
> fill an iovec with what's available in the allocator's freelist (i.e.
> spare apr_memnodes) of at least the given min_size bytes (possibly a
> max too but I don't see the need for now) and up to the size of the
> given iovec.

Interesting. Not only for pure files maybe.

It would be great if there'd be a SSL_writev()...but until there is, the TLS 
case will either turn every iovec into its own TLS record or one needs another 
copy before that. This last strategy is used by mod_http2. Since there are 9 
header bytes per frame, copying data into a right sized buffer gives better 
performance. So, it would be nice to read n bytes from a bucket brigade and get 
iovecs back. Which, as I understand it, you propose?

> This function could be the base of a new apr_bucket_allocv() (and
> possibly apr_p[c]allocv(), though out of scope here) which in turn
> could be used by file_bucket_read() to get an iovec of available
> buffers.
> This iovec could then be passed to (new still) apr_file_readv() based
> on the readv() syscall, which would allow to read much more data in
> one go.
> 
> With this the scheme we'd have iovec from end to end, well, sort of
> since mod_ssl would be break the chain but still produce transient
> buckets on output which anyway will end up in the core_output_filter's
> brigade of aside heap buckets, for apr_socket_sendv() to finally
> writev() them.
> 
> We'd also have more recycled heap buckets (hence memnodes in the
> allocator) as the core_output_filter retains buckets, all with
> APR_BUCKET_BUFF_SIZE, up to THRESHOLD_MAX_BUFFER which, if
> configurable and along with MaxMemFree, would be the real limiter of
> recycling.
> So it's also adaptative.
> 
> Actually it looks like what we need, but I'd like to have feedbacks
> before I go further the prototype I have so far (quite straightforward
> apr_allocator changes...).
> 
> Thoughts?
> 
> 
> Regards,
> Yann.

Stefan Eissing

bytes GmbH
Hafenstrasse 16
48155 Münster
www.greenbytes.de

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-23 Thread Yann Ylavic

On Wed, Feb 22, 2017 at 6:36 PM, Jacob Champion  wrote:
> On 02/22/2017 12:00 AM, Stefan Eissing wrote:
>>
>> Just so I do not misunderstand:
>>
>> you increased BUCKET_BUFF_SIZE in APR from 8000 to 64K? That is what you
>> are testing?
>
>
> Essentially, yes, *and* turn off mmap and sendfile. My hope is to disable
> the mmap-optimization by default while still improving overall performance
> for most users.
>
> Technically, Yann's patch doesn't redefine APR_BUCKET_BUFF_SIZE, it just
> defines a new buffer size for use with the file bucket. It's a little less
> than 64K, I assume to make room for an allocation header:
>
> #define FILE_BUCKET_BUFF_SIZE (64 * 1024 - 64)

Actually I'm not very pleased with this solution (or the final one
that would make this size open / configurable).
The issue is potentially the huge (order big-n) allocations which
finally may hurt the system (fragmentation, OOM...).

So I'm thinking of another way to achieve the same with the current
APR_BUCKET_BUFF_SIZE (2 pages) per alloc.

The idea is to have a new apr_allocator_allocv() function which would
fill an iovec with what's available in the allocator's freelist (i.e.
spare apr_memnodes) of at least the given min_size bytes (possibly a
max too but I don't see the need for now) and up to the size of the
given iovec.

This function could be the base of a new apr_bucket_allocv() (and
possibly apr_p[c]allocv(), though out of scope here) which in turn
could be used by file_bucket_read() to get an iovec of available
buffers.
This iovec could then be passed to (new still) apr_file_readv() based
on the readv() syscall, which would allow to read much more data in
one go.

With this the scheme we'd have iovec from end to end, well, sort of
since mod_ssl would be break the chain but still produce transient
buckets on output which anyway will end up in the core_output_filter's
brigade of aside heap buckets, for apr_socket_sendv() to finally
writev() them.

We'd also have more recycled heap buckets (hence memnodes in the
allocator) as the core_output_filter retains buckets, all with
APR_BUCKET_BUFF_SIZE, up to THRESHOLD_MAX_BUFFER which, if
configurable and along with MaxMemFree, would be the real limiter of
recycling.
So it's also adaptative.

Actually it looks like what we need, but I'd like to have feedbacks
before I go further the prototype I have so far (quite straightforward
apr_allocator changes...).

Thoughts?

Regards,
Yann.

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-22 Thread Niklas Edmundsson


On Wed, 22 Feb 2017, Jacob Champion wrote:


To make results less confusing, any specific patches/branch I should
test? My baseline is httpd-2.4.25 + httpd-2.4.25-deps
--with-included-apr FWIW.


2.4.25 is just fine. We'll have to make sure there's nothing substantially 
different about it performance-wise before we backport patches anyway, so 
it'd be good to start testing it now.


OK.


- The OpenSSL test server, writing from memory: 1.2 GiB/s
- httpd trunk with `EnableMMAP on` and serving from disk: 850 MiB/s
- httpd trunk with 'EnableMMAP off': 580 MiB/s
- httpd trunk with my no-mmap-64K-block file bucket: 810 MiB/s


At those speeds your results might be skewed by the latency of
processing 10 MiB GET:s.


Maybe, but keep in mind I care more about the difference between the numbers 
than the absolute throughput ceiling here. (In any case, I don't see 
significantly different numbers between 10 MiB and 1 GiB files. Remember, I'm 
testing via loopback.)


Ah, right.


Discard the results from the first warm-up
access and your results delivering from memory or disk (cache) shouldn't
differ.


Ah, but they *do*, as Yann pointed out earlier. We can't just deliver the 
disk cache to OpenSSL for encryption; it has to be copied into some 
addressable buffer somewhere. That seems to be a major reason for the mmap() 
advantage, compared to a naive read() solution that just reads into a small 
buffer over and over again.


(I am trying to set up Valgrind to confirm where the test server is spending 
most of its time, but it doesn't care for the large in-memory static buffer, 
or for OpenSSL's compressed debugging symbols, and crashes. :( )


Any joy with something simpler like gprof? (Caveat: haven't used it in 
ages to I don't know if its even applicable nowadays).


Numbers on the "memcopy penalty" would indeed be interesting, 
especially any variation when the block size differs.



As I said, our live server does 600 MB/s aes-128-gcm and can deliver 300
MB/s https without mmap. That's only a factor 2 difference between
aes-128-gcm speed and delivered speed.

Your results above are almost a factor 4 off, so something's fishy :-)


Well, I can only report my methodology and numbers -- whether the numbers are 
actually meaningful has yet to be determined. ;D More testers are welcome!


:-)

I did some repeated tests and my initial results were actually a bit 
on the low side:


Server CPU is an Intel E5606 (1st gen aes offload), openssl speed -evp
says:

The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes
aes-128-gcm 208536.05k   452980.05k   567523.33k   607578.11k   619192.32k

Single-stream https over a 10 Gbps link with 3ms RTT (useful routing 
SNAFU when talking to stuff in the neigboring building traffic takes 
the "shortcut" through a town 300 km away ;).


Using wget -O /dev/null as a client, on a host with Intel E5-2630 CPU 
(960-ish MB/s aes-128-gcm on 8k blocks).


http (sendfile): 1.07 GB/s (repeatedly)

httpd (no mmap): 370-380 MB/s

openssl s_server: 330-340 MB/s

So httpd isn't beat by the naive openssl s_server approach at least 
;-)



Going off on a tangent here:

For those of you who actually know how the ssl stuff really works, is 
it possible to get multiple threads involved in doing the encryption, 
or do you need the results from the previous block in order to do the 
next one? Yes, I know this wouldn't make sense for most real setups 
but for a student computer club with old hardware and good 
connectivity this is a real problem ;-)


On the other hand, you would need it to do 100 Gbps single-stream 
https even on latest&greatest CPUs 8-)



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 There may be a correlation between humor and sex. - Data
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-22 Thread Daniel Lescohier

On Wed, Feb 22, 2017 at 2:42 PM, Jacob Champion 
wrote:

> Ah, but they *do*, as Yann pointed out earlier. We can't just deliver the
> disk cache to OpenSSL for encryption; it has to be copied into some
> addressable buffer somewhere. That seems to be a major reason for the
> mmap() advantage, compared to a naive read() solution that just reads into
> a small buffer over and over again.
>

IOW:
read():Three copies: copy from filesystem cache to httpd read() buffer to
encrypted-data buffer to kernel socket buffer.
mmap(): Two copies: filesystem page already mapped into httpd, so just copy
from filesystem (cached) page to encrypted-data buffer to kernel socket
buffer.

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-22 Thread Jacob Champion


On 02/22/2017 10:34 AM, Niklas Edmundsson wrote:

To make results less confusing, any specific patches/branch I should
test? My baseline is httpd-2.4.25 + httpd-2.4.25-deps
--with-included-apr FWIW.


2.4.25 is just fine. We'll have to make sure there's nothing 
substantially different about it performance-wise before we backport 
patches anyway, so it'd be good to start testing it now.



- The OpenSSL test server, writing from memory: 1.2 GiB/s
- httpd trunk with `EnableMMAP on` and serving from disk: 850 MiB/s
- httpd trunk with 'EnableMMAP off': 580 MiB/s
- httpd trunk with my no-mmap-64K-block file bucket: 810 MiB/s


At those speeds your results might be skewed by the latency of
processing 10 MiB GET:s.


Maybe, but keep in mind I care more about the difference between the 
numbers than the absolute throughput ceiling here. (In any case, I don't 
see significantly different numbers between 10 MiB and 1 GiB files. 
Remember, I'm testing via loopback.)



Discard the results from the first warm-up
access and your results delivering from memory or disk (cache) shouldn't
differ.


Ah, but they *do*, as Yann pointed out earlier. We can't just deliver 
the disk cache to OpenSSL for encryption; it has to be copied into some 
addressable buffer somewhere. That seems to be a major reason for the 
mmap() advantage, compared to a naive read() solution that just reads 
into a small buffer over and over again.


(I am trying to set up Valgrind to confirm where the test server is 
spending most of its time, but it doesn't care for the large in-memory 
static buffer, or for OpenSSL's compressed debugging symbols, and 
crashes. :( )



As I said, our live server does 600 MB/s aes-128-gcm and can deliver 300
MB/s https without mmap. That's only a factor 2 difference between
aes-128-gcm speed and delivered speed.

Your results above are almost a factor 4 off, so something's fishy :-)


Well, I can only report my methodology and numbers -- whether the 
numbers are actually meaningful has yet to be determined. ;D More 
testers are welcome!


--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-22 Thread Niklas Edmundsson


On Tue, 21 Feb 2017, Jacob Champion wrote:


Is there interest in more real-life numbers with increasing
FILE_BUCKET_BUFF_SIZE or are you already on it?


Yes please! My laptop probably isn't representative of most servers; it can 
do nearly 3 GB/s AES-128-GCM. The more machines we test, the better.


To make results less confusing, any specific patches/branch I should 
test? My baseline is httpd-2.4.25 + httpd-2.4.25-deps 
--with-included-apr FWIW.



I have an older server
that can do 600 MB/s aes-128-gcm per core, but is only able to deliver
300 MB/s https single-stream via its 10 GBps interface. My guess is too
small blocks causing CPU cycles being spent not delivering data...


Right. To give you an idea of where I am in testing at the moment: I have a 
basic test server written with OpenSSL. It sends a 10 MiB response body from 
memory (*not* from disk) for every GET it receives. I also have a copy of 
httpd trunk that's serving an actual 10 MiB file from disk.


My test call is just `h2load --h1 -n 100 https://localhost/`, which should 
send 100 requests over a single TLS connection. The ciphersuite selected for 
all test cases is ECDHE-RSA-AES256-GCM-SHA384. For reference, I can do 
in-memory AES-256-GCM at 2.1 GiB/s.


- The OpenSSL test server, writing from memory: 1.2 GiB/s
- httpd trunk with `EnableMMAP on` and serving from disk: 850 MiB/s
- httpd trunk with 'EnableMMAP off': 580 MiB/s
- httpd trunk with my no-mmap-64K-block file bucket: 810 MiB/s


At those speeds your results might be skewed by the latency of 
processing 10 MiB GET:s.


I'd go for multiple GiB files (whatever you can cache in RAM) and 
deliver files from disk. Discard the results from the first warm-up 
access and your results delivering from memory or disk (cache) 
shouldn't differ.


So just bumping the block size gets me almost to the speed of mmap, without 
the downside of a potential SIGBUS. Meanwhile, the OpenSSL test server seems 
to suggest a performance ceiling about 50% above where we are now.


I'm guessing that if you redo the tests with a bigger file you should 
see even more potential.


As I said, our live server does 600 MB/s aes-128-gcm and can deliver 
300 MB/s https without mmap. That's only a factor 2 difference 
between aes-128-gcm speed and delivered speed.


Your results above are almost a factor 4 off, so something's fishy :-)

Even with the test server serving responses from memory, that seems like 
plenty of room to grow. I'm working on a version of the test server that 
serves files from disk so that I'm not comparing apples to oranges, but my 
prior testing leads me to believe that disk access is not the limiting factor 
on my machine.


Hmm. Perhaps I should just do a quick test with openssl s_server, just 
to see what numbers I get...



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 BETA testing is hazardous to your health.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-22 Thread Jacob Champion


On 02/22/2017 12:00 AM, Stefan Eissing wrote:

Just so I do not misunderstand:

you increased BUCKET_BUFF_SIZE in APR from 8000 to 64K? That is what you are 
testing?


Essentially, yes, *and* turn off mmap and sendfile. My hope is to 
disable the mmap-optimization by default while still improving overall 
performance for most users.


Technically, Yann's patch doesn't redefine APR_BUCKET_BUFF_SIZE, it just 
defines a new buffer size for use with the file bucket. It's a little 
less than 64K, I assume to make room for an allocation header:


#define FILE_BUCKET_BUFF_SIZE (64 * 1024 - 64)

--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-22 Thread Stefan Eissing


> Am 22.02.2017 um 00:14 schrieb Jacob Champion :
> 
> On 02/19/2017 01:37 PM, Niklas Edmundsson wrote:
>> On Thu, 16 Feb 2017, Jacob Champion wrote:
>>> So, I had already hacked my O_DIRECT bucket case to just be a copy of
>>> APR's file bucket, minus the mmap() logic. I tried making this change
>>> on top of it...
>>> 
>>> ...and holy crap, for regular HTTP it's *faster* than our current
>>> mmap() implementation. HTTPS is still slower than with mmap, but
>>> faster than it was without the change. (And the HTTPS performance has
>>> been really variable.)
>> 
>> I'm guessing that this is with a low-latency storage device, say a
>> local SSD with low load? O_DIRECT on anything with latency would require
>> way bigger blocks to hide the latency... You really want the OS
>> readahead in the generic case, simply because it performs reasonably
>> well in most cases.
> 
> I described my setup really poorly. I've ditched O_DIRECT entirely. The 
> bucket type I created to use O_DIRECT has been repurposed to just be a copy 
> of the APR file bucket, with the mmap optimization removed entirely, and with 
> the new 64K bucket buffer limit. This new "no-mmap-plus-64K-block" file 
> bucket type performs better on my machine than the old "mmap-enabled" file 
> bucket type.
> 
> (But yes, my testing is all local, with a nice SSD. Hopefully that gets a 
> little closer to isolating the CPU parts of this equation, which is the thing 
> we have the most influence over.)
> 
>> I think the big win here is to use appropriate block sizes, you do more
>> useful work and less housekeeping. I have no clue on when the block size
>> choices were made, but it's likely that it was a while ago. Assuming
>> that things will continue to evolve, I'd say making hard-coded numbers
>> tunable is a Good Thing to do.
> 
> Agreed.
> 
>> Is there interest in more real-life numbers with increasing
>> FILE_BUCKET_BUFF_SIZE or are you already on it?
> 
> Yes please! My laptop probably isn't representative of most servers; it can 
> do nearly 3 GB/s AES-128-GCM. The more machines we test, the better.
> 
>> I have an older server
>> that can do 600 MB/s aes-128-gcm per core, but is only able to deliver
>> 300 MB/s https single-stream via its 10 GBps interface. My guess is too
>> small blocks causing CPU cycles being spent not delivering data...
> 
> Right. To give you an idea of where I am in testing at the moment: I have a 
> basic test server written with OpenSSL. It sends a 10 MiB response body from 
> memory (*not* from disk) for every GET it receives. I also have a copy of 
> httpd trunk that's serving an actual 10 MiB file from disk.
> 
> My test call is just `h2load --h1 -n 100 https://localhost/`, which should 
> send 100 requests over a single TLS connection. The ciphersuite selected for 
> all test cases is ECDHE-RSA-AES256-GCM-SHA384. For reference, I can do 
> in-memory AES-256-GCM at 2.1 GiB/s.
> 
> - The OpenSSL test server, writing from memory: 1.2 GiB/s
> - httpd trunk with `EnableMMAP on` and serving from disk: 850 MiB/s
> - httpd trunk with 'EnableMMAP off': 580 MiB/s
> - httpd trunk with my no-mmap-64K-block file bucket: 810 MiB/s
> 
> So just bumping the block size gets me almost to the speed of mmap, without 
> the downside of a potential SIGBUS. Meanwhile, the OpenSSL test server seems 
> to suggest a performance ceiling about 50% above where we are now.
> 
> Even with the test server serving responses from memory, that seems like 
> plenty of room to grow. I'm working on a version of the test server that 
> serves files from disk so that I'm not comparing apples to oranges, but my 
> prior testing leads me to believe that disk access is not the limiting factor 
> on my machine.
> 
> --Jacob

Just so I do not misunderstand: 

you increased BUCKET_BUFF_SIZE in APR from 8000 to 64K? That is what you are 
testing?

Stefan Eissing

bytes GmbH
Hafenstrasse 16
48155 Münster
www.greenbytes.de

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-21 Thread Jacob Champion


On 02/19/2017 01:37 PM, Niklas Edmundsson wrote:

On Thu, 16 Feb 2017, Jacob Champion wrote:

So, I had already hacked my O_DIRECT bucket case to just be a copy of
APR's file bucket, minus the mmap() logic. I tried making this change
on top of it...

...and holy crap, for regular HTTP it's *faster* than our current
mmap() implementation. HTTPS is still slower than with mmap, but
faster than it was without the change. (And the HTTPS performance has
been really variable.)


I'm guessing that this is with a low-latency storage device, say a
local SSD with low load? O_DIRECT on anything with latency would require
way bigger blocks to hide the latency... You really want the OS
readahead in the generic case, simply because it performs reasonably
well in most cases.


I described my setup really poorly. I've ditched O_DIRECT entirely. The 
bucket type I created to use O_DIRECT has been repurposed to just be a 
copy of the APR file bucket, with the mmap optimization removed 
entirely, and with the new 64K bucket buffer limit. This new 
"no-mmap-plus-64K-block" file bucket type performs better on my machine 
than the old "mmap-enabled" file bucket type.


(But yes, my testing is all local, with a nice SSD. Hopefully that gets 
a little closer to isolating the CPU parts of this equation, which is 
the thing we have the most influence over.)



I think the big win here is to use appropriate block sizes, you do more
useful work and less housekeeping. I have no clue on when the block size
choices were made, but it's likely that it was a while ago. Assuming
that things will continue to evolve, I'd say making hard-coded numbers
tunable is a Good Thing to do.


Agreed.


Is there interest in more real-life numbers with increasing
FILE_BUCKET_BUFF_SIZE or are you already on it?


Yes please! My laptop probably isn't representative of most servers; it 
can do nearly 3 GB/s AES-128-GCM. The more machines we test, the better.



I have an older server
that can do 600 MB/s aes-128-gcm per core, but is only able to deliver
300 MB/s https single-stream via its 10 GBps interface. My guess is too
small blocks causing CPU cycles being spent not delivering data...


Right. To give you an idea of where I am in testing at the moment: I 
have a basic test server written with OpenSSL. It sends a 10 MiB 
response body from memory (*not* from disk) for every GET it receives. I 
also have a copy of httpd trunk that's serving an actual 10 MiB file 
from disk.


My test call is just `h2load --h1 -n 100 https://localhost/`, which 
should send 100 requests over a single TLS connection. The ciphersuite 
selected for all test cases is ECDHE-RSA-AES256-GCM-SHA384. For 
reference, I can do in-memory AES-256-GCM at 2.1 GiB/s.


- The OpenSSL test server, writing from memory: 1.2 GiB/s
- httpd trunk with `EnableMMAP on` and serving from disk: 850 MiB/s
- httpd trunk with 'EnableMMAP off': 580 MiB/s
- httpd trunk with my no-mmap-64K-block file bucket: 810 MiB/s

So just bumping the block size gets me almost to the speed of mmap, 
without the downside of a potential SIGBUS. Meanwhile, the OpenSSL test 
server seems to suggest a performance ceiling about 50% above where we 
are now.


Even with the test server serving responses from memory, that seems like 
plenty of room to grow. I'm working on a version of the test server that 
serves files from disk so that I'm not comparing apples to oranges, but 
my prior testing leads me to believe that disk access is not the 
limiting factor on my machine.


--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-20 Thread Niklas Edmundsson


On Mon, 20 Feb 2017, Yann Ylavic wrote:


On Sun, Feb 19, 2017 at 10:11 PM, Niklas Edmundsson  wrote:

On Thu, 16 Feb 2017, Yann Ylavic wrote:


Here I am, localhost still, 21GB file (client wget -qO- [url]
&>/dev/null).
Output attached.


Looks good with nice big writes if I interpret it correctly.

Is this without patching anything, meaning that AP_MAX_SENDFILE has no
effect, or did you fix things?


That's with unpatched 2.4.x.

Actually AP_MAX_SENDFILE seems to be used nowhere in 2.4.x code (but
in mod_file_cache, which I known nothing about, and is not widely used
IMHO).


Goodie. Just a confusing leftover define and a confused Nikke then ;)



Maybe you'd want to switch from 2.2.x? ;)


ftp.acc.umu.se is running 2.4.25, so we're pretty good on that front.

It would however help if I learn not to mistype/tab-expand myself into 
old directory trees and grep around in the historical archives when 
trying to look into the current state of things ;-)


So, to conclude: All is good on the http/sendfile front, nothing to 
see here, move along :-)



/Nikke - who apparently had Monday every day last week...
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 Sculpture: mud pies that endure.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-20 Thread Yann Ylavic

On Sun, Feb 19, 2017 at 10:11 PM, Niklas Edmundsson  wrote:
> On Thu, 16 Feb 2017, Yann Ylavic wrote:
>>
>> Here I am, localhost still, 21GB file (client wget -qO- [url]
>> &>/dev/null).
>> Output attached.
>
> Looks good with nice big writes if I interpret it correctly.
>
> Is this without patching anything, meaning that AP_MAX_SENDFILE has no
> effect, or did you fix things?

That's with unpatched 2.4.x.

Actually AP_MAX_SENDFILE seems to be used nowhere in 2.4.x code (but
in mod_file_cache, which I known nothing about, and is not widely used
IMHO).
Maybe you'd want to switch from 2.2.x? ;)

Regards,
Yann.

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-19 Thread Niklas Edmundsson


On Thu, 16 Feb 2017, Jacob Champion wrote:


On 02/16/2017 02:49 AM, Yann Ylavic wrote:
+#define FILE_BUCKET_BUFF_SIZE (64 * 1024 - 64) /* > APR_BUCKET_BUFF_SIZE 
*/


So, I had already hacked my O_DIRECT bucket case to just be a copy of APR's 
file bucket, minus the mmap() logic. I tried making this change on top of 
it...


...and holy crap, for regular HTTP it's *faster* than our current mmap() 
implementation. HTTPS is still slower than with mmap, but faster than it was 
without the change. (And the HTTPS performance has been really variable.)


I'm guessing that this is with a low-latency storage device, say a
local SSD with low load? O_DIRECT on anything with latency would 
require way bigger blocks to hide the latency... You really want the 
OS readahead in the generic case, simply because it performs 
reasonably well in most cases.


Yes, you can avoid a memcpy using O_DIRECT, but compared to the SSL 
stuff a memcpy is rather cheap...


Can you confirm that you see a major performance improvement with the with 
the new 64K file buffer? I'm pretty skeptical of my own results at this 
point... but if you see it too, I think we need to make *all* these 
hard-coded numbers tunable in the config.


I think the big win here is to use appropriate block sizes, you do 
more useful work and less housekeeping. I have no clue on when the 
block size choices were made, but it's likely that it was a while ago. 
Assuming that things will continue to evolve, I'd say making 
hard-coded numbers tunable is a Good Thing to do.


Is there interest in more real-life numbers with increasing 
FILE_BUCKET_BUFF_SIZE or are you already on it? I have an older server 
that can do 600 MB/s aes-128-gcm per core, but is only able to deliver 
300 MB/s https single-stream via its 10 GBps interface. My guess is 
too small blocks causing CPU cycles being spent not delivering data...



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 Fortunately... no one's in control.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-19 Thread Niklas Edmundsson


On Thu, 16 Feb 2017, Yann Ylavic wrote:


Outputs (and the patch to produce them) attached.

TL;DR:
- http  +  EnableMMap=> single write
- http  + !EnableMMap +  EnableSendfile  => single write
- http  + !EnableMMap + !EnableSendfile  => 125KB writes
- https +  EnableMMap=> 16KB writes
- https + !EnableMMap=> 8KB writes



If you try larger filesizes you should start seeing things being broken into
chunks even for mmap/sendfile. For example we have
#define AP_MAX_SENDFILE 16777216  /* 2^24 */
which is unneccessarily low IMHO.


Here I am, localhost still, 21GB file (client wget -qO- [url] &>/dev/null).
Output attached.


Looks good with nice big writes if I interpret it correctly.

Is this without patching anything, meaning that AP_MAX_SENDFILE has no 
effect, or did you fix things?



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 Fortunately... no one's in control.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-17 Thread William A Rowe Jr

On Feb 17, 2017 2:52 PM, "William A Rowe Jr"  wrote:

On Feb 17, 2017 1:02 PM, "Jacob Champion"  wrote:

`EnableMMAP on` appears to boost performance for static files, yes, but is
that because of mmap() itself, or because our bucket brigades configure
themselves more optimally in the mmap() code path? Yann's research is
starting to point towards the latter IMO.


This may be as simple as the page manager caching and reusing the
un-cleared page mapping on subsequent hits. You would need to
overwhelmingly vary the page content served to test this theory. But the
same caching wins for libld[l] ... Which doesn't segv during os updates.
Probably due to copy-on-write mechanics.


(With traditional and sendfile, you still have copy once on read - even if
that file is sitting in FS cache.)

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-17 Thread William A Rowe Jr

On Feb 17, 2017 1:02 PM, "Jacob Champion"  wrote:

`EnableMMAP on` appears to boost performance for static files, yes, but is
that because of mmap() itself, or because our bucket brigades configure
themselves more optimally in the mmap() code path? Yann's research is
starting to point towards the latter IMO.


This may be as simple as the page manager caching and reusing the
un-cleared page mapping on subsequent hits. You would need to
overwhelmingly vary the page content served to test this theory. But the
same caching wins for libld[l] ... Which doesn't segv during os updates.
Probably due to copy-on-write mechanics.

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-17 Thread Jacob Champion


On 02/17/2017 07:04 AM, Daniel Lescohier wrote:

Is the high-level issue that: for serving static content over HTTP, you
can use sendfile() from the OS filesystem cache, avoiding extra
userspace copying; but if it's SSL, or any other dynamic filtering of
content, you have to do extra work in userspace?


Yes -- there are a bunch of potential high-level issues, but the one 
you've highlighted is the reason that I wouldn't expect our HTTPS 
implementation to ever get as fast as HTTP for static responses. At 
least not given the current architecture.


(There are potential kernel-level encryption APIs that are popping up; I 
keep hoping someone will start playing around with AF_ALG sockets. But 
those aren't magic, either; we still have to layer TLS around the 
encrypted data.)


That said, that's not what I'm trying to focus on with this thread. I 
have a feeling our performance is being artificially limited. 
`EnableMMAP on` appears to boost performance for static files, yes, but 
is that because of mmap() itself, or because our bucket brigades 
configure themselves more optimally in the mmap() code path? Yann's 
research is starting to point towards the latter IMO.


--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-17 Thread Daniel Lescohier

Is the high-level issue that: for serving static content over HTTP, you can
use sendfile() from the OS filesystem cache, avoiding extra userspace
copying; but if it's SSL, or any other dynamic filtering of content, you
have to do extra work in userspace?


On Thu, Feb 16, 2017 at 6:01 PM, Yann Ylavic  wrote:

> On Thu, Feb 16, 2017 at 10:51 PM, Jacob Champion 
> wrote:
> > On 02/16/2017 02:49 AM, Yann Ylavic wrote:
> >>
> >> +#define FILE_BUCKET_BUFF_SIZE (64 * 1024 - 64) /* >
> APR_BUCKET_BUFF_SIZE
> >> */
> >
> >
> > So, I had already hacked my O_DIRECT bucket case to just be a copy of
> APR's
> > file bucket, minus the mmap() logic. I tried making this change on top of
> > it...
> >
> > ...and holy crap, for regular HTTP it's *faster* than our current mmap()
> > implementation. HTTPS is still slower than with mmap, but faster than it
> was
> > without the change. (And the HTTPS performance has been really variable.)
> >
> > Can you confirm that you see a major performance improvement with the
> with
> > the new 64K file buffer?
>
> I can't test speed for now (stick with my laptop/localhost, which
> won't be relevant enough I guess).
>
> > I'm pretty skeptical of my own results at this
> > point... but if you see it too, I think we need to make *all* these
> > hard-coded numbers tunable in the config.
>
> We could also improve the apr_bucket_alloc()ator to recycle more
> order-n allocations possibilities (saving as much
> {apr_allocator_,m}alloc() calls), along with configurable/higher
> orders in httpd that'd be great I think.
>
> I can try this patch...
>

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-16 Thread Yann Ylavic

On Thu, Feb 16, 2017 at 10:51 PM, Jacob Champion  wrote:
> On 02/16/2017 02:49 AM, Yann Ylavic wrote:
>>
>> +#define FILE_BUCKET_BUFF_SIZE (64 * 1024 - 64) /* > APR_BUCKET_BUFF_SIZE
>> */
>
>
> So, I had already hacked my O_DIRECT bucket case to just be a copy of APR's
> file bucket, minus the mmap() logic. I tried making this change on top of
> it...
>
> ...and holy crap, for regular HTTP it's *faster* than our current mmap()
> implementation. HTTPS is still slower than with mmap, but faster than it was
> without the change. (And the HTTPS performance has been really variable.)
>
> Can you confirm that you see a major performance improvement with the with
> the new 64K file buffer?

I can't test speed for now (stick with my laptop/localhost, which
won't be relevant enough I guess).

> I'm pretty skeptical of my own results at this
> point... but if you see it too, I think we need to make *all* these
> hard-coded numbers tunable in the config.

We could also improve the apr_bucket_alloc()ator to recycle more
order-n allocations possibilities (saving as much
{apr_allocator_,m}alloc() calls), along with configurable/higher
orders in httpd that'd be great I think.

I can try this patch...

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-16 Thread Jacob Champion


On 02/16/2017 02:49 AM, Yann Ylavic wrote:

+#define FILE_BUCKET_BUFF_SIZE (64 * 1024 - 64) /* > APR_BUCKET_BUFF_SIZE */


So, I had already hacked my O_DIRECT bucket case to just be a copy of 
APR's file bucket, minus the mmap() logic. I tried making this change on 
top of it...


...and holy crap, for regular HTTP it's *faster* than our current mmap() 
implementation. HTTPS is still slower than with mmap, but faster than it 
was without the change. (And the HTTPS performance has been really 
variable.)


Can you confirm that you see a major performance improvement with the 
with the new 64K file buffer? I'm pretty skeptical of my own results at 
this point... but if you see it too, I think we need to make *all* these 
hard-coded numbers tunable in the config.


--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-16 Thread Jacob Champion


On 02/16/2017 02:48 AM, Niklas Edmundsson wrote:

While I applaud the efforts to get https to behave performance-wise I
would hate for http to be left out of being able to do top-notch on
latest&greatest networking :-)


My intent in focusing there was to discover why disabling mmap() seemed 
to be hitting our HTTPS implementation so badly compared to HTTP, in the 
hopes that we could move the mmap() crashy-optimization to a non-default 
setting. (Reducing HTTPS performance by 30+% out of the box seems like a 
non-starter to me.)


But agreed that HTTP should not be left out.

--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-16 Thread Jacob Champion


On 02/16/2017 03:41 AM, Yann Ylavic wrote:

I can't reproduce it anymore, somehow I failed with my restarts
between EnableMMap on=>off.
Sorry for the noise...


This is suspiciously similar to what I've been fighting the last three days.

It's still entirely possible that you and I both messed up the restarts 
independently... but if anyone else thinks to themselves "huh, that's 
funny, didn't I restart?" please speak up. :)


--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-16 Thread Yann Ylavic

On Thu, Feb 16, 2017 at 11:41 AM, Plüm, Rüdiger, Vodafone Group
 wrote:
>
>> -Ursprüngliche Nachricht-
>> Von: Yann Ylavic [mailto:ylavic@gmail.com]
>> Gesendet: Donnerstag, 16. Februar 2017 11:35
>> An: httpd-dev 
>> Betreff: Re: httpd 2.4.25, mpm_event, ssl: segfaults
>>
>> On Thu, Feb 16, 2017 at 11:20 AM, Plüm, Rüdiger, Vodafone Group
>>  wrote:
>> >>
>> >> Please note that "EnableMMap on" avoids EnableSendfile (i.e.
>> >> "EnableMMap on" => "EnableSendfile off")
>> >
>> > Just for clarification: If you placed EnableMMap on in your test
>> > configuration you also put EnableSendfile off in your configuration,
>> > correct? Just putting in EnableMMap on does not automatically cause
>> > EnableSendfile set to off. I know that at least on trunk
>> > EnableSendfile on is no longer the default.
>>
>> If I "EnableMMap on", core_output_filter() will never use sendfile()
>> (whatever EnableSendfile is).
>>
>> I can try to figure out why (that's a really-all build/conf, so maybe
>> some module's filter is apr_bucket_read()ing in the chain unless
>> EnableMMap, unlikely though)...
>
> And this is what I don't understand and cannot read immediately from the 
> code. Weird.

I can't reproduce it anymore, somehow I failed with my restarts
between EnableMMap on=>off.
Sorry for the noise...

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-16 Thread Yann Ylavic

On Thu, Feb 16, 2017 at 11:48 AM, Niklas Edmundsson  wrote:
> On Thu, 16 Feb 2017, Yann Ylavic wrote:
>
>> Here are some SSL/core_write outputs (sizes) for me, with 2.4.x.
>> This is with a GET for a 2MB file, on localhost...
>>
>> Please note that "EnableMMap on" avoids EnableSendfile (i.e.
>> "EnableMMap on" => "EnableSendfile off"), which is relevant only in
>> the http (non-ssl) case anyway.
>>
>> Outputs (and the patch to produce them) attached.
>>
>> TL;DR:
>> - http  +  EnableMMap=> single write
>> - http  + !EnableMMap +  EnableSendfile  => single write
>> - http  + !EnableMMap + !EnableSendfile  => 125KB writes
>> - https +  EnableMMap=> 16KB writes
>> - https + !EnableMMap=> 8KB writes
>
>
> If you try larger filesizes you should start seeing things being broken into
> chunks even for mmap/sendfile. For example we have
> #define AP_MAX_SENDFILE 16777216  /* 2^24 */
> which is unneccessarily low IMHO.

Here I am, localhost still, 21GB file (client wget -qO- [url] &>/dev/null).
Output attached.
http + !EnableMMap + EnableSendfile (21GB):
[Thu Feb 16 12:08:43.809832 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] writev_nonblocking(): 291/291 bytes
[Thu Feb 16 12:08:43.812446 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 11219190/21478375424 bytes
[Thu Feb 16 12:08:43.812843 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 2095456/21467156234 bytes
[Thu Feb 16 12:08:43.812893 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 130966/21465060778 bytes
[Thu Feb 16 12:08:43.812908 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 0/21464929812 bytes
[Thu Feb 16 12:08:43.814166 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 1964490/21464929812 bytes
[Thu Feb 16 12:08:43.814205 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 0/21462965322 bytes
[Thu Feb 16 12:08:43.815408 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 1833524/21462965322 bytes
[Thu Feb 16 12:08:43.815482 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 196449/21461131798 bytes
[Thu Feb 16 12:08:43.815499 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 0/21460935349 bytes
[Thu Feb 16 12:08:43.816843 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 2160939/21460935349 bytes
[Thu Feb 16 12:08:43.816881 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 0/21458774410 bytes
[Thu Feb 16 12:08:43.818319 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 2160939/21458774410 bytes
[...]
[Thu Feb 16 12:08:44.538137 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 1768041/18805599699 bytes
[Thu Feb 16 12:08:44.538145 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 0/18803831658 bytes
[Thu Feb 16 12:08:44.538601 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 1768041/18803831658 bytes
[Thu Feb 16 12:08:44.538609 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 0/18802063617 bytes
[Thu Feb 16 12:08:44.539060 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 1833524/18802063617 bytes
[Thu Feb 16 12:08:44.539069 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 0/18800230093 bytes
[Thu Feb 16 12:08:44.539593 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 1964490/18800230093 bytes
[Thu Feb 16 12:08:44.539601 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 0/18798265603 bytes
[Thu Feb 16 12:08:44.540136 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 1964490/18798265603 bytes
[Thu Feb 16 12:08:44.540156 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 0/18796301113 bytes
[Thu Feb 16 12:08:44.540632 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 1964490/18796301113 bytes
[Thu Feb 16 12:08:44.540639 2017] [core:notice] [pid 26960:tid 139674034870016] 
[client 127.0.0.1:56722] sendfile_nonblocking(): 0/18794336623 bytes
[Thu Feb 16 12:08:44.541157 2017] [core:notice

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-16 Thread Yann Ylavic

On Thu, Feb 16, 2017 at 11:48 AM, Niklas Edmundsson  wrote:
> On Thu, 16 Feb 2017, Yann Ylavic wrote:
>
>> Here are some SSL/core_write outputs (sizes) for me, with 2.4.x.
>> This is with a GET for a 2MB file, on localhost...
>>
>> Please note that "EnableMMap on" avoids EnableSendfile (i.e.
>> "EnableMMap on" => "EnableSendfile off"), which is relevant only in
>> the http (non-ssl) case anyway.
>>
>> Outputs (and the patch to produce them) attached.
>>
>> TL;DR:
>> - http  +  EnableMMap=> single write
>> - http  + !EnableMMap +  EnableSendfile  => single write
>> - http  + !EnableMMap + !EnableSendfile  => 125KB writes
>> - https +  EnableMMap=> 16KB writes
>> - https + !EnableMMap=> 8KB writes
>
>
> If you try larger filesizes you should start seeing things being broken into
> chunks even for mmap/sendfile. For example we have
> #define AP_MAX_SENDFILE 16777216  /* 2^24 */
> which is unneccessarily low IMHO.

Good point, I can try to make it configurable.

>
> Think being able to do 100 Gbps single-stream effectively, that's what we
> need to optimize for now in order to have it in distros some years from now.
> And you will waste a lot of CPU doing small writes at those speeds...
>
> While I applaud the efforts to get https to behave performance-wise I would
> hate for http to be left out of being able to do top-notch on
> latest&greatest networking :-)
>
> Granted, latest&greatest CPU:s "only" seem to be able to do approx 5 GB/s
> AES128 per core so it will be hard to reach 100 Gbps single-stream https,
> but I really think we should set the goal high while at it...

Agreed, let's look at the whole scope.


Regards,
Yann.

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-16 Thread Yann Ylavic

On Thu, Feb 16, 2017 at 11:01 AM, Yann Ylavic  wrote:
> On Thu, Feb 16, 2017 at 10:49 AM, Yann Ylavic  wrote:
>>
>> - http  + !EnableMMap + !EnableSendfile  => 125KB writes
>
> This is due to MAX_IOVEC_TO_WRITE being 16 in
> send_brigade_nonblocking(), 125KB is 16 * 8000B.
> So playing with MAX_IOVEC_TO_WRITE might also be worth a try for
> !EnableMMap+!EnableSendfile case...

To minimize copying, this patch on APR(-util) could possibly help yet better:

Index: srclib/apr-util/buckets/apr_buckets_file.c
===
--- srclib/apr-util/buckets/apr_buckets_file.c(revision 1732829)
+++ srclib/apr-util/buckets/apr_buckets_file.c(working copy)
@@ -72,6 +72,8 @@ static int file_make_mmap(apr_bucket *e, apr_size_
 }
 #endif

+#define FILE_BUCKET_BUFF_SIZE (64 * 1024 - 64) /* > APR_BUCKET_BUFF_SIZE */
+
 static apr_status_t file_bucket_read(apr_bucket *e, const char **str,
  apr_size_t *len, apr_read_type_e block)
 {
@@ -108,8 +110,8 @@ static apr_status_t file_bucket_read(apr_bucket *e
 }
 #endif

-*len = (filelength > APR_BUCKET_BUFF_SIZE)
-   ? APR_BUCKET_BUFF_SIZE
+*len = (filelength > FILE_BUCKET_BUFF_SIZE)
+   ? FILE_BUCKET_BUFF_SIZE
: filelength;
 *str = NULL;  /* in case we die prematurely */
 buf = apr_bucket_alloc(*len, e->list);
_

If that's the case, we could have an apr_bucket_file_bufsize_set() to
make this configurable, or even better one more apr_bucket_alloc's
freelist for such larger/fixed buffers...

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-16 Thread Niklas Edmundsson


On Thu, 16 Feb 2017, Yann Ylavic wrote:


Here are some SSL/core_write outputs (sizes) for me, with 2.4.x.
This is with a GET for a 2MB file, on localhost...

Please note that "EnableMMap on" avoids EnableSendfile (i.e.
"EnableMMap on" => "EnableSendfile off"), which is relevant only in
the http (non-ssl) case anyway.

Outputs (and the patch to produce them) attached.

TL;DR:
- http  +  EnableMMap=> single write
- http  + !EnableMMap +  EnableSendfile  => single write
- http  + !EnableMMap + !EnableSendfile  => 125KB writes
- https +  EnableMMap=> 16KB writes
- https + !EnableMMap=> 8KB writes


If you try larger filesizes you should start seeing things being 
broken into chunks even for mmap/sendfile. For example we have

#define AP_MAX_SENDFILE 16777216  /* 2^24 */
which is unneccessarily low IMHO.

Think being able to do 100 Gbps single-stream effectively, that's what 
we need to optimize for now in order to have it in distros some years 
from now. And you will waste a lot of CPU doing small writes at those 
speeds...


While I applaud the efforts to get https to behave performance-wise I 
would hate for http to be left out of being able to do top-notch on 
latest&greatest networking :-)


Granted, latest&greatest CPU:s "only" seem to be able to do approx 5 
GB/s AES128 per core so it will be hard to reach 100 Gbps 
single-stream https, but I really think we should set the goal high 
while at it...



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 Memory allocation error: Reboot System!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-16 Thread Yann Ylavic

On Thu, Feb 16, 2017 at 11:20 AM, Plüm, Rüdiger, Vodafone Group
 wrote:
>>
>> Please note that "EnableMMap on" avoids EnableSendfile (i.e.
>> "EnableMMap on" => "EnableSendfile off")
>
> Just for clarification: If you placed EnableMMap on in your test
> configuration you also put EnableSendfile off in your configuration,
> correct? Just putting in EnableMMap on does not automatically cause
> EnableSendfile set to off. I know that at least on trunk
> EnableSendfile on is no longer the default.

If I "EnableMMap on", core_output_filter() will never use sendfile()
(whatever EnableSendfile is).

I can try to figure out why (that's a really-all build/conf, so maybe
some module's filter is apr_bucket_read()ing in the chain unless
EnableMMap, unlikely though)...

Regards,
Yann

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-16 Thread Yann Ylavic

On Thu, Feb 16, 2017 at 10:49 AM, Yann Ylavic  wrote:
>
> - http  + !EnableMMap + !EnableSendfile  => 125KB writes

This is due to MAX_IOVEC_TO_WRITE being 16 in
send_brigade_nonblocking(), 125KB is 16 * 8000B.
So playing with MAX_IOVEC_TO_WRITE might also be worth a try for
!EnableMMap+!EnableSendfile case...

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-16 Thread Yann Ylavic

Here are some SSL/core_write outputs (sizes) for me, with 2.4.x.
This is with a GET for a 2MB file, on localhost...

Please note that "EnableMMap on" avoids EnableSendfile (i.e.
"EnableMMap on" => "EnableSendfile off"), which is relevant only in
the http (non-ssl) case anyway.

Outputs (and the patch to produce them) attached.

TL;DR:
- http  +  EnableMMap=> single write
- http  + !EnableMMap +  EnableSendfile  => single write
- http  + !EnableMMap + !EnableSendfile  => 125KB writes
- https +  EnableMMap=> 16KB writes
- https + !EnableMMap=> 8KB writes
http + EnableMMap:
[Thu Feb 16 09:42:33.725297 2017] [core:notice] [pid 25096:tid 139655069091584] 
[client 127.0.0.1:56466] writev_nonblocking(): 2053620/2053620 bytes


http + !EnableMMap + EnableSendfile:
[Thu Feb 16 10:09:35.504327 2017] [core:notice] [pid 25598:tid 139674043262720] 
[client 127.0.0.1:56494] writev_nonblocking(): 284/284 bytes
[Thu Feb 16 10:09:35.504733 2017] [core:notice] [pid 25598:tid 139674043262720] 
[client 127.0.0.1:56494] sendfile_nonblocking(): 2053336/2053336 bytes


http + !EnableMMap + !EnableSendfile:
[Thu Feb 16 09:40:41.377781 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 120284/120284 bytes
[Thu Feb 16 09:40:41.377895 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 128000/128000 bytes
[Thu Feb 16 09:40:41.377973 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 128000/128000 bytes
[Thu Feb 16 09:40:41.378069 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 128000/128000 bytes
[Thu Feb 16 09:40:41.378145 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 128000/128000 bytes
[Thu Feb 16 09:40:41.378215 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 128000/128000 bytes
[Thu Feb 16 09:40:41.378291 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 128000/128000 bytes
[Thu Feb 16 09:40:41.378366 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 128000/128000 bytes
[Thu Feb 16 09:40:41.378446 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 128000/128000 bytes
[Thu Feb 16 09:40:41.378525 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 128000/128000 bytes
[Thu Feb 16 09:40:41.378592 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 128000/128000 bytes
[Thu Feb 16 09:40:41.378642 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 128000/128000 bytes
[Thu Feb 16 09:40:41.378708 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 128000/128000 bytes
[Thu Feb 16 09:40:41.378778 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 128000/128000 bytes
[Thu Feb 16 09:40:41.378931 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 128000/128000 bytes
[Thu Feb 16 09:40:41.379006 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 128000/128000 bytes
[Thu Feb 16 09:40:41.379040 2017] [core:notice] [pid 25057:tid 139655060698880] 
[client 127.0.0.1:56464] writev_nonblocking(): 13336/13336 bytes


https + EnableMMap:
[Thu Feb 16 09:38:33.699387 2017] [ssl:notice] [pid 24930:tid 139655069091584] 
[client 127.0.0.1:58074] bio_filter_out_write(): 2088 bytes
[Thu Feb 16 09:38:33.699429 2017] [ssl:notice] [pid 24930:tid 139655069091584] 
[client 127.0.0.1:58074] bio_filter_out_write(): flush
[Thu Feb 16 09:38:33.699454 2017] [core:notice] [pid 24930:tid 139655069091584] 
[client 127.0.0.1:58074] writev_nonblocking(): 2088/2088 bytes
[Thu Feb 16 09:38:33.700821 2017] [ssl:notice] [pid 24930:tid 139655069091584] 
[client 127.0.0.1:58074] bio_filter_out_write(): 274 bytes
[Thu Feb 16 09:38:33.700835 2017] [ssl:notice] [pid 24930:tid 139655069091584] 
[client 127.0.0.1:58074] bio_filter_out_write(): flush
[Thu Feb 16 09:38:33.700853 2017] [core:notice] [pid 24930:tid 139655069091584] 
[client 127.0.0.1:58074] writev_nonblocking(): 274/274 bytes
[Thu Feb 16 09:38:33.701481 2017] [ssl:notice] [pid 24930:tid 139655069091584] 
[client 127.0.0.1:58074] ssl_filter_write(): 284 bytes
[Thu Feb 16 09:38:33.701497 2017] [ssl:notice] [pid 24930:tid 139655069091584] 
[client 127.0.0.1:58074] bio_filter_out_write(): 313 bytes
[Thu Feb 16 09:38:33.701502 2017] [ssl:notice] [pid 24930:tid 139655069091584] 
[client 127.0.0.1:58074] ssl_filter_write(): 2053336 bytes
[Thu Feb 16 09:38:33.701524 2017] [ssl:notice] [pid 24

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-16 Thread Stefan Eissing

Not at my comp, but the mod_http2 output has special handling for file buckts. 
Because apr_buckt_read returns a max of 8k and splits itself. It instead grabs 
the file and reads the size it needs, if memory serves me well.

I assume when it's mmapped it does not make much of a difference. 

> Am 16.02.2017 um 00:40 schrieb Yann Ylavic :
> 
>> On Thu, Feb 16, 2017 at 12:31 AM, Yann Ylavic  wrote:
>> 
>> Actually this is 16K (the maximum size of a TLS record)
> 
> ... these are the outputs (records) splitted/produced by SSL_write()
> when given inputs (plain text) greater than 16K (at once).

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-15 Thread Yann Ylavic

On Thu, Feb 16, 2017 at 12:31 AM, Yann Ylavic  wrote:
>
> Actually this is 16K (the maximum size of a TLS record)

... these are the outputs (records) splitted/produced by SSL_write()
when given inputs (plain text) greater than 16K (at once).

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-15 Thread Yann Ylavic

On Thu, Feb 16, 2017 at 12:06 AM, Jacob Champion  wrote:
> On 02/15/2017 02:03 PM, Yann Ylavic wrote:
>
>> Assuming so :) there is also the fact that mod_ssl will encrypt/pass
>> 8K buckets at a time, while the core output filter tries to send the
>> whole mmap()ed file, keeping what remains after EAGAIN for the next
>> call (if any).
>
> Oh, right... we split on APR_MMAP_LIMIT in the mmap() case. Which is a nice
> big 4MiB instead of 8KiB. Could it really be as easy as tuning the default
> file bucket size up?

Actually this is 16K (the maximum size of a TLS record), so 16K
buckets are passed to the core output filter (which itself bufferizes
buckets lower than THRESHOLD_MIN_WRITE=4K, up to
THRESHOLD_MAX_BUFFER=64K before sending.

Without SSL, sending is done up to the size of the mmapp()ed file
still, which may make a difference.

Maybe you could try to play with
THRESHOLD_MIN_WRITE/THRESHOLD_MAX_BUFFER in server/core_filters.c for
the SSL case (e.g. MIN=4M and MAX=16M), but that'd still cost some
transient to heap buckets (copies) which don't happen in the non-SSL
case...
>
> --Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-15 Thread Jacob Champion


On 02/15/2017 02:03 PM, Yann Ylavic wrote:

On Wed, Feb 15, 2017 at 9:50 PM, Jacob Champion  wrote:


For the next step, I want to find out why TLS connections see such a big
performance hit when I switch off mmap(), but unencrypted connections
don't... it's such a huge difference that I feel like I must be missing
something obvious.


First, you did "EnableSendfile off", right?


Yep :) But thanks for the reminder anyway; I would have kicked myself...

Also, I have to retract an earlier claim I made: I am now seeing a 
difference between the performance of mmap'd and non-mmap'd responses 
for regular HTTP connections. I don't know if I actually changed 
something, or if the original lack of difference was tester error on my 
part.



Assuming so :) there is also the fact that mod_ssl will encrypt/pass
8K buckets at a time, while the core output filter tries to send the
whole mmap()ed file, keeping what remains after EAGAIN for the next
call (if any).


Oh, right... we split on APR_MMAP_LIMIT in the mmap() case. Which is a 
nice big 4MiB instead of 8KiB. Could it really be as easy as tuning the 
default file bucket size up?


--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-15 Thread Yann Ylavic

On Wed, Feb 15, 2017 at 9:50 PM, Jacob Champion  wrote:
>
> For the next step, I want to find out why TLS connections see such a big
> performance hit when I switch off mmap(), but unencrypted connections
> don't... it's such a huge difference that I feel like I must be missing
> something obvious.

First, you did "EnableSendfile off", right?

Assuming so :) there is also the fact that mod_ssl will encrypt/pass
8K buckets at a time, while the core output filter tries to send the
whole mmap()ed file, keeping what remains after EAGAIN for the next
call (if any).
That's I think a big difference too, especially on localhost or a
fast/large bandwidth network.

Regards,
Yann.

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-15 Thread Jacob Champion


On 02/07/2017 02:32 AM, Niklas Edmundsson wrote:

O_DIRECT also bypasses any read-ahead logic, so you'll have to do nice
and big IO etc to get good performance.


Yep, confirmed... my naive approach to O_DIRECT, which reads from the 
file in the 8K chunks we're used to from the file bucket brigade, 
absolutely mutilates our performance (80% slowdown) *and* rails the disk 
during the load test. Not good.


(I was hoping that combining the O_DIRECT approach with in-memory 
caching would give us the best of both worlds. Nope. A plain read() with 
no explicit caching at all is much, much faster on my machine.)



We've played around with O_DIRECT to optimize the caching process in our
large-file caching module (our backing store is nfs). However, since all
our hosts are running Linux we had much better results with doing plain
reads and utilizing posix_fadvise with POSIX_FADV_WILLNEED to trigger
read-ahead and POSIX_FADV_DONTNEED to drop the original file from cache
when read (as future requests will be served from local disk cache).
We're doing 8MB fadvise chunks to get full streaming performance when
caching large files.


Hmm, I will keep the file advisory API in the back of my head, thanks 
for that.


For the next step, I want to find out why TLS connections see such a big 
performance hit when I switch off mmap(), but unencrypted connections 
don't... it's such a huge difference that I feel like I must be missing 
something obvious.


--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-07 Thread Daniel Lescohier

Here is how cache page replacement is done in Linux:
https://linux-mm.org/PageReplacementDesign


On Tue, Feb 7, 2017 at 5:32 AM, Niklas Edmundsson  wrote:

> On Mon, 6 Feb 2017, Jacob Champion wrote:
>
> 
>
> Considering the massive amount of caching that's built into the entire
>> HTTP ecosystem already, O_DIRECT *might* be an effective way to do that (in
>> which we give up filesystem optimizations and caching in return for a DMA
>> into userspace). I have a PoC about halfway done, but I need to split my
>> time this week between this and the FCGI stuff I've been neglecting.
>>
>
> As O_DIRECT bypasses the cache, all your IO will hit your storage device.
> I think this makes it useful only in exotic usecases.
>
> For our workload that's using mod_cache with a local disk cache we still
> want to use the remaining RAM as a disk read cache, it doesn't make sense
> to force disk I/O for files that simply could have been served from RAM. We
> typically see 90-95% of the requests served by mod_cache actually being
> served from the disk cache in RAM rather than hitting the local disk cache
> devices. I'm suspecting the picture is similar for most occurrances of
> static file serving, regardless of using mod_cache etc or not.
>
> O_DIRECT also bypasses any read-ahead logic, so you'll have to do nice and
> big IO etc to get good performance.
>
> We've played around with O_DIRECT to optimize the caching process in our
> large-file caching module (our backing store is nfs). However, since all
> our hosts are running Linux we had much better results with doing plain
> reads and utilizing posix_fadvise with POSIX_FADV_WILLNEED to trigger
> read-ahead and POSIX_FADV_DONTNEED to drop the original file from cache
> when read (as future requests will be served from local disk cache). We're
> doing 8MB fadvise chunks to get full streaming performance when caching
> large files.
>
> But that's way out of scope for this discussion, I think ;-)
>
> In conclusion, I wouldn't expect any benefits of using O_DIRECT in the
> common httpd usecases. That said, I would gladly be proven wrong :)
>
> /Nikke
> --
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> -=-=-=-=-=-=-=-
>  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
> 
> ---
>  "The point is I am now a perfectly safe penguin!" -- Ford Prefect
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> =-=-=-=-=-=-=-=
>

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-07 Thread Niklas Edmundsson


On Mon, 6 Feb 2017, Jacob Champion wrote:



Considering the massive amount of caching that's built into the entire HTTP 
ecosystem already, O_DIRECT *might* be an effective way to do that (in which 
we give up filesystem optimizations and caching in return for a DMA into 
userspace). I have a PoC about halfway done, but I need to split my time this 
week between this and the FCGI stuff I've been neglecting.


As O_DIRECT bypasses the cache, all your IO will hit your storage 
device. I think this makes it useful only in exotic usecases.


For our workload that's using mod_cache with a local disk cache we 
still want to use the remaining RAM as a disk read cache, it doesn't 
make sense to force disk I/O for files that simply could have been 
served from RAM. We typically see 90-95% of the requests served by 
mod_cache actually being served from the disk cache in RAM rather than 
hitting the local disk cache devices. I'm suspecting the picture is 
similar for most occurrances of static file serving, regardless of 
using mod_cache etc or not.


O_DIRECT also bypasses any read-ahead logic, so you'll have to do nice 
and big IO etc to get good performance.


We've played around with O_DIRECT to optimize the caching process in 
our large-file caching module (our backing store is nfs). However, 
since all our hosts are running Linux we had much better results with 
doing plain reads and utilizing posix_fadvise with POSIX_FADV_WILLNEED 
to trigger read-ahead and POSIX_FADV_DONTNEED to drop the original 
file from cache when read (as future requests will be served from 
local disk cache). We're doing 8MB fadvise chunks to get full 
streaming performance when caching large files.


But that's way out of scope for this discussion, I think ;-)

In conclusion, I wouldn't expect any benefits of using O_DIRECT in the 
common httpd usecases. That said, I would gladly be proven wrong :)


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 "The point is I am now a perfectly safe penguin!" -- Ford Prefect
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-06 Thread Jacob Champion


On 02/03/2017 12:30 AM, Niklas Edmundsson wrote:

Methinks this makes mmap+ssl a VERY bad combination if the thing
SIGBUS:es due to a simple IO error, I'll proceed with disabling mmap and
see if that is a viable way to go for our workload...


(Pulling from a parallel conversation, with permission)

The question has been raised: is our mmap() optimization really giving 
us the utility we want for the additional instability we pay? Stefan had 
this to say:


On 02/03/2017 08:32 AM, Stefan Eissing wrote:

Experimented on my Ubuntu 14.04 image on Parallels on MacOS 10.12,
MacBook Pro mid 2012. Loading a 10 MB file 1000 times over 8
connections:

h2load -c 8 -t 8 -n 1000 -m1 http://xxx/10mb.file

using HTTP/1.1 and HTTP/2 (limit of 1 stream at a time per
connection). Plain and with TLS1.2, transfer speeds in GByte/sec
from  localhost:

   H1Plain H1SSL  H2Plain H2SSL
MMAP on4.3 1.53.8 1.3
 off   3.5 1.13.8 1.3

HTTP/2 seems rather unaffected, while HTTP/1.1 experiences
significant  differences. Hmm...


and I replied:

Am 03.02.2017 um 21:47 schrieb Jacob Champion :

Weird. I can't see any difference for plain HTTP/1.1 when just
toggling EnableMMAP, even with EnableSendfile off. I *do* see a
significant difference for TLS+HTTP/1.1. That doesn't really make
sense to me; is there some other optimization kicking in?

sendfile blows the mmap optimization out of the water, but naturally
it can't kick in for TLS. I would be interested to see if an
O_DIRECT-aware file bucket could speed up the TLS side of things
without exposing people to mmap instability.


I was also interested to see if there was some mmap() flag we were 
missing that could fix the problem for us. Turns out a few systems (used 
to?) have one called MAP_COPY. Linus had a few choice words about it:


http://yarchive.net/comp/linux/map_copy.html

Linus-insult-rant aside, his point applies here too, I think. We're 
using mmap() as an optimized read(). We should be focusing on how to use 
read() in an optimized way. And surely read() for modern systems has 
come a long way since that thread in 2001?


Considering the massive amount of caching that's built into the entire 
HTTP ecosystem already, O_DIRECT *might* be an effective way to do that 
(in which we give up filesystem optimizations and caching in return for 
a DMA into userspace). I have a PoC about halfway done, but I need to 
split my time this week between this and the FCGI stuff I've been 
neglecting.


--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-06 Thread Yann Ylavic

On Mon, Feb 6, 2017 at 12:10 PM, Ruediger Pluem  wrote:
>>
>> What might happen in ssl_io_filter_output() is that buffered
>> output data (already deleted but not cleared) end up being reused
>> on shutdown.
>>
>> Could you please try the attached patch?
>
> Why would we need to handle filter_ctx->pssl == NULL the same way we
> handle META_BUCKETS? filter_ctx->pssl == NULL already causes
> ssl_filter_write to fail. Do I miss any code before that could crash
> in the data case with filter_ctx->pssl == NULL?

No, this hunk was not intended to be proposed/tested (the case should
not happen though, so harmless imo), and anyway was not committed in
r1781582 ([1]).

However I opened the thread "ssl_io_filter_output vs EOC" ([2]), maybe
we could discuss this there?
It seems that we can either error/fail or fall through the filter
stack after the EOC, depending on whether subsequents buckets are in
the same brigade or not.
We probably should clarify (and being consistent on) what to do after
the EOC when TLS is in place (i.e. send whatever follows, besides
metadata, in clear or bail out?).

Regards,
Yann.

[1] http://svn.apache.org/r1781582
[2] 
https://lists.apache.org/thread.html/714ca91c918e7520b75fae664b2bdee28d7b2a9f9ef78e51d8765c96@%3Cdev.httpd.apache.org%3E

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-06 Thread Ruediger Pluem



On 02/02/2017 11:04 AM, Yann Ylavic wrote:
> Hi Niklas,
> 
> On Wed, Feb 1, 2017 at 7:02 PM, Niklas Edmundsson  wrote:
>>
>> We've started to see spurious segfaults with httpd 2.4.25, mpm_event, ssl on
>> Ubuntu 14.04LTS. Not frequent, but none the less happening.
>>
>> #4  ssl_io_filter_output (f=0x7f507013cfe0, bb=0x7f4f840be168) at
>> ssl_engine_io.c:1746
>> data = 0x7f5075518000 > 0x7f5075518000>
>> len = 4194304
>> bucket = 0x7f4f840b1ba8
>> status = 
>> filter_ctx = 0x7f507013cf88
>> inctx = 
>> outctx = 0x7f507013d008
>> rblock = APR_NONBLOCK_READ
> 
> I suspect some cleanup ordering issue happening in
> ssl_io_filter_output(), when the EOC bucket is found.
> 
>>
>> Are we hitting a corner case of process cleanup that plays merry hell with
>> https/ssl, or are we just having bad luck? Ideas? Suggestions?
> 
> 2.4.25 is eager to terminate/shutdown keepalive connections more
> quickly (than previous versions) on graceful shutdown (e.g.
> MaxConnectionsPerChild reached).
> 
> What might happen in ssl_io_filter_output() is that buffered output
> data (already deleted but not cleared) end up being reused on
> shutdown.
> 
> Could you please try the attached patch?

Why would we need to handle filter_ctx->pssl == NULL the same way we handle 
META_BUCKETS?
filter_ctx->pssl == NULL already causes ssl_filter_write to fail. Do I miss any 
code before that could
crash in the data case with filter_ctx->pssl == NULL?

Regards

Rüdiger

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-03 Thread Niklas Edmundsson




Hmm, Linux raises SIGBUS if an mmap is used after the underlying file
has been truncated (see [1]).


See also https://bz.apache.org/bugzilla/show_bug.cgi?id=46688 .

Niklas, just to clarify: you're not willfully truncating large files as 
they're being served, right? I *can* reproduce a SIGBUS if I start 
truncating files during my stress testing. But EnableMMAP's documentation 
calls that case out explicitly.


We're using our own large-file tuned cache module (I know, we should really 
publish it properly), and clean on oldest atime with a script that does rm. 
There's nothing cache-related that I know of that ever truncates files.


Bah. This particular server seems to have sporadic IO errors on the 
cache filesystem. I'm guessing that this could show up as a truncated 
file in the mmap sense of things, with SIGBUS and all. I should have 
caught on when I only saw the issue on one host :-/


Methinks this makes mmap+ssl a VERY bad combination if the thing 
SIGBUS:es due to a simple IO error, I'll proceed with disabling mmap 
and see if that is a viable way to go for our workload...


We do sendfile for vanilla http, I'm guessing that's why we've never 
hit this before.


So, sorry for the noise. You learn something new every day...

Just to rule it out, I changed our httpd init script to leave the stacksize 
ulimit at Linux default (8MB) without changing anything else.


I'll probably leave this in though, as I'm guessing this usecase is 
what gets the bulk of testing...


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 This tagline SHAREWARE.  Send $5.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-03 Thread Niklas Edmundsson


On Thu, 2 Feb 2017, Jacob Champion wrote:

We've started to see spurious segfaults with httpd 2.4.25, mpm_event, ssl 
on

Ubuntu 14.04LTS. Not frequent, but none the less happening.

#4  ssl_io_filter_output (f=0x7f507013cfe0, bb=0x7f4f840be168) at
ssl_engine_io.c:1746
data = 0x7f5075518000 
len = 4194304
bucket = 0x7f4f840b1ba8
status = 
filter_ctx = 0x7f507013cf88
inctx = 
outctx = 0x7f507013d008
rblock = APR_NONBLOCK_READ


Idle thoughts: "Cannot access memory" in this case could be a red herring, if 
Niklas' gdb can't peer into mmap'd memory spaces [1]. It seems reasonable 
that the data in question could be mmap'd, given the nice round address and 4 
MiB length (equal to APR_MMAP_LIMIT).


Oooh, hadn't even though of that...

That doesn't mean we're looking in the wrong place, though, since SIGBUS can 
also be generated by an out-of-bounds access to an mmap'd region.


Niklas, what version of APR are you using? Are you serving large (> 4 MiB) 
static files? I have not been able to reproduce so far (Ubuntu 16.04, httpd 
2.4.25 + mod_ssl + mpm_event).


Yes, this is a file archive offload target ONLY serving large files.

APR from httpd-2.4.25-deps.tar.bz2 and
./configure --prefix=/lap/apache/2.4.25.sslcleanuppatch 
--sysconfdir=/var/conf/apache2 --with-included-apr 
--enable-nonportable-atomics=yes --enable-layout=GNU --enable-mpms-shared=all 
--with-gdbm --without-berkeley-db --enable-mods-shared=all 
--enable-cache=shared --enable-cache-disk=shared --enable-ssl=shared 
--enable-cgi=shared --enable-suexec --with-suexec-caller=www-srv 
--with-suexec-uidmin=1000 --with-suexec-gidmin=1000 CFLAGS=-O2 -
(yes, the cgi/suexec stuff isn't really used but we haven't gotten around to 
change our builds).

It's indeed using that APR according to 
/proc/pid/maps, libapr-1.so.0.5.2 in the httpd install tree.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 This tagline SHAREWARE.  Send $5.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-03 Thread Niklas Edmundsson


On Thu, 2 Feb 2017, Jacob Champion wrote:


On 02/02/2017 03:05 PM, Yann Ylavic wrote:

Hmm, Linux raises SIGBUS if an mmap is used after the underlying file
has been truncated (see [1]).


See also https://bz.apache.org/bugzilla/show_bug.cgi?id=46688 .

Niklas, just to clarify: you're not willfully truncating large files as 
they're being served, right? I *can* reproduce a SIGBUS if I start truncating 
files during my stress testing. But EnableMMAP's documentation calls that 
case out explicitly.


We're using our own large-file tuned cache module (I know, we should 
really publish it properly), and clean on oldest atime with a script 
that does rm. There's nothing cache-related that I know of that ever 
truncates files.


Just to rule it out, I changed our httpd init script to leave the 
stacksize ulimit at Linux default (8MB) without changing anything 
else.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 This tagline SHAREWARE.  Send $5.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Jacob Champion


On 02/02/2017 03:05 PM, Yann Ylavic wrote:

Couldn't htcacheclean or alike do something like this?
"EnableMMAP off" could definitely help here.


(Didn't mean to ignore this part of your email, but I don't have much 
experience with htcacheclean yet so I can't really comment...)


--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Jacob Champion


On 02/02/2017 03:05 PM, Yann Ylavic wrote:

Hmm, Linux raises SIGBUS if an mmap is used after the underlying file
has been truncated (see [1]).


See also https://bz.apache.org/bugzilla/show_bug.cgi?id=46688 .

Niklas, just to clarify: you're not willfully truncating large files as 
they're being served, right? I *can* reproduce a SIGBUS if I start 
truncating files during my stress testing. But EnableMMAP's 
documentation calls that case out explicitly.


--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Yann Ylavic

On Thu, Feb 2, 2017 at 11:36 PM, Jacob Champion  wrote:
> On 02/02/2017 02:32 PM, Yann Ylavic wrote:
>>
>> On Thu, Feb 2, 2017 at 11:19 PM, Jacob Champion 
>> wrote:
>>>
>>> Idle thoughts: "Cannot access memory" in this case could be a red
>>> herring,
>>> if Niklas' gdb can't peer into mmap'd memory spaces [1]. It seems
>>> reasonable
>>> that the data in question could be mmap'd, given the nice round address
>>> and
>>> 4 MiB length (equal to APR_MMAP_LIMIT).
>>>
>>> That doesn't mean we're looking in the wrong place, though, since SIGBUS
>>> can
>>> also be generated by an out-of-bounds access to an mmap'd region.
>>
>>
>> Right, looks like the memory has been unmapped though (SIGBUS) before
>> being (re)used.
>
> Oh, I thought an access after an unmap would SIGSEGV instead of SIGBUS. I
> haven't ever tested that out; I should try it...

Hmm, Linux raises SIGBUS if an mmap is used after the underlying file
has been truncated (see [1]).

Couldn't htcacheclean or alike do something like this?
"EnableMMAP off" could definitely help here.

[1] http://man7.org/linux/man-pages/man2/mmap.2.html

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Jacob Champion


On 02/02/2017 02:32 PM, Yann Ylavic wrote:

On Thu, Feb 2, 2017 at 11:19 PM, Jacob Champion  wrote:


Idle thoughts: "Cannot access memory" in this case could be a red herring,
if Niklas' gdb can't peer into mmap'd memory spaces [1]. It seems reasonable
that the data in question could be mmap'd, given the nice round address and
4 MiB length (equal to APR_MMAP_LIMIT).

That doesn't mean we're looking in the wrong place, though, since SIGBUS can
also be generated by an out-of-bounds access to an mmap'd region.


Right, looks like the memory has been unmapped though (SIGBUS) before
being (re)used.


Oh, I thought an access after an unmap would SIGSEGV instead of SIGBUS. 
I haven't ever tested that out; I should try it...


--Jacob

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Yann Ylavic

On Thu, Feb 2, 2017 at 11:19 PM, Jacob Champion  wrote:
>
> Idle thoughts: "Cannot access memory" in this case could be a red herring,
> if Niklas' gdb can't peer into mmap'd memory spaces [1]. It seems reasonable
> that the data in question could be mmap'd, given the nice round address and
> 4 MiB length (equal to APR_MMAP_LIMIT).
>
> That doesn't mean we're looking in the wrong place, though, since SIGBUS can
> also be generated by an out-of-bounds access to an mmap'd region.

Right, looks like the memory has been unmapped though (SIGBUS) before
being (re)used.
Does "EnableMMAP off" help or produce another backtrace?

>
> Niklas, what version of APR are you using? Are you serving large (> 4 MiB)
> static files? I have not been able to reproduce so far (Ubuntu 16.04, httpd
> 2.4.25 + mod_ssl + mpm_event).

The original file bucket comes from mod_cache, and indeed looks larger than 4MB.
If it were (htcache)cleaned while being served, SIGBUS shouldn't
happen still since we hold an fd (and reference) on it...

Regards,
Yann.

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Jacob Champion


On 02/02/2017 02:04 AM, Yann Ylavic wrote:

Hi Niklas,

On Wed, Feb 1, 2017 at 7:02 PM, Niklas Edmundsson  wrote:


We've started to see spurious segfaults with httpd 2.4.25, mpm_event, ssl on
Ubuntu 14.04LTS. Not frequent, but none the less happening.

#4  ssl_io_filter_output (f=0x7f507013cfe0, bb=0x7f4f840be168) at
ssl_engine_io.c:1746
data = 0x7f5075518000 
len = 4194304
bucket = 0x7f4f840b1ba8
status = 
filter_ctx = 0x7f507013cf88
inctx = 
outctx = 0x7f507013d008
rblock = APR_NONBLOCK_READ


Idle thoughts: "Cannot access memory" in this case could be a red 
herring, if Niklas' gdb can't peer into mmap'd memory spaces [1]. It 
seems reasonable that the data in question could be mmap'd, given the 
nice round address and 4 MiB length (equal to APR_MMAP_LIMIT).


That doesn't mean we're looking in the wrong place, though, since SIGBUS 
can also be generated by an out-of-bounds access to an mmap'd region.


Niklas, what version of APR are you using? Are you serving large (> 4 
MiB) static files? I have not been able to reproduce so far (Ubuntu 
16.04, httpd 2.4.25 + mod_ssl + mpm_event).


--Jacob

[1] 
https://stackoverflow.com/questions/654393/examining-mmaped-addresses-using-gdb

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Niklas Edmundsson


On Thu, 2 Feb 2017, Niklas Edmundsson wrote:


On Thu, 2 Feb 2017, Yann Ylavic wrote:


Are we hitting a corner case of process cleanup that plays merry hell with
https/ssl, or are we just having bad luck? Ideas? Suggestions?


2.4.25 is eager to terminate/shutdown keepalive connections more
quickly (than previous versions) on graceful shutdown (e.g.
MaxConnectionsPerChild reached).

What might happen in ssl_io_filter_output() is that buffered output
data (already deleted but not cleared) end up being reused on
shutdown.

Could you please try the attached patch?


Built and deployed, waiting for the most affected host to drain in order to 
restart it.


I'll also lower MaxConnectionsPerChild a bit more in order to stress this.


Still seems to SIGBUS. backtraces attached.

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 New Borg Software Package : Locutus 1-2-3
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

bt.2201.bz2
Description: Binary data


bt.6429.bz2
Description: Binary data


bt.8071.bz2
Description: Binary data

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Niklas Edmundsson


On Thu, 2 Feb 2017, Yann Ylavic wrote:


Are we hitting a corner case of process cleanup that plays merry hell with
https/ssl, or are we just having bad luck? Ideas? Suggestions?


2.4.25 is eager to terminate/shutdown keepalive connections more
quickly (than previous versions) on graceful shutdown (e.g.
MaxConnectionsPerChild reached).

What might happen in ssl_io_filter_output() is that buffered output
data (already deleted but not cleared) end up being reused on
shutdown.

Could you please try the attached patch?


Built and deployed, waiting for the most affected host to drain in 
order to restart it.


I'll also lower MaxConnectionsPerChild a bit more in order to stress 
this.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 At last I'm organized", he sighed, and died.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-02 Thread Yann Ylavic

Hi Niklas,

On Wed, Feb 1, 2017 at 7:02 PM, Niklas Edmundsson  wrote:
>
> We've started to see spurious segfaults with httpd 2.4.25, mpm_event, ssl on
> Ubuntu 14.04LTS. Not frequent, but none the less happening.
>
> #4  ssl_io_filter_output (f=0x7f507013cfe0, bb=0x7f4f840be168) at
> ssl_engine_io.c:1746
> data = 0x7f5075518000  0x7f5075518000>
> len = 4194304
> bucket = 0x7f4f840b1ba8
> status = 
> filter_ctx = 0x7f507013cf88
> inctx = 
> outctx = 0x7f507013d008
> rblock = APR_NONBLOCK_READ

I suspect some cleanup ordering issue happening in
ssl_io_filter_output(), when the EOC bucket is found.

>
> Are we hitting a corner case of process cleanup that plays merry hell with
> https/ssl, or are we just having bad luck? Ideas? Suggestions?

2.4.25 is eager to terminate/shutdown keepalive connections more
quickly (than previous versions) on graceful shutdown (e.g.
MaxConnectionsPerChild reached).

What might happen in ssl_io_filter_output() is that buffered output
data (already deleted but not cleared) end up being reused on
shutdown.

Could you please try the attached patch?


Regards,
Yann.
Index: modules/ssl/ssl_engine_io.c
===
--- modules/ssl/ssl_engine_io.c	(revision 1781324)
+++ modules/ssl/ssl_engine_io.c	(working copy)
@@ -138,6 +138,7 @@ static int bio_filter_out_pass(bio_filter_out_ctx_
 
 outctx->rc = ap_pass_brigade(outctx->filter_ctx->pOutputFilter->next,
  outctx->bb);
+apr_brigade_cleanup(outctx->bb);
 /* Fail if the connection was reset: */
 if (outctx->rc == APR_SUCCESS && outctx->c->aborted) {
 outctx->rc = APR_ECONNRESET;
@@ -1699,13 +1700,12 @@ static apr_status_t ssl_io_filter_output(ap_filter
 while (!APR_BRIGADE_EMPTY(bb) && status == APR_SUCCESS) {
 apr_bucket *bucket = APR_BRIGADE_FIRST(bb);
 
-if (APR_BUCKET_IS_METADATA(bucket)) {
+if (APR_BUCKET_IS_METADATA(bucket) || !filter_ctx->pssl) {
 /* Pass through metadata buckets untouched.  EOC is
  * special; terminate the SSL layer first. */
 if (AP_BUCKET_IS_EOC(bucket)) {
 ssl_filter_io_shutdown(filter_ctx, f->c, 0);
 }
-AP_DEBUG_ASSERT(APR_BRIGADE_EMPTY(outctx->bb));
 
 /* Metadata buckets are passed one per brigade; it might
  * be more efficient (but also more complex) to use
@@ -1712,11 +1712,10 @@ static apr_status_t ssl_io_filter_output(ap_filter
  * outctx->bb as a true buffer and interleave these with
  * data buckets. */
 APR_BUCKET_REMOVE(bucket);
-APR_BRIGADE_INSERT_HEAD(outctx->bb, bucket);
-status = ap_pass_brigade(f->next, outctx->bb);
-if (status == APR_SUCCESS && f->c->aborted)
-status = APR_ECONNRESET;
-apr_brigade_cleanup(outctx->bb);
+APR_BRIGADE_INSERT_TAIL(outctx->bb, bucket);
+if (bio_filter_out_pass(outctx) < 0) {
+status = outctx->rc;
+}
 }
 else {
 /* Filter a data bucket. */

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-01 Thread Niklas Edmundsson


On Wed, 1 Feb 2017, Eric Covener wrote:


On Wed, Feb 1, 2017 at 1:02 PM, Niklas Edmundsson  wrote:

This might be due to processes being cleaned up due to hitting
MaxSpareThreads or MaxConnectionsPerChild, these are tuned to not happen
frequently. It's just a wild guess, but the reason for me suspecting this is
the weird looking stacktraces that points towards use-after-free issues...


The backtraces of the other threads in the process might give a hint
if graceful proc shutdown is occurring -- e.g. one thread might have
join_workers / apr_thread_join in the stack.


Nothing obvious to me (grep -i join finds nothing in the backtraces). 
I have 9 coredumps with the thread apply all bt full output summing to 
1.3 MB which feels a bit much to post here, although I guess they 
would be small if I bzip:ed and attached them...


Before we look too hard in that direction though, I remembered that 
our httpd init script sets the stacksize to 512k down from the Linux 
default of 8MB (historical reasons). Might that be the easy 
explanation, ie threads overflowing the stack?


We only started serving https this autumn, and recently saw a bump in 
usage due to the LineageOS mirror. It is entirely possible that this 
is triggered by a usage pattern change exposing some of our arcane 
habits ;)


Another observation is that these dumps seems to happen in groups, ie:

[Tue Jan 24 19:32:52.277623 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 425377 exit signal Bus error (7), possible coredump in /tmp
[Tue Jan 24 19:32:55.281211 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 29545 exit signal Bus error (7), possible coredump in /tmp
[Tue Jan 24 19:32:56.282240 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 20749 exit signal Bus error (7), possible coredump in /tmp
[Tue Jan 24 19:32:58.285476 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 679374 exit signal Bus error (7), possible coredump in /tmp
[Wed Jan 25 00:14:29.743371 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 679441 exit signal Bus error (7), possible coredump in /tmp
[Wed Jan 25 00:14:38.753792 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 679547 exit signal Bus error (7), possible coredump in /tmp
[Tue Jan 31 15:29:04.767732 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 954024 exit signal Bus error (7), possible coredump in /tmp
[Tue Jan 31 15:29:09.773329 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 693010 exit signal Bus error (7), possible coredump in /tmp
[Tue Jan 31 15:29:18.782301 2017] [core:notice] [pid 2520:tid 139983763429184] 
AH00051: child pid 693899 exit signal Bus error (7), possible coredump in /tmp

Don't know what, if any, conclusions can be drawn from that though...


/Nikke - rambling a bit before falling asleep...
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 It's a port of call, home away from home...
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: httpd 2.4.25, mpm_event, ssl: segfaults

2017-02-01 Thread Eric Covener

On Wed, Feb 1, 2017 at 1:02 PM, Niklas Edmundsson  wrote:
> This might be due to processes being cleaned up due to hitting
> MaxSpareThreads or MaxConnectionsPerChild, these are tuned to not happen
> frequently. It's just a wild guess, but the reason for me suspecting this is
> the weird looking stacktraces that points towards use-after-free issues...

The backtraces of the other threads in the process might give a hint
if graceful proc shutdown is occurring -- e.g. one thread might have
join_workers / apr_thread_join in the stack.

-- 
Eric Covener
cove...@gmail.com

70 matches

Mail list logo