OK, found the bug. Seems an update to the lastest nghttp2 lib co-incided with the checkin of your async filter changes. Everything is fine now.
At least I learned some more about core filters, cannot hurt. Thanks for the help. //Stefan > Am 07.10.2015 um 18:40 schrieb Graham Leggett <minf...@sharp.fm>: > > On 07 Oct 2015, at 6:23 PM, Stefan Eissing <stefan.eiss...@greenbytes.de> > wrote: > >>> Can you explain "non-multithreadability of apr_buckets” in more detail? I >>> take it this is the problem with passing a bucket from one allocator to >>> another? >>> >>> If so then the copy makes more sense. >> >> Yes, I wrote about this on the list a while ago. When the bucket is >> destroyed, its allocator tries to put it on the free list. There is no >> protection for that. > > It would be nice to fix this for the future, but that would be an APR fix. > >> >>>> Stream pool destruction is synched with >>>> 1. slave connection being done and no longer writing to it >>> >>> How do you currently know the slave connection is done? >>> >>> Normally a connection is cleaned up by the MPM that spawned the connection, >>> I suspect you’ll need to replicate the same logic the MPMs use to tear down >>> the connection using the c->aborted and c->keepalive flags. >>> >>> Crucially the slave connection needs to tell you that it’s done. If you >>> kill a connection early, data will be lost. >>> >>> I suspect part of the problem is not implementing the algorithm that async >>> MPMs used to kick filters with data in them. Without this kick, data in the >>> slave stacks will never be sent. In theory, when the http2 filter receives >>> a kick, it should pass the kick on to all slave connections. >> >> I am not sure what you mean by that "kick". I'd have to look at your async >> filter design some more… > > What the core network filter used to do was the following: > > - Apply an algorithm to determine how far into the brigade we should write > using blocking writes. Flush buckets and safety limits get applied here. > - Actually do the write. > - As soon as the write returns EAGAIN, setaside the brigade in a buffer and > leave > - The MPM “kicks” the core network filter by passing NULL to the filter and > we repeat the above > > We now do this for any filter: > > - Apply the same safety algorithm, determine flush-to point up to which we > must do blocking write. > - Do writes until we reach the flush-to point. > - Continue to do writes, calling ap_filter_should_yield() as a proxy for > EAGAIN. > - Setaside remaining data in a buffer and leave, and add us to the set of > filters that should be “kicked”. > - The MPM “kicks” all filters with setaside data in the c->filters set > exactly once on each pass and we repeat the above. > > Your code is effectively emulating an MPM, so would need to implement the > “kick” above. > >> I think you misunderstood me. mod_h2 uses ap_process_connection() just like >> core. > > I found that right at the start and confirmed you were doing so correctly. > >> Maybe this async changes just shines the light on a bug that has always been >> there, but never happened due to timing. I will look some more tomorrow. >> Originally, I planned to do something else, but I am running out of >> subversion branches where I can work… > > Branches are cheap, creating more of them is not a problem. > > Can you confirm what happens when things go wrong? > > Do we see missing data, or do the requests hang? > > Regards, > Graham > — >