Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 17:34 schrieb Joe Orton :
> 
> On Wed, Aug 25, 2021 at 05:08:06PM +0200, ste...@eissing.org wrote:
>> 
>> 
>>> Am 25.08.2021 um 17:06 schrieb Yann Ylavic :
>>> 
>>> Thanks, looks good, it probably needs a backport to 2.4.x since the
>>> first travis failure was there (Build #1819).
>>> Let's see how it works after several runs before maybe.
>> 
>> +1
>> 
>> Let's see what Joe and Travis can do to wreck it.
> 
> Well I could accidentally cancel your builds, how's that for starters?
> 
> (I restarted the most recent one, was trying to cancel the older PR 
> build, but hit the wrong button on the wrong page, sorry!)
> 
> Nice job getting this one fixed.
> 
> Regards, Joe
> 
> p.s. did you forget the APLOGNO()?
> p.p.s just kidding

I see that we are getting into the right spirit here! ;-)


Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread Joe Orton
On Wed, Aug 25, 2021 at 05:08:06PM +0200, ste...@eissing.org wrote:
> 
> 
> > Am 25.08.2021 um 17:06 schrieb Yann Ylavic :
> > 
> > Thanks, looks good, it probably needs a backport to 2.4.x since the
> > first travis failure was there (Build #1819).
> > Let's see how it works after several runs before maybe.
> 
> +1
> 
> Let's see what Joe and Travis can do to wreck it.

Well I could accidentally cancel your builds, how's that for starters?

(I restarted the most recent one, was trying to cancel the older PR 
build, but hit the wrong button on the wrong page, sorry!)

Nice job getting this one fixed.

Regards, Joe

p.s. did you forget the APLOGNO()?
p.p.s just kidding



Canceled: apache/httpd#1841 (trunk - 376e04c)

2021-08-25 Thread Travis CI
Build Update for apache/httpd
-

Build: #1841
Status: Canceled

Duration: ?
Commit: 376e04c (trunk)
Author: Stefan Eissing
Message:  * test/modules/http2: using stop/start instead of reload when 
changing apache configs
   to give reliable results. The new reload behaviour keeps old children around 
until
   very late and may answer on old configurations.



git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1892598 
13f79535-47bb-0310-9956-ffa450edef68

View the changeset: 
https://github.com/apache/httpd/compare/f390a459161e...376e04c95aed

View the full build log and details: 
https://app.travis-ci.com/github/apache/httpd/builds/236198374?utm_medium=notification_source=email

  Restart your build: 
https://app.travis-ci.com/github/apache/httpd/builds/236198374?utm_medium=notification_source=email

--

You can unsubscribe from build emails from the apache/httpd repository going to 
https://app.travis-ci.com/account/preferences/unsubscribe?repository=16806660_medium=notification_source=email.
Or unsubscribe from *all* email updating your settings at 
https://app.travis-ci.com/account/preferences/unsubscribe?utm_medium=notification_source=email.
Or configure specific recipients for build notifications in your .travis.yml 
file. See https://docs.travis-ci.com/user/notifications.



Canceled: apache/httpd#1842 (trunk - 3259f23)

2021-08-25 Thread Travis CI
Build Update for apache/httpd
-

Build: #1842
Status: Canceled

Duration: ?
Commit: 3259f23 (trunk)
Author: Stefan Eissing
Message: mod_http2: fixes a use-after-read of an integer value when
   passing a stream identifier for further IO checking. A
   non-issue since an int value matching no active stream
   will lead to no action.



git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1892599 
13f79535-47bb-0310-9956-ffa450edef68

View the changeset: 
https://github.com/apache/httpd/compare/376e04c95aed...3259f2346014

View the full build log and details: 
https://app.travis-ci.com/github/apache/httpd/builds/236201215?utm_medium=notification_source=email

  Restart your build: 
https://app.travis-ci.com/github/apache/httpd/builds/236201215?utm_medium=notification_source=email

--

You can unsubscribe from build emails from the apache/httpd repository going to 
https://app.travis-ci.com/account/preferences/unsubscribe?repository=16806660_medium=notification_source=email.
Or unsubscribe from *all* email updating your settings at 
https://app.travis-ci.com/account/preferences/unsubscribe?utm_medium=notification_source=email.
Or configure specific recipients for build notifications in your .travis.yml 
file. See https://docs.travis-ci.com/user/notifications.



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 17:06 schrieb Yann Ylavic :
> 
> Thanks, looks good, it probably needs a backport to 2.4.x since the
> first travis failure was there (Build #1819).
> Let's see how it works after several runs before maybe.

+1

Let's see what Joe and Travis can do to wreck it.

> On Wed, Aug 25, 2021 at 5:02 PM ste...@eissing.org  wrote:
>> 
>> It should be fixed now with r1892599, I believe.
>> 
>>> Am 25.08.2021 um 13:53 schrieb Yann Ylavic :
>>> 
>>> On Wed, Aug 25, 2021 at 1:14 AM Travis CI  wrote:
 
 apache / httpd
 
 trunk
 
 Build #1831 was broken
 21 mins and 33 secs
 Yann Ylavic
 243c5fa CHANGESET →
 
 mpm_{event,worker,prefork}: late stop of children processes on restart.
>>> 
>>> (unrelated to this change, Build #1819 failed the same earlier).
>>> 
>>> Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
>>> 
>>> It seems that when exiting a stream can be destroyed while in
>>> h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
>>> when the mplx lock is released (thus stream->id faults).
>>> 
>>> Should the caller(s) of mst_check_data_for() pass stream->id (under
>>> the lock) instead?
>>> This would avoid the fault but that's still a potentially destroyed
>>> stream being fifo'ed, though we are exiting so it might not matter..
>>> 
>>> A better fix maybe, Stefan?
>>> 
>>> Cheers;
>>> Yann.
>> 



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread Yann Ylavic
Thanks, looks good, it probably needs a backport to 2.4.x since the
first travis failure was there (Build #1819).
Let's see how it works after several runs before maybe.

On Wed, Aug 25, 2021 at 5:02 PM ste...@eissing.org  wrote:
>
> It should be fixed now with r1892599, I believe.
>
> > Am 25.08.2021 um 13:53 schrieb Yann Ylavic :
> >
> > On Wed, Aug 25, 2021 at 1:14 AM Travis CI  wrote:
> >>
> >> apache / httpd
> >>
> >> trunk
> >>
> >> Build #1831 was broken
> >> 21 mins and 33 secs
> >> Yann Ylavic
> >> 243c5fa CHANGESET →
> >>
> >> mpm_{event,worker,prefork}: late stop of children processes on restart.
> >
> > (unrelated to this change, Build #1819 failed the same earlier).
> >
> > Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
> >
> > It seems that when exiting a stream can be destroyed while in
> >  h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
> > when the mplx lock is released (thus stream->id faults).
> >
> > Should the caller(s) of mst_check_data_for() pass stream->id (under
> > the lock) instead?
> > This would avoid the fault but that's still a potentially destroyed
> > stream being fifo'ed, though we are exiting so it might not matter..
> >
> > A better fix maybe, Stefan?
> >
> > Cheers;
> > Yann.
>


Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org
It should be fixed now with r1892599, I believe.

> Am 25.08.2021 um 13:53 schrieb Yann Ylavic :
> 
> On Wed, Aug 25, 2021 at 1:14 AM Travis CI  wrote:
>> 
>> apache / httpd
>> 
>> trunk
>> 
>> Build #1831 was broken
>> 21 mins and 33 secs
>> Yann Ylavic
>> 243c5fa CHANGESET →
>> 
>> mpm_{event,worker,prefork}: late stop of children processes on restart.
> 
> (unrelated to this change, Build #1819 failed the same earlier).
> 
> Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
> 
> It seems that when exiting a stream can be destroyed while in
>  h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
> when the mplx lock is released (thus stream->id faults).
> 
> Should the caller(s) of mst_check_data_for() pass stream->id (under
> the lock) instead?
> This would avoid the fault but that's still a potentially destroyed
> stream being fifo'ed, though we are exiting so it might not matter..
> 
> A better fix maybe, Stefan?
> 
> Cheers;
> Yann.



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org
Ok, brute-forced it a bit. The test suite now always does
a stop/start cycle on config changes as reload now longer
works reliably. Takes a bit more time, but not overly so
on my machine.

Cheers,
Stefan

> Am 25.08.2021 um 16:22 schrieb ste...@eissing.org:
> 
> 
> 
>> Am 25.08.2021 um 16:16 schrieb Yann Ylavic :
>> 
>> On Wed, Aug 25, 2021 at 4:04 PM Yann Ylavic  wrote:
>>> 
>>> On Wed, Aug 25, 2021 at 3:56 PM ste...@eissing.org  
>>> wrote:
 
 I could add that in the test config as header to a certain location.
 
 However, will all new connections get to new gen children once the
 new number has been spotted?
>>> 
>>> Yes, that's guaranteed by the MPM which increases the gen number just
>>> before creating new children (after having asked the old ones to
>>> stop), no old child will ever be created with a new generation number.
>>> ap_mpm_query(AP_MPMQ_GENERATION, ) within a child will
>>> always return the one that was forked.
>> 
>> Hmm, you asked for a different question actually, the new
>> *connections* done after the gen bump are not guaranteed to be handled
>> by a new gen child, because it may take some (short) time before the
>> old gen listeners stop listening.
>> There is no way to guarantee where connections will go in the first
>> times of a restart I'm afraid.
>> At least your test client can know whether it's handled by an old or
>> new gen and drop the connection in the former case..
> 
> Thanks for confirming what I suspected. Concurrency is a bitch.
> 
> Hmm, hate to add timed waits here, as who knows how Travis will
> work with those. Will experiment a bit.



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 16:16 schrieb Yann Ylavic :
> 
> On Wed, Aug 25, 2021 at 4:04 PM Yann Ylavic  wrote:
>> 
>> On Wed, Aug 25, 2021 at 3:56 PM ste...@eissing.org  
>> wrote:
>>> 
>>> I could add that in the test config as header to a certain location.
>>> 
>>> However, will all new connections get to new gen children once the
>>> new number has been spotted?
>> 
>> Yes, that's guaranteed by the MPM which increases the gen number just
>> before creating new children (after having asked the old ones to
>> stop), no old child will ever be created with a new generation number.
>> ap_mpm_query(AP_MPMQ_GENERATION, ) within a child will
>> always return the one that was forked.
> 
> Hmm, you asked for a different question actually, the new
> *connections* done after the gen bump are not guaranteed to be handled
> by a new gen child, because it may take some (short) time before the
> old gen listeners stop listening.
> There is no way to guarantee where connections will go in the first
> times of a restart I'm afraid.
> At least your test client can know whether it's handled by an old or
> new gen and drop the connection in the former case..

Thanks for confirming what I suspected. Concurrency is a bitch.

Hmm, hate to add timed waits here, as who knows how Travis will
work with those. Will experiment a bit.

Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread Yann Ylavic
On Wed, Aug 25, 2021 at 4:04 PM Yann Ylavic  wrote:
>
> On Wed, Aug 25, 2021 at 3:56 PM ste...@eissing.org  wrote:
> >
> > I could add that in the test config as header to a certain location.
> >
> > However, will all new connections get to new gen children once the
> > new number has been spotted?
>
> Yes, that's guaranteed by the MPM which increases the gen number just
> before creating new children (after having asked the old ones to
> stop), no old child will ever be created with a new generation number.
> ap_mpm_query(AP_MPMQ_GENERATION, ) within a child will
> always return the one that was forked.

Hmm, you asked for a different question actually, the new
*connections* done after the gen bump are not guaranteed to be handled
by a new gen child, because it may take some (short) time before the
old gen listeners stop listening.
There is no way to guarantee where connections will go in the first
times of a restart I'm afraid.
At least your test client can know whether it's handled by an old or
new gen and drop the connection in the former case..


Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 16:04 schrieb Yann Ylavic :
> 
> On Wed, Aug 25, 2021 at 3:56 PM ste...@eissing.org  wrote:
>> 
>> 
>> 
>>> Am 25.08.2021 um 15:54 schrieb Yann Ylavic :
>>> 
>>> On Wed, Aug 25, 2021 at 3:37 PM ste...@eissing.org  
>>> wrote:
 
 Follow up: if I do a "time.sleep(2)" after each "apachectl" in my
 test suite, it works fine.
 
 Speculating: my test suite does commonly a GET after a reload to
 see if the reloaded server is responding. And then fires requests
 against the new configuration.
 
 Now, if the server is always responding, the test sill get - sometimes -
 send request to an old gen child which then answers incorrectly. So
 I see 404 responses where a 200 should be as the config does not match.
 
 In this new world of always responding servers, when is a test suite
 supposed to know that now the server will respond with the newly laoded
 config?
>>> 
>>> Maybe we could have access to the generation number
>>> (AP_MPMQ_GENERATION) in a variable like "MPM_GENERATION" which
>>> mod_headers could use to set in a response header?
>>> Or a test only module which does that specifically could be compiled
>>> by the framework and LoadModule'd?
>> 
>> I could add that in the test config as header to a certain location.
>> 
>> However, will all new connections get to new gen children once the
>> new number has been spotted?
> 
> Yes, that's guaranteed by the MPM which increases the gen number just
> before creating new children (after having asked the old ones to
> stop), no old child will ever be created with a new generation number.
> ap_mpm_query(AP_MPMQ_GENERATION, ) within a child will
> always return the one that was forked.

Thanks, I'll give this a shot. 

Joe: as long as that is not solved, the tests will not be reliable, I'm afraid.

- Stefan

Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread Yann Ylavic
On Wed, Aug 25, 2021 at 3:56 PM ste...@eissing.org  wrote:
>
>
>
> > Am 25.08.2021 um 15:54 schrieb Yann Ylavic :
> >
> > On Wed, Aug 25, 2021 at 3:37 PM ste...@eissing.org  
> > wrote:
> >>
> >> Follow up: if I do a "time.sleep(2)" after each "apachectl" in my
> >> test suite, it works fine.
> >>
> >> Speculating: my test suite does commonly a GET after a reload to
> >> see if the reloaded server is responding. And then fires requests
> >> against the new configuration.
> >>
> >> Now, if the server is always responding, the test sill get - sometimes -
> >> send request to an old gen child which then answers incorrectly. So
> >> I see 404 responses where a 200 should be as the config does not match.
> >>
> >> In this new world of always responding servers, when is a test suite
> >> supposed to know that now the server will respond with the newly laoded
> >> config?
> >
> > Maybe we could have access to the generation number
> > (AP_MPMQ_GENERATION) in a variable like "MPM_GENERATION" which
> > mod_headers could use to set in a response header?
> > Or a test only module which does that specifically could be compiled
> > by the framework and LoadModule'd?
>
> I could add that in the test config as header to a certain location.
>
> However, will all new connections get to new gen children once the
> new number has been spotted?

Yes, that's guaranteed by the MPM which increases the gen number just
before creating new children (after having asked the old ones to
stop), no old child will ever be created with a new generation number.
ap_mpm_query(AP_MPMQ_GENERATION, ) within a child will
always return the one that was forked.


Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 15:54 schrieb Yann Ylavic :
> 
> On Wed, Aug 25, 2021 at 3:37 PM ste...@eissing.org  wrote:
>> 
>> Follow up: if I do a "time.sleep(2)" after each "apachectl" in my
>> test suite, it works fine.
>> 
>> Speculating: my test suite does commonly a GET after a reload to
>> see if the reloaded server is responding. And then fires requests
>> against the new configuration.
>> 
>> Now, if the server is always responding, the test sill get - sometimes -
>> send request to an old gen child which then answers incorrectly. So
>> I see 404 responses where a 200 should be as the config does not match.
>> 
>> In this new world of always responding servers, when is a test suite
>> supposed to know that now the server will respond with the newly laoded
>> config?
> 
> Maybe we could have access to the generation number
> (AP_MPMQ_GENERATION) in a variable like "MPM_GENERATION" which
> mod_headers could use to set in a response header?
> Or a test only module which does that specifically could be compiled
> by the framework and LoadModule'd?

I could add that in the test config as header to a certain location.

However, will all new connections get to new gen children once the
new number has been spotted?

> 
> Cheers;
> Yann.



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread Yann Ylavic
On Wed, Aug 25, 2021 at 3:37 PM ste...@eissing.org  wrote:
>
> Follow up: if I do a "time.sleep(2)" after each "apachectl" in my
> test suite, it works fine.
>
> Speculating: my test suite does commonly a GET after a reload to
> see if the reloaded server is responding. And then fires requests
> against the new configuration.
>
> Now, if the server is always responding, the test sill get - sometimes -
> send request to an old gen child which then answers incorrectly. So
> I see 404 responses where a 200 should be as the config does not match.
>
> In this new world of always responding servers, when is a test suite
> supposed to know that now the server will respond with the newly laoded
> config?

Maybe we could have access to the generation number
(AP_MPMQ_GENERATION) in a variable like "MPM_GENERATION" which
mod_headers could use to set in a response header?
Or a test only module which does that specifically could be compiled
by the framework and LoadModule'd?

Cheers;
Yann.


Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 15:15 schrieb ste...@eissing.org:
> 
> 
> 
>> Am 25.08.2021 um 15:06 schrieb ste...@eissing.org:
>> 
>> 
>> 
>>> Am 25.08.2021 um 13:53 schrieb Yann Ylavic :
>>> 
>>> On Wed, Aug 25, 2021 at 1:14 AM Travis CI  wrote:
 
 apache / httpd
 
 trunk
 
 Build #1831 was broken
 21 mins and 33 secs
 Yann Ylavic
 243c5fa CHANGESET →
 
 mpm_{event,worker,prefork}: late stop of children processes on restart.
>>> 
>>> (unrelated to this change, Build #1819 failed the same earlier).
>>> 
>>> Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
>>> 
>>> It seems that when exiting a stream can be destroyed while in
>>> h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
>>> when the mplx lock is released (thus stream->id faults).
>>> 
>>> Should the caller(s) of mst_check_data_for() pass stream->id (under
>>> the lock) instead?
>>> This would avoid the fault but that's still a potentially destroyed
>>> stream being fifo'ed, though we are exiting so it might not matter..
>>> 
>>> A better fix maybe, Stefan?
>> 
>> I think the "stream->id" needs to be save before giving up the lock. 
>> I'll make the change, but currently my test suite throws several 
>> other errors in trunk. Something changed...
>> 
>> Btw. I am working on a real fix of this messy code in http2, using 
>> a apr_pollset for monitoring connections and ongoing requests. Also
>> cleaning up other parts of the code, reducing complexity.
>> 
>> 
>> 
>> I'll see what version of trunk still runs my tests and report back.
> 
> 
> Ok, r1892586 still runs the http2 tests nicely. Seems that r1892587 
> broke things. Note that the http2 test suite does a lot of server
> reloads.

Follow up: if I do a "time.sleep(2)" after each "apachectl" in my 
test suite, it works fine.

Speculating: my test suite does commonly a GET after a reload to 
see if the reloaded server is responding. And then fires requests
against the new configuration.

Now, if the server is always responding, the test sill get - sometimes -
send request to an old gen child which then answers incorrectly. So
I see 404 responses where a 200 should be as the config does not match.

In this new world of always responding servers, when is a test suite
supposed to know that now the server will respond with the newly laoded
config?

- Stefan

> 
> Stefan
> 
>> 
>> - Stefan
>> 
>>> 
>>> Cheers;
>>> Yann.



Re: http2 test suite

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 15:30 schrieb Joe Orton :
> 
> On Mon, Aug 23, 2021 at 03:03:34PM +0200, ste...@eissing.org wrote:
>> 
>> 
>>> Am 23.08.2021 um 14:40 schrieb Joe Orton :
>>> 
>>> On Mon, Aug 23, 2021 at 12:59:19PM +0200, ste...@eissing.org wrote:
 The http2 test suites in https://github.com/icing/mod_h2 and svn trunk are 
 now nearly identical. I made some small adjustment and also have them
 running in debian sid and archlinux docker images (make docker-test in
 the github repro).
>>> 
>>> It works on my machine with r1892545, which I hope does not break things 
>>> on your machine(s)?
>> 
>> Nice! Don't see a problem with that change.
>> 
>>> Started an attempt at getting Travis working here: 
>>> https://github.com/apache/httpd/pull/262
>>> 
>>> ... but it's quite likely to need some tweaks to the installed package 
>>> set to get working at minimum.
>> 
>> 
>> In my Dockerfiles on https://github.com/icing/mod_h2/tree/master/docker I
>> install everything (plus the apach2 package). Maybe that helps.
> 
> I'm not sure whether all the packages in that are really needed, e.g. 
> tqdm and pycurl?
> 

No, I use tqdm in my load scenarios, not needed to the test suite itself.

> This is as far as I got so far in Travis: 
> 
> https://app.travis-ci.com/github/apache/httpd/builds/236173863#L2985

This looks like the recent "reload" optimization, as I see similar failures now 
on my machine.

> Any idea what causes those failures?
> 
> There is just one change to test/modules/http2 which I haven't pushed to 
> trunk, is this OK?
> 
> https://github.com/notroj/httpd/commit/72a037003cae0c8c634313d0707311222fe237e0

Looks fine.

> 
> Regards, Joe
> 
> 
>> 
>> - Stefan
>> 
>>> 
>>> Regards, Joe
>>> 
>>> 
 
 Cheers, Stefan
 
> Am 20.08.2021 um 18:08 schrieb ste...@eissing.org:
> 
> Done in r1892476. Looking forward on how this works on your machines.
> 
> - Stefan
> 
>> Am 20.08.2021 um 13:50 schrieb ste...@eissing.org:
>> 
>> 
>> 
>>> Am 20.08.2021 um 13:46 schrieb Joe Orton :
>>> 
>>> On Fri, Aug 20, 2021 at 11:35:45AM +0200, Stefan Eissing wrote:
 https://github.com/apache/httpd/pull/260 
 a PR with the http2 test suite in trunk/test/modules/http2
 
 How to use:
 
 • run configure again after you checked this out
 • the following components need to be installed on your system:
 • python3, pytest
 • curl, nghttp, h2load
 
 run the tests:
> make install
> cd test
> pytest
>>> 
>>> Awesome!
>> 
>> Thanks!
>> 
 This starts the installed httpd on local ports 
 40001 + 40002, runs the test suite and tears it down again. 
 To run individual test cases, use
 
> cd test/modules/http2
> pytest -k test_004   # run all tests in test_004_post.py
> pytest -k test_004_07# run test 07 in test_004_post.py
 
 Next would be the definition to run this in a Docker file via 
 Travis. If someone familiar with that setup could help me to 
 start this?
>>> 
>>> Does this need to run in Docker, or can it run directly in a Linux vm?  
>> 
>> For me, it runs directly on MacOS and I believe it should also on any 
>> linux
>> with the prerequisites installed.
>> 
>>> It would make sense to me to add separate job in Travis for this which 
>>> has the right Debian packages installed, and adjust test/travis*.sh to 
>>> make run the tests in a similar way to how TEST_SSL etc work.
>> 
>> That is probably a good start. When this works reliably, we may add
>> a flag to the common linux script to run it. The additional packages
>> should not be really a burden for docker, I believe.
>> 
 PS. I made a PR to not disturb our existing travis setup, but if
 trunk is the better place to refine this, just say so.
>>> 
>>> FWIW I'd say this is mostly personal preference, unless you expect to 
>>> break trunk and hold up others working there, it's always fine to work 
>>> on trunk. You get the cost/benefit of Travis for your changes either 
>>> way.
>> 
>> Ok, will merge it to trunk later today or quite soon.
>> 
>>> 
>>> Regards, Joe



Re: http2 test suite

2021-08-25 Thread Joe Orton
On Mon, Aug 23, 2021 at 03:03:34PM +0200, ste...@eissing.org wrote:
> 
> 
> > Am 23.08.2021 um 14:40 schrieb Joe Orton :
> > 
> > On Mon, Aug 23, 2021 at 12:59:19PM +0200, ste...@eissing.org wrote:
> >> The http2 test suites in https://github.com/icing/mod_h2 and svn trunk are 
> >> now nearly identical. I made some small adjustment and also have them
> >> running in debian sid and archlinux docker images (make docker-test in
> >> the github repro).
> > 
> > It works on my machine with r1892545, which I hope does not break things 
> > on your machine(s)?
> 
> Nice! Don't see a problem with that change.
> 
> > Started an attempt at getting Travis working here: 
> > https://github.com/apache/httpd/pull/262
> > 
> > ... but it's quite likely to need some tweaks to the installed package 
> > set to get working at minimum.
> 
> 
> In my Dockerfiles on https://github.com/icing/mod_h2/tree/master/docker I
> install everything (plus the apach2 package). Maybe that helps.

I'm not sure whether all the packages in that are really needed, e.g. 
tqdm and pycurl?

This is as far as I got so far in Travis: 

https://app.travis-ci.com/github/apache/httpd/builds/236173863#L2985

Any idea what causes those failures?

There is just one change to test/modules/http2 which I haven't pushed to 
trunk, is this OK?

https://github.com/notroj/httpd/commit/72a037003cae0c8c634313d0707311222fe237e0

Regards, Joe


> 
> - Stefan
> 
> > 
> > Regards, Joe
> > 
> > 
> >> 
> >> Cheers, Stefan
> >> 
> >>> Am 20.08.2021 um 18:08 schrieb ste...@eissing.org:
> >>> 
> >>> Done in r1892476. Looking forward on how this works on your machines.
> >>> 
> >>> - Stefan
> >>> 
>  Am 20.08.2021 um 13:50 schrieb ste...@eissing.org:
>  
>  
>  
> > Am 20.08.2021 um 13:46 schrieb Joe Orton :
> > 
> > On Fri, Aug 20, 2021 at 11:35:45AM +0200, Stefan Eissing wrote:
> >> https://github.com/apache/httpd/pull/260 
> >> a PR with the http2 test suite in trunk/test/modules/http2
> >> 
> >> How to use:
> >> 
> >> • run configure again after you checked this out
> >> • the following components need to be installed on your system:
> >> • python3, pytest
> >> • curl, nghttp, h2load
> >> 
> >> run the tests:
> >>> make install
> >>> cd test
> >>> pytest
> > 
> > Awesome!
>  
>  Thanks!
>  
> >> This starts the installed httpd on local ports 
> >> 40001 + 40002, runs the test suite and tears it down again. 
> >> To run individual test cases, use
> >> 
> >>> cd test/modules/http2
> >>> pytest -k test_004   # run all tests in test_004_post.py
> >>> pytest -k test_004_07# run test 07 in test_004_post.py
> >> 
> >> Next would be the definition to run this in a Docker file via 
> >> Travis. If someone familiar with that setup could help me to 
> >> start this?
> > 
> > Does this need to run in Docker, or can it run directly in a Linux vm?  
>  
>  For me, it runs directly on MacOS and I believe it should also on any 
>  linux
>  with the prerequisites installed.
>  
> > It would make sense to me to add separate job in Travis for this which 
> > has the right Debian packages installed, and adjust test/travis*.sh to 
> > make run the tests in a similar way to how TEST_SSL etc work.
>  
>  That is probably a good start. When this works reliably, we may add
>  a flag to the common linux script to run it. The additional packages
>  should not be really a burden for docker, I believe.
>  
> >> PS. I made a PR to not disturb our existing travis setup, but if
> >> trunk is the better place to refine this, just say so.
> > 
> > FWIW I'd say this is mostly personal preference, unless you expect to 
> > break trunk and hold up others working there, it's always fine to work 
> > on trunk. You get the cost/benefit of Travis for your changes either 
> > way.
>  
>  Ok, will merge it to trunk later today or quite soon.
>  
> > 
> > Regards, Joe
> >>> 
> >> 
> > 
> 



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 15:06 schrieb ste...@eissing.org:
> 
> 
> 
>> Am 25.08.2021 um 13:53 schrieb Yann Ylavic :
>> 
>> On Wed, Aug 25, 2021 at 1:14 AM Travis CI  wrote:
>>> 
>>> apache / httpd
>>> 
>>> trunk
>>> 
>>> Build #1831 was broken
>>> 21 mins and 33 secs
>>> Yann Ylavic
>>> 243c5fa CHANGESET →
>>> 
>>> mpm_{event,worker,prefork}: late stop of children processes on restart.
>> 
>> (unrelated to this change, Build #1819 failed the same earlier).
>> 
>> Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
>> 
>> It seems that when exiting a stream can be destroyed while in
>> h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
>> when the mplx lock is released (thus stream->id faults).
>> 
>> Should the caller(s) of mst_check_data_for() pass stream->id (under
>> the lock) instead?
>> This would avoid the fault but that's still a potentially destroyed
>> stream being fifo'ed, though we are exiting so it might not matter..
>> 
>> A better fix maybe, Stefan?
> 
> I think the "stream->id" needs to be save before giving up the lock. 
> I'll make the change, but currently my test suite throws several 
> other errors in trunk. Something changed...
> 
> Btw. I am working on a real fix of this messy code in http2, using 
> a apr_pollset for monitoring connections and ongoing requests. Also
> cleaning up other parts of the code, reducing complexity.
> 
> 
> 
> I'll see what version of trunk still runs my tests and report back.


Ok, r1892586 still runs the http2 tests nicely. Seems that r1892587 
broke things. Note that the http2 test suite does a lot of server
reloads.

Stefan

> 
> - Stefan
> 
>> 
>> Cheers;
>> Yann.



Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread ste...@eissing.org



> Am 25.08.2021 um 13:53 schrieb Yann Ylavic :
> 
> On Wed, Aug 25, 2021 at 1:14 AM Travis CI  wrote:
>> 
>> apache / httpd
>> 
>> trunk
>> 
>> Build #1831 was broken
>> 21 mins and 33 secs
>> Yann Ylavic
>> 243c5fa CHANGESET →
>> 
>> mpm_{event,worker,prefork}: late stop of children processes on restart.
> 
> (unrelated to this change, Build #1819 failed the same earlier).
> 
> Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103
> 
> It seems that when exiting a stream can be destroyed while in
>  h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
> when the mplx lock is released (thus stream->id faults).
> 
> Should the caller(s) of mst_check_data_for() pass stream->id (under
> the lock) instead?
> This would avoid the fault but that's still a potentially destroyed
> stream being fifo'ed, though we are exiting so it might not matter..
> 
> A better fix maybe, Stefan?

I think the "stream->id" needs to be save before giving up the lock. 
I'll make the change, but currently my test suite throws several 
other errors in trunk. Something changed...

Btw. I am working on a real fix of this messy code in http2, using 
a apr_pollset for monitoring connections and ongoing requests. Also
cleaning up other parts of the code, reducing complexity.



I'll see what version of trunk still runs my tests and report back.

- Stefan

> 
> Cheers;
> Yann.



Re: Late(r) stop of children processes on restart

2021-08-25 Thread Rainer Jung

Thanks for the headroom explanation Yann, good reading!

Rainer

Am 25.08.2021 um 13:23 schrieb Yann Ylavic:

On Tue, Jun 29, 2021 at 3:00 PM Rainer Jung  wrote:


Am 29.06.2021 um 14:31 schrieb Stefan Eissing:

Can comment really on the diff, but totally agree on the goal to minimize the 
unresponsive time and make graceful less disruptive.

So +1 for that.


+1 on the intention as well.


Checked in trunk (r1892587 + r1892595).



Not sure, whether that means people would need more headroom in the
scoreboard (which would probably warrant a sentence in CHANGES or docs
about that) or whether it just means the duration during which that
headroom is used changes (which I wouldn't care about).


The restart delay between stop and start is now minimal (no reload in
between), but the headroom needed does not change AIUI.
We still have the situation where connections (worker threads) are
active for both the new and old generations of children processes, and
its duration depends mainly on the actual lifetime of the connections.
So the current tunings still hold I think.

What changes now is that for both graceful and ungraceful restarts the
main process fully consumes one CPU (to reload) while children are
actively running (the old generation keeps accepting/processing
connections during reload), whereas before the children were tearing
down thus easing the CPUs (but filling the sockets backlogs,
potentially until exhaustion..).
So there might be a greater load spike (overall) than before on reload.

A note on the headroom while at it:
mpm_event is possibly less consumer of children (hence scoreboard
slots) on restart, because when a child is dying it stops (and thus
doesn't account for) the worker threads above the remaining number of
connections, which will accurately create children of the new
generation to scale. mpm_worker never stops threads (this improvement
never made it there AFAICT), thus by accounting for inactive threads
as active it will finally create more children of the new generation
as connections arrive (eventually reaching the limits earlier, or
blocking/waiting for worker threads in the new generation of children
overflowed by incoming connections which the main process thinks are
evenly distributed across all the children, including old
generation's).
I don't know how hard/worthy it is to align mpm_worker with mpm_event
on this, just a note..


Cheers;
Yann.


Re: Broken: apache/httpd#1831 (trunk - 243c5fa)

2021-08-25 Thread Yann Ylavic
On Wed, Aug 25, 2021 at 1:14 AM Travis CI  wrote:
>
> apache / httpd
>
> trunk
>
> Build #1831 was broken
> 21 mins and 33 secs
> Yann Ylavic
> 243c5fa CHANGESET →
>
> mpm_{event,worker,prefork}: late stop of children processes on restart.

(unrelated to this change, Build #1819 failed the same earlier).

Here: https://app.travis-ci.com/github/apache/httpd/jobs/533578536#L4103

It seems that when exiting a stream can be destroyed while in
  h2_mplx_s_task_done::s_task_done()::mst_check_data_for()
when the mplx lock is released (thus stream->id faults).

Should the caller(s) of mst_check_data_for() pass stream->id (under
the lock) instead?
This would avoid the fault but that's still a potentially destroyed
stream being fifo'ed, though we are exiting so it might not matter..

A better fix maybe, Stefan?

Cheers;
Yann.


Re: Late(r) stop of children processes on restart

2021-08-25 Thread Yann Ylavic
On Tue, Jun 29, 2021 at 3:00 PM Rainer Jung  wrote:
>
> Am 29.06.2021 um 14:31 schrieb Stefan Eissing:
> > Can comment really on the diff, but totally agree on the goal to minimize 
> > the unresponsive time and make graceful less disruptive.
> >
> > So +1 for that.
>
> +1 on the intention as well.

Checked in trunk (r1892587 + r1892595).

>
> Not sure, whether that means people would need more headroom in the
> scoreboard (which would probably warrant a sentence in CHANGES or docs
> about that) or whether it just means the duration during which that
> headroom is used changes (which I wouldn't care about).

The restart delay between stop and start is now minimal (no reload in
between), but the headroom needed does not change AIUI.
We still have the situation where connections (worker threads) are
active for both the new and old generations of children processes, and
its duration depends mainly on the actual lifetime of the connections.
So the current tunings still hold I think.

What changes now is that for both graceful and ungraceful restarts the
main process fully consumes one CPU (to reload) while children are
actively running (the old generation keeps accepting/processing
connections during reload), whereas before the children were tearing
down thus easing the CPUs (but filling the sockets backlogs,
potentially until exhaustion..).
So there might be a greater load spike (overall) than before on reload.

A note on the headroom while at it:
mpm_event is possibly less consumer of children (hence scoreboard
slots) on restart, because when a child is dying it stops (and thus
doesn't account for) the worker threads above the remaining number of
connections, which will accurately create children of the new
generation to scale. mpm_worker never stops threads (this improvement
never made it there AFAICT), thus by accounting for inactive threads
as active it will finally create more children of the new generation
as connections arrive (eventually reaching the limits earlier, or
blocking/waiting for worker threads in the new generation of children
overflowed by incoming connections which the main process thinks are
evenly distributed across all the children, including old
generation's).
I don't know how hard/worthy it is to align mpm_worker with mpm_event
on this, just a note..


Cheers;
Yann.


Fixed: apache/httpd#1839 (trunk - f390a45)

2021-08-25 Thread Travis CI
Build Update for apache/httpd
-

Build: #1839
Status: Fixed

Duration: 20 mins and 38 secs
Commit: f390a45 (trunk)
Author: Yann Ylavic
Message: mpm_{event,worker,prefork}: follow up to r1892587: restore ungraceful 
on MPM change.

While at it add a comment for the rationale, including for MPM motorz.


git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1892595 
13f79535-47bb-0310-9956-ffa450edef68

View the changeset: 
https://github.com/apache/httpd/compare/243c5fad0a8d...f390a459161e

View the full build log and details: 
https://app.travis-ci.com/github/apache/httpd/builds/236173878?utm_medium=notification_source=email


--

You can unsubscribe from build emails from the apache/httpd repository going to 
https://app.travis-ci.com/account/preferences/unsubscribe?repository=16806660_medium=notification_source=email.
Or unsubscribe from *all* email updating your settings at 
https://app.travis-ci.com/account/preferences/unsubscribe?utm_medium=notification_source=email.
Or configure specific recipients for build notifications in your .travis.yml 
file. See https://docs.travis-ci.com/user/notifications.



Re: svn commit: r1892587 - in /httpd/httpd/trunk: os/unix/unixd.c server/listen.c server/mpm/event/event.c server/mpm/prefork/prefork.c server/mpm/worker/worker.c

2021-08-25 Thread Yann Ylavic
On Wed, Aug 25, 2021 at 11:26 AM Yann Ylavic  wrote:
>
> On Wed, Aug 25, 2021 at 9:53 AM Ruediger Pluem  wrote:
> >
> > Why is this no longer needed?
>
> Thanks for noticing, I'll revert.

r1892595.

>
>
> Regards;
> Yann.


Re: svn commit: r1892587 - in /httpd/httpd/trunk: os/unix/unixd.c server/listen.c server/mpm/event/event.c server/mpm/prefork/prefork.c server/mpm/worker/worker.c

2021-08-25 Thread Yann Ylavic
On Wed, Aug 25, 2021 at 9:53 AM Ruediger Pluem  wrote:
>
> On 8/25/21 12:22 AM, yla...@apache.org wrote:
>
> > @@ -3753,16 +3761,13 @@ static int event_pre_config(apr_pool_t *
> >  if (!retained) {
> >  retained = ap_retained_data_create(userdata_key, 
> > sizeof(*retained));
> >  retained->mpm = ap_unixd_mpm_get_retained_data();
> > +retained->mpm->baton = retained;
> >  retained->max_daemons_limit = -1;
> >  if (retained->mpm->module_loads) {
> >  test_atomics = 1;
> >  }
> >  }
> >  retained->mpm->mpm_state = AP_MPMQ_STARTING;
> > -if (retained->mpm->baton != retained) {
> > -retained->mpm->was_graceful = 0;
>
> Why is this no longer needed?

Argh, forgot to revert this change, first because it's unrelated and
second because it's bad.
I first thought that the ->baton never changes (forgot why I did that
in the first place) but if a different MPM is LoadModule'd during
restart it does, and we want to be ungraceful..
Thanks for noticing, I'll revert.


Regards;
Yann.


Re: svn commit: r1892587 - in /httpd/httpd/trunk: os/unix/unixd.c server/listen.c server/mpm/event/event.c server/mpm/prefork/prefork.c server/mpm/worker/worker.c

2021-08-25 Thread Ruediger Pluem



On 8/25/21 12:22 AM, yla...@apache.org wrote:
> Author: ylavic
> Date: Tue Aug 24 22:22:40 2021
> New Revision: 1892587
> 
> URL: http://svn.apache.org/viewvc?rev=1892587=rev
> Log:
> mpm_{event,worker,prefork}: late stop of children processes on restart.
> 
> Change how the main process handles restarts, from:
> 0. 
> 1. stop old generation of children processes (graceful or not)
> 2. reload new configuration
> 3. start new generation of children processes
> to:
> 0. 
> 1. reload new configuration
> 2. stop old generation of children processes (graceful or not)
> 3. start new generation of children processes
> 
> The delay between stop and start is now very short and does not depend on the
> reload time (which can be quite long with many vhosts and/or complex setups
> with regexps or whatever third party components to compile).
> 
> Also, while reloading, the old generation of children processes keeps 
> accepting
> and handling incoming connections until the new generation is up to take over.
> 
> * os/unix/unixd.c (sig_term, sig_restart):
>   Set AP_MPMQ_STOPPING only once.
> 
> * server/listen.c (ap_duplicate_listeners):
>   Use ap_log_error() the main server instead of ap_log_perror().
> 
> * server/mpm/{event,worker,prefork}/{event,worker,prefork}.c
> ({event,worker,prefork}_retained_data):
>   Save the generation pool pointer (gen_pool) and all the buckets here, they
>   won't be cleared before the reload like pconf so they need a persitent
>   storage accross restarts (i.e. retained->gen_pool).
> 
> * server/mpm/{event,worker,prefork}/{event,worker,prefork}.c
> (perform_idle_server_maintenance, child_main, make_child):
>   Change usage of all_buckets (previously with global/static scope) to the new
>   retained->buckets array.
> 
> 
> Modified:
> httpd/httpd/trunk/os/unix/unixd.c
> httpd/httpd/trunk/server/listen.c
> httpd/httpd/trunk/server/mpm/event/event.c
> httpd/httpd/trunk/server/mpm/prefork/prefork.c
> httpd/httpd/trunk/server/mpm/worker/worker.c
> 

> Modified: httpd/httpd/trunk/server/mpm/event/event.c
> URL: 
> http://svn.apache.org/viewvc/httpd/httpd/trunk/server/mpm/event/event.c?rev=1892587=1892586=1892587=diff
> ==
> --- httpd/httpd/trunk/server/mpm/event/event.c (original)
> +++ httpd/httpd/trunk/server/mpm/event/event.c Tue Aug 24 22:22:40 2021

> @@ -3753,16 +3761,13 @@ static int event_pre_config(apr_pool_t *
>  if (!retained) {
>  retained = ap_retained_data_create(userdata_key, sizeof(*retained));
>  retained->mpm = ap_unixd_mpm_get_retained_data();
> +retained->mpm->baton = retained;
>  retained->max_daemons_limit = -1;
>  if (retained->mpm->module_loads) {
>  test_atomics = 1;
>  }
>  }
>  retained->mpm->mpm_state = AP_MPMQ_STARTING;
> -if (retained->mpm->baton != retained) {
> -retained->mpm->was_graceful = 0;

Why is this no longer needed?

> -retained->mpm->baton = retained;
> -}
>  ++retained->mpm->module_loads;
>  
>  /* test once for correct operation of fdqueue */
> 

> Modified: httpd/httpd/trunk/server/mpm/worker/worker.c
> URL: 
> http://svn.apache.org/viewvc/httpd/httpd/trunk/server/mpm/worker/worker.c?rev=1892587=1892586=1892587=diff
> ==
> --- httpd/httpd/trunk/server/mpm/worker/worker.c (original)
> +++ httpd/httpd/trunk/server/mpm/worker/worker.c Tue Aug 24 22:22:40 2021

> @@ -2024,13 +2033,10 @@ static int worker_pre_config(apr_pool_t
>  if (!retained) {
>  retained = ap_retained_data_create(userdata_key, sizeof(*retained));
>  retained->mpm = ap_unixd_mpm_get_retained_data();
> +retained->mpm->baton = retained;
>  retained->max_daemons_limit = -1;
>  }
>  retained->mpm->mpm_state = AP_MPMQ_STARTING;
> -if (retained->mpm->baton != retained) {
> -retained->mpm->was_graceful = 0;

Why is this no longer needed?

> -retained->mpm->baton = retained;
> -}
>  ++retained->mpm->module_loads;
>  
>  /* sigh, want this only the second time around */
> 
> 
> 

Regards

Rüdiger