Re: svn commit: r1897458 - in /httpd/httpd/trunk: changes-entries/ab-ssl-sense-fix.txt support/ab.c

2022-01-26 Thread Ruediger Pluem



On 1/25/22 4:54 PM, minf...@apache.org wrote:
> Author: minfrin
> Date: Tue Jan 25 15:54:22 2022
> New Revision: 1897458
> 
> URL: http://svn.apache.org/viewvc?rev=1897458&view=rev
> Log:
> ab: Respond appropriately to SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE.
> Previously the correct event was polled for, but the response to the poll
> would call write instead of read, and read instead of write. PR 55952
> 
> Added:
> httpd/httpd/trunk/changes-entries/ab-ssl-sense-fix.txt
> Modified:
> httpd/httpd/trunk/support/ab.c
> 

> Modified: httpd/httpd/trunk/support/ab.c
> URL: 
> http://svn.apache.org/viewvc/httpd/httpd/trunk/support/ab.c?rev=1897458&r1=1897457&r2=1897458&view=diff
> ==
> --- httpd/httpd/trunk/support/ab.c (original)
> +++ httpd/httpd/trunk/support/ab.c Tue Jan 25 15:54:22 2022

> @@ -810,9 +811,6 @@ static void ssl_proceed_handshake(struct
>  
>  static void write_request(struct connection * c)
>  {
> -if (started >= requests) {
> -return;
> -}

Why is this no longer needed?

>  
>  do {
>  apr_time_t tnow;

> @@ -1461,7 +1465,6 @@ static void start_connect(struct connect
>  }
>  
>  /* connected first time */
> -set_conn_state(c, STATE_CONNECTED);

Why don't we set the state to connected any longer?

>  #ifdef USE_SSL
>  if (c->ssl) {
>  ssl_proceed_handshake(c);

> @@ -1786,7 +1799,7 @@ read_more:
>  c->read = c->bread = 0;
>  /* zero connect time with keep-alive */
>  c->start = c->connect = lasttime = apr_time_now();
> -set_conn_state(c, STATE_CONNECTED);

Why don't we set the state to connected any longer?

> +
>  write_request(c);
>  }
>  }

> @@ -2048,7 +2077,7 @@ static void test(void)
>  continue;
>  }
>  else {
> -set_conn_state(c, STATE_CONNECTED);

Why don't we set the state to connected any longer?

> +
>  #ifdef USE_SSL
>  if (c->ssl)
>  ssl_proceed_handshake(c);


Regards

Rüdiger


Re: mpm event assertion failures

2022-01-26 Thread Yann Ylavic
On Wed, Jan 26, 2022 at 5:07 PM Graham Leggett  wrote:
>
> We need to clarify expectations at this point.
>
> The trunk of httpd has a policy called “commit then review” (CTR), meaning 
> that changes are applied first, and then review is done to see what the 
> ramifications of those changes are. Some changes are at a high level and very 
> well contained, some changes such as this one are at a very low level and 
> affect the whole server. Obviously there is an expectation that one must 
> think it works before committing, but none of our contributors have access to 
> even a fraction of the number of platforms that httpd runs on, and so we must 
> rely on both our CI and the review of others (thus the “then review”) to show 
> us where things have gone wrong. Our CI is a tool to tell us what potentially 
> has gone wrong across a wide set of scenarios, far beyond the capability of 
> what a single person has access to.

I agree with all of the above, there is nothing wrong with the way you did it.
Maybe now that we have a better ci that runs on each branch, a github
PR could be created first to see/test the results before checking in
to trunk (this was not an option a few years ago)?
This still allows for review and/or help if something goes wrong (by
asking on dev@ if needed), while others don't have to wait for trunk
to work for their own changes.

>
> The next issue is “Analyze what you broke”.
>
> I have been working on this code morning, day and night for many days now. A 
> lot of time was spent chasing what I thought was an infinite loop complaining 
> about EOF, but actually was a harmless error that should have been a debug 
> triggered every time the client disconnected. Then more time was spent trying 
> to get to the bottom of why the timeouts weren’t working, only to discover 
> that the timeouts weren’t implemented. The accusation that I have been 
> careless is highly offensive.
>
> In open source we don’t bark accusations at each other, particularly when 
> noone has seen just how much time and effort has gone into this. I have been 
> working on this code for 25 years and am not afraid to call this out, but new 
> people to open source or new to this project are going to be intimidated and 
> leave. This must not happen.
>
> Please be mindful of others working on this project, and be helpful where 
> possible.

I'm really sorry if you feel like this from what I could have said, I
very much value your contributions and efforts on httpd/apr or other
oss projects.

I had the impression from your previous message(s) that you implied
that your changes were working correctly and the issues needed now to
be addressed by those who created (or last touched at) the code that's
now exercised differently and does not work as expected (according to
you), that's why my last message(s) could have sounded a bit harsh
possibly.
It's likely a misinterpretation on my side, sorry for that again, but
communicating is not always easy (more so in a non-native language
possibly) and your responses are sometimes frustrating..


Regards;
Yann.


Re: mpm event assertion failures

2022-01-26 Thread Graham Leggett
On 26 Jan 2022, at 13:49, Stefan Eissing  wrote:

> Guys, we have changes in a central part of the server and our CI fails.=20=
> 
> It is not good enough to give other people unspecific homework to fix =
> it.=20
> 
> Analyze what you broke and if we can help, we'll happily do that. But
> you need to give some more details here.

We need to clarify expectations at this point.

The trunk of httpd has a policy called “commit then review” (CTR), meaning that 
changes are applied first, and then review is done to see what the 
ramifications of those changes are. Some changes are at a high level and very 
well contained, some changes such as this one are at a very low level and 
affect the whole server. Obviously there is an expectation that one must think 
it works before committing, but none of our contributors have access to even a 
fraction of the number of platforms that httpd runs on, and so we must rely on 
both our CI and the review of others (thus the “then review”) to show us where 
things have gone wrong. Our CI is a tool to tell us what potentially has gone 
wrong across a wide set of scenarios, far beyond the capability of what a 
single person has access to.

The next issue is “Analyze what you broke”.

I have been working on this code morning, day and night for many days now. A 
lot of time was spent chasing what I thought was an infinite loop complaining 
about EOF, but actually was a harmless error that should have been a debug 
triggered every time the client disconnected. Then more time was spent trying 
to get to the bottom of why the timeouts weren’t working, only to discover that 
the timeouts weren’t implemented. The accusation that I have been careless is 
highly offensive.

In open source we don’t bark accusations at each other, particularly when noone 
has seen just how much time and effort has gone into this. I have been working 
on this code for 25 years and am not afraid to call this out, but new people to 
open source or new to this project are going to be intimidated and leave. This 
must not happen.

Please be mindful of others working on this project, and be helpful where 
possible.

Regards,
Graham
—



Re: mpm event assertion failures

2022-01-26 Thread Yann Ylavic
Stefan,

On Tue, Jan 25, 2022 at 1:12 PM Stefan Eissing  wrote:
>
> Also, while running the http2 test suite, I get repeated assert failures in 
> event.c:1230
>
> if (rv != APR_SUCCESS && !APR_STATUS_IS_EEXIST(rv)) {
> ->  AP_DEBUG_ASSERT(0);
> TO_QUEUE_REMOVE(cs->sc->wc_q, cs);
> apr_thread_mutex_unlock(timeout_mutex);
> ap_log_error(APLOG_MARK, APLOG_ERR, rv, ap_server_conf, APLOGNO(03465)
>  "process_socket: apr_pollset_add failure for "
>  "write completion");
> close_connection(cs);
> signal_threads(ST_GRACEFUL);
> }

This EBADF(?) in mpm_event seems to imply an APR_POLL_FILE type, which
possibly comes from the mplx's pipes?
They should not end up in the MPM pollset AIUI but I'm wondering if it
could be a synchronization / corruption issue.
The mplx pollset is not created with APR_POLLSET_THREADSAFE, can the
calls to apr_pollset_add/remove/poll() be concurrent there?


Cheers;
Yann.


Re: Fwd: FYI: change to travis-ci emailer could cause moderation headaches

2022-01-26 Thread Joe Orton
On Wed, Jan 26, 2022 at 08:58:14AM -0500, Eric Covener wrote:
> I noticed I stopped getting "Travis CI" emails for httpd around 10/21.
> But I see people still discussing CI failures, so I am a little
> confused. Maybe they are only seeing it in the context of PRs.
> 
> Did we lose notifications to dev@ as a result of the below? Or something else?

Yup, it's apparently exactly that.  I filed: 
https://issues.apache.org/jira/browse/INFRA-22619 a while ago

Does anyone know anybody at Travis so we could escalate this?  It seems 
like having them configure the envelope sender properly would be the 
best fix, or at least, easiest for us.

Regards, Joe



Re: FYI: change to travis-ci emailer could cause moderation headaches

2022-01-26 Thread Yann Ylavic
On Wed, Jan 26, 2022 at 2:58 PM Eric Covener  wrote:
>
> I noticed I stopped getting "Travis CI" emails for httpd around 10/21.
> But I see people still discussing CI failures, so I am a little
> confused. Maybe they are only seeing it in the context of PRs.

I'm not receiving them anymore either (not sent to dev@ or IRC as before).
So I'm looking at https://app.travis-ci.com/github/apache/httpd/builds
manually for the discussions..

>
> Did we lose notifications to dev@ as a result of the below? Or something else?

Dunno, but it would be really nice to have them back..


Fwd: FYI: change to travis-ci emailer could cause moderation headaches

2022-01-26 Thread Eric Covener
I noticed I stopped getting "Travis CI" emails for httpd around 10/21.
But I see people still discussing CI failures, so I am a little
confused. Maybe they are only seeing it in the context of PRs.

Did we lose notifications to dev@ as a result of the below? Or something else?

-- Forwarded message -
From: sebb 
Date: Thu, Oct 28, 2021 at 11:08 AM
Subject: FYI: change to travis-ci emailer could cause moderation headaches
To: Users 


It looks like mails from bui...@travis-ci.com are now using a
different email provider.

This means that the envelope sender has changed.

They used to use mandrillapp.com, but now seem to use amazonses.com

mandrillapp.com used a dynamic envelope sender, but there was a
pattern to it which meant that the underlying sender could be detected
(see INFRA-18843)

It's not clear if the amazonses.com envelopes have a common pattern
that could be leveraged in the same way.

If not, I suspect every email from travis.com will need to be moderated.

Unless someone has a better idea of how to allow such emails without
opening the floodgates.

Sebb


-- 
Eric Covener
cove...@gmail.com


Re: mpm event assertion failures

2022-01-26 Thread Yann Ylavic
On Tue, Jan 25, 2022 at 10:26 PM Graham Leggett  wrote:
>
> On 25 Jan 2022, at 14:11, Stefan Eissing  wrote:
>
> > Also, while running the http2 test suite, I get repeated assert failures in 
> > event.c:1230
> >
> > if (rv != APR_SUCCESS && !APR_STATUS_IS_EEXIST(rv)) {
> > ->  AP_DEBUG_ASSERT(0);
> >TO_QUEUE_REMOVE(cs->sc->wc_q, cs);
> >apr_thread_mutex_unlock(timeout_mutex);
> >ap_log_error(APLOG_MARK, APLOG_ERR, rv, ap_server_conf, APLOGNO(03465)
> > "process_socket: apr_pollset_add failure for "
> > "write completion");
> >close_connection(cs);
> >signal_threads(ST_GRACEFUL);
> > }
>
> One line above is this:
>
> rv = apr_pollset_add(event_pollset, &cs->pfd);
>
> and looking up the log we fail with a bad file descriptor.
>
> This looks like it’s being triggered by this:
>
> https://github.com/apache/apr/blob/trunk/poll/unix/poll.c#L194
>
> which in turn looks like an apr_file_t is being added to the connection. Does 
> that seem familiar?

What system is this with !APR_FILES_AS_SOCKETS and apr_pollset using
poll() instead of epoll()?
Is OSX like that? I'd expect it to use kqueue (with APR_FILES_AS_SOCKETS)..

Sorry you will have to debug that, how an apr_file_t different than
the wakeup pipe can end up being apr_pollset_poll()ed and handled like
a connection socket, it doesn't sound familiar to me.

Also it seems that travis triggers this too:
https://app.travis-ci.com/github/apache/httpd/jobs/556994472#L2627
But there it's with apr-1.7.0, so nothing related to latest apr
changes it seems.

>
> To explain what’s changed in mod_ssl, a bug has been fixed.

I wouldn't call it like that..

> In the old mod_ssl, directly after connecting we make one single attempt to 
> read zero bytes in a blocking connection. OpenSSL kicks in and performs reads 
> and writes to finish the handshake. And then if that failed for whatever 
> reason - brokenly - we throw the errors away and pretend nothing happened.
>
> https://github.com/apache/httpd/blob/2.4.x/modules/ssl/mod_ssl.c#L695

Well, the handshake and process_connection hooks in general are run in
blocking mode there.
The expectation (I suppose) is that the next process_connection hook
will catch the error too.

Btw, the new ssl_hook_process_connection() in trunk seems to do that
for AP_FILTER_ERROR still, but not for the other errors.

>
> We then get into the protocol handler, and in http/1.1 we read the now broken 
> connection a second time here:
>
> https://github.com/apache/httpd/blob/2.4.x/modules/http/http_core.c#L148

Yes but is that an issue? It seems that in 2.4.x (and before your
changes in trunk) this actually catches the former error.

Now it seems that you can't make the handshake nonblocking by just
s/APR_BLOCK_READ/APR_NONBLOCK_READ/ and pretend it should work. If it
does not please find the cause (the consequences are shown by the ci
already).

The AH03465 does not trigger on my machine when running the pytest
suites (I even tried with apr-1.7.0), but it triggers on travis so I
suppose there is something with the epoll pollset too?


Regards;
Yann.


Re: mpm event assertion failures

2022-01-26 Thread Stefan Eissing



> Am 25.01.2022 um 22:25 schrieb Graham Leggett :
> 
> On 25 Jan 2022, at 14:11, Stefan Eissing  wrote:
> 
>> Also, while running the http2 test suite, I get repeated assert failures in 
>> event.c:1230
>> 
>> if (rv != APR_SUCCESS && !APR_STATUS_IS_EEXIST(rv)) {
>> ->  AP_DEBUG_ASSERT(0);
>>   TO_QUEUE_REMOVE(cs->sc->wc_q, cs);
>>   apr_thread_mutex_unlock(timeout_mutex);
>>   ap_log_error(APLOG_MARK, APLOG_ERR, rv, ap_server_conf, APLOGNO(03465)
>>"process_socket: apr_pollset_add failure for "
>>"write completion");
>>   close_connection(cs);
>>   signal_threads(ST_GRACEFUL);
>> }
> 
> One line above is this:
> 
>rv = apr_pollset_add(event_pollset, &cs->pfd);
> 
> and looking up the log we fail with a bad file descriptor.
> 
> This looks like it’s being triggered by this:
> 
> https://github.com/apache/apr/blob/trunk/poll/unix/poll.c#L194
> 
> which in turn looks like an apr_file_t is being added to the connection. Does 
> that seem familiar?

I do not even parse that statement. What does "added to the connection" mean?

> To explain what’s changed in mod_ssl, a bug has been fixed. In the old 
> mod_ssl, directly after connecting we make one single attempt to read zero 
> bytes in a blocking connection. OpenSSL kicks in and performs reads and 
> writes to finish the handshake. And then if that failed for whatever reason - 
> brokenly - we throw the errors away and pretend nothing happened.
> 
> https://github.com/apache/httpd/blob/2.4.x/modules/ssl/mod_ssl.c#L695
> 
> We then get into the protocol handler, and in http/1.1 we read the now broken 
> connection a second time here:
> 
> https://github.com/apache/httpd/blob/2.4.x/modules/http/http_core.c#L148
> 
> Is the http2 code doing anything to work around mod_ssl trying to read, 
> failing, throwing away the error, and then pretending nothing happened? To 
> fix this behaviour, I suspect whatever you needed to do to work around the 
> mod_ssl bug needs to be undone.

Guys, we have changes in a central part of the server and our CI fails. 
It is not good enough to give other people unspecific homework to fix it. 

Analyze what you broke and if we can help, we'll happily do that. But
you need to give some more details here.

- Stefan

Re: trunk test failure, SSL handshake

2022-01-26 Thread Stefan Eissing



> Am 25.01.2022 um 21:59 schrieb Yann Ylavic :
> 
> Stefan,
> 
> On Tue, Jan 25, 2022 at 12:58 PM Stefan Eissing  wrote:
>> 
>> Failure in https://app.travis-ci.com/github/apache/httpd/jobs/556778281
>> indicates that the SSL handshake timeout no longer is working.
>> Also logs mpm:error several times.
>> 
>> The test opens a socket and sends one byte, then waits for
>> the connection to close with the configured timeout.
>> 
>> Run locally with:
>>> pytest test/modules/http2/ -vvv -k test_h2_105_02
> 
> Is there a way to have the pytest framework talk to an httpd already
> running in gdb?
> 
> Cheers;
> Yann.

I think the current implementation of handshake handling is broken. 
Added some logs in mod_reqtimeout. Test case opens a connection and sends 1 
byte. Log:

[Wed Jan 26 11:14:01.496247 2022] [ssl:info] [pid 54078:tid 123145339097088] 
[client 127.0.0.1:62564] AH01964: Connection to child 130 established (server 
cgi.tests.httpd.apache.org:443)
[Wed Jan 26 11:14:01.496381 2022] [ssl:trace3] [pid 54078:tid 123145339097088] 
ssl_engine_kernel.c(2213): [client 127.0.0.1:62564] OpenSSL: Handshake: start
[Wed Jan 26 11:14:01.496420 2022] [ssl:trace3] [pid 54078:tid 123145339097088] 
ssl_engine_kernel.c(2221): [client 127.0.0.1:62564] OpenSSL: Loop: before SSL 
initialization
[Wed Jan 26 11:14:01.496430 2022] [reqtimeout:trace3] [pid 54078:tid 
123145339097088] mod_reqtimeout.c(227): [client 127.0.0.1:62564] non-block read 
start
[Wed Jan 26 11:14:01.496438 2022] [core:trace6] [pid 54078:tid 123145339097088] 
util_filter.c(1052): [client 127.0.0.1:62564] adopt full brigade to empty 
brigade in 'core_in' input filter
[Wed Jan 26 11:14:01.496443 2022] [reqtimeout:trace3] [pid 54078:tid 
123145339097088] mod_reqtimeout.c(232): [client 127.0.0.1:62564] non-block read 
return
[Wed Jan 26 11:14:01.496448 2022] [ssl:trace4] [pid 54078:tid 123145339097088] 
ssl_engine_io.c(2499): [client 127.0.0.1:62564] OpenSSL: read 1/5 bytes from 
BIO#7f83ec704710 [mem: 7f83fd829603] (BIO dump follows)
[Wed Jan 26 11:14:01.496453 2022] [ssl:trace7] [pid 54078:tid 123145339097088] 
ssl_engine_io.c(2429): [client 127.0.0.1:62564] 
+-+
[Wed Jan 26 11:14:01.496458 2022] [ssl:trace7] [pid 54078:tid 123145339097088] 
ssl_engine_io.c(2466): [client 127.0.0.1:62564] | : 30  
 0|
[Wed Jan 26 11:14:01.496462 2022] [ssl:trace7] [pid 54078:tid 123145339097088] 
ssl_engine_io.c(2471): [client 127.0.0.1:62564] 
+-+
[Wed Jan 26 11:14:01.496465 2022] [reqtimeout:trace3] [pid 54078:tid 
123145339097088] mod_reqtimeout.c(227): [client 127.0.0.1:62564] non-block read 
start
[Wed Jan 26 11:14:01.496468 2022] [core:trace6] [pid 54078:tid 123145339097088] 
util_filter.c(1076): [client 127.0.0.1:62564] reinstate full brigade to empty 
brigade in 'core_in' input filter
[Wed Jan 26 11:14:01.496473 2022] [core:trace6] [pid 54078:tid 123145339097088] 
util_filter.c(1052): [client 127.0.0.1:62564] adopt full brigade to empty 
brigade in 'core_in' input filter
[Wed Jan 26 11:14:01.496476 2022] [reqtimeout:trace3] [pid 54078:tid 
123145339097088] mod_reqtimeout.c(232): [client 127.0.0.1:62564] non-block read 
return
[Wed Jan 26 11:14:01.496479 2022] [ssl:trace4] [pid 54078:tid 123145339097088] 
ssl_engine_io.c(2509): [client 127.0.0.1:62564] OpenSSL: I/O error, 4 bytes 
expected to read on BIO#7f83ec704710 [mem: 7f83fd829604]
[Wed Jan 26 11:14:01.496484 2022] [ssl:trace3] [pid 54078:tid 123145339097088] 
ssl_engine_kernel.c(2250): [client 127.0.0.1:62564] OpenSSL: Exit: error in 
before SSL initialization
[Wed Jan 26 11:14:01.496492 2022] [ssl:trace6] [pid 54078:tid 123145339097088] 
ssl_engine_io.c(1523): [client 127.0.0.1:62564] Want read during nonblocking 
accept
[Wed Jan 26 11:14:01.496495 2022] [ssl:debug] [pid 54078:tid 123145339097088] 
mod_ssl.c(735): [client 127.0.0.1:62564] AH10371: SSL handshake in progress, 
continuing
...nothing more...

The last line means ssl_hook_process_connection() returns the new AGAIN to 
the MPM and that means the socket goes into polling with the usual Timeout.

mod_reqtimeout has no chance to impose its configured handshake timeout.

What is the intention behind the change in handshake handling? To not block a 
worker thread during its processing?

Kind Regards,
Stefan