Re: [PATCH 63503] - Reverse proxy server - SIGSEGV

2019-08-14 Thread Don Poitras
Eric,
  The timestamp changed, but the content looks the same. 

-- 
Don Poitras - Host R&DSAS Institute Inc.SAS Campus Drive 
mailto:sas...@sas.com (919)531-5637  Fax:677-   Cary, NC 27513



Re: [PATCH 63503] - Reverse proxy server - SIGSEGV

2019-08-14 Thread Don Poitras
Eric,
  http://people.apache.org/~covener/trunk-proxy-segv.diff looks the same. Did 
you place somewhere else?

-- 
Don Poitras - Host R&DSAS Institute Inc.SAS Campus Drive 
mailto:sas...@sas.com (919)531-5637  Fax:677-   Cary, NC 27513



RE: [PATCH 63503] - Reverse proxy server - SIGSEGV

2019-08-13 Thread Don Poitras
Eric,
  It's in the works. This involves several other people as the large server is 
owned by another group and getting our backend software ready to be called from 
httpd is somewhat involved. The build needs to be done by someone else as well. 
I've requested they build with --enable-pool-concurrency-check as that's what 
allows me to see which two threads are stepping on each other. I looked at your 
change and I see that you changed the logic so that the worker mutex is set for 
the first call to ap_proxy_initialize_worker(). I don't think that hurts 
anything and probably makes it easier to read, but I don't think it has an 
effect as in the first call (the one that allocates the pool), there is no way 
for any other thread to be using the pool. You also moved the worker mutex 
unlock so it only happens if worker->s->hmax != NULL and leave the two calls to 
connection_constructor() un-serialized. I don't have in my notes definite proof 
that this function must be serialized, so hopefully the test will show if this 
is ok. However, even in the case where worker->s->hmax == NULL, we need to 
unlock the worker mutex before returning. If this shows as a problem in the 
test, I can make that small change and run again and we should also put back 
the unlock call in the error condition for init_conn_pool(), as we shouldn't 
leave the mutex locked if we error out of this routine.
  Also, when this is finalized, all the new calls to APLOGNO need to go through 
your utility to assign message numbers.

> -Original Message-
> From: Eric Covener 
> Sent: Monday, August 12, 2019 3:25 PM
> To: Apache HTTP Server Development List 
> Subject: Re: [PATCH 63503] - Reverse proxy server - SIGSEGV
> 
> Hi Don, can you try this very similar patch? I applied yours to my
> sandbox to get more context and made a few minor changes (including
> pre-existing stuff that looked misleading)
> 
> http://people.apache.org/~covener/trunk-proxy-segv.diff

-- 
Don Poitras - Host R&DSAS Institute Inc.SAS Campus Drive 
mailto:sas...@sas.com (919)531-5637  Fax:677-   Cary, NC 27513


RE: [PATCH 63503] - Reverse proxy server - SIGSEGV

2019-08-12 Thread Don Poitras
Eric,
  I'm not sure what you're asking. The global mutex will cover the call to 
init_conn_pool(), so there isn't a problem with worker->cp getting overlayed if 
that's what you're thinking. The problem is that ap_proxy_initialize_worker() 
gets called repeatedly with the same worker pointer. The first time it's called 
for a particular worker, it creates a connection pool and the worker thread 
mutex. This mutex appears to only be used in ap_proxy_determine_connection() to 
protect setting of worker->cp->addr by apr_sockaddr_info_get() (it also 
protects use of worker->cp->pool, but it's not clear that was recognized as a 
requirement). But _every_ use of worker->cp->pool must be serialized. 
Subsequent calls to ap_proxy_initialize_worker() with the same worker pointer 
will use the pool and mutex created in the first call. It's only these calls 
after the first that are at risk for concurrent use of the pool (hence my use 
of the 'notfirst' flag). In the first call, I don't even bother getting the 
proxy mutex as there isn't any way for other threads to be using the pool 
before it's created.

-- 
Don Poitras - Host R&DSAS Institute Inc.SAS Campus Drive 
mailto:sas...@sas.com (919)531-5637  Fax:677-   Cary, NC 27513


RE: [PATCH 63503] - Reverse proxy server - SIGSEGV

2019-08-12 Thread Don Poitras
Eric,
  The global mutex only serializes concurrent calls to 
ap_proxy_initialize_worker(). The worker pool is also used when the 
proxy_handler() is called from a thread kicked off from a _previous_ call to 
ap_proxy_initialize_worker() . Turning on the pool concurrency check shows this 
happening:

pool concurrency check: pool 0xa04348(proxy_worker_cp), thread cur 7f25a21fc700 
in use by 7f2598bed700, state in use -> in use

172256 .  Thawed  libc:gsignal (+0x37)  
 <--- abort
  1 libc:gsignal (+0x37) 
  2 libc:abort (+0x143)  
  3 libapr-1:\apr_pools\pool_concurrency_abort 768   
  4 libapr-1:\apr_pools\apr_palloc 778 (+0xA)
  5 libapr-1:\thread_cond\apr_thread_cond_create 44  
  6 libaprutil-1:\apr_reslist\apr_reslist_create 299 (+0x9)  
  7 mod_proxy:\proxy_util\ap_proxy_initialize_worker 2013 (+0x2F)
  8 mod_proxy:\mod_proxy\proxy_handler 1176 (+0xE)   
  9 httpd:UNKNOWN at 0x004543A0  

Here's the thead the diagnostic said had the pool 'in use' when apr_palloc() 
was called:

172271Thawed  libc:__GI_strncmp (+0x1680)   
<-- in use
  1 libc:__GI_strncmp (+0x1680)
  2 libc:getenv (+0xBD)
  3 libc:__nscd_getai (+0x3D3)
  4 libc:gaih_inet.constprop.8 (+0x15F2)
  5 libc:getaddrinfo (+0x11F) 
  6 libapr-1:\sockaddr\call_resolver 397 (+0x10)
  7 mod_proxy:\proxy_util\ap_proxy_determine_connection 2506 (+0x15)
  8 mod_proxy_http:\mod_proxy_http\proxy_http_handler 1956 (+0x1D)
  9 mod_proxy:\mod_proxy\proxy_run_scheme_handler 3063 (+0x18)
 10 mod_proxy:\mod_proxy\proxy_handler 1250 (+0x16)
 11 httpd:UNKNOWN at 0x004543A0


Another possible fix would be to use the global mutex everywhere it's needed 
and remove the call to create and use the worker thread lock altogether, but 
that would mean changing the code that currently uses the worker proxy thread 
lock and it seemed cleaner to just change proxy_util.c rather than two source 
modules.

The two functions in ap_proxy_acquire_connection() that I serialized were found 
from other runs. Both apr_reslist_acquire() and connection_constructor() end up 
using the same worker pool that is being used by 
ap_proxy_determine_connection(). e.g.

pool concurrency check: pool 0x1d93198(proxy_worker_cp), thread cur 
7f019bfff700 in use by 7f019abfd700, state in use -> in use

aborting thread:
1Thread 0x7f019bfff700 (LWP 106622) 0x7f01a735c207 in raise () from 
/lib64/libc.so.6
  3 libapr-1:\apr_pools\pool_concurrency_abort 768
  4 libapr-1:\apr_pools\apr_palloc 778 (+0xA)
  5 libaprutil-1:\apr_reslist\get_container 100 (+0x8)  
 
  6 libaprutil-1:\apr_reslist\apr_reslist_acquire 121 (+0x3)
  7 mod_proxy:\proxy_util\ap_proxy_acquire_connection 2327 (+0x3)
  8 mod_proxy_http:\mod_proxy_http\proxy_http_handler 1933 (+0x19)
  9 mod_proxy:\mod_proxy\proxy_run_scheme_handler 3063 (+0x18)
 10 mod_proxy:\mod_proxy\proxy_handler 1250 (+0x16)
 11 httpd:UNKNOWN at 0x004543A0

And the "in-use" thread:

5Thread 0x7f019abfd700 (LWP 106624) 0x7f01a7424bed in connect () from 
/lib64/libc.so.6
  8 mod_proxy:\proxy_util\ap_proxy_determine_connection 2537 (+0x15)
 
  9 mod_proxy_http:\mod_proxy_http\proxy_http_handler 1956 (+0x1D)
 10 mod_proxy:\mod_proxy\proxy_run_scheme_handler 3063 (+0x18)
 11 mod_proxy:\mod_proxy\proxy_handler 1250 (+0x16)

Once the workers are up and running, they use thread-specific storage, but all 
the code that starts them up uses a pool that needs to be serialized.

worker->cp as used in ap_proxy_acquire_connection() is currently not serialized 
when running with the worker mpm. It can run at the same time as 
ap_proxy_initialize_worker() with the same pool used in both.

The problem doesn't show up when running the same test with pre-fork or event 
mpms.

-- 
Don Poitras - Host R&DSAS Institute Inc.SAS Campus Drive 
mailto:sas...@sas.com (919)531-5637  Fax:677-   Cary, NC 27513


RE: [PATCH 63503] - Reverse proxy server - SIGSEGV

2019-08-11 Thread Don Poitras
Hello,
  I see that proxy_util.c was updated a few days ago. Would it be considered a 
faux-pas to contact the developer directly to request a review of this patch? 
It's been almost 2 months and I'm worried it will just get lost in the noise.

-- 
Don Poitras - Host R&DSAS Institute Inc.SAS Campus Drive 
mailto:sas...@sas.com (919)531-5637  Fax:677-   Cary, NC 27513


[PATCH 63503] - Reverse proxy server - SIGSEGV

2019-06-21 Thread Don Poitras
Hello. I submitted this patch last week and I'd just like to solicit comments 
on how to proceed. See this for the text of the bug and fix: 
https://bz.apache.org/bugzilla/show_bug.cgi?id=63503

Unfortunately, I don't have a test that exhibits the bug that I can ask others 
to run. We were only able to see it when running a Loadrunner script emulating 
thousands of concurrent users. 

As I see it, there are two potential problems in the fix: 1) adding mutex() 
calls will single-thread code that previously was able to run on multiple 
processors and so imposes a performance hit. 2) There's a possibility of 
"deadly embrace" if some code path uses the locks in an order not seen in our 
tests.

I think #2 would become evident pretty quickly if we were able to get a test 
run that exercises the mod_proxy code more fully than our one server. #1 I 
think is just overhead required for thread-safety. It would be nice if all 
apr-pools were required to only be accessed by a single thread, but if they 
aren't, then they must be serialized externally when there is the possibility 
of concurrent access.

The presence of the "--enable-pool-concurrency-check" made fixing this problem 
a lot easier. I couldn't find a previous report of its use, but obviously 
someone thought it would be helpful.

Let me know if there are questions I can answer about the patch or the core 
files that the bug produced. Thanks!

-- 
Don Poitras - Host R&DSAS Institute Inc.SAS Campus Drive 
mailto:sas...@sas.com (919)531-5637  Fax:677-   Cary, NC 27513