Assuming this happens really in thread start of a HTTP/2 worker, the following 
change was made in Revision 1874909. The stacktrace indicates a 64 bit system.

Is someone making assumptions about connection->id content here? winnt mpm? 
Another module that freaks out? Or do I just not see the problem...


--- httpd/httpd/branches/2.4.x/modules/http2/h2_task.c  2020/03/06 16:14:06     
1874908
+++ httpd/httpd/branches/2.4.x/modules/http2/h2_task.c  2020/03/06 16:15:17     
1874909
@@ -555,37 +555,36 @@ apr_status_t h2_task_do(h2_task *task, a
     task->worker_started = 1;
     
     if (c->master) {
-        /* Each conn_rec->id is supposed to be unique at a point in time. Since
+        /* See the discussion at <https://github.com/icing/mod_h2/issues/195>
+         *
+         * Each conn_rec->id is supposed to be unique at a point in time. Since
          * some modules (and maybe external code) uses this id as an identifier
          * for the request_rec they handle, it needs to be unique for slave 
          * connections also.
-         * The connection id is generated by the MPM and most MPMs use the 
formula
-         *    id := (child_num * max_threads) + thread_num
-         * which means that there is a maximum id of about
-         *    idmax := max_child_count * max_threads
-         * If we assume 2024 child processes with 2048 threads max, we get
-         *    idmax ~= 2024 * 2048 = 2 ** 22
-         * On 32 bit systems, we have not much space left, but on 64 bit 
systems
-         * (and higher?) we can use the upper 32 bits without fear of 
collision.
-         * 32 bits is just what we need, since a connection can only handle so
-         * many streams. 
+         *
+         * The MPM module assigns the connection ids and mod_unique_id is using
+         * that one to generate identifier for requests. While the 
implementation
+         * works for HTTP/1.x, the parallel execution of several requests per
+         * connection will generate duplicate identifiers on load.
+         * 
+         * The original implementation for slave connection identifiers used 
+         * to shift the master connection id up and assign the stream id to 
the 
+         * lower bits. This was cramped on 32 bit systems, but on 64bit there 
was
+         * enough space.
+         * 
+         * As issue 195 showed, mod_unique_id only uses the lower 32 bit of the
+         * connection id, even on 64bit systems. Therefore collisions in 
request ids.
+         *
+         * The way master connection ids are generated, there is some space 
"at the
+         * top" of the lower 32 bits on allmost all systems. If you have a 
setup
+         * with 64k threads per child and 255 child processes, you live on the 
edge.
+         *
+         * The new implementation shifts 8 bits and XORs in the worker
+         * id. This will experience collisions with > 256 h2 workers and heavy
+         * load still. There seems to be no way to solve this in all possible 
+         * configurations by mod_h2 alone. 
          */
-        int slave_id, free_bits;
-        
-        task->id = apr_psprintf(task->pool, "%ld-%d", c->master->id, 
-                                task->stream_id);
-        if (sizeof(unsigned long) >= 8) {
-            free_bits = 32;
-            slave_id = task->stream_id;
-        }
-        else {
-            /* Assume we have a more limited number of threads/processes
-             * and h2 workers on a 32-bit system. Use the worker instead
-             * of the stream id. */
-            free_bits = 8;
-            slave_id = worker_id; 
-        }
-        task->c->id = (c->master->id << free_bits)^slave_id;
+        task->c->id = (c->master->id << 8)^worker_id;
     }
         
     h2_beam_create(&task->output.beam, c->pool, task->stream_id, "output", 


Stefan Eissing

<green/>bytes GmbH
Hafenweg 16
48155 Münster
www.greenbytes.de

> Am 14.04.2020 um 14:12 schrieb Eric Covener <cove...@gmail.com>:
> 
> On Tue, Apr 14, 2020 at 8:09 AM Ruediger Pluem <rpl...@apache.org> wrote:
>> 
>> 
>> 
>> On 4/14/20 12:22 PM, Steffen wrote:
>>> 
>>> 
>>> This is the post above of backtrace
>> 
>> Thanks.
>> 
>>> 
>>> By accident I've seen that Perl comes with GDB. This might help as well.
>>> I called httpd.exe from GDB with "-X -e debug" and then called a Perl URL 
>>> in the browser.
>>> 
>>> Excerpt below:
>>> 
>> 
>> Somehow the below wasn't visible in the original mail.
>> 
>>> Thread 100 received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 4936.0x23e0]
>>> 0x00007ffbe57515d9 in libhttpd!ap_get_server_built () from 
>>> X:\Apps\Apache24\bin\libhttpd.dll
>>> (gdb) bt
>>> #0  0x00007ffbe57515d9 in libhttpd!ap_get_server_built () from 
>>> X:\Apps\Apache24\bin\libhttpd.dll
>>> #1  0x00007ffbe44d14aa in ?? () from X:\Apps\Apache24\modules\mod_cgi.so
>>> #2  0x00007ffbe575ee85 in libhttpd!ap_run_handler () from 
>>> X:\Apps\Apache24\bin\libhttpd.dll
>>> #3  0x00007ffbe575da7f in libhttpd!ap_invoke_handler () from 
>>> X:\Apps\Apache24\bin\libhttpd.dll
>>> #4  0x00007ffbe575a62a in libhttpd!ap_internal_redirect_handler () from 
>>> X:\Apps\Apache24\bin\libhttpd.dll
>>> #5  0x00007ffbe575a6af in libhttpd!ap_process_request () from 
>>> X:\Apps\Apache24\bin\libhttpd.dll
>>> #6  0x00007ffbe22888ef in ?? () from X:\Apps\Apache24\modules\mod_http2.so
>>> #7  0x00007ffbe5761545 in libhttpd!ap_run_process_connection () from 
>>> X:\Apps\Apache24\bin\libhttpd.dll
>>> #8  0x00007ffbe22885ba in ?? () from X:\Apps\Apache24\modules\mod_http2.so
>>> #9  0x00007ffbe228c36e in ?? () from X:\Apps\Apache24\modules\mod_http2.so
>>> #10 0x00007ffbe9e30e72 in ucrtbase!_beginthreadex () from 
>>> C:\Windows\System32\ucrtbase.dll
>>> #11 0x00007ffbea107bd4 in KERNEL32!BaseThreadInitThunk () from 
>>> C:\Windows\System32\kernel32.dll
>>> #12 0x00007ffbebecced1 in ntdll!RtlUserThreadStart () from 
>>> C:\Windows\SYSTEM32\ntdll.dll
>>> #13 0x0000000000000000 in ?? ()
>>> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>>> (gdb)
>>> 
>> 
>> 
>> Unfortunately this stacktrace does not help. One reason might be that the 
>> debugging symbols are missing.
>> It is very strange that it segfaults in ap_get_server_built, a simple 
>> function just returning a pointer
>> to a static string constant. Furthermore ap_get_server_built is not called 
>> by mod_cgi.
>> Can the crash be repeated against a binary with debugging symbols that are 
>> then used to generate the stacktrace?
>> As I am not a Windows guy, I unfortunately cannot provide any instructions 
>> how to do this.
> 
> My experience on windows is that if the PDB's are not 110% right you
> will get all kinds of misleading stuff above the first ?? in the
> displayed backtrace.

Reply via email to