Re: Reproducing SERF-195 [was: Releasing Serf 1.5.0]

Daniel Sahlberg Mon, 15 Dec 2025 11:59:36 -0800

Den lör 13 dec. 2025 kl 20:43 skrev Branko Čibej <[email protected]>:


> On 13. 12. 25 20:14, Daniel Sahlberg wrote:
> > Den lör 13 dec. 2025 kl 16:39 skrev Branko Čibej<[email protected]>:
> >
> >> On 13. 12. 25 03:35, Branko Čibej wrote:
> >>> On 12. 12. 25 20:49, Branko Čibej wrote:
> >>>> On 12. 12. 25 20:02, Daniel Sahlberg wrote:
> >>>>> Den tors 11 dec. 2025 kl 19:43 skrev Branko Čibej<[email protected]>:
> >>>>>> On 11. 12. 25 19:59, Daniel Sahlberg wrote:
> >>>>>> * Issue SERF-195, which is a substantial rewrite of the request
> >>>>>> queues.
> >>>>>> The
> >>>>>>> code is in a separate branch SERF-195. The code looks good to me
> >>>>>>> but I
> >>>>>>> haven't analyzed in detail. It should be possible to merge to
> trunk.
> >>>>>> The patch works, in the sense that it causes no regressions, but I
> >>>>>> haven't been able to reproduce the reported issue. It's also makes
> the
> >>>>>> code slightly more readable, but I couldn't find the mentioned race
> >>>>>> condition during review, it looks like just a refactoring of the
> >>>>>> original code.
> >>>>>>
> >>>>>> I think we need another pair of eyes to do a careful review of the
> >>>>>> branch changes.
> >>>>>>
> >>>>> I'll try to reproduce the initial issue and see if I can figure out
> >>>>> what
> >>>>> goes wrong. Do you have any ideas where I should start? I assume a
> >>>>> multithreaded client (with multiple connections) would be needed?
> >>>> Multi-threading on the client has no effect here. Request queues are
> >>>> per-context and contexts require single-threaded access.
> >>>>
> >>>> The original issue submission just adds a sleep() in
> >>>> serf__setup_request_basic_auth(), so I guess you don't need a
> >>>> multi-threaded client, you just need a server that always requires
> >>>> authentication. That's why, e.g., linking Subversion with that and
> >>>> doing read from the Apache repos won't work. You could try to set up
> >>>> a local multi-threaded http server that requires basic auth and
> >>>> serves some data  -- easy to do in Python -- then write a Serf-based
> >>>> client that sends requests in a loop. Serf-get would be a good
> >>>> starting point.
> >>>>
> >>>> I think the important points here are:
> >>>>
> >>>>   * all requests to the server must require authentication;
> >>>>   * the server must be able to process requests in parallel and out of
> >>>>     order: this is the only way I can think of where processed
> requests
> >>>>     could be removed from the middle of the queue, not the beginning
> or
> >>>>     the end;
> >>>>   * the server must be "slow" enough that the client request queue
> fills
> >>>>     up, so it needs both an upper limit to the number of parallel
> >>>>     requests and a fake latency for the responses.
> >>>>
> >>>>
> >>>> I'll give the server script a go.
> >>> See r1930478.
> >>>
> >>> Try something like this:
> >>>
> >>>     $ authserver.py 2>/dev/null &
> >>>     $ serf_get -n 400 -c 10 -x 20 -U user -P pass -m
> >>> HEADhttp://127.0.0.1:8087/
> >>>
> >>>
> >>> However, this still doesn't reproduce the reported issue. One possible
> >>> reason is that serf_get does *not* pipeline the requests in each
> >>> connection but only sends the next request when the previous response
> >>> has arrived. I haven't been able to track this down, don't know if
> >>> it's a problem with serf_get or authserver.py.
> >> I was mistaken about the pipelining. After adding some instrumentation
> >> to serf_get (see below), it turns out that requests are pipelined, and
> >> yet I can't reproduce the issue.
> >>
> >> This is the patch for the basic auth handler, the same as in the
> >> description of SERF-195:
> >>
> >> --- serf-trunk/auth/auth_basic.c        (revision 1930477)
> >> +++ serf-trunk/auth/auth_basic.c        (working copy)
> >> @@ -19,6 +19,7 @@
> >>     */
> >>
> >>    /*** Basic authentication ***/
> >> +#include <unistd.h>
> >>
> >>    #include <serf.h>
> >>    #include <serf_private.h>
> >> @@ -153,6 +154,7 @@ serf__setup_request_basic_auth(const
> serf__authn_schem
> >>        basic_info = authn_info->baton;
> >>
> >>        if (basic_info && basic_info->header && basic_info->value) {
> >> +        usleep(75000);  /* 75 milliseconds */
> >>            serf_bucket_headers_setn(hdrs_bkt, basic_info->header,
> >>                                     basic_info->value);
> >>            return APR_SUCCESS;
> >>
> >>
> >> I also noticed something interesting: Each connection sends up to
> >> max-pending requests at once, but then waits for all the replies before
> >> sending the next batch. I haven't checked if this is a bug in serf_get
> >> or in Serf proper; the connection has more requests ready, so it should
> >> send new ones as soon any received response makes room in the pending
> >> requests queue. I saw this with the following invocation:
> >>
> >>      serf_get -d -n 500 -c 1 -x 50 -U user -P something -m GEThttp://
> >> 127.0.0.1:8087/ >/dev/null
> >>
> >>
> >>
> >> The instrumentation patch for serf_get:
> >>
> >> --- serf-trunk/test/serf_get.c  (revision 1930478)
> >> +++ serf-trunk/test/serf_get.c  (working copy)
> >> @@ -590,6 +590,7 @@ int main(int argc, const char **argv)
> >>        apr_getopt_t *opt;
> >>        int opt_c;
> >>        const char *opt_arg;
> >> +    unsigned int prev_min = 0, prev_max = UINT_MAX, prev_all = 0;
> >>
> >>        apr_initialize();
> >>        atexit(apr_terminate);
> >> @@ -910,6 +911,24 @@ int main(int argc, const char **argv)
> >>                break;
> >>            }
> >>            /* Debugging purposes only! */
> >> +        if (debug) {
> >> +            unsigned int min = UINT_MAX, max = 0, all = 0;
> >> +            for (i = 0; i < conn_count; i++) {
> >> +                serf_connection_t *conn = connections[i];
> >> +                unsigned int pending =
> >> (serf_connection_pending_requests(conn)
> >> +                                        -
> >> serf_connection_queued_requests(conn));
> >> +                all += pending;
> >> +                if (min > pending) min = pending;
> >> +                if (max < pending) max = pending;
> >> +            }
> >> +            if (prev_min != min || prev_max != max || prev_all != all)
> {
> >> +                fprintf(stderr, ">pending: %u [avg=%0.1f min=%u
> >> max=%u]\n",
> >> +                        all, (double)all/conn_count, min, max);
> >> +                prev_min = min;
> >> +                prev_max = max;
> >> +                prev_all = all;
> >> +            }
> >> +        }
> >>            serf_debug__closed_conn(app_ctx.bkt_alloc);
> >>        }
> >>
> >>
> >> -- Brane
> >>
> > With a longer delay in authserver.py (10.0) and a high enough number of
> > concurrent connections (15 seems enough) and high enough outstanding
> > requests (>2), I seem to get an error:
> > [[[
> > dsg@devi-25-01:~/serf_trunk/test$ ./serf_get -d -n 150 -c 15 -x 5 -U
> foo -P
> > barhttp://127.0.0.1:8087
> > ...
> > Error running context: (70014) End of file found
> > ]]]
> >
> > With 2 outstanding requests:
> > [[[
> > dsg@devi-25-01:~/serf_trunk/test$ ./serf_get -d -n 150 -c 15 -x 2 -U
> foo -P
> > barhttp://127.0.0.1:8087
> > ...
> > 2025-12-13T19:06:06.161609+00 ERROR [l:127.0.0.1:35170 r:127.0.0.1:8087]
> > receiving raw: Error 104 while reading.
> > Error running context: (104) Connection reset by peer
> > ]]]
> >
> > If I add the usleep(), I can't reproduce these errors anymore. It seems
> to
> > be exactly opposite of what the original poster says - as far as I
> > understand they claim that the error is triggered by adding a usleep?
>
>
> SERF-195 is about memory corruption and crashes on the client. This
> appears to be something else.
>
> In any case, it would make sense to retry your test with serf_get from
> the SERF-195 branch (but note that branch needs to be sync-merged from
> trunk).
>

Tried it and it crashes the same way - which as Brane correctly notes -
isn't quite the same as described in this issue.

I've been thinking about the usleep() and it doesn't make sense that it
should introduce any problems for a single threaded application. From how I
understand the description, it seems like something is removing stuff from
the linked lists during sleep. But if there is only one thread, that thread
should be the only one working the lists so nothing should happen.

Could it be that the original poster is reusing the same context across
multiple threads? If so, this is not a supported use case and we should
close this issue?

Cheers,
Daniel

Re: Reproducing SERF-195 [was: Releasing Serf 1.5.0]

Reply via email to