[ https://issues.apache.org/jira/browse/SERF-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116499#comment-17116499 ]
Stefan Sperling commented on SERF-195: -------------------------------------- Thanks a lot for tracking this down! There are some abort() calls added by the patch. It would be better to return errors in such cases, even if triggered by internal inconsistencies, and even if that means that new public APIs would need to be added to return the error. Subversion, and by extension, serf, are embedded into larger programs, such as the Windows Explorer via TortoiseSVN. We really cannot tell what application context we are terminating when we call abort(). Just tearing things down also makes it harder for users to report related bugs since it is often unclear what happened. This has bitten us quite badly in the past so we decided against introducing new abort() and assert() type statements and to convert existing instances to regular error handling (at least in Subversion; if such cases still exist in serf elsewhere that means they haven't been converted yet). Even if you cannot trigger any of these aborts right now, your code might later on be changed, potentially years from now, in ways that could trigger them. > [PATCH] time-consuming operations in auth_ module leads to memory corruption > ---------------------------------------------------------------------------- > > Key: SERF-195 > URL: https://issues.apache.org/jira/browse/SERF-195 > Project: serf > Issue Type: Bug > Affects Versions: serf-1.4.0, serf-trunk > Reporter: Alex W > Priority: Major > Attachments: linked-list-fix.patch > > > In serf trunk and 1.4.x, doing time-consuming operations in an auth module's > "setup" function can lead to memory corruption and crashes. I strongly > suspect this can be triggered in other ways as well, if they can lead to > delayed requests, timeouts or errors on the server, etc. You can easily > reproduce this (pretty reliably) by making a change such as: > {noformat} > diff a/auth/auth_basic.c b/auth/auth_basic.c > --- a/auth/auth_basic.c > +++ b/auth/auth_basic.c > @@ -169,6 +169,7 @@ serf__setup_request_basic_auth(const serf__authn_scheme_t > *scheme, > basic_info = authn_info->baton; > > if (basic_info && basic_info->header && basic_info->value) { > + usleep(75000); > serf_bucket_headers_setn(hdrs_bkt, basic_info->header, > basic_info->value); > return APR_SUCCESS; > {noformat} > And then using serf via e.g. {{svn co}} of a large repository. > The crash which occurs results from the {{written_reqs}} and > {{unwritten_reqs}} lists on the {{serf_connection_t}} containing invalid > pointers. Generally, since the memory is pooled, we don't notice the > corruption until teardown time. If you run with pool debugging and valgrind > you can eventually narrow down the cause to the linked lists. > Several places which manipulate the linked lists are suspicious, e.g. this > section in {{outgoing.c}}: > {noformat} > serf__link_requests(&conn->written_reqs, &conn->written_reqs_tail, > request); > conn->nr_of_written_reqs++; > conn->unwritten_reqs = conn->unwritten_reqs->next; > conn->nr_of_unwritten_reqs--; > request->next = NULL; > {noformat} > This assumes (but never checks) that the request in question is at the front > of the {{unwritten_reqs}} list, which can be violated in both error cases and > in cases of aborted requests. The ordering of operations here (link the > request onto the new list, then unlink it from the old one) should also make > a reader pretty suspicious, though in this particular case it shouldn't be a > problem (since {{serf__link_requests}} doesn't actually change the {{next}} > pointer). There are also other sections of linked list code too which are a > little questionable. > I'm attaching a patch which reorganises the {{written_reqs}} and > {{unwritten_reqs}} lists into their own structs and provides higher-level > functions for manipulating them. These also add assertions to make sure that > the correct list is being used. I also check conditions in places such as > this section of {{outgoing.c}} for other error cases. After these changes I > can no longer reproduce the incorrect free() or other crashes with a slow > auth plugin. -- This message was sent by Atlassian Jira (v8.3.4#803005)