I have completed a solution to this problem, which can be a drop-in update
for the existing apr_memcache.c.  It is now checked in for my module as
http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c
.

It differs from the solution in
https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that:

   - It doesn't require an API change; it but it enforces the 50ms timeout
   that already exists for apr_multgetp for all operations.
   - It works under my load test (which I found is not true of the patch in
   51065).

For my own purposes, I will be shipping my module with apr_memcache2 so I
get the behavior I want regardless of what version of Apache is installed.
 But I'd like to propose my patch for apr_memcache.c.  The patch is
attached, and I've also submitted it as an alternative patch to bug 51065.

If you agree with the strategy I used to solve this problem, then please
let me know if I can help with any changes required to get this into the
main distribution,


On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz <jmara...@google.com> wrote:

> I've had some preliminary success with my own variant of apr_memcache.c
> (creatively called apr_memcache2.c).  Rather than setting the socket
> timeout, I've been mimicing the timeout strategy I saw in
> apr_memcache_multgetp, by adding a new helper method:
>
> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool,
>                                                apr_memcache2_conn_t* conn)
> {
>     apr_pollset_t* pollset;
>     apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool, 0);
>     if (rv == APR_SUCCESS) {
>         apr_pollfd_t pollfd;
>         pollfd.desc_type = APR_POLL_SOCKET;
>         pollfd.reqevents = APR_POLLIN;
>         pollfd.p = temp_pool;
>         pollfd.desc.s = conn->sock;
>         pollfd.client_data = NULL;
>         apr_pollset_add(pollset, &pollfd);
>         apr_int32_t queries_recvd;
>         const apr_pollfd_t* activefds;
>         rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT, &queries_recvd,
>                               &activefds);
>         if (rv == APR_SUCCESS) {
>             assert(queries_recvd == 1);
>             assert(activefds->desc.s == conn->sock);
>             assert(activefds->client_data == NULL);
>         }
>     }
>     return rv;
> }
>
> And calling that before many of the existing calls to get_server_line as:
>
>     rv = wait_for_server_or_timeout_no_pool(conn);
>     if (rv != APR_SUCCESS) {
>         ms_release_conn(ms, conn);
>         return rv;
>     }
>
> This is just an experiment; I think I can streamline this by
> pre-populating the pollfd structure as part of the apr_memcache_conn_t
> (actually now apr_memcache2_conn_t).
>
> I have two questions about this:
>
> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4 is
> somewhat different from the one that ships with Apache 2.2.  In particular
> the 2.4 version cannot be compiled against the headers that come with a 2.2
> distribution.  Is there any downside to taking my hacked 2.2 apr_memcache.c
> and running it in Apache 2.4?  Or should I maintain two hacks?
>
> 2. This seems wasteful in terms of system calls.  I am making an extra
> call to poll, rather than relying on the socket timeout.  The socket
> timeout didn't work as well as this though.  Does anyone have any theories
> as to why, or what could be done to the patch in
> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work?
>
> -Josh
>
>
> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz <jmara...@google.com>wrote:
>
>> Following up: I tried doing what I suggested above: patching that change
>> into my own copy of apr_memcache.c  It was first of all a bad idea to pull
>> in only part of apr_memcache.c because that file changed slightly between
>> 2.2 and 2.4 and our module works in both.
>>
>> I was successful making my own version of apr_memcache (renaming
>> entry-points apr_memcache2*) that I could hack.  But if I changed the
>> socket timeout from -1 to 10 seconds, then the system behaved very poorly
>> under load test (though it worked fine in our unit-tests and system-tests).
>>  In other words, I think the proposed patch that Jeff pointed to above is
>> not really working (as he predicted).  This test was done without
>> SIGSTOPing the memcached; it would timeout under our load anyway and
>> thereafter behave poorly.
>>
>> I'm going to follow up on that bugzilla entry, but for now I'm going to
>> pursue my own complicated mechanism of timing out the calls from my side.
>>
>> -Josh
>>
>>
>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz <jmara...@google.com>wrote:
>>
>>> Thanks Jeff, that is very helpful.  We are considering a course of
>>> action and before doing any work toward this, I'd like to understand the
>>> pitfalls from people that understand Apache better than us.
>>>
>>> Here's our reality: we believe we need to incorporate memcached for
>>> mod_pagespeed <http://modpagespeed.com> to scale effectively for very
>>> large sites & hosting providers.  We are fairly close (we think) to
>>> releasing this functionality as beta.  However, in such large sites, stuff
>>> goes wrong: machines crash, power failure, fiber cut, etc.  When it does we
>>> want to fall back to serving partially unoptimized sites rather than
>>> hanging the servers.
>>>
>>> I understand the realities of backward-compatible APIs.  My expectation
>>> is that this would take years to make it into an APR distribution we could
>>> depend on.  We want to deploy this functionality in weeks.  The workarounds
>>> we have tried backgrounding the apr_memcache calls in a thread and timing
>>> out in mainline are complex and even once they work 100% will be very
>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly on
>>> 'apachectl stop') if this happens more than (say) once a month.
>>>
>>> Our plan is to copy the patched implementation of
>>> apr_memcache_server_connect and the static methods it calls into a new .c
>>> file we will link into our module, naming the new entry-point something
>>> else (apr_memcache_server_connect_with_timeout seems good).  From a
>>> CS/SE perspective this is offensive and we admit it, but from a product
>>> quality perspective we believe this beats freezes and complicated/imperfect
>>> workarounds with threads.
>>>
>>> So I have two questions for the Apache community:
>>>
>>>    1. What are the practical problems with this approach?  Note that in
>>>    any case a new APR rev would require editing/ifdefing our code anyway, 
>>> so I
>>>    think immunity from APR updates such as this patch being applied is not a
>>>    distinguishing drawback.
>>>    2. Is there an example of the correct solution to the technical
>>>    problem Jeff highlighted: "it is apparently missing a call to adjust
>>>    the socket timeout and to discard the connection if the timeout is 
>>> reached".
>>>     That sounds like a pattern that might be found elsewhere in the Apache
>>>    HTTPD code base.
>>>
>>> Thanks in advance for your help!
>>> -Josh
>>>
>>>
>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick <traw...@gmail.com> wrote:
>>>
>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz <jmara...@google.com>
>>>> wrote:
>>>> > Is there a mechanism to time out individual operations?
>>>>
>>>> No, the socket connect timeout is hard-coded at 1 second and the
>>>> socket I/O timeout is disabled.
>>>>
>>>> Bugzilla bug https://issues.apache.org/bugzilla/show_bug.cgi?id=51065
>>>> has a patch, though it is apparently missing a call to adjust the
>>>> socket timeout and to discard the connection if the timeout is
>>>> reached.  More importantly, the API can only be changed in future APR
>>>> 2.0; alternate, backwards-compatible API(s) could be added in future
>>>> APR-Util 1.6.
>>>>
>>>> >
>>>> > If memcached freezes, then it appears my calls to 'get' will block
>>>> until
>>>> > memcached wakes up.  Is there any way to set a timeout for that call?
>>>> >
>>>> > I can repro this in my unit tests by sending a SIGSTOP to memcached
>>>> before
>>>> > doing a 'get'?
>>>> >
>>>> > Here are my observations:
>>>> >
>>>> > apr_memcache_multgetp seems to time out in bounded time if I SIGSTOP
>>>> the
>>>> > memcached process. Yes!
>>>> >
>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the
>>>> memcached
>>>> > process.
>>>> >
>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the memcached
>>>> > process.
>>>> >
>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP the
>>>> memcached
>>>> > process.
>>>> >
>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP the
>>>> memcached
>>>> > process.
>>>> >
>>>> > That last one really sucks as I am using that to print the status of
>>>> all my
>>>> > cache shards to the log file if I detected a problem :(
>>>> >
>>>> >
>>>> > Why does apr_memcache_multgetp do what I want and not the others?
>>>>  Can I
>>>> > induce the others to have reasonable timeout behavior?
>>>> >
>>>> > When I SIGSTOP memcached this makes it hard to even restart Apache, at
>>>> > least with graceful-stop.
>>>> >
>>>> >
>>>> > On a related note, the apr_memcache
>>>> > documentation<
>>>> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html
>>>> >is
>>>> > very thin.  I'd be happy to augment it with my observations on its
>>>> > usage
>>>> > and the meaning of some of the arguments if that was desired.  How
>>>> would I
>>>> > go about that?
>>>>
>>>> Check out APR trunk from Subversion, adjust the doxygen docs in the
>>>> include files, build (make dox) and inspect the results, submit a
>>>> patch to d...@apr.apache.org.
>>>>
>>>> >
>>>> > -Josh
>>>>
>>>>
>>>>
>>>> --
>>>> Born in Roswell... married an alien...
>>>> http://emptyhammock.com/
>>>>
>>>
>>>
>>
>

Reply via email to