ping! Please don't hesitate to push back and tell me if I can supply the patch or update in some easier-to-digest form. In particular, while I have rigorously stress-tested this change using mod_pagespeed's unit test, system-test, and load-test framework, I don't really understand what the testing flow is for APR. I'd be happy to add unit-tests for that if someone points me to a change-list or patch-file that does it properly.
-Josh On Thu, Nov 1, 2012 at 8:04 AM, Joshua Marantz <jmara...@google.com> wrote: > I have completed a solution to this problem, which can be a drop-in update > for the existing apr_memcache.c. It is now checked in for my module as > http://code.google.com/p/modpagespeed/source/browse/trunk/src/third_party/aprutil/apr_memcache2.c > . > > It differs from the solution in > https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 in that: > > - It doesn't require an API change; it but it enforces the 50ms > timeout that already exists for apr_multgetp for all operations. > - It works under my load test (which I found is not true of the patch > in 51065). > > For my own purposes, I will be shipping my module with apr_memcache2 so I > get the behavior I want regardless of what version of Apache is installed. > But I'd like to propose my patch for apr_memcache.c. The patch is > attached, and I've also submitted it as an alternative patch to bug 51065. > > If you agree with the strategy I used to solve this problem, then please > let me know if I can help with any changes required to get this into the > main distribution, > > > On Mon, Oct 22, 2012 at 5:21 PM, Joshua Marantz <jmara...@google.com>wrote: > >> I've had some preliminary success with my own variant of apr_memcache.c >> (creatively called apr_memcache2.c). Rather than setting the socket >> timeout, I've been mimicing the timeout strategy I saw in >> apr_memcache_multgetp, by adding a new helper method: >> >> static apr_status_t wait_for_server_or_timeout(apr_pool_t* temp_pool, >> apr_memcache2_conn_t* >> conn) { >> apr_pollset_t* pollset; >> apr_status_t rv = apr_pollset_create(&pollset, 1, temp_pool, 0); >> if (rv == APR_SUCCESS) { >> apr_pollfd_t pollfd; >> pollfd.desc_type = APR_POLL_SOCKET; >> pollfd.reqevents = APR_POLLIN; >> pollfd.p = temp_pool; >> pollfd.desc.s = conn->sock; >> pollfd.client_data = NULL; >> apr_pollset_add(pollset, &pollfd); >> apr_int32_t queries_recvd; >> const apr_pollfd_t* activefds; >> rv = apr_pollset_poll(pollset, MULT_GET_TIMEOUT, &queries_recvd, >> &activefds); >> if (rv == APR_SUCCESS) { >> assert(queries_recvd == 1); >> assert(activefds->desc.s == conn->sock); >> assert(activefds->client_data == NULL); >> } >> } >> return rv; >> } >> >> And calling that before many of the existing calls to get_server_line as: >> >> rv = wait_for_server_or_timeout_no_pool(conn); >> if (rv != APR_SUCCESS) { >> ms_release_conn(ms, conn); >> return rv; >> } >> >> This is just an experiment; I think I can streamline this by >> pre-populating the pollfd structure as part of the apr_memcache_conn_t >> (actually now apr_memcache2_conn_t). >> >> I have two questions about this: >> >> 1. I noticed the version of apr_memcache.c that ships with Apache 2.4 is >> somewhat different from the one that ships with Apache 2.2. In particular >> the 2.4 version cannot be compiled against the headers that come with a 2.2 >> distribution. Is there any downside to taking my hacked 2.2 apr_memcache.c >> and running it in Apache 2.4? Or should I maintain two hacks? >> >> 2. This seems wasteful in terms of system calls. I am making an extra >> call to poll, rather than relying on the socket timeout. The socket >> timeout didn't work as well as this though. Does anyone have any theories >> as to why, or what could be done to the patch in >> https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 to work? >> >> -Josh >> >> >> On Fri, Oct 19, 2012 at 9:25 AM, Joshua Marantz <jmara...@google.com>wrote: >> >>> Following up: I tried doing what I suggested above: patching that change >>> into my own copy of apr_memcache.c It was first of all a bad idea to pull >>> in only part of apr_memcache.c because that file changed slightly between >>> 2.2 and 2.4 and our module works in both. >>> >>> I was successful making my own version of apr_memcache (renaming >>> entry-points apr_memcache2*) that I could hack. But if I changed the >>> socket timeout from -1 to 10 seconds, then the system behaved very poorly >>> under load test (though it worked fine in our unit-tests and system-tests). >>> In other words, I think the proposed patch that Jeff pointed to above is >>> not really working (as he predicted). This test was done without >>> SIGSTOPing the memcached; it would timeout under our load anyway and >>> thereafter behave poorly. >>> >>> I'm going to follow up on that bugzilla entry, but for now I'm going to >>> pursue my own complicated mechanism of timing out the calls from my side. >>> >>> -Josh >>> >>> >>> On Thu, Oct 18, 2012 at 10:46 AM, Joshua Marantz <jmara...@google.com>wrote: >>> >>>> Thanks Jeff, that is very helpful. We are considering a course of >>>> action and before doing any work toward this, I'd like to understand the >>>> pitfalls from people that understand Apache better than us. >>>> >>>> Here's our reality: we believe we need to incorporate memcached for >>>> mod_pagespeed <http://modpagespeed.com> to scale effectively for very >>>> large sites & hosting providers. We are fairly close (we think) to >>>> releasing this functionality as beta. However, in such large sites, stuff >>>> goes wrong: machines crash, power failure, fiber cut, etc. When it does we >>>> want to fall back to serving partially unoptimized sites rather than >>>> hanging the servers. >>>> >>>> I understand the realities of backward-compatible APIs. My expectation >>>> is that this would take years to make it into an APR distribution we could >>>> depend on. We want to deploy this functionality in weeks. The workarounds >>>> we have tried backgrounding the apr_memcache calls in a thread and timing >>>> out in mainline are complex and even once they work 100% will be very >>>> unsatisfactory (resource leaks; Apache refusing to exit cleanly on >>>> 'apachectl stop') if this happens more than (say) once a month. >>>> >>>> Our plan is to copy the patched implementation of >>>> apr_memcache_server_connect and the static methods it calls into a new .c >>>> file we will link into our module, naming the new entry-point something >>>> else (apr_memcache_server_connect_with_timeout seems good). From a >>>> CS/SE perspective this is offensive and we admit it, but from a product >>>> quality perspective we believe this beats freezes and complicated/imperfect >>>> workarounds with threads. >>>> >>>> So I have two questions for the Apache community: >>>> >>>> 1. What are the practical problems with this approach? Note that >>>> in any case a new APR rev would require editing/ifdefing our code >>>> anyway, >>>> so I think immunity from APR updates such as this patch being applied is >>>> not a distinguishing drawback. >>>> 2. Is there an example of the correct solution to the technical >>>> problem Jeff highlighted: "it is apparently missing a call to >>>> adjust the socket timeout and to discard the connection if the >>>> timeout is reached". That sounds like a pattern that might be >>>> found elsewhere in the Apache HTTPD code base. >>>> >>>> Thanks in advance for your help! >>>> -Josh >>>> >>>> >>>> On Wed, Oct 17, 2012 at 8:16 PM, Jeff Trawick <traw...@gmail.com>wrote: >>>> >>>>> On Wed, Oct 17, 2012 at 3:36 PM, Joshua Marantz <jmara...@google.com> >>>>> wrote: >>>>> > Is there a mechanism to time out individual operations? >>>>> >>>>> No, the socket connect timeout is hard-coded at 1 second and the >>>>> socket I/O timeout is disabled. >>>>> >>>>> Bugzilla bug https://issues.apache.org/bugzilla/show_bug.cgi?id=51065 >>>>> has a patch, though it is apparently missing a call to adjust the >>>>> socket timeout and to discard the connection if the timeout is >>>>> reached. More importantly, the API can only be changed in future APR >>>>> 2.0; alternate, backwards-compatible API(s) could be added in future >>>>> APR-Util 1.6. >>>>> >>>>> > >>>>> > If memcached freezes, then it appears my calls to 'get' will block >>>>> until >>>>> > memcached wakes up. Is there any way to set a timeout for that call? >>>>> > >>>>> > I can repro this in my unit tests by sending a SIGSTOP to memcached >>>>> before >>>>> > doing a 'get'? >>>>> > >>>>> > Here are my observations: >>>>> > >>>>> > apr_memcache_multgetp seems to time out in bounded time if I SIGSTOP >>>>> the >>>>> > memcached process. Yes! >>>>> > >>>>> > apr_memcache_getp seems to hang indefinitely if I SIGSTOP the >>>>> memcached >>>>> > process. >>>>> > >>>>> > apr_memcache_set seems to hang indefinitely if I SIGSTOP the >>>>> memcached >>>>> > process. >>>>> > >>>>> > apr_memcache_delete seems to hang indefinitely if I SIGSTOP the >>>>> memcached >>>>> > process. >>>>> > >>>>> > apr_memcache_stats seems to hang indefinitely if I SIGSTOP the >>>>> memcached >>>>> > process. >>>>> > >>>>> > That last one really sucks as I am using that to print the status of >>>>> all my >>>>> > cache shards to the log file if I detected a problem :( >>>>> > >>>>> > >>>>> > Why does apr_memcache_multgetp do what I want and not the others? >>>>> Can I >>>>> > induce the others to have reasonable timeout behavior? >>>>> > >>>>> > When I SIGSTOP memcached this makes it hard to even restart Apache, >>>>> at >>>>> > least with graceful-stop. >>>>> > >>>>> > >>>>> > On a related note, the apr_memcache >>>>> > documentation< >>>>> http://apr.apache.org/docs/apr-util/1.4/group___a_p_r___util___m_c.html >>>>> >is >>>>> > very thin. I'd be happy to augment it with my observations on its >>>>> > usage >>>>> > and the meaning of some of the arguments if that was desired. How >>>>> would I >>>>> > go about that? >>>>> >>>>> Check out APR trunk from Subversion, adjust the doxygen docs in the >>>>> include files, build (make dox) and inspect the results, submit a >>>>> patch to dev@apr.apache.org. >>>>> >>>>> > >>>>> > -Josh >>>>> >>>>> >>>>> >>>>> -- >>>>> Born in Roswell... married an alien... >>>>> http://emptyhammock.com/ >>>>> >>>> >>>> >>> >> >