I personnally like this solution better (IMHO) since it does not rely on apr_thread_mutex_trylock() to be wait-free/userspace (eg. natively implements the "compare and swap").
On the other hand, apr_atomic_cas32() may itself be implemented using apr_thread_mutex_lock() when USE_ATOMICS_GENERIC is defined (explicitly, or with --enable-nonportable-atomics=no, or else forcibly with "gcc -stdc=c89" or intel cpus <= i686). Hence with USE_ATOMICS_GENERIC, apr_thread_mutex_trylock() may be a better solution than the apr_thread_mutex_lock()... On Tue, Dec 3, 2013 at 6:01 PM, Daniel Lescohier <daniel.lescoh...@cbsi.com>wrote: > If the developers list is OK using apr_atomic in the server core, there > would be lots of advantages over trylock: > > 1. No need for child init. > 2. No need for function pointers. > 3. Could have a lock per cache element (I deemed it too expensive > memory-wise to have a large mutex structure per cache element). > 4. It would avoid the problem of trylock not being implemented on all > platforms. > 5. Fewer parameters to the function macro. > > The code would be like this: > > #define TIME_CACHE_FUNCTION(VALUE_SIZE, CACHE_T, CACHE_PTR, > CACHE_SIZE_POWER,\ > CALC_FUNC, AFTER_READ_WORK\ > > )\ > const apr_int64_t seconds = apr_time_sec(t);\ > apr_status_t status;\ > CACHE_T * const cache_element = \ > &(CACHE_PTR[seconds & ((1<<CACHE_SIZE_POWER)-1)]);\ > /* seconds==0 can be confused with unitialized cache; don't use cache > */\ > if (seconds==0) return CALC_FUNC(value, t);\ > if (apr_atomic_cas32(&cache_element->lock, 1, 0)==0) {\ > > if (seconds == cache_element->key) {\ > memcpy(value, &cache_element->value, VALUE_SIZE);\ > apr_atomic_dec32(&cache_element->lock);\ > > AFTER_READ_WORK;\ > return APR_SUCCESS;\ > }\ > if (seconds < cache_element->key) {\ > apr_atomic_dec32(&cache_element->lock);\ > return CALC_FUNC(value, t);\ > }\ > apr_atomic_dec32(&cache_element->lock);\ > > }\ > status = CALC_FUNC(value, t);\ > if (status == APR_SUCCESS) {\ > if (apr_atomic_cas32(&cache_element->lock, 1, 0)==0) {\ > > if (seconds > cache_element->key) {\ > cache_element->key = seconds;\ > memcpy(&cache_element->value, value, VALUE_SIZE);\ > }\ > apr_atomic_dec32(&cache_element->lock);\ > }\ > }\ > return status; > > -------------------------------------------------- > > typedef struct { > apr_int64_t key; > apr_uint32_t lock; > > apr_time_exp_t value; > } explode_time_cache_t; > > TIME_CACHE(explode_time_cache_t, explode_time_lt_cache, > TIME_CACHE_SIZE_POWER) > > AP_DECLARE(apr_status_t) ap_explode_recent_localtime( > apr_time_exp_t * value, apr_time_t t) > { > TIME_CACHE_FUNCTION( > > sizeof(apr_time_exp_t), explode_time_cache_t, explode_time_lt_cache, > TIME_CACHE_SIZE_POWER, apr_time_exp_lt, > value->tm_usec = (apr_int32_t) apr_time_usec(t)) > } > > > > On Tue, Dec 3, 2013 at 10:53 AM, Daniel Lescohier < > daniel.lescoh...@cbsi.com> wrote: > >> I infer that you think that in this particular case, the function macro >> makes the code more readable and maintainable. I don't think one can >> define an absolute rule, it's a judgment call whether a macro increases or >> decreases clarity and maintainability. It reminds me of 'goto': most of >> the time, one shouldn't use it, but there are cases where it's better than >> the alternative. >> >> >> On Tue, Dec 3, 2013 at 10:01 AM, Jim Jagielski <j...@jagunet.com> wrote: >> >>> At first blush, the below looks very workable! On a slightly >>> off-topic topic, however, I wonder if macros are the way to >>> go. Long long ago function calls were really really expensive. >>> Does the decreased clarity (and debugging capability) really >>> offset the saved cycles? Agreed that it's not fully applicable >>> below. >>> >>> On Dec 2, 2013, at 6:13 PM, Daniel Lescohier < >>> daniel.lescoh...@cbsinteractive.com> wrote: >>> >>> > The current time caching implementation in util_time.c and >>> mod_log_config.c is not guaranteed to work with all compilers and cpu >>> architectures. I have some proposed code I've written, that I want to get >>> input from the mailing list on, before continuing on to compile and test it. >>> > >>> > The old implementation tries to do a wait-free algorithm, but >>> incorrectly, because the algorithm relies on total memory ordering, and the >>> implementation does not guarantee total memory ordering from the compiler >>> or CPU. >>> > >>> > My proposal is to use the standard and portable >>> apr_thread_mutex_trylock for enforcing the memory barriers. For APR's >>> Linux x86_64 implementation, this basically turns into a lock: cmpxchgl >>> instruction, and a lock: decl instruction for the unlock. Because only >>> trylock is used, not lock, the futex is never spun and arbitrated on by the >>> kernel, it's all done in userspace (if there's contention, each thread just >>> calculates the value itself instead of reading from the cache). So the >>> replacement is also a wait-free algorithm, using the standard and portable >>> apr_thread_mutex calls. Also, the old algorithm does two memcpy operations >>> when reading from the cache, while the new algorithm just does one. For >>> APR_HAS_THREADS==0 or a non-threaded mpm in use, no locking is done. >>> > >>> > The first part of the code is generic time caching code I've written >>> in util_time_caching.h. I use these macros to create five different time >>> caches: >>> > >>> > excerpt w/o boilerplate: >>> > >>> > ----------------------------------- >>> > #define TIME_CACHE(CACHE_T, CACHE_PTR, CACHE_SIZE_POWER)\ >>> > static CACHE_T CACHE_PTR[1<<CACHE_SIZE_POWER]; >>> > >>> > #define TIME_CACHE_FUNC_PTR(FUNC_PTR, VALUE_T, CALC_FUNC)\ >>> > static apr_status_t (*FUNC_PTR)(VALUE_T *, apr_time_t) = &CALC_FUNC; >>> > >>> > /* TIME_CACHE_FUNCTION macro: >>> > * The cache is implemented as a ring buffer. The key in the >>> > * cache element indicates which second the cache value is for. >>> > * We use a wait-free algorithm: if we cannot access the cache, >>> > * we just calculate the value, doing useful work, instead of >>> > * spinning trying to access the cache. >>> > * We only update the cache with newer times, because older times >>> > * should have a lower cache hit ratio, and we want to keep the >>> > * caches small to fit in the CPU L1/L2 caches. >>> > */ >>> > >>> > #define TIME_CACHE_FUNCTION(FUNC_NAME, VALUE_T, VALUE_SIZE, \ >>> > CACHE_T, CACHE_PTR, CACHE_SIZE_POWER, \ >>> > CALC_FUNC, TRY_LOCK, RELEASE_LOCK, AFTER_READ_WORK)\ >>> > static apr_status_t FUNC_NAME(VALUE_T *value, apr_time_t t)\ >>> > {\ >>> > const apr_int64_t seconds = apr_time_sec(t);\ >>> > apr_status_t status;\ >>> > CACHE_T * const cache_element = \ >>> > &(CACHE_PTR[seconds & ((1<<CACHE_SIZE_POWER)-1)]);\ >>> > /* seconds==0 can be confused with unitialized cache; don't use >>> cache */\ >>> > if (seconds==0) return CALC_FUNC(value, t);\ >>> > if (TRY_LOCK) {\ >>> > if (seconds == cache_element->key) {\ >>> > memcpy(value, &cache_element->value, VALUE_SIZE);\ >>> > RELEASE_LOCK;\ >>> > AFTER_READ_WORK;\ >>> > return APR_SUCCESS;\ >>> > }\ >>> > if (seconds < cache_element->key) {\ >>> > RELEASE_LOCK;\ >>> > return CALC_FUNC(value, t);\ >>> > }\ >>> > RELEASE_LOCK;\ >>> > }\ >>> > status = CALC_FUNC(value, t);\ >>> > if (status == APR_SUCCESS) {\ >>> > if (TRY_LOCK) {\ >>> > if (seconds > cache_element->key) {\ >>> > cache_element->key = seconds;\ >>> > memcpy(&cache_element->value, value, VALUE_SIZE);\ >>> > }\ >>> > RELEASE_LOCK;\ >>> > }\ >>> > }\ >>> > return status;\ >>> > } >>> > ---------------------------------------------- >>> > >>> > Here is an example of implementing one of the caches in util_time.c: >>> > >>> > ---------------------------------------------- >>> > /* Have small sized caches, to stay in CPU's L1/L2 caches. >>> > * There should be few requests older than 3 seconds, so set >>> > * cache size power to 2. >>> > */ >>> > #define TIME_CACHE_SIZE_POWER 2 >>> > >>> > typedef struct { >>> > apr_int64_t key; >>> > apr_time_exp_t value; >>> > } explode_time_cache_t; >>> > >>> > TIME_CACHE(explode_time_cache_t, explode_time_lt_cache, >>> > TIME_CACHE_SIZE_POWER) >>> > TIME_CACHE_FUNC_PTR(explode_time_lt_ptr, apr_time_exp_t, >>> > apr_time_exp_lt) >>> > TIME_CACHE_FUNCTION( >>> > explode_time_lt_nolocking, apr_time_exp_t, sizeof(apr_time_exp_t), >>> > explode_time_cache_t, explode_time_lt_cache, TIME_CACHE_SIZE_POWER, >>> > apr_time_exp_lt, 1,, >>> > value->tm_usec = (apr_int32_t) apr_time_usec(t)) >>> > #if APR_HAS_THREADS >>> > static apr_thread_mutex_t *explode_time_lt_lock; >>> > TIME_CACHE_FUNCTION( >>> > explode_time_lt_locking, apr_time_exp_t, sizeof(apr_time_exp_t), >>> > explode_time_cache_t, explode_time_lt_cache, TIME_CACHE_SIZE_POWER, >>> > apr_time_exp_lt, >>> > apr_thread_mutex_trylock(explode_time_lt_lock)==APR_SUCCESS, >>> > apr_thread_mutex_unlock(explode_time_lt_lock), >>> > value->tm_usec = (apr_int32_t) apr_time_usec(t) >>> > #endif >>> > >>> > AP_DECLARE(apr_status_t) ap_explode_recent_localtime(apr_time_exp_t * >>> tm, >>> > apr_time_t t) >>> > { >>> > return explode_recent_lt_ptr(tm, t); >>> > } >>> > ------------------------------------------------- >>> > >>> > The function pointer initially points to the uncached CALC_FUNC; only >>> after child_init is run is the function pointer replaced with the function >>> using the cache. Using the function pointer, the API and ABI of >>> ap_explode_recent_localtime stays the same. >>> > >>> > I've implemented five caches: explode local/gmt, rfc822 date, and >>> common log format local/gmt. For the rfc822 cache, it will cache the >>> formatted string, while in the old implementation, it'd only cache the >>> exploded time structure, so the old implementation formatted the string on >>> every request. >>> > >>> > Here is the child init function in util_time.c that will be referenced >>> in server/core.c: >>> > >>> > ------------------------------------------------- >>> > void ap_util_time_child_init(apr_pool_t *p, server_rec *s) >>> > { >>> > #if APR_HAS_THREADS >>> > int threaded_mpm; >>> > if (ap_mpm_query(AP_MPMQ_IS_THREADED, &threaded_mpm) == APR_SUCCESS >>> > && threaded_mpm) >>> > { >>> > #define init_cache_func(mutex, func_ptr, func_name)\ >>> > apr_thread_mutex_create(&mutex, APR_THREAD_MUTEX_DEFAULT, p);\ >>> > func_ptr = &func_name; >>> > >>> > init_cache_func(explode_time_lt_lock, explode_time_lt_ptr, >>> > explode_time_lt_locking) >>> > init_cache_func(explode_time_gmt_lock, explode_time_gmt_ptr, >>> > explode_time_gmt_locking) >>> > init_cache_func(rfc822_date_lock, rfc822_date_ptr, >>> > rfc822_date_locking) >>> > init_cache_func(clf_time_local_lock, clf_time_local_ptr, >>> > clf_time_local_locking) >>> > init_cache_func(clf_time_gmt_lock, clf_time_gmt_ptr, >>> > clf_time_gmt_locking) >>> > } >>> > else >>> > { >>> > #endif >>> > explode_time_lt_ptr = &explode_time_lt_nolocking; >>> > explode_time_gmt_ptr = &explode_time_gmt_nolocking; >>> > rfc822_date_ptr = &rfc822_date_nolocking; >>> > clf_time_local_ptr = &clf_time_local_nolocking; >>> > clf_time_gmt_ptr = &clf_time_gmt_nolocking; >>> > #if APR_HAS_THREADS >>> > } >>> > #endif >>> > } >>> > ------------------------------------------------------------- >>> > >>> > Comments? >>> > >>> >>> >> >