I personnally like this solution better (IMHO) since it does not rely on
apr_thread_mutex_trylock() to be wait-free/userspace (eg. natively
implements the "compare and swap").

On the other hand, apr_atomic_cas32() may itself be implemented using
apr_thread_mutex_lock() when USE_ATOMICS_GENERIC is defined (explicitly, or
with --enable-nonportable-atomics=no, or else forcibly with "gcc -stdc=c89"
or intel cpus <= i686).

Hence with USE_ATOMICS_GENERIC, apr_thread_mutex_trylock() may be a better
solution than the apr_thread_mutex_lock()...



On Tue, Dec 3, 2013 at 6:01 PM, Daniel Lescohier
<daniel.lescoh...@cbsi.com>wrote:

> If the developers list is OK using apr_atomic in the server core, there
> would be lots of advantages over trylock:
>
>    1. No need for child init.
>    2. No need for function pointers.
>    3. Could have a lock per cache element (I deemed it too expensive
>    memory-wise to have a large mutex structure per cache element).
>    4. It would avoid the problem of trylock not being implemented on all
>    platforms.
>    5. Fewer parameters to the function macro.
>
> The code would be like this:
>
> #define TIME_CACHE_FUNCTION(VALUE_SIZE, CACHE_T, CACHE_PTR,
> CACHE_SIZE_POWER,\
>     CALC_FUNC, AFTER_READ_WORK\
>
> )\
>     const apr_int64_t seconds = apr_time_sec(t);\
>     apr_status_t status;\
>     CACHE_T * const cache_element = \
>         &(CACHE_PTR[seconds & ((1<<CACHE_SIZE_POWER)-1)]);\
>     /* seconds==0 can be confused with unitialized cache; don't use cache
> */\
>     if (seconds==0) return CALC_FUNC(value, t);\
>     if (apr_atomic_cas32(&cache_element->lock, 1, 0)==0) {\
>
>         if (seconds == cache_element->key) {\
>             memcpy(value, &cache_element->value, VALUE_SIZE);\
>             apr_atomic_dec32(&cache_element->lock);\
>
>             AFTER_READ_WORK;\
>             return APR_SUCCESS;\
>         }\
>         if (seconds < cache_element->key) {\
>             apr_atomic_dec32(&cache_element->lock);\
>             return CALC_FUNC(value, t);\
>         }\
>         apr_atomic_dec32(&cache_element->lock);\
>
>     }\
>     status = CALC_FUNC(value, t);\
>     if (status == APR_SUCCESS) {\
>         if (apr_atomic_cas32(&cache_element->lock, 1, 0)==0) {\
>
>             if (seconds > cache_element->key) {\
>                 cache_element->key = seconds;\
>                 memcpy(&cache_element->value, value, VALUE_SIZE);\
>             }\
>             apr_atomic_dec32(&cache_element->lock);\
>         }\
>     }\
>     return status;
>
> --------------------------------------------------
>
> typedef struct {
>     apr_int64_t key;
>     apr_uint32_t lock;
>
>     apr_time_exp_t value;
> } explode_time_cache_t;
>
> TIME_CACHE(explode_time_cache_t, explode_time_lt_cache,
>            TIME_CACHE_SIZE_POWER)
>
> AP_DECLARE(apr_status_t) ap_explode_recent_localtime(
>     apr_time_exp_t * value, apr_time_t t)
> {
>     TIME_CACHE_FUNCTION(
>
>     sizeof(apr_time_exp_t), explode_time_cache_t, explode_time_lt_cache,
>     TIME_CACHE_SIZE_POWER, apr_time_exp_lt,
>     value->tm_usec = (apr_int32_t) apr_time_usec(t))
> }
>
>
>
> On Tue, Dec 3, 2013 at 10:53 AM, Daniel Lescohier <
> daniel.lescoh...@cbsi.com> wrote:
>
>> I infer that you think that in this particular case, the function macro
>> makes the code more readable and maintainable.  I don't think one can
>> define an absolute rule, it's a judgment call whether a macro increases or
>> decreases clarity and maintainability.  It reminds me of 'goto': most of
>> the time, one shouldn't use it, but there are cases where it's better than
>> the alternative.
>>
>>
>> On Tue, Dec 3, 2013 at 10:01 AM, Jim Jagielski <j...@jagunet.com> wrote:
>>
>>> At first blush, the below looks very workable! On a slightly
>>> off-topic topic, however, I wonder if macros are the way to
>>> go.  Long long ago function calls were really really expensive.
>>> Does the decreased clarity (and debugging capability) really
>>> offset the saved cycles? Agreed that it's not fully applicable
>>> below.
>>>
>>> On Dec 2, 2013, at 6:13 PM, Daniel Lescohier <
>>> daniel.lescoh...@cbsinteractive.com> wrote:
>>>
>>> > The current time caching implementation in util_time.c and
>>> mod_log_config.c is not guaranteed to work with all compilers and cpu
>>> architectures.  I have some proposed code I've written, that I want to get
>>> input from the mailing list on, before continuing on to compile and test it.
>>> >
>>> > The old implementation tries to do a wait-free algorithm, but
>>> incorrectly, because the algorithm relies on total memory ordering, and the
>>> implementation does not guarantee total memory ordering from the compiler
>>> or CPU.
>>> >
>>> > My proposal is to use the standard and portable
>>> apr_thread_mutex_trylock for enforcing the memory barriers.  For APR's
>>> Linux x86_64 implementation, this basically turns into a lock: cmpxchgl
>>> instruction, and a lock: decl instruction for the unlock.  Because only
>>> trylock is used, not lock, the futex is never spun and arbitrated on by the
>>> kernel, it's all done in userspace (if there's contention, each thread just
>>> calculates the value itself instead of reading from the cache).  So the
>>> replacement is also a wait-free algorithm, using the standard and portable
>>> apr_thread_mutex calls.  Also, the old algorithm does two memcpy operations
>>> when reading from the cache, while the new algorithm just does one.  For
>>> APR_HAS_THREADS==0 or a non-threaded mpm in use, no locking is done.
>>> >
>>> > The first part of the code is generic time caching code I've written
>>> in util_time_caching.h. I use these macros to create five different time
>>> caches:
>>> >
>>> > excerpt w/o boilerplate:
>>> >
>>> > -----------------------------------
>>> > #define TIME_CACHE(CACHE_T, CACHE_PTR, CACHE_SIZE_POWER)\
>>> > static CACHE_T CACHE_PTR[1<<CACHE_SIZE_POWER];
>>> >
>>> > #define TIME_CACHE_FUNC_PTR(FUNC_PTR, VALUE_T, CALC_FUNC)\
>>> > static apr_status_t (*FUNC_PTR)(VALUE_T *, apr_time_t) = &CALC_FUNC;
>>> >
>>> > /* TIME_CACHE_FUNCTION macro:
>>> >  * The cache is implemented as a ring buffer. The key in the
>>> >  * cache element indicates which second the cache value is for.
>>> >  * We use a wait-free algorithm: if we cannot access the cache,
>>> >  * we just calculate the value, doing useful work, instead of
>>> >  * spinning trying to access the cache.
>>> >  * We only update the cache with newer times, because older times
>>> >  * should have a lower cache hit ratio, and we want to keep the
>>> >  * caches small to fit in the CPU L1/L2 caches.
>>> >  */
>>> >
>>> > #define TIME_CACHE_FUNCTION(FUNC_NAME, VALUE_T, VALUE_SIZE, \
>>> >     CACHE_T, CACHE_PTR, CACHE_SIZE_POWER, \
>>> >     CALC_FUNC, TRY_LOCK, RELEASE_LOCK, AFTER_READ_WORK)\
>>> > static apr_status_t FUNC_NAME(VALUE_T *value, apr_time_t t)\
>>> > {\
>>> >     const apr_int64_t seconds = apr_time_sec(t);\
>>> >     apr_status_t status;\
>>> >     CACHE_T * const cache_element = \
>>> >         &(CACHE_PTR[seconds & ((1<<CACHE_SIZE_POWER)-1)]);\
>>> >     /* seconds==0 can be confused with unitialized cache; don't use
>>> cache */\
>>> >     if (seconds==0) return CALC_FUNC(value, t);\
>>> >     if (TRY_LOCK) {\
>>> >         if (seconds == cache_element->key) {\
>>> >             memcpy(value, &cache_element->value, VALUE_SIZE);\
>>> >             RELEASE_LOCK;\
>>> >             AFTER_READ_WORK;\
>>> >             return APR_SUCCESS;\
>>> >         }\
>>> >         if (seconds < cache_element->key) {\
>>> >             RELEASE_LOCK;\
>>> >             return CALC_FUNC(value, t);\
>>> >         }\
>>> >         RELEASE_LOCK;\
>>> >     }\
>>> >     status = CALC_FUNC(value, t);\
>>> >     if (status == APR_SUCCESS) {\
>>> >         if (TRY_LOCK) {\
>>> >             if (seconds > cache_element->key) {\
>>> >                 cache_element->key = seconds;\
>>> >                 memcpy(&cache_element->value, value, VALUE_SIZE);\
>>> >             }\
>>> >             RELEASE_LOCK;\
>>> >         }\
>>> >     }\
>>> >     return status;\
>>> > }
>>> > ----------------------------------------------
>>> >
>>> > Here is an example of implementing one of the caches in util_time.c:
>>> >
>>> > ----------------------------------------------
>>> > /* Have small sized caches, to stay in CPU's L1/L2 caches.
>>> >  * There should be few requests older than 3 seconds, so set
>>> >  * cache size power to 2.
>>> >  */
>>> > #define TIME_CACHE_SIZE_POWER 2
>>> >
>>> > typedef struct {
>>> >     apr_int64_t key;
>>> >     apr_time_exp_t value;
>>> > } explode_time_cache_t;
>>> >
>>> > TIME_CACHE(explode_time_cache_t, explode_time_lt_cache,
>>> >            TIME_CACHE_SIZE_POWER)
>>> > TIME_CACHE_FUNC_PTR(explode_time_lt_ptr, apr_time_exp_t,
>>> >                     apr_time_exp_lt)
>>> > TIME_CACHE_FUNCTION(
>>> >     explode_time_lt_nolocking, apr_time_exp_t, sizeof(apr_time_exp_t),
>>> >     explode_time_cache_t, explode_time_lt_cache, TIME_CACHE_SIZE_POWER,
>>> >     apr_time_exp_lt, 1,,
>>> >     value->tm_usec = (apr_int32_t) apr_time_usec(t))
>>> > #if APR_HAS_THREADS
>>> > static apr_thread_mutex_t *explode_time_lt_lock;
>>> > TIME_CACHE_FUNCTION(
>>> >     explode_time_lt_locking, apr_time_exp_t, sizeof(apr_time_exp_t),
>>> >     explode_time_cache_t, explode_time_lt_cache, TIME_CACHE_SIZE_POWER,
>>> >     apr_time_exp_lt,
>>> >     apr_thread_mutex_trylock(explode_time_lt_lock)==APR_SUCCESS,
>>> >     apr_thread_mutex_unlock(explode_time_lt_lock),
>>> >     value->tm_usec = (apr_int32_t) apr_time_usec(t)
>>> > #endif
>>> >
>>> > AP_DECLARE(apr_status_t) ap_explode_recent_localtime(apr_time_exp_t *
>>> tm,
>>> >                                                      apr_time_t t)
>>> > {
>>> >     return explode_recent_lt_ptr(tm, t);
>>> > }
>>> > -------------------------------------------------
>>> >
>>> > The function pointer initially points to the uncached CALC_FUNC; only
>>> after child_init is run is the function pointer replaced with the function
>>> using the cache.  Using the function pointer, the API and ABI of
>>> ap_explode_recent_localtime stays the same.
>>> >
>>> > I've implemented five caches: explode local/gmt, rfc822 date, and
>>> common log format local/gmt.  For the rfc822 cache, it will cache the
>>> formatted string, while in the old implementation, it'd only cache the
>>> exploded time structure, so the old implementation formatted the string on
>>> every request.
>>> >
>>> > Here is the child init function in util_time.c that will be referenced
>>> in server/core.c:
>>> >
>>> > -------------------------------------------------
>>> > void ap_util_time_child_init(apr_pool_t *p, server_rec *s)
>>> > {
>>> > #if APR_HAS_THREADS
>>> >     int threaded_mpm;
>>> >     if (ap_mpm_query(AP_MPMQ_IS_THREADED, &threaded_mpm) == APR_SUCCESS
>>> >         && threaded_mpm)
>>> >     {
>>> > #define init_cache_func(mutex, func_ptr, func_name)\
>>> >         apr_thread_mutex_create(&mutex, APR_THREAD_MUTEX_DEFAULT, p);\
>>> >         func_ptr = &func_name;
>>> >
>>> >         init_cache_func(explode_time_lt_lock, explode_time_lt_ptr,
>>> >             explode_time_lt_locking)
>>> >         init_cache_func(explode_time_gmt_lock, explode_time_gmt_ptr,
>>> >             explode_time_gmt_locking)
>>> >         init_cache_func(rfc822_date_lock, rfc822_date_ptr,
>>> >             rfc822_date_locking)
>>> >         init_cache_func(clf_time_local_lock, clf_time_local_ptr,
>>> >             clf_time_local_locking)
>>> >         init_cache_func(clf_time_gmt_lock, clf_time_gmt_ptr,
>>> >             clf_time_gmt_locking)
>>> >     }
>>> >     else
>>> >     {
>>> > #endif
>>> >         explode_time_lt_ptr = &explode_time_lt_nolocking;
>>> >         explode_time_gmt_ptr = &explode_time_gmt_nolocking;
>>> >         rfc822_date_ptr = &rfc822_date_nolocking;
>>> >         clf_time_local_ptr = &clf_time_local_nolocking;
>>> >         clf_time_gmt_ptr = &clf_time_gmt_nolocking;
>>> > #if APR_HAS_THREADS
>>> >     }
>>> > #endif
>>> > }
>>> > -------------------------------------------------------------
>>> >
>>> > Comments?
>>> >
>>>
>>>
>>
>

Reply via email to