Brian Pane wrote:
If we could change the apr_atomic_inc/dec functions to use
apr_uint32_t,
this part of the fdqueue code could become a lot simpler.
>As far as I know, we can make it work with apr_uint32_t on most platforms, as long as we declare any inline assembly blocks as volatile (thanks to Sascha Schumann for fixing this recently in apr_atomic_cas on Linux).
The one platform I'm not sure of is OS/390, due to the special data types and API involved: http://marc.theaimsgroup.com/?l=apr-dev&m=104129861312811
Don't let that stop you. I'll take care of OS/390. Right off the top of my head, I can't see how the cs_t could be anything other than a 32 bit integer, knowing how the cs instruction works.
(sorry for not replying to that msg earlier...I'm still backed up from the holidays.)
I still have one more change I'm hoping to make in the fdqueue
synchronization logic: move the queue manipulation code in
ap_queue_push/pop outside the mutex protected region by maintaining
a linked list of queue nodes with apr_atomic_casptr.
Sounds good, as long as the pops are single threaded somehow, or if
you have
some other way of getting around the A-B-A cas pop window.
I'm planning on spinning around the CAS; i.e., retry if the CAS fails due to a collision with another thread. The queue_info synchronization, which uses that same technique, seems to be working well in practice.
no...we're cool if the CAS fails; the problem is that sometimes it doesn't fail when you wish it would.
thread 1 prepares to pop A off the list and picks up its next pointer to B, then gets interrupted/swapped out/whatever.
other threads really pop A off the list, pop B off the list and free it, then push A back on the list with A's next pointer now pointing at C.
thread 1 wakes back up and uses CAS to compare for A and swap with B. The CAS succeeds on most architectures, because the head contains a pointer to A once again. This corrupts the head to point to B which has been freed. The problem is that A's next pointer had been updated, and a straightforward CAS on a single word can't tell the difference between a pointer to old A and a pointer to new improved A.
I did look at the queue_info pop and decided it must be single threaded for a given pool/qi combo, because it is driven as a pool cleanup and if that wasn't single threaded already, the apr pool destroy code would go nuts. I hope that assessment is correct.
A double word CAS can do it safely by incrementing a list version # concurrently with updating the list head pointer. That's possible on Pentiums with CMPXCHG8B and __cds on S/390. PowerPCs and others (Alpha??) don't need the double word version; they can use Load and Reserve followed by Store Conditional safely. But I'm not aware of such a mechanism on Sparc for example, and I don't know much about other CPU architectures.
Greg