Il 15/11/2012 08:47, liu ping fan ha scritto: >>>> RCU prototype required pointer-sized access, which you cannot make type- >>> But I think that your RCU prototype should rely on atomic of CPU, not >>> gcc‘s atomic. >> >> What's the difference? gcc's atomic produces the same instructions as >> hand-written assembly (or should). >> > Probably I made a mistake here, in vhost, log = > __sync_fetch_and_and(from, 0) is used to fetch 64bits atomically in > the case 32bits qemu running on 64bits linux. Right? But how can > we read 32bits twice in atomic?
You can use cmpxchg8b. >>> Otherwise, it could be slow (I guess something like spinlock there). >> >> Not sure what you mean. I'm using two things: 1) write barriers; 2) >> atomic_xchg of a pointer for fast, wait-free enqueuing of call_rcu >> callbacks. > > Got your main idea. Will go through your prototype. Just one more > question, why here atomic_xchg needed? Could not the sequence "read > old pointer, set new pointer" satisfy your requirement? No, it would be racy. Say you have this: wrong right ----------------------------------------- ref(new) ref(new) old = tail old = new tail = new xchg(&old, &tail) old->next = new old->next = new up(&sem) up(&sem) where sem and tail are global, while old and new are local. There can be any number of producers as in the above scheme, and one consumer. In the consumer, iteration of the list starts from the second element (the first element is dummy): dummy = new Node; tail = dummy; for(;;) { down(&sem); node = dummy->next; unref(dummy); visit node; dummy = node; } Then the invariants are: - the number of elements N in the list (starting from the dummy node and following ->next) is always >1. Because of this, you never have to touch head when adding an element. - the number of excess signals in the semaphore sem is always <= N-1. Because of this, it doesn't matter that there is an instant in time where tail is not reachable from head. The element is really in the list (as far as the consumer goes) only after the semaphore is signaled. In the wrong version, two threads could read the same pointer: x = tail | y = tail | tail = new_x | tail = new_y | x->next = new_x | old_tail->next = new_x up(&sem) | y->next = new_y | old_tail->next = new_x up(&sem) | No node points to new_x, so you have 2 nodes reachable from the head (the dummy node, and new_y) with 2 signals on the semaphore, contradicting the second invariant. This instead works: x = new_x | y = new_y | xchg(&x, &tail) | tail = x, x = old_tail xchg(&y, &tail) | tail = y, y = x x->next = new_x | old_tail->next = new_x up(&sem) | y->next = new_y | x->next = new_y up(&sem) | because you have 3 nodes and 2 signals. Paolo