> > + if (!si) > > + goto bad_nofile; > > + > > + preempt_disable(); > > + if (!(si->flags & SWP_VALID)) > > + goto unlock_out; > > After Hugh alluded to barriers, it seems the read of SWP_VALID could be > reordered with the write in preempt_disable at runtime. Without smp_mb() > between the two, couldn't this happen, however unlikely a race it is? > > CPU0 CPU1 > > __swap_duplicate() > get_swap_device() > // sees SWP_VALID set > swapoff > p->flags &= ~SWP_VALID; > spin_unlock(&p->lock); // pair w/ > smp_mb > ... > stop_machine(...) > p->swap_map = NULL; > preempt_disable() > read NULL p->swap_map
I don't think that that smp_mb() is necessary. I elaborate: An important piece of information, I think, that is missing in the diagram above is the stopper thread which executes the work queued by stop_machine(). We have two cases to consider, that is, 1) the stopper is "executed before" the preempt-disable section CPU0 cpu_stopper_thread() ... preempt_disable() ... preempt_enable() 2) the stopper is "executed after" the preempt-disable section CPU0 preempt_disable() ... preempt_enable() ... cpu_stopper_thread() Notice that the reads from p->flags and p->swap_map in CPU0 cannot cross cpu_stopper_thread(). The claim is that CPU0 sees SWP_VALID unset in (1) and that it sees a non-NULL p->swap_map in (2). I consider the two cases separately: 1) CPU1 unsets SPW_VALID, it locks the stopper's lock, and it queues the stopper work; CPU0 locks the stopper's lock, it dequeues this work, and it reads from p->flags. Diagrammatically, we have the following MP-like pattern: CPU0 CPU1 lock(stopper->lock) p->flags &= ~SPW_VALID get @work lock(stopper->lock) unlock(stopper->lock) add @work reads p->flags unlock(stopper->lock) where CPU0 must see SPW_VALID unset (if CPU0 sees the work added by CPU1). 2) CPU0 reads from p->swap_map, it locks the completion lock, and it signals completion; CPU1 locks the completion lock, it checks for completion, and it writes to p->swap_map. (If CPU0 doesn't signal the completion, or CPU1 doesn't see the completion, then CPU1 will have to iterate the read and to postpone the control-dependent write to p->swap_map.) Diagrammatically, we have the following LB-like pattern: CPU0 CPU1 reads p->swap_map lock(completion) lock(completion) read completion->done completion->done++ unlock(completion) unlock(completion) p->swap_map = NULL where CPU0 must see a non-NULL p->swap_map if CPU1 sees the completion from CPU0. Does this make sense? Andrea