On Tue, Feb 12, 2019 at 04:21:21AM +0100, Andrea Parri wrote: > > > + if (!si) > > > + goto bad_nofile; > > > + > > > + preempt_disable(); > > > + if (!(si->flags & SWP_VALID)) > > > + goto unlock_out; > > > > After Hugh alluded to barriers, it seems the read of SWP_VALID could be > > reordered with the write in preempt_disable at runtime. Without smp_mb() > > between the two, couldn't this happen, however unlikely a race it is? > > > > CPU0 CPU1 > > > > __swap_duplicate() > > get_swap_device() > > // sees SWP_VALID set > > swapoff > > p->flags &= ~SWP_VALID; > > spin_unlock(&p->lock); // pair w/ > > smp_mb > > ... > > stop_machine(...) > > p->swap_map = NULL; > > preempt_disable() > > read NULL p->swap_map > > > I don't think that that smp_mb() is necessary. I elaborate: > > An important piece of information, I think, that is missing in the > diagram above is the stopper thread which executes the work queued > by stop_machine(). We have two cases to consider, that is, > > 1) the stopper is "executed before" the preempt-disable section > > CPU0 > > cpu_stopper_thread() > ... > preempt_disable() > ... > preempt_enable() > > 2) the stopper is "executed after" the preempt-disable section > > CPU0 > > preempt_disable() > ... > preempt_enable() > ... > cpu_stopper_thread() > > Notice that the reads from p->flags and p->swap_map in CPU0 cannot > cross cpu_stopper_thread(). The claim is that CPU0 sees SWP_VALID > unset in (1) and that it sees a non-NULL p->swap_map in (2). > > I consider the two cases separately: > > 1) CPU1 unsets SPW_VALID, it locks the stopper's lock, and it > queues the stopper work; CPU0 locks the stopper's lock, it > dequeues this work, and it reads from p->flags. > > Diagrammatically, we have the following MP-like pattern: > > CPU0 CPU1 > > lock(stopper->lock) p->flags &= ~SPW_VALID > get @work lock(stopper->lock) > unlock(stopper->lock) add @work > reads p->flags unlock(stopper->lock) > > where CPU0 must see SPW_VALID unset (if CPU0 sees the work > added by CPU1). > > 2) CPU0 reads from p->swap_map, it locks the completion lock, > and it signals completion; CPU1 locks the completion lock, > it checks for completion, and it writes to p->swap_map. > > (If CPU0 doesn't signal the completion, or CPU1 doesn't see > the completion, then CPU1 will have to iterate the read and > to postpone the control-dependent write to p->swap_map.) > > Diagrammatically, we have the following LB-like pattern: > > CPU0 CPU1 > > reads p->swap_map lock(completion) > lock(completion) read completion->done > completion->done++ unlock(completion) > unlock(completion) p->swap_map = NULL > > where CPU0 must see a non-NULL p->swap_map if CPU1 sees the > completion from CPU0. > > Does this make sense?
Yes, thanks for this, Andrea! Good that smp_mb isn't needed.