> > I mean have a single function that does both the atomic load/store and
> > the memory barrier. Instead of:
> >
> >   stw_phys(addr, val)
> >   barrier();
> >
> > We do:
> >
> >   stw_phys_barrier(addr, val).
> 
> Well, I think it's a good idea to use Linux APIs instead of
> inventing our own. A lot of people are familiar with them,
> and there is decent documentation written.

Where is this documentation? Where does it say that stw_phys is atomic?

By my reading stw_phys is implemented using memcpy. This means that it is 
almost certainly not atomic.  My guess is that this works entirely by chance, 
because the window for observing that race condition on cached SMP systems is 
exceedingly small.
 
> In the example above, the name does not make it clear
> whether the barrier is before or after the store?

As a first implementation, both. I doubt we're really that sensitive to 
performance. If we *are* that performance sensitive then I'm pretty sure some 
architectures can do ordering of individual loads/stores, which are more 
efficient than a global barrier.

> I think this demonstrates why it's a good idea to stick to Linux
> standard.

qemu has restrictions and requirements that don't exist inside a linux kernel.
 
> > This avoids issues in the future (multithreaded TCG) where atomic memory
> > accesses may be nontrivial.
> 
> Unfortunately I have no real idea how this will work and what the issues
> are. I speculate stw_phys, on host platforms that can not write 2 bytes
> atomically, will need to take some lock? 

Yes. As mentioned above, I don't believe stw_phys is atomic on any platform.

> So possibly this means that we
> could optimize the barrier away, but I don't think this amounts to a
> serious issue, I guess portability/readability is more important.

The more important issue is that regular devices which to not require 
coherency or ordering can omit this lock.

Paul


Reply via email to