Hi there, On Sat, Apr 05, 2003 at 06:08:51PM +0100, Steve Harris wrote: > On Sat, Apr 05, 2003 at 06:15:09 +0200, Ingo Oeser wrote: > > Now make that thread-safe and esp. thread-safe on an architecture > > with weak memory ordering and all the fun stuff. > > Sure, it will only work on architectures where 32bit reads and writes are > atomic. That is not even true on all ix86 machines. At least I've seen special memory ordering barriers used in the kernel for newer ix86 machines.
> Smalled is not the issue, its branches that are important in inner loops. Right, but if your compiler unrolls your loop, uses indexes to load the data in a random order and increments your pointer later by a bigger chunk. e.g. a1 = buffer[read_ptr++ & (size - 1)]; a2 = buffer[read_ptr++ & (size - 1)]; a3 = buffer[read_ptr++ & (size - 1)]; a4 = buffer[read_ptr++ & (size - 1)]; a5 = buffer[read_ptr++ & (size - 1)]; a_ = (a1 + a2 + a3 + a4 + a5) / 5; can become: read_ptr += 5; a5 = buffer[(read_ptr - 5) & (size - 1)]; a3 = buffer[(read_ptr - 3) & (size - 1)]; a4 = buffer[(read_ptr - 4) & (size - 1)]; a2 = buffer[(read_ptr - 2) & (size - 1)]; a1 = buffer[(read_ptr - 1) & (size - 1)]; a_ = (a1 + a2 + a3 + a4 + a5) / 5; without any problem (compiler tries so parallize load/stores, because the architecture has two load/store units). These are mathematically equivalent transformations backed by the C99 and C89 standard and now your write_ptr will overwrite 5 bytes of your buffer. This may not be important for sound output, because it will only sound wrong, but for recording this is really bad, since it will record wrong data. > I dont think compiler will optimise away the trhread safeness, unless I've > missed something. A vectorising compiler might unroll the loops but it > will still keep the ordering, and the aligned vector operations are still > atomic AFAIK. The ordering will only be kept, if the data access path requires it or you call (non-inlined) functions in between or you use volatile. The kernel people have seen lots of thread safeness being optimized away, so I never assume anything about atomicy in C constructs, that is not backed by the standard. But our solutions combined will give maximum effectiveness. Your masked adressing in indexes removes most of the branches left in my lock_free_fifo scheme. It might be worth to code that all up to have a GPLed high performance fifo scheme. The glibc has internal support for atomic operations (at least atomic_add() is there and allows a negative argument, so atomic_sub() is there too and exchange_and_add(&lff->filled, 0) will provide an atomic read), so there is no problem about portability on Linux systems (and even other systems). Regards Ingo Oeser