Re: [LAD] a *simple* ring buffer, comments pls?

2011-08-27 Thread Emanuel Rumpf
2011/8/20 Emanuel Rumpf :
> Technologies as Mono and LLVM appear to make that (using different progr. 
> languages) possible.
>
... and Java.


"Microsoft submitted a portion of its .Net framework to a standards
body, but Heffner estimates it amounts to only 10%, ..."
( source:  http://www.computerworld.com/s/article/71221/.Net_vs._Java )

"Because of the possible threat of Microsoft patents, the FSF
recommends that people avoid creating software that depends on Mono or
C#"
( source: http://en.wikipedia.org/wiki/Comparison_of_the_Java_and_.NET_platforms
)

So, maybe rather not  .net / Mono  ;)

-- 
E.R.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-08-20 Thread Gabriel Beddingfield

On 08/20/2011 01:24 PM, Emanuel Rumpf wrote:

2011/7/20 Kai Vehmanen:


PS My plan B is to wait and see for another 10 years (and many more
   long threads about this topic on this list), and then
   I can at least just start using C++0x already... ;)


It's called C++11 now and accepted by ISO already ;)


Nope.  First line in TFA: "Barring unforeseen delays the official 
standard will be approved in the fall."


(translation: not quite yet.)

-gabriel
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-08-20 Thread Emanuel Rumpf
2011/7/20 Kai Vehmanen :
>
> PS My plan B is to wait and see for another 10 years (and many more
>   long threads about this topic on this list), and then
>   I can at least just start using C++0x already... ;)

It's called C++11 now and accepted by ISO already ;)

2011 April 11 ---  C++11, passed review by the technical standards committee:
  http://www.physorg.com/news/2011-04-c11-iteration-language.html

The biggest changes:
  
http://www.softwarequalityconnection.com/2011/06/the-biggest-changes-in-c11-and-why-you-should-care/

I've never particularly liked C++ (syntax and complexity) though and would
prefer a different language to be spread.
Technologies as Mono and LLVM appear to make that possible.

-- 
E.R.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-20 Thread Kai Vehmanen

Hi,

thanks folks for the annual ringbuffer thread, it's always a pleasure to 
read (and you learn something new at every iteration). ;)


On Tue, 12 Jul 2011, Dan Muresan wrote:


I wonder if

{
pthread_mutex_t dummy = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock(&dummy);
pthread_mutex_unlock(&dummy);
}

doesn't provide a portable full memory barrier. The dummy is different


This is exactly what I've been thinking about using in my code. It does 
have bit of a "i'm giving up" feel to it ;), but it would seem to be the 
only really easy, portable way to ensure a full barrier, even on exotic 
hardware. I'd love to use the new C++0x atomic+barrier ops, but I'm afraid 
that's still too bleeding edge as a build-dependency, and I'm not 
motivated enough to add (and test) a spagetti blob of autoconf magic to 
test for various available options.



each time, so no contention -- but still inefficient since  this would
be a 2-step full barrier. Nevertheless, it could be a portable
fallback.


True, but I wonder if the performance hit is really big enough to warrant 
the complexity (in terms of additional testing and maintaining more 
optimal barrier implementations). Predictability, reliability (data 
coherency) and code readability might be worth more than the performance 
hit. But yeah, some actual numbers would be needed (and thus I'm still 
just contemplating using this approach).


PS My plan B is to wait and see for another 10 years (and many more
   long threads about this topic on this list), and then
   I can at least just start using C++0x already... ;)
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-13 Thread Paul Davis
On Wed, Jul 13, 2011 at 10:28 AM, Gabriel Beddingfield
 wrote:

>> i don't see why this is needed. its trivial to demonstrate on a piece
>> of paper that in a system with weak memory ordering constraints,
>> absence of a memory barrier is incorrect for any code with coupled
>> data values (e.g. a read index and read data). it doesn't matter if
>> this doesn't happen very often. you don't need a simulator of any
>> given CPU+memory model - its just demonstrably incorrect.
>
> "Beware of bugs in the above code -- I've only proved it correct not
> actually run it." - Knuth

funny, but just so that we're clear - I'm not suggesting that a paper
proof of correctness is enough, but I am suggesting that a paper proof
of incorrectness *is* enough.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-13 Thread Gabriel Beddingfield
On Wed, Jul 13, 2011 at 8:51 AM, Paul Davis  wrote:
> On Wed, Jul 13, 2011 at 9:46 AM, Olivier Guilyardi  wrote:
>> On 07/13/2011 11:09 AM, Tim Blechmann wrote:
>>
>>> in fact, testing is not the best approach for verifying lock-free data
>>> structures: an implementation may work for years, if a certain condition is
>>> never triggered. the only reasonable way to ensure the correctness is a 
>>> formal
>>> proof. unfortunately, most publications assume a sequencially consistent 
>>> memory
>>> model and sometimes even avoid dealing with the ABA problem.
>>
>> To me, testing on real devices is needed, it's a pragmatic approach. But I 
>> agree
>> it doesn't solve the entire problem.
[snip]
>
> i don't see why this is needed. its trivial to demonstrate on a piece
> of paper that in a system with weak memory ordering constraints,
> absence of a memory barrier is incorrect for any code with coupled
> data values (e.g. a read index and read data). it doesn't matter if
> this doesn't happen very often. you don't need a simulator of any
> given CPU+memory model - its just demonstrably incorrect.

"Beware of bugs in the above code -- I've only proved it correct not
actually run it." - Knuth
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-13 Thread Olivier Guilyardi
On 07/13/2011 03:51 PM, Paul Davis wrote:
> On Wed, Jul 13, 2011 at 9:46 AM, Olivier Guilyardi  wrote:
>> On 07/13/2011 11:09 AM, Tim Blechmann wrote:
>>
>>> in fact, testing is not the best approach for verifying lock-free data
>>> structures: an implementation may work for years, if a certain condition is
>>> never triggered. the only reasonable way to ensure the correctness is a 
>>> formal
>>> proof. unfortunately, most publications assume a sequencially consistent 
>>> memory
>>> model and sometimes even avoid dealing with the ABA problem.
>> To me, testing on real devices is needed, it's a pragmatic approach. But I 
>> agree
>> it doesn't solve the entire problem.
>>
>> That said, what about simulation? One could for example implement a minimal
>> emulator for the abstract CPU and memory model described in
>> linux/Documentation/memory-barriers.txt, and test lock-free algorithms on 
>> this.
> 
> i don't see why this is needed. its trivial to demonstrate on a piece
> of paper that in a system with weak memory ordering constraints,
> absence of a memory barrier is incorrect for any code with coupled
> data values (e.g. a read index and read data). it doesn't matter if
> this doesn't happen very often. you don't need a simulator of any
> given CPU+memory model - its just demonstrably incorrect.

I see your point, although sometimes it's unclear what kind of barriers to use
and where to place them.

--
  Olivier

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-13 Thread Olivier Guilyardi
On 07/13/2011 11:09 AM, Tim Blechmann wrote:

> in fact, testing is not the best approach for verifying lock-free data 
> structures: an implementation may work for years, if a certain condition is 
> never triggered. the only reasonable way to ensure the correctness is a 
> formal 
> proof. unfortunately, most publications assume a sequencially consistent 
> memory 
> model and sometimes even avoid dealing with the ABA problem.

To me, testing on real devices is needed, it's a pragmatic approach. But I agree
it doesn't solve the entire problem.

That said, what about simulation? One could for example implement a minimal
emulator for the abstract CPU and memory model described in
linux/Documentation/memory-barriers.txt, and test lock-free algorithms on this.

I personally think that's a good approach, in the spirit of unit testing. I'm
not talking about creating a full-fleged CPU emulator, just a minimal one, with
a simple custom language suitable for writing test suites.

Not only ringbuffers could be tested this way, but also other algorithms. But
all in all, it could be quite a lot of work I think. Maybe that would be a good
thesis subject ;)

--
  Olivier
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-13 Thread Paul Davis
On Wed, Jul 13, 2011 at 9:46 AM, Olivier Guilyardi  wrote:
> On 07/13/2011 11:09 AM, Tim Blechmann wrote:
>
>> in fact, testing is not the best approach for verifying lock-free data
>> structures: an implementation may work for years, if a certain condition is
>> never triggered. the only reasonable way to ensure the correctness is a 
>> formal
>> proof. unfortunately, most publications assume a sequencially consistent 
>> memory
>> model and sometimes even avoid dealing with the ABA problem.
>
> To me, testing on real devices is needed, it's a pragmatic approach. But I 
> agree
> it doesn't solve the entire problem.
>
> That said, what about simulation? One could for example implement a minimal
> emulator for the abstract CPU and memory model described in
> linux/Documentation/memory-barriers.txt, and test lock-free algorithms on 
> this.

i don't see why this is needed. its trivial to demonstrate on a piece
of paper that in a system with weak memory ordering constraints,
absence of a memory barrier is incorrect for any code with coupled
data values (e.g. a read index and read data). it doesn't matter if
this doesn't happen very often. you don't need a simulator of any
given CPU+memory model - its just demonstrably incorrect.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-13 Thread Tim Blechmann
>> no one can write a test case which fails when
>> memory barriers are missing in a ringbuffer implementation.
> 
> That's an interesting assertion.  It's kind of tempting to write some
> buggy circular buffers and test that assumption on common hardware.

in fact, testing is not the best approach for verifying lock-free data 
structures: an implementation may work for years, if a certain condition is 
never triggered. the only reasonable way to ensure the correctness is a formal 
proof. unfortunately, most publications assume a sequencially consistent memory 
model and sometimes even avoid dealing with the ABA problem.

fortunately the atomics of c++0x/c1x will make it much clearer (and more robust 
as the memory order argument to the atomic functions defaults to sequencial 
consistency)

cheers, tim


___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Olivier Guilyardi

Le 13/07/11 01:56, Paul Davis a écrit :

On Tue, Jul 12, 2011 at 7:31 PM, Arnold Krille  wrote:


You mean there should be a barrier to discussions about memory barriers?


No. He means that there needs to be a barrier inserted into the
discussion before its possible to move onto the next stage.


Yes, I think that might well be it :)

But it's getting late now. Good night

--
  Olivier


___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Olivier Guilyardi

Le 13/07/11 02:08, Fred Gleason a écrit :

On Jul 12, 2011, at 19:50 05, Olivier Guilyardi wrote:


Problem is I don't have a such device at the moment.


Is your testing code online somewhere?  I do have such a setup (iPad 2 
provisioned as a development device in XCode), and may take a crack at this.


That is nice! Yes I do.

On a standard shell you just need to:
$ svn co http://svn.samalyse.com/misc/rbtest
$ cd rbtest
$ make test

But you may need to adapt things a bit for the iPad.

--
  Olivier
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Fred Gleason
On Jul 12, 2011, at 19:50 05, Olivier Guilyardi wrote:

> Problem is I don't have a such device at the moment.

Is your testing code online somewhere?  I do have such a setup (iPad 2 
provisioned as a development device in XCode), and may take a crack at this.

Cheers!


|-|
| Frederick F. Gleason, Jr. |   Chief Developer   |
|   |   Paravel Systems   |
|-|
|  Research is what I'm doing when I don't know what I'm doing.   |
|  -- Werner von Braun|
|-|

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Paul Davis
On Tue, Jul 12, 2011 at 7:31 PM, Arnold Krille  wrote:

> You mean there should be a barrier to discussions about memory barriers?

No. He means that there needs to be a barrier inserted into the
discussion before its possible to move onto the next stage.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Olivier Guilyardi

Le 13/07/11 00:23, Dan Kegel a écrit :

On Tue, Jul 12, 2011 at 2:32 PM, Olivier Guilyardi  wrote:

no one can write a test case which fails when
memory barriers are missing in a ringbuffer implementation.


That's an interesting assertion.  It's kind of tempting to write some
buggy circular buffers and test that assumption on common hardware.


Not sure what you mean by buggy circular buffer, but we already did quite a lot 
of testing in the past.


That said, this article about the iPad2 is quite frightening. I've read that 
again and the guy seems to know what he's talking about. His little FIFO and his 
testing methodology both seem correct to me. That's the first potential proof I 
ever hear about:

http://wanderingcoder.net/2011/04/01/arm-memory-ordering/

This guy is quietly saying that a lot of code out there is about to break, for 
real. As I mentioned previously, multi-core ARM devices are in the wild now. In 
addition to the iPad2, possibly vulnerable Android devices are being sold 
massively right now.


Problem is I don't have a such device at the moment.

--
  Olivier
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Arnold Krille
On Tuesday 12 July 2011 22:20:48 Olivier Guilyardi wrote:
> On 07/12/2011 09:45 PM, Chris Cannam wrote:
> > Thinking it over and going back over some references and earlier
> > threads here (e.g. much earlier ones from Olivier et al) it does seem
> > that this should be enough.  This particular situation isn't so
> > complicated after all.  I think the more I read earlier during this
> > thread (and reading around) generally about memory ordering, the more
> > I was beginning to feel as if the entire subject was a source of only
> > trouble.
> 
> Quite interestingly, I have noticed that discussions about memory barriers
> are often somehow endless.

You mean there should be a barrier to discussions about memory barriers?

Sorry, couldn't resist...


signature.asc
Description: This is a digitally signed message part.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Dan Kegel
On Tue, Jul 12, 2011 at 2:32 PM, Olivier Guilyardi  wrote:
> no one can write a test case which fails when
> memory barriers are missing in a ringbuffer implementation.

That's an interesting assertion.  It's kind of tempting to write some
buggy circular buffers and test that assumption on common hardware.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Olivier Guilyardi
On 07/12/2011 11:37 PM, Chris Cannam wrote:
> On 12 July 2011 22:32, Olivier Guilyardi  wrote:
>> Thing is, of every single thing that has been said on this thread about 
>> memory
>> barriers and ringbuffers, no one can prove anything. On this thread, on 
>> others,
>> on LAD and elsewhere. For example, no one can write a test case which fails 
>> when
>> memory barriers are missing in a ringbuffer implementation.
> 
> There is one in the iPad 2 example Sean posted a link to earlier in the 
> thread:
> 
> http://wanderingcoder.net/2011/04/01/arm-memory-ordering/
> 
> I haven't tried it, lacking an iPad 2 or any other multicore ARM computer.

Ah right, I read that too quickly... Thing is, I'm always suspicious with
quickly crafted ringbuffers as the one on this blog post. It's never like a
mature implementation. I will try and run my little test suite on a such device.

--
  Olivier

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Chris Cannam
On 12 July 2011 22:32, Olivier Guilyardi  wrote:
> Thing is, of every single thing that has been said on this thread about memory
> barriers and ringbuffers, no one can prove anything. On this thread, on 
> others,
> on LAD and elsewhere. For example, no one can write a test case which fails 
> when
> memory barriers are missing in a ringbuffer implementation.

There is one in the iPad 2 example Sean posted a link to earlier in the thread:

http://wanderingcoder.net/2011/04/01/arm-memory-ordering/

I haven't tried it, lacking an iPad 2 or any other multicore ARM computer.


Chris
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Olivier Guilyardi
On 07/12/2011 10:36 PM, Paul Davis wrote:
> On Tue, Jul 12, 2011 at 4:20 PM, Olivier Guilyardi  wrote:
> 
>> Quite interestingly, I have noticed that discussions about memory barriers 
>> are
>> often somehow endless. What happened in the past is that I saw countless
>> discussions about whether they are needed, whether they are not, and people
>> would argue a lot and passionately.
> 
> I think the problem is that memory barriers are almost never required
> when writing "normal" code, and so people (including myself) are not
> exposed to their implementation or their use very much. and indeed,
> there are precious few library implementations of memory barriers, nor
> are they widely documented in a way that suggests that their use is
> "normal".
> 
> by contrast, they get used all over the place in the kernel
> (relatively speaking), so if you tend to have a lot of exposure to
> kernel code, then calls to mb() or whatever they use these days will
> be quite familiar.
> 
> there's the additional problem that this discussion normally ends up
> confusing two separate topics that many people seem to think are the
> same (they are not):
> 
>1) do you need to actively ensure correct thread-level synchronization
>   between the reader and writer of a single-reader/single-writer FIFO?
> Put differently, do you need to use synchronization mechanisms
>   semantically equivalent to a mutex to ensure that any
> arbitrary execution
>   order of the 2 threads does not cause incorrect behaviour?
> 
>2) do you need memory barriers to ensure correct synchronization
>  for this kind of data structure in the face of possible hardware 
> level
>  instruction reordering?
> 
> My feeling is that the answer to (1) is "no" and the answer to (2) is "yes".

Thing is, of every single thing that has been said on this thread about memory
barriers and ringbuffers, no one can prove anything. On this thread, on others,
on LAD and elsewhere. For example, no one can write a test case which fails when
memory barriers are missing in a ringbuffer implementation.

That's a pretty rare and intriguing situation.

--
  Olivier

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Olivier Guilyardi
On 07/12/2011 11:07 PM, Chris Cannam wrote:
> On 12 July 2011 21:20, Olivier Guilyardi  wrote:
>> Quite interestingly, I have noticed that discussions about memory barriers 
>> are
>> often somehow endless. [...] So I thought, maybe there's a hidden topic
>> behind that. A "memory barrier"...
> 
> Indeed -- perhaps these discussions need some sort of memory write
> barrier, ensuring that everything discussed before the barrier will be
> recalled from store subsequently, instead of being discussed anew.
> But this list would be awfully quiet if we had such a thing.

I didn't say the discussions were useless or uninteresting. I just meant that I
think there's a strong symbolic aspect in this topic, which makes it even more
interesting ;)

--
  Olivier
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Chris Cannam
On 12 July 2011 21:36, Paul Davis  wrote:
>   2) do you need memory barriers to ensure correct synchronization
>         for this kind of data structure in the face of possible hardware level
>         instruction reordering?

The transactional metaphor for this kind of thing seems useful -- the
idea that "we've written everything, now we commit for our readers"
feels like a helpful way to picture the points where barriers might be
necessary.

Since transactional integrity is not provided for us, the commit needs
to be either

 * protected with a memory barrier, if it doesn't matter that the data
is available before it has been announced but does matter if the data
is announced before being available
 * an atomic swap, if the new data must not be available before it has
been announced and also there is only a single point of reference to
the new data
 * mutex protected, if the references to any changes may be
significant or we are not confident of either of the previous cases

...?


Chris
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Chris Cannam
On 12 July 2011 21:20, Olivier Guilyardi  wrote:
> Quite interestingly, I have noticed that discussions about memory barriers are
> often somehow endless. [...] So I thought, maybe there's a hidden topic
> behind that. A "memory barrier"...

Indeed -- perhaps these discussions need some sort of memory write
barrier, ensuring that everything discussed before the barrier will be
recalled from store subsequently, instead of being discussed anew.
But this list would be awfully quiet if we had such a thing.


Chris
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Paul Davis
On Tue, Jul 12, 2011 at 4:20 PM, Olivier Guilyardi  wrote:

> Quite interestingly, I have noticed that discussions about memory barriers are
> often somehow endless. What happened in the past is that I saw countless
> discussions about whether they are needed, whether they are not, and people
> would argue a lot and passionately.

I think the problem is that memory barriers are almost never required
when writing "normal" code, and so people (including myself) are not
exposed to their implementation or their use very much. and indeed,
there are precious few library implementations of memory barriers, nor
are they widely documented in a way that suggests that their use is
"normal".

by contrast, they get used all over the place in the kernel
(relatively speaking), so if you tend to have a lot of exposure to
kernel code, then calls to mb() or whatever they use these days will
be quite familiar.

there's the additional problem that this discussion normally ends up
confusing two separate topics that many people seem to think are the
same (they are not):

   1) do you need to actively ensure correct thread-level synchronization
  between the reader and writer of a single-reader/single-writer FIFO?
  Put differently, do you need to use synchronization mechanisms
  semantically equivalent to a mutex to ensure that any
arbitrary execution
  order of the 2 threads does not cause incorrect behaviour?

   2) do you need memory barriers to ensure correct synchronization
 for this kind of data structure in the face of possible hardware level
 instruction reordering?

My feeling is that the answer to (1) is "no" and the answer to (2) is "yes".
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Olivier Guilyardi
On 07/12/2011 09:45 PM, Chris Cannam wrote:

> Thinking it over and going back over some references and earlier
> threads here (e.g. much earlier ones from Olivier et al) it does seem
> that this should be enough.  This particular situation isn't so
> complicated after all.  I think the more I read earlier during this
> thread (and reading around) generally about memory ordering, the more
> I was beginning to feel as if the entire subject was a source of only
> trouble.

Quite interestingly, I have noticed that discussions about memory barriers are
often somehow endless. What happened in the past is that I saw countless
discussions about whether they are needed, whether they are not, and people
would argue a lot and passionately. So I thought, maybe there's a hidden topic
behind that. A "memory barrier"... Well, that very much reminds me of this
memory loss which happens to all of us in the childhood. It turns early years
into fuzzy memories. That is a barrier, too.

Whether such psychological barriers are needed or not, that's an interesting
question :)

--
  Olivier






___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Chris Cannam
On 12 July 2011 13:44, Dan Muresan  wrote:
> I wonder if
>
> {
> pthread_mutex_t dummy = PTHREAD_MUTEX_INITIALIZER;
> pthread_mutex_lock(&dummy);
> pthread_mutex_unlock(&dummy);
> }
>
> doesn't provide a portable full memory barrier.

According to 
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11
the answer must surely be yes -- pthread_mutex_lock provides a memory
barrier and the only (explicit) exception is for recursive mutexes
already held by the caller.

> Oh, and you probably need a barrier in the consummer too

Yes, before updating the read index (I said read index earlier when I
meant write index).

Thinking it over and going back over some references and earlier
threads here (e.g. much earlier ones from Olivier et al) it does seem
that this should be enough.  This particular situation isn't so
complicated after all.  I think the more I read earlier during this
thread (and reading around) generally about memory ordering, the more
I was beginning to feel as if the entire subject was a source of only
trouble.


Chris
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Emanuel Rumpf
Will this solve any of the problems you are worrying about ?
Anyway, I'm mentioning it:

http://gcc.gnu.org/projects/gomp/

-- 
E.R.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Dan Muresan
> updating the read index, yes?  how is that expressed as portably as
> possible?  Does __sync_synchronized reliably do that?  (The
> documentation is surprisingly short...)

I wonder if

{
pthread_mutex_t dummy = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock(&dummy);
pthread_mutex_unlock(&dummy);
}

doesn't provide a portable full memory barrier. The dummy is different
each time, so no contention -- but still inefficient since  this would
be a 2-step full barrier. Nevertheless, it could be a portable
fallback.

Oh, and you probably need a barrier in the consummer too, just
reasoning from symmetry.

-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Olivier Guilyardi
On 07/12/2011 11:15 AM, Tim Blechmann wrote:
>>> using preempt_rt, the scheduling latency can be very low (like 10
>>> microseconds), if cpu frequency scaling is applied or smt/hyperthreading is
>>> enabled it can be as high as 250 microseconds (which is already quite
>>> significant, if one is using small signal vector sizes).
>> That's interesting. We're actually benchmarking scheduling latency at the
>> moment.
> 
> btw, i have discussed this briefly in my master thesis, section 4.1 [1]. the 
> effect of the scheduling latency can also be observed in the benchmarks given 
> in 
> section 5.

Wow, that's pretty exhaustive..

> and it would be great, if you can share your results, i'd be curious to see 
> them!

I'll setup something when we're done. The tests can actually be found in:
http://code.google.com/p/andraudio/source/browse/experiments/scheduling

Beware though this is about non-RT. We're trying to see if running multiple
inter-connected audio clients is feasible on Android without adding latency.

--
  Olivier

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Tim Blechmann
>> using preempt_rt, the scheduling latency can be very low (like 10
>> microseconds), if cpu frequency scaling is applied or smt/hyperthreading is
>> enabled it can be as high as 250 microseconds (which is already quite
>> significant, if one is using small signal vector sizes).
> 
> That's interesting. We're actually benchmarking scheduling latency at the
> moment.

btw, i have discussed this briefly in my master thesis, section 4.1 [1]. the 
effect of the scheduling latency can also be observed in the benchmarks given 
in 
section 5.

and it would be great, if you can share your results, i'd be curious to see 
them!

cheers, tim

[1] http://tim.klingt.org/publications/tim_blechmann_supernova.pdf

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Olivier Guilyardi
On 07/12/2011 10:27 AM, Tim Blechmann wrote:
>>> OTOH, if you have a number of threads at the same priority
>>> as Jack's and doing audio work (e.g. to use all the CPUs of
>>> an SMP machine) then using locks between them (but no other
>>> threads) should be OK - depending a bit on how they are used.
>> So, you can use locks as long as that's only meant to synchronize realtime
>> threads with each other? Should some master thread (could be the JACK process
>> thread) have a realtime priority slightly higher than the other (worker)
>> realtime threads? What are the caveats in general?
> 
> yes and no: it is perfectly fine to use locks to use multiple real-time 
> threads, 
> but the thread A fails to acquire a lock, it will be suspended and woken up 
> once 
> thread B releases the lock. the time between `B releases the lock' and `A 
> resumes its execution' is the scheduling latency, which is both os and 
> hardware 
> related.

I understand.

> using preempt_rt, the scheduling latency can be very low (like 10 
> microseconds), 
> if cpu frequency scaling is applied or smt/hyperthreading is enabled it can 
> be 
> as high as 250 microseconds (which is already quite significant, if one is 
> using 
> small signal vector sizes).

That's interesting. We're actually benchmarking scheduling latency at the 
moment.

> however one can avoid the scheduling latency by using spinlocks if one can 
> ensure that none of the synchronized threads can be preempted. personally i 
> achieve this by (a) using real-time scheduling, (b) using not more real-time 
> threads than physical cores and (c) pinning the rt threads to separate cores.

Ah yes, you ensure that threads run on separate cores. That really makes sense.
Thanks for the tip.

--
  Olivier
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Tim Blechmann
>> OTOH, if you have a number of threads at the same priority
>> as Jack's and doing audio work (e.g. to use all the CPUs of
>> an SMP machine) then using locks between them (but no other
>> threads) should be OK - depending a bit on how they are used.
> 
> So, you can use locks as long as that's only meant to synchronize realtime
> threads with each other? Should some master thread (could be the JACK process
> thread) have a realtime priority slightly higher than the other (worker)
> realtime threads? What are the caveats in general?

yes and no: it is perfectly fine to use locks to use multiple real-time 
threads, 
but the thread A fails to acquire a lock, it will be suspended and woken up 
once 
thread B releases the lock. the time between `B releases the lock' and `A 
resumes its execution' is the scheduling latency, which is both os and hardware 
related.
using preempt_rt, the scheduling latency can be very low (like 10 
microseconds), 
if cpu frequency scaling is applied or smt/hyperthreading is enabled it can be 
as high as 250 microseconds (which is already quite significant, if one is 
using 
small signal vector sizes).

however one can avoid the scheduling latency by using spinlocks if one can 
ensure that none of the synchronized threads can be preempted. personally i 
achieve this by (a) using real-time scheduling, (b) using not more real-time 
threads than physical cores and (c) pinning the rt threads to separate cores.

tim


___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-12 Thread Chris Cannam
On 11 July 2011 21:58, Paul Davis  wrote:
> On Mon, Jul 11, 2011 at 4:50 PM, Chris Cannam
>  wrote:
>> Perhaps the pragmatic solution is to _lock_ the shared buffer?
>
> no, the pragmatic solution is to use memory barriers liberally applied.

Well OK, the vital library ringbuffer implementation that everyone
relies on is probably a bad place to start arguing for doing things
the wrong but easy way rather than the right way.

But I know my limitations (increasingly as I get older!) and reasoning
accurately about the behaviour of lock-free shared structures in a
system with relaxed memory ordering is probably among them.  My
existing code contains plenty of consequences of "oh, we don't need a
lock here because..." fuzz that won't work correctly in such an
environment.  I do wonder whether there isn't going to be increasingly
a case for doing it wrong in principle (through locking) but at least
getting the right answers.

Reading around for lock-free FIFO implementations I see several that
consist of "sequences of objects" (where the objects are allocated
outside the scope of the example) that rely on a single atomic
operation to update the read index -- that wouldn't be an adequate
barrier for the data itself in our buffer-of-floats, though, right?
We need a general write barrier after storing the data and before
updating the read index, yes?  how is that expressed as portably as
possible?  Does __sync_synchronized reliably do that?  (The
documentation is surprisingly short...)


Chris
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Dan Muresan
> For a ringbuffer you need two anyway if the buffer is longer than
> one item, and you want to signal both of 'not empty' and 'not full'
> in the appropriate direction.

Well, if one of the threads is periodic (like a Jack process thread)
it can use sem_getvalue() on the same semaphore to figure things out
-- it doesn't need to be signalled on another semaphore. There are
some small complications that can be solved by double-signalling from
the other thread, or by wasting one slot...


-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Dan Muresan
> from my understanding, sem_post() implies a write barrier (stores have been
> executed before sem_post()) and sem_wait() a read barrier (loads have to be
> executed after sem_wait()).

Given that a mutex can be replaced with a semaphore initialized with 1
(then lock == sem_wait, unlock == sem_post), your statement should be
true as long as the semaphore value is 0 / 1.

However, I don't know if your statement must still be true when the
semaphore is used as a count tied to a number of items, i.e. when it
always hovers above 1. POSIX talks about a "thread of control", but in
this case, there is no exclusion, thus no "thread of control".

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11

(same link as a few posts ago)


-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Olivier Guilyardi
On 07/12/2011 12:06 AM, Fons Adriaensen wrote:

> OTOH, if you have a number of threads at the same priority
> as Jack's and doing audio work (e.g. to use all the CPUs of
> an SMP machine) then using locks between them (but no other
> threads) should be OK - depending a bit on how they are used.

So, you can use locks as long as that's only meant to synchronize realtime
threads with each other? Should some master thread (could be the JACK process
thread) have a realtime priority slightly higher than the other (worker)
realtime threads? What are the caveats in general?

--
  Olivier

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Paul Davis
On Mon, Jul 11, 2011 at 6:27 PM, Robin Gareus  wrote:
> On 07/12/2011 12:12 AM, Arnold Krille wrote:

>> Real-time means "as fast as possible".
>
> No, "real-time" means - _guaranteed_ to be done before time X.
> AKA. guaranteed response time.

and in an audio context, for developers of applications, real time means:

   time to process N audio frames = (A * number_of_frames) + B

where A and B are constants. that is, you have a fixed amount of
computation per frame, plus a constant overhead.

this is an *ideal* - its not often actually accomplished in real life.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Robin Gareus
On 07/12/2011 12:12 AM, Arnold Krille wrote:
> On Monday 11 July 2011 23:15:26 Tim E. Real wrote:
>> On July 11, 2011 04:50:06 pm Chris Cannam wrote:
>>> I know taking locks in a RT process is deeply frowned upon
>>
>> Likely been answered before, but good time for me to ask:
>> What is the reason it is not recommended?
>> Is it simply because of a long time involved in acquiring the lock, or is
>> it because the lock might block some other Jack thread from running?
>> The same reason why other things like 'malloc' or 'new' are not
>>  recommended either?
> 
> Real-time means "as fast as possible".

No, "real-time" means - _guaranteed_ to be done before time X.
AKA. guaranteed response time.

real-time (tasks|kernels) are often slower on average than non-RT
(tasks|kernels).
As for the linux-kernel: This is due to the overhead introduced by
real-time scheduling (take care of process priority)

While non-real-time (tasks|kernels) may be faster on average; they may
also occasionally take more time than they should. For RT-tasks this is
not acceptable.

e.g.

non-RT task 5 runs: 10 sec, 11 sec, 9 sec,*20sec*,  9 sec (avg 11.8)
RT task 5 runs: 12 sec, 12 sec, 11sec, 12sec,  13 sec (avg 12.0)

I hope that explains it well for you.
Cheers!
robin
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Robin Gareus
[oops forgot to CC the list at first]

On 07/11/2011 11:15 PM, Tim E. Real wrote:
> On July 11, 2011 04:50:06 pm Chris Cannam wrote:
>> I know taking locks in a RT process is deeply frowned upon
> 
> Likely been answered before, but good time for me to ask:
> What is the reason it is not recommended?
> Is it simply because of a long time involved in acquiring the lock, or is it 
>  because the lock might block some other Jack thread from running?

The latter. The real-time thread would block until the other thread
(that currently holding the lock) releases the lock.

How long it takes until the other thread releases the lock is undefined
(context switches, process priority dependent, etc) and thus the concept
is not suitable for real-time.

> The same reason why other things like 'malloc' or 'new' are not 
>  recommended either?

Yes. malloc, printf, etc can not guarantee to return in time.

Allocation of new memory may involve various system-calls, flush
cache-lines, result in swapping, context-switches, etc... On most
systems the worst-case execution time it takes to allocate new memory
can not be guaranteed to be below a certain limit.

There are various attempts to provide constant time - O(1) - memory
allocation and de-allocation code for real-time systems. None of them
have reached widespread adoption. Often the drawbacks (e.g. decreased
available memory due to fragmentation) outweigh the benefits, and
pre-allocation on application level is easier to implement.

HTH,
robin


> Thanks. Tim.
> ___
> Linux-audio-dev mailing list
> Linux-audio-dev@lists.linuxaudio.org
> http://lists.linuxaudio.org/listinfo/linux-audio-dev
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Dan Mills
On Tue, 2011-07-12 at 00:12 +0200, Arnold Krille wrote:

> 
> Real-time means "as fast as possible". 

Err not really. 

Real-time means "Has a bounded response time", locks generally put your
completion time at the mercy of another process that is often not
designed to have a bounded response time. 

It is probably more correct to say that locks should be used with
extreme caution in RT processes as their use makes the relevant part of
another thread part of the time critical response timing.

Trylock is an interesting case can can actually be used to solve some of
this, have two copies of the working data structure, one protected by a
lock, then in the RT thread call trylock and if it succeeds copy the
data protected by the lock over the data set used by the RT process and
release the lock.

If it fails then you still have a coherent set of data, and it might
succeed the next time, this is useful for UI data sets (in both
directions). 

Regards, Dan.


___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Dan Muresan
> The problem with e.g. a Jack callback trying to take a lock
> is that the lock could be held by a non-RT thread, and you
> have no idea how long it could take before that thread
> releases it. Even if the Jack thread blocks waiting for
> the lock, that doesn't mean the one holding the lock will
> continue, and even if it does that provides no guarantee.

Priority inversion. A low-priority "disk thread" could hold a lock
needed by the high-priority Jack process thread. But the disk thread
is pre-empted by some other non-related medium-priority process. The
medium-priority process could keep running for a long time.

So the inversion is that, indirectly, a high-priority thread is
waiting for some medium-priority process.

Some RTOS's deal with this kind of stuff (like "hard" real-time OSs)

-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Arnold Krille
On Monday 11 July 2011 23:15:26 Tim E. Real wrote:
> On July 11, 2011 04:50:06 pm Chris Cannam wrote:
> > I know taking locks in a RT process is deeply frowned upon
> 
> Likely been answered before, but good time for me to ask:
> What is the reason it is not recommended?
> Is it simply because of a long time involved in acquiring the lock, or is
> it because the lock might block some other Jack thread from running?
> The same reason why other things like 'malloc' or 'new' are not
>  recommended either?

Real-time means "as fast as possible". Waiting to acquire a lock isn't exactly 
what you would call "fast"...
Waiting for a lock is the same as waiting for the kernels/OSs memory managment 
to allocate some space.

Have fun,

Arnold


signature.asc
Description: This is a digitally signed message part.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Fons Adriaensen
On Mon, Jul 11, 2011 at 05:15:26PM -0400, Tim E. Real wrote:
 
> Likely been answered before, but good time for me to ask:
> What is the reason it is not recommended?
> Is it simply because of a long time involved in acquiring the lock, or is it 
>  because the lock might block some other Jack thread from running?

The time required to take a lock if it is free is probably
not significant in most cases (this should not involve a
system call).

The problem with e.g. a Jack callback trying to take a lock
is that the lock could be held by a non-RT thread, and you
have no idea how long it could take before that thread
releases it. Even if the Jack thread blocks waiting for
the lock, that doesn't mean the one holding the lock will
continue, and even if it does that provides no guarantee.

OTOH, if you have a number of threads at the same priority
as Jack's and doing audio work (e.g. to use all the CPUs of
an SMP machine) then using locks between them (but no other
threads) should be OK - depending a bit on how they are used.

> The same reason why other things like 'malloc' or 'new' are not 
>  recommended either?

These could take a long time in some cases (if your process
needs more memory from the system for example), so they can't
be used safely.

-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Tim Blechmann
>> and the NPTL code in glibc *seems* to perform a memory barrier only on
>> sem_post().
> 
> Wouldn't that seem quite natural ? Semas provide one-way communication.
> It's the sem_post() that means there is an event to be seen by some
> other thread. So it has to make sure that any data related to it is
> visible to the receiver. The receiver taking the event (returning
> from sem_wait()), or checking for it (calling sem_wait()), is not
> meant to be an event for the other side. You need a second sema if
> events have to go both ways.

from my understanding, sem_post() implies a write barrier (stores have been 
executed before sem_post()) and sem_wait() a read barrier (loads have to be 
executed after sem_wait()).

tim

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Tim E. Real
On July 11, 2011 04:50:06 pm Chris Cannam wrote:
> I know taking locks in a RT process is deeply frowned upon

Likely been answered before, but good time for me to ask:
What is the reason it is not recommended?
Is it simply because of a long time involved in acquiring the lock, or is it 
 because the lock might block some other Jack thread from running?

The same reason why other things like 'malloc' or 'new' are not 
 recommended either?

Thanks. Tim.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Paul Davis
On Mon, Jul 11, 2011 at 4:50 PM, Chris Cannam
 wrote:
> On 11 July 2011 21:32, James Morris  wrote:
>> I've ended up going back to Fons's pragmatism. If
>> non-blocking/lock-free programming is so impossibly difficult,
>> requiring intimate hardware knowledge of numerous different
>> architectures then there's only one solution available to people like
>> me, and that's to code for AMD64/Intel and use the existing ringbuffer
>> implementations.
>
> Perhaps the pragmatic solution is to _lock_ the shared buffer?

no, the pragmatic solution is to use memory barriers liberally applied.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Fons Adriaensen
On Mon, Jul 11, 2011 at 09:59:25PM +0300, Dan Muresan wrote:
 
> and the NPTL code in glibc *seems* to perform a memory barrier only on
> sem_post().

Wouldn't that seem quite natural ? Semas provide one-way communication.
It's the sem_post() that means there is an event to be seen by some
other thread. So it has to make sure that any data related to it is
visible to the receiver. The receiver taking the event (returning
from sem_wait()), or checking for it (calling sem_wait()), is not
meant to be an event for the other side. You need a second sema if
events have to go both ways.

For a ringbuffer you need two anyway if the buffer is longer than
one item, and you want to signal both of 'not empty' and 'not full' 
in the appropriate direction.

Ciao,

-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Chris Cannam
On 11 July 2011 21:50, Chris Cannam  wrote:
> [...] a critical section juts across the code

"just" across


Chris
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Chris Cannam
On 11 July 2011 21:32, James Morris  wrote:
> I've ended up going back to Fons's pragmatism. If
> non-blocking/lock-free programming is so impossibly difficult,
> requiring intimate hardware knowledge of numerous different
> architectures then there's only one solution available to people like
> me, and that's to code for AMD64/Intel and use the existing ringbuffer
> implementations.

Perhaps the pragmatic solution is to _lock_ the shared buffer?

I'm not at all sound on this subject, but I wouldn't be surprised if
Dan was right that memory ordering guarantees on the read/write index
variables might still leave the data that is just "copied into the
buffer" exposed to cache coherency problems.

I know taking locks in a RT process is deeply frowned upon around
here, but a critical section juts across the code used to update the
buffer and read back from it -- with an accompanying guarantee of
actually getting the right data out again -- might not be such a bad
tradeoff these days as we'd be inclined to think?


Chris
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread David Olofson
On Monday 11 July 2011, at 22.32.08, James Morris  
wrote:
> On 11 July 2011 20:19, Olivier Guilyardi  wrote:
> > Good catch... Multi-core ARM devices are actually arriving massively.
> > With Android, there's the Motorola Atrix, the Samsung Galaxy S II, etc..
> 
> What about my toaster? :-P
> 
> I've ended up going back to Fons's pragmatism. If
> non-blocking/lock-free programming is so impossibly difficult,
> requiring intimate hardware knowledge of numerous different
> architectures then there's only one solution available to people like
> me, and that's to code for AMD64/Intel and use the existing ringbuffer
> implementations.

Also, if/when it's time to port, find the code and/or information you need at 
the time, and test thoroughly on the actual hardware. These things can usually 
be done on "anything", one way or another, more or less efficiently.

The only thing that's worse than missing support for some new platform is 
"support" that doesn't actually work properly. Lots of debugging fun to be had 
that way... :-)


-- 
//David Olofson - Consultant, Developer, Artist, Open Source Advocate

.--- Games, examples, libraries, scripting, sound, music, graphics ---.
|   http://consulting.olofson.net  http://olofsonarcade.com   |
'-'
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Fons Adriaensen
On Mon, Jul 11, 2011 at 03:02:57PM +0300, Dan Muresan wrote:

> Well, if you are content with any old possible value of xval (i.e. you
> have no synchronization on it anywhere), you are right; but then you
> might as well make xval a constant -- just use the first value it ever
> had. You have no guarantees that the thread calling f() will EVER have
> its hardware cache synchronized to whoever is producing xval. Cache
> coherency again...

The typical use case would be xval being some parameter used
by e.g. a Jack callback and updated by e.g. MIDI events, or 
a measurement; and f() being the code that runs periodically
to update the GUI element for that parameter or measurement.
There is no synchronisation and occasionally using an out-
dated value has no fatal consequences - it will very probably
be put right on the next update which is 1/10 second or so
in the future. The display is informative only, correct audio
processing does in no way depend on it being up to date.

> Otherwise, I bet you *do* have some synchronization, somewhere, for
> xval in the thread that calls f. Wherever that place is, save the
> cached value, i.e.
> 
> mutex_lock()
> x = xval
> mutex_unlock()
> f (x)

That would mean the other side has to use the mutex as well
when updating xval, and in a Jack callback that is 'not done',
at least not if the mutex could be held by a non-realtime
thread such as one updating the GUI.

> In the signal case, you should of course make sure the signal is
> caught by the thread who reads x, by using pthread_sigmask()
> appropriately. Or make the program non-threaded. Otherwise, you now
> have two problems...

Not if there is no synchronisation involved as in the example
above. If all the signal handler has to do is update xval, it
doesn't matter in which thread it runs.

> Also, strictly speaking if you're using a sighandler plus volatile,
> you should also declare xval as sig_atomic_t.

Strictly true, but R/W of ints is atomic on the platforms that 
matter to me.

Ciao,

-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread James Morris
On 11 July 2011 20:19, Olivier Guilyardi  wrote:

> Good catch... Multi-core ARM devices are actually arriving massively. With
> Android, there's the Motorola Atrix, the Samsung Galaxy S II, etc..
>

What about my toaster? :-P

I've ended up going back to Fons's pragmatism. If
non-blocking/lock-free programming is so impossibly difficult,
requiring intimate hardware knowledge of numerous different
architectures then there's only one solution available to people like
me, and that's to code for AMD64/Intel and use the existing ringbuffer
implementations.

I'd be interested though if my usage of  __sync_bool_compare_and_swap
and  __sync_fetch_and_and improves the ring buffer at all? I like the
fact the implementation I'm using is so simple. Trouble is, using the
GCC builtins instead of volatile slows it down. Guess that means it's
duff.

James.

8<-

#include "rng_buf.h"

#include 
#include 


struct _RingBuffer
{
void** buf;
void** bufend;

void** w;
void** r;
};


RngBuf* rng_buf_new(size_t count)
{
size_t sz = 1;
RngBuf* mb = malloc(sizeof(RngBuf));

if (!mb)
return 0;

for (sz = 1; sz < count; sz <<= 1)
;

mb->buf = calloc(sz, sizeof(void*));

if (!mb->buf)
{
free(mb);
return 0;
}

mb->bufend = mb->buf + sz - 1;
mb->w = mb->buf;
mb->r = mb->buf;

return mb;
}


void rng_buf_free(RngBuf* mb)
{
free(mb->buf);
free(mb);
}


size_t rng_buf_write(RngBuf* mb, const void* data)
{
if (__sync_bool_compare_and_swap(mb->w, 0, data))
{
mb->w = (mb->w == mb->bufend) ? mb->buf : mb->w + 1;
return (size_t)1;
}

return (size_t)0;
}


void* rng_buf_read(RngBuf* mb)
{
void* data;

if ((data = __sync_fetch_and_and(mb->r, 0)))
{
mb->r = (mb->r == mb->bufend) ? mb->buf : mb->r + 1;
return data;
}

return NULL;
}


8<-
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Olivier Guilyardi
On 07/11/2011 03:45 AM, Sean Bolton wrote:
> On Jul 10, 2011, at 6:34 PM, Sean Bolton wrote:
>> On Jul 10, 2011, at 2:41 PM, Paul Davis wrote:
>>> do we have SMP systems these days that do not guarantee cache coherency?
>>
>> Yes. PowerPC and Alpha do not. UltraSPARC v9 and ARMv6/ARM11 and later
>> have modes where they do not (and linux on a SPARC v9 runs in that
>> mode.)
> 
> And how about the iPad 2?:
> http://wanderingcoder.net/2011/04/01/arm-memory-ordering/

Good catch... Multi-core ARM devices are actually arriving massively. With
Android, there's the Motorola Atrix, the Samsung Galaxy S II, etc..

I think it wouldn't hurt to run the ringbuffer tests again..

--
  Olivier
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Dan Muresan
> This reordering cannot be prevented without proper synchronization. So
> my advice to anyone considering this would be to drop volatile and do
> proper synchronization at the application level (i.e. semaphores,
> since it has to work in real time).

After this whole round of discussions, it's starting to dawn on me
that not even semaphores might be enough for synchronizing the actual
data in the ringbuffer (never mind the ringbuffer pointers). I assumed
all sem_*() operations performed memory barriers, but POSIX is quite
vague:

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11

(what is a "thread of control" when using counting semaphores? A
thread has exclusive access only when the semaphore is 0, but a
counting semaphore could always hover above 3.)

and the NPTL code in glibc *seems* to perform a memory barrier only on
sem_post().

Does anyone know what kind of memory-barrier guarantees counting
semaphores are supposed to offer?

-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-11 Thread Dan Muresan
> We have a variable 'int xval' that is being modified

I notice I have mixed up (switched) x and xval in my previous reply...

> extern int xval;
>
> void f(void)
> {
>    int a, b, c, ... x, y, z;
>
>    x = xval;
>
>    // lots of code using a ... z;
> }

Well, if you are content with any old possible value of xval (i.e. you
have no synchronization on it anywhere), you are right; but then you
might as well make xval a constant -- just use the first value it ever
had. You have no guarantees that the thread calling f() will EVER have
its hardware cache synchronized to whoever is producing xval. Cache
coherency again...

Otherwise, I bet you *do* have some synchronization, somewhere, for
xval in the thread that calls f. Wherever that place is, save the
cached value, i.e.

mutex_lock()
x = xval
mutex_unlock()
f (x)

... and now there is no need for volatile, data flow becomes clear etc.

So you are right, the way you have structured your short snippet of
code you might be preventing an optimization -- but there is still too
little context, and I bet structuring the code differently will
obviate the need for volatile, plus provide predictable data flow.

> A. If xval is being modified by an interrupt handler
> then clearly you can't use the mutex - you can't risk
> to block the interrupt handler.

Let's say a signal handler, not an interrupt handler. Let's stick with
userspace (C89 + Posix) -- kernels have different rules, different
primitives etc.

> *there is no difference* between xval being modified by
> an interrupt handler, or by another thread. The difference

There is a difference. A signal will modify x immediately. A different
thread running on a different cache might never propagate the update
the to hardware cache used by f()'s thread.

A few more things, for the sake of completeness:

In the signal case, you should of course make sure the signal is
caught by the thread who reads x, by using pthread_sigmask()
appropriately. Or make the program non-threaded. Otherwise, you now
have two problems...

Also, strictly speaking if you're using a sighandler plus volatile,
you should also declare xval as sig_atomic_t.


-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-10 Thread Sean Bolton

On Jul 10, 2011, at 6:34 PM, Sean Bolton wrote:

On Jul 10, 2011, at 2:41 PM, Paul Davis wrote:
do we have SMP systems these days that do not guarantee cache  
coherency?


Yes. PowerPC and Alpha do not. UltraSPARC v9 and ARMv6/ARM11 and later
have modes where they do not (and linux on a SPARC v9 runs in that
mode.)


And how about the iPad 2?:
http://wanderingcoder.net/2011/04/01/arm-memory-ordering/

-Sean

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-10 Thread Sean Bolton

On Jul 10, 2011, at 2:41 PM, Paul Davis wrote:
On Sun, Jul 10, 2011 at 5:14 PM, Fons Adriaensen  
 wrote:

On that I absolutely agree - cache coherency is the real
problem, not pipelining. The latter should in fact be
transparent from a language such as C/C++.


i may be way out of the loop, but having worked with some of the early
"massively" parallel "conventional processor" systems of the late 80's
and early 90's (such as the sequent symmetry and the kendall square
research machines), my impression was that everyone gave up on
"clever" cache coherency because it turned out to be too hard (as an
engineering problem, not as a CS problem). i've gotten stuck with the
idea that the industry went for "simple" cache coherency that simply
does what any moderately skilled designer would do when faced with the
problem and no concerns about elegance or speed: locks, signalling,
and all the usual stuff that is the h/w equiivalent of the pthreads
mutex API.

do we have SMP systems these days that do not guarantee cache  
coherency?


Yes. PowerPC and Alpha do not. UltraSPARC v9 and ARMv6/ARM11 and later
have modes where they do not (and linux on a SPARC v9 runs in that
mode.)

Here's some good reading that may help:

Memory Ordering in Modern Microprocessors, Part I
http://www.linuxjournal.com/article/8211

Memory Ordering in Modern Microprocessors, Part II
http://www.linuxjournal.com/article/8212

Linux Kernel Memory Barriers
http://www.kernel.org/doc/Documentation/memory-barriers.txt

ARM11 Processors: Ordering requirements for memory accesses
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0211i/Babcijbf.html


btw: i do understand that whether they do or do not doesn't affect the
basic point about cache coherency possibly leading to incorrect data
being read.


Good--without memory barriers, JACK's ringbuffer is at risk
for this corruption on these platforms.

Cheers,

-Sean

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-10 Thread Dan Kegel
Guys,
if you're writing code for your own use, and don't care about portability
or security, go ahead and use volatile as a synchronization primitive.

But if the code is going to be accepted into an open source project
that gets wide use, volatile is a bad idea.  But don't take my word
for it; here's what kernel.org, cert.org, and an intel researcher say about it:

"Volatile considered harmful",
http://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt
"The use of volatile is likely to be seen as a bug and will bring
additional scrutiny to the code."

"CERT Secure Coding Standards, recommendation POS03: Do not use
volatile as a synchronization primitive"
https://www.securecoding.cert.org/confluence/display/seccode/POS03-C.+Do+not+use+volatile+as+a+synchronization+primitive
"A variable declared as volatile is not cached in a register, leading
to this confusion that it can be used safely as a synchronization
primitive. When declared volatile, the compiler does not re-order the
sequence of reads and writes to that memory location. However, the
compiler might re-order these reads and writes with those to other
memory locations. This might result in non-atomic operations on the
synchronization variable resulting in errors."

"Ad Hoc Synchronization Considered Harmful",
www.xiongww.com/papers/osdi10-xiong.pdf
"Ad hoc synchronizations are error-prone. Significant percentages
(22-67%) of these ad hoc synchronizations introduced bugs or severe
performance issues."

Volatile is a 1980's solution to a 2000's problem, and it just doesn't
cut the mustard anymore.
- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-10 Thread Paul Davis
On Sun, Jul 10, 2011 at 5:14 PM, Fons Adriaensen  wrote:

> On that I absolutely agree - cache coherency is the real
> problem, not pipelining. The latter should in fact be
> transparent from a language such as C/C++.

i may be way out of the loop, but having worked with some of the early
"massively" parallel "conventional processor" systems of the late 80's
and early 90's (such as the sequent symmetry and the kendall square
research machines), my impression was that everyone gave up on
"clever" cache coherency because it turned out to be too hard (as an
engineering problem, not as a CS problem). i've gotten stuck with the
idea that the industry went for "simple" cache coherency that simply
does what any moderately skilled designer would do when faced with the
problem and no concerns about elegance or speed: locks, signalling,
and all the usual stuff that is the h/w equiivalent of the pthreads
mutex API.

do we have SMP systems these days that do not guarantee cache coherency?

btw: i do understand that whether they do or do not doesn't affect the
basic point about cache coherency possibly leading to incorrect data
being read.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-10 Thread Fons Adriaensen
On Sun, Jul 10, 2011 at 06:05:45PM +0300, Dan Muresan wrote:

> Ah. pthread_mutex_lock() / unlock(), as EXTERNAL functions, will never
> be optimized away or inlined. Now, being all sequence points, if you
> simply do
> 
> pthread_mutex_lock();
> xval = x;
> pthread_mutex_unlock();
> 
> the compiler is not allowed to move statements out the locked section
> or reorder them in any way (without need for any volatile qualifiers).

OK. This is becoming an interesting discussion, so please
allow me to restate clearly the context.

We have a variable 'int xval' that is being modified
by forces unknown to the code we are discussing. This
code is a function f() which uses the value of xval,
but the algorithm implemented by f() requires that
the same value is used at all points where f() uses
xval during a single invocation of that function.

So we have:

extern int xval;

void f(void)
{
int a, b, c, ... x, y, z;

x = xval;

// lots of code using a ... z;
}

Since f() has much more local variables than the CPU
has registers, the compiler could be tempted to reuse
the register used to store x for some other purpose
at some point in the code. It could do that in two ways:

1. Store x on the stack, and read that location when
xval is required again, or

2. Just reuse the register without saving it, and read
the memory location 'xval' again when required.

(2) could make the algorithm fail, because xval could
have changed. So we want to prevent that happening.

The solution I propose is to declare xval volatile.
This forces the compiler to read it just once, as 
expressed by the source code. So it either will do (1),
or maybe decide not to trash the register holding x at
all but use another one.


The solution you propose is to protect xval by a mutex. 
I invite you to consider the following:

A. If xval is being modified by an interrupt handler
then clearly you can't use the mutex - you can't risk
to block the interrupt handler.

B. From the point of view of the code we are discussing
*there is no difference* between xval being modified by
an interrupt handler, or by another thread. The difference
is completely irrelevant to f(). The only thing that
matters is that xval can change while f() executes. 

You would probably accept the 'volatile' if xval is
being written by an interrupt handler. Given (B), 
there is no good reason to reject it in either case.


> I see. But as I said, in general the cache coherency problem is worse
> than the pipeline reordering problem -- i.e. when there are multiple
> CPUs/cores using different caches, they may see actions out-of-order.

On that I absolutely agree - cache coherency is the real
problem, not pipelining. The latter should in fact be
transparent from a language such as C/C++.

Ciao,

-- 
FA
 


___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-10 Thread Tim Blechmann
> >> > the main problem is the lack of a memory model for multi-threaded
> >> > applications at the level of the language (c or c++). fortunately this
> >> > is about to change with c++0x and probably c1x.
> >> 
> >> So in 10 years we will be able to rely on a conformant compiler being
> >> available on all relevant platforms :)
> > 
> > http://www.chaoticmind.net/~hcb/projects/boost.atomic/
> 
> if it all works ... very nice.
> 
> but note that its only been tested on a couple of versions of gcc on a
> couple of *nix-ish platforms.

the number of supported compilers in the documentation is outdated. it supports 
the most commonly used compilers and multiple architectures.
if a compiler is not supported natively, it uses a fallback implementation 
based 
on a pool of spinlocks, which of course is not lock-free, but c++0x doesn't 
guarantee lock-freedom either ...

i've been using it for quite some time in boost.lockfree (in a slightly 
modified 
version).

tim


signature.asc
Description: This is a digitally signed message part.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-10 Thread Paul Davis
On Sun, Jul 10, 2011 at 1:28 PM, Tim Blechmann  wrote:
>> > the main problem is the lack of a memory model for multi-threaded
>> > applications at the level of the language (c or c++). fortunately this
>> > is about to change with c++0x and probably c1x.
>>
>> So in 10 years we will be able to rely on a conformant compiler being
>> available on all relevant platforms :)
>
> http://www.chaoticmind.net/~hcb/projects/boost.atomic/

if it all works ... very nice.

but note that its only been tested on a couple of versions of gcc on a
couple of *nix-ish platforms.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-10 Thread Tim Blechmann
> > the main problem is the lack of a memory model for multi-threaded
> > applications at the level of the language (c or c++). fortunately this
> > is about to change with c++0x and probably c1x.
> 
> So in 10 years we will be able to rely on a conformant compiler being
> available on all relevant platforms :)

http://www.chaoticmind.net/~hcb/projects/boost.atomic/


signature.asc
Description: This is a digitally signed message part.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-10 Thread Dan Muresan
> the hardware would be allowed to reorder them ... this is the reason why mutex
> implementations involve memory barriers ...

Yes, the hardware would be allowed to reorder them, so
pthread_mutex_lock() has memory barriers. I think everyone knew that
much :)

However my point to Fons was that, because pthread_mutex_lock() is an
external function, the compiler is not allowed to make any assumptions
about the global variable x (both functions might modify it): it
cannot re-read x after unlock(), or read x before lock(). This was the
missing ingredient.

So it's a cooperation of sequence point-rules and memory barriers that
achieves the effects of a critical section in C.

> the main problem is the lack of a memory model for multi-threaded applications
> at the level of the language (c or c++). fortunately this is about to change
> with c++0x and probably c1x.

So in 10 years we will be able to rely on a conformant compiler being
available on all relevant platforms :)


-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-10 Thread Tim Blechmann
> Ah. pthread_mutex_lock() / unlock(), as EXTERNAL functions, will never
> be optimized away or inlined. Now, being all sequence points, if you
> simply do
> 
> pthread_mutex_lock();
> xval = x;
> pthread_mutex_unlock();
> 
> the compiler is not allowed to move statements out the locked section
> or reorder them in any way (without need for any volatile qualifiers).

the hardware would be allowed to reorder them ... this is the reason why mutex 
implementations involve memory barriers ...

the main problem is the lack of a memory model for multi-threaded applications 
at the level of the language (c or c++). fortunately this is about to change 
with c++0x and probably c1x.

tim

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-10 Thread Dan Muresan
> In the example I provided the essential point is that there
> is *one* *correct* access pattern which is to read it once
> for each call to f(), to ensure that the same value is used
> everywhere in that function. Declaring this value volatile
> and taking a local copy does exactly the right thing.
> The alternative would be protect it by a mutex for as long
> as f() runs. For no good reason, as I don't mind it being
> overwritten while f() runs. Would that be more 'optimal' ?

Ah. pthread_mutex_lock() / unlock(), as EXTERNAL functions, will never
be optimized away or inlined. Now, being all sequence points, if you
simply do

pthread_mutex_lock();
xval = x;
pthread_mutex_unlock();

the compiler is not allowed to move statements out the locked section
or reorder them in any way (without need for any volatile qualifiers).

http://en.wikipedia.org/wiki/Sequence_point

(or the C89 standard). This is another point that people forget --
many reorderings that they wish to prohibit using volatile are already
prohibited by the "sequence point theory".

In fact, I think the optimization you fear might not be allowed by
sequence points, even without any mutex calls (or volatiles) at all --
depending on your actual code.

>> OK, even if your disk thread is periodic for some reason, how does
>> that argue for library-level synchronization, *instead of* app-level
>> synchronization? In this case the cost would be the same -- no loss.
>
> I don't see the point.

You initially replied to my advice to not do any sync at *lib-level*
(jack/ringbuffer.h), and leave sync up to the app (whoever includes
jack/ringbuffer.h). But we're splitting hairs now.

> No, I'm talking about SMP systems. Writing the data and updating
> the write pointer is done by the same thread and hence CPU, these
> actions won't be re-ordered.

I see. But as I said, in general the cache coherency problem is worse
than the pipeline reordering problem -- i.e. when there are multiple
CPUs/cores using different caches, they may see actions out-of-order.

>> "Intel and AMD" only?
>
> There is no legal obligation for code to be portable. Nor is there
> any moral obligation. If I choose to support only Intel and AMD PCs
> and not embedded systems or mobile devices (and for the kind of SW
> I write that does make sense) then that is my choice, period.

Your call. Still, don't forget that x86 != AMD + Intel -- there are
also VIAs, and others. I don't really know how widespread they are in
"PCs", or what guarantees they offer. I also don't know if your
non-reordering assumption (for which I haven't really seen any hard
documentation -- though I've heard rumours too) applies to Intel
Atom's, AMD Geode's etc (some of which may be in netbooks, OLPC etc).

> waving their finger about programming style etc. There is room for
> some pragmatism in everything.

OK... This thread's subject is about programming advice though :)

Plus, when there is no big cost to doing things "by the book", I think
developers should be encouraged to keep their code portable, at least
in order to save the world future pains a-la Y2K etc.

-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-09 Thread Fons Adriaensen
On Sat, Jul 09, 2011 at 04:25:22PM +0300, Dan Muresan wrote:

> >> The apps already need to do some type of synchronization internally.
> >> For example a player's disk thread, when its ringbuffer is full, needs
> >> to wait for the process thread to consume some data and thus free up
> >
> > Depends. If both ends are periodic processes no other synchronisation
> > is required. And e.g. Jack callback is such a process, and likely to
> > be one end.
> 
> How about the other "end" (i.e. the "disk thread"?) Would that
> normally be periodic?

It could be, and that would be perfectly ok in some cases.
But I'm not arguing agains synchronisation, and my own implementation
of this ringbuffer optionally provides in either direction (buffer
becoming non-empyt/non full).
 
> OK, even if your disk thread is periodic for some reason, how does
> that argue for library-level synchronization, *instead of* app-level
> synchronization? In this case the cost would be the same -- no loss.

I don't see the point.
 
> > You may be right about the (HW as opposed to compiler) re-ordering of
> > data w.r.t. pointers on some architectures. But AFAIK, at least on Intel
> > and AMD writes are not re-ordered w.r.t. other writes from the same CPU,
> 
> "From the same CPU"? Are we regressing to non-SMP-only schemes? And

No, I'm talking about SMP systems. Writing the data and updating
the write pointer is done by the same thread and hence CPU, these
actions won't be re-ordered.

> "Intel and AMD" only?

There is no legal obligation for code to be portable. Nor is there
any moral obligation. If I choose to support only Intel and AMD PCs
and not embedded systems or mobile devices (and for the kind of SW
I write that does make sense) then that is my choice, period.

I get usually sick when computer scientist or language buffs start
waving their finger about programming style etc. There is room for
some pragmatism in everything.

> > Regarding the volatile declarations, at least on my version (which
> > is slightly different from Jack's) there is no performance penalty.
> 
> Under which access patterns, with what compiler / optimization flags
> etc? I would not make such generalizations... Volatile frustrates the
> optimizer's ability to chose the optimal access patterns.

In the example I provided the essential point is that there
is *one* *correct* access pattern which is to read it once
for each call to f(), to ensure that the same value is used
everywhere in that function. Declaring this value volatile
and taking a local copy does exactly the right thing.
The alternative would be protect it by a mutex for as long
as f() runs. For no good reason, as I don't mind it being
overwritten while f() runs. Would that be more 'optimal' ? 

Ciao,

-- 
FA


___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-09 Thread Dan Muresan
>> The apps already need to do some type of synchronization internally.
>> For example a player's disk thread, when its ringbuffer is full, needs
>> to wait for the process thread to consume some data and thus free up
>
> Depends. If both ends are periodic processes no other synchronisation
> is required. And e.g. Jack callback is such a process, and likely to
> be one end.

How about the other "end" (i.e. the "disk thread"?) Would that
normally be periodic?

OK, even if your disk thread is periodic for some reason, how does
that argue for library-level synchronization, *instead of* app-level
synchronization? In this case the cost would be the same -- no loss.

> You may be right about the (HW as opposed to compiler) re-ordering of
> data w.r.t. pointers on some architectures. But AFAIK, at least on Intel
> and AMD writes are not re-ordered w.r.t. other writes from the same CPU,

"From the same CPU"? Are we regressing to non-SMP-only schemes? And
"Intel and AMD" only?

How about multiple cores / CPUs / caches? Pipeline reordering is not
the main concern (though it can happen) -- cache coherence is.

> Regarding the volatile declarations, at least on my version (which
> is slightly different from Jack's) there is no performance penalty.

Under which access patterns, with what compiler / optimization flags
etc? I would not make such generalizations... Volatile frustrates the
optimizer's ability to chose the optimal access patterns.

> So I keep them just as reminders that these data are shared and may
> change in unexpected ways.

Hijacking volatile for *manual* type checking, at the cost of
frustrating the optimizer? Andrei Alexandrescu once advocated that
approach for *automatic* type checking in a famous article
(http://drdobbs.com/cpp/184403766). I believe the shortcomings have
been thoroughly discussed in  comp.lang.c++.

If  you want to remind yourself, you could group the variable(s) and
the mutex / semaphore in a structure, or name them similarly etc.

> You are wrong in saying that 'volatile' has no place in multi-threading.
> It is the correct way to go if you want to ensure that a value is e.g.
> read/written just once even if it is used many times:

It has no place in properly synchronized threaded programs. And it
cannot guarantee the correctness of un-synchronized threaded programs
(unless you assume non-SMP, non-hyper-threaded, Intely-type hardware
-- *maybe*)

> extern volatile int xval;  // Written by other thread(s)
>
> void f (void)
> {
>    int x;
>
>    x = xval;
>
>    // use x many times, it won't change.
> }
>
> Without the 'volatile', the compiler is free to read
> the memory value xval as many times as it wants, even
> if it has a local copy, and it probably will do so if
> you have many local variables.

What does that accomplish? You're merely frustrating the compiler's
ability to optimize. You're not achieving complete thread safety by
*adding* volatile -- not on arbitrary hardware. If your code is
completely thread-safe with volatile, it is also completely
thread-safe (and faster) without volatile. Volatile does not offer any
guarantees that cannot be later undone by the pipeline or CPU cache.


-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Fons Adriaensen
On Sat, Jul 09, 2011 at 02:03:34AM +0300, Dan Muresan wrote:
> > Better to just follow the recommendations of the respective ABIs,
> > and put in the memory barriers for those platforms that need them,
> > like PortAudio, the linux kernel, and most other implementations
> 
> The apps already need to do some type of synchronization internally.
> For example a player's disk thread, when its ringbuffer is full, needs
> to wait for the process thread to consume some data and thus free up
> some space.

Depends. If both ends are periodic processes no other synchronisation
is required. And e.g. Jack callback is such a process, and likely to
be one end.
 
> So I think it would be better to drop the volatile's, and leave safety
> to the app level. There's no point in duplicating synchronization at
> the library and at the app level.

You may be right about the (HW as opposed to compiler) re-ordering of
data w.r.t. pointers on some architectures. But AFAIK, at least on Intel
and AMD writes are not re-ordered w.r.t. other writes from the same CPU,
same for reads.

Regarding the volatile declarations, at least on my version (which
is slightly different from Jack's) there is no performance penalty.
So I keep them just as reminders that these data are shared and may
change in unexpected ways.

You are wrong in saying that 'volatile' has no place in multi-threading.
It is the correct way to go if you want to ensure that a value is e.g.
read/written just once even if it is used many times:

extern volatile int xval;  // Written by other thread(s)

void f (void)
{
int x;

x = xval;
 
// use x many times, it won't change.
}

Without the 'volatile', the compiler is free to read
the memory value xval as many times as it wants, even
if it has a local copy, and it probably will do so if
you have many local variables.

Ciao,

-- 
FA


___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Dan Muresan
> Better to just follow the recommendations of the respective ABIs,
> and put in the memory barriers for those platforms that need them,
> like PortAudio, the linux kernel, and most other implementations

The apps already need to do some type of synchronization internally.
For example a player's disk thread, when its ringbuffer is full, needs
to wait for the process thread to consume some data and thus free up
some space.

So I think it would be better to drop the volatile's, and leave safety
to the app level. There's no point in duplicating synchronization at
the library and at the app level.


--  Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Sean Bolton

Hi Paul,

On Jul 8, 2011, at 1:49 PM, Paul Davis wrote:

this is why we don't care about the types of stuff that Dan Muresan
mentioned, except to the extent that it could actually lead to the
computation of data/space available being wrong in a deeper way.


You're missing the point of what Dan was pointing out (and that I
pointed out the last time we hashed through this.)  It's not the
space available computation that is at risk of failure--you're quite
right about that. The potential for failure he's referring to is that
on systems with weak memory ordering, a single-writer single-reader
lock-free ring buffer without memory barriers (like JACK's) is at
risk for the data read by the reader process becoming corrupted,
when a data pointer update becomes visible to the reading processor
before the data itself becomes visible.

On Jul 8, 2011, at 6:21 AM, Paul Davis wrote:

...but as oliver mentioned,
we've done some fairly exhaustive testing...


Really? I don't remember anyone reporting test results for an SMP
Alpha, or a Sparc v9 running in RMO mode (i.e. under linux), or a
multiprocessor (not just multicore) G5. Not to mention any of the
newer processors released in the last 2.5 years. More to-the-point,
exhaustive testing to prove the non-existence of a behavior that is
expected to be fairly uncommon to begin with, on every processor
where the code is ever likely to be run? That's a fool's errand.

Better to just follow the recommendations of the respective ABIs,
and put in the memory barriers for those platforms that need them,
like PortAudio, the linux kernel, and most other implementations
do.

Hope that's useful, it would be nice to put this subject to bed.

-Sean






___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Dan Kegel
On Fri, Jul 8, 2011 at 8:49 PM, Paul Davis  wrote:
> could be true enough, but i will be happy to fake ignorance and say i
> don't know for sure. but ...

I sure as heck don't know for sure :-)

> the whole point single reader/single writer lock-free FIFOs is that
> *synchronization doesn't matter*.

At least on x86, where Intel was very careful to preserve the illusion
of coherence, it works... usually?

http://en.wikipedia.org/wiki/Non-blocking_algorithm seems to agree with you,
but http://www.rossbencina.com/code/lockfree seems to agree with you
only on a uniprocessor system.

http://www.codeproject.com/KB/threads/LockFree.aspx has lots of interesting
links.In particular, it links to a couple articles by Herb Sutter,

http://drdobbs.com/cpp/210600279
http://drdobbs.com/high-performance-computing/211601363

He makes it sound like it's only possible using compare-and-swap
atomic primitives.
- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Fons Adriaensen
On Fri, Jul 08, 2011 at 10:59:14PM +0200, Arnold Krille wrote:

> Empty for reading. Lots of space for writing.
> Its much more interesting to see what happens when the indezes cross the 2^32 
> or 2^64 mark, then the write-index will be smaller then the read-index untill 
> the read-index catches up. But as far as I see currently, this wouldn't be a 
> problem either. The writer only has to stop writing when its at read_ptr - 1.

It is assumed that the buffer is used correctly, so no one does e.g
a wr_commit(n) with n > wr_avail(), same for read. There is no
problem with wraparound at 2^32 or 2^64 if size is a power of 2.
 
> Maybe I don't understand it all, but with fons approach I think it only works 
> when the buffer-sizes are 2^n. When you have a buffer of say 5 elements, 
> doing 
> the modulo at the element-access and not at the read/write-head-movement, 
> this 
> will jump every now and then, right?

It works only if the size is a divider of the int size, so
it must be a power of 2. But that is the case for Jack's
implementation as well (or anything that uses  'n & (size-1)'
instead of 'n % size'.

Ciao,

-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Fons Adriaensen
On Fri, Jul 08, 2011 at 04:41:54PM -0400, Paul Davis wrote:
> On Fri, Jul 8, 2011 at 3:53 PM, Fons Adriaensen  wrote:
> 
> > - You can use all 2^n elements, there is no ambiguity between
> >  full and empty.
> 
> so what does it mean when wr_index == rd_index?

That the buffer is empty, rd_avail = size, 
wr_avail = 0


-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Arnold Krille
On Friday 08 July 2011 22:41:54 Paul Davis wrote:
> On Fri, Jul 8, 2011 at 3:53 PM, Fons Adriaensen  wrote:
> > - You can use all 2^n elements, there is no ambiguity between
> >  full and empty.
> 
> so what does it mean when wr_index == rd_index?

Empty for reading. Lots of space for writing.
Its much more interesting to see what happens when the indezes cross the 2^32 
or 2^64 mark, then the write-index will be smaller then the read-index untill 
the read-index catches up. But as far as I see currently, this wouldn't be a 
problem either. The writer only has to stop writing when its at read_ptr - 1.

Maybe I don't understand it all, but with fons approach I think it only works 
when the buffer-sizes are 2^n. When you have a buffer of say 5 elements, doing 
the modulo at the element-access and not at the read/write-head-movement, this 
will jump every now and then, right?

Have fun,

Arnold


signature.asc
Description: This is a digitally signed message part.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Tim Blechmann
>> > I thought a "lock-free" ring buffer was supposed to be
>> > the easy solution!
>> 
>> It is... when you re-use one that's already been written and
>> debugged.  ;-)
>> 
>> Why not copy/paste the JACK ringbuffer (C) or even Ardours
>> (C++ Container)?
> 
> Here's another C++ one, for the record:
> 
> http://svn.drobilla.net/lad/trunk/raul/raul/RingBuffer.hpp

the linux kernel and supercollider have their own implementations of this 
algorithm as well ... iirc it is also described in the book `the art of 
multiprocessor programming' by maurice herlihy and nir shavit.

and i would like to do some self-advertising: i have submitted some of my lock-
free data structures to boost, and they will be reviewed in the end of july 
(18th-27th). source and docs are online [1, 2] and i'd like to invite everybody 
with an interest in lock-free programming to take part in the review. all the 
provided data structures (wait-free spsc ringbuffer, lock-free mpmc fifo and 
lock-free mpmc lifo) are potentially interesting for audio applications and i 
heavily use them for my multiprocessor-aware supercollider server ...

the review itself will take place on the boost-dev mailinglist [3]

cheers, tim

[1] http://tim.klingt.org/boost_lockfree
[2] http://tim.klingt.org/git?p=boost_lockfree.git;a=summary
[3] http://www.boost.org/community/groups.html#main

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Paul Davis
On Fri, Jul 8, 2011 at 3:29 PM, Dan Kegel  wrote:
> On Fri, Jul 8, 2011 at 7:23 PM, Dan Muresan  wrote:
>> (http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html):
>>...
>> The only thing that volatile accomplishes is to slow down
>> properly-synchronized programs. volatile is for signal handlers and
>> interrupt handlers, not for threading.
>>...
>> This reordering cannot be prevented without proper synchronization. So
>> my advice to anyone considering this would be to drop volatile and do
>> proper synchronization at the application level (i.e. semaphores,
>> since it has to work in real time).
>
> +1
>
> volatile is not the droid you're looking for.

could be true enough, but i will be happy to fake ignorance and say i
don't know for sure. but ...

the whole point single reader/single writer lock-free FIFOs is that
*synchronization doesn't matter*. to state it again: the whole essence
of the design is that the reader and writer can execute with any
timing behaviour whatsoever and the "worst" that can happen is that
they each think there is less data/space available than there actually
is. the design is NOT attempting to implement a synchronized
interaction at all, because it is not necessary.

this is why we don't care about the types of stuff that Dan Muresan
mentioned, except to the extent that it could actually lead to the
computation of data/space available being wrong in a deeper way. we do
NOT care about synchronization for these data types unless it could
actually cause the value of one of the indexes to be accessed at a
time that is "intermediate" in the sense of being "between two C
expressions". it appears that at this time, this is not possible on
any architecture that you might be sensibly using lock free FIFOs
with.

IF we had a processor pipeline that was really big enough and really
did enough runtime optimization across the entire pipeline, then i can
imagine how this could turn into a problem. but AFAIK (and again this
subject has been hashed to death many times here and elsewhere),
current processors do not have this architecture and there's no sign
of them being imminent or even that desirable.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Paul Davis
On Fri, Jul 8, 2011 at 3:53 PM, Fons Adriaensen  wrote:

> - You can use all 2^n elements, there is no ambiguity between
>  full and empty.

so what does it mean when wr_index == rd_index?
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Fons Adriaensen
On Fri, Jul 08, 2011 at 09:05:42PM +0200, Arnold Krille wrote:
 
> What would happen? The ringbuffer in jack is explicitely only for one-reader-
> one-writer. So in this optimization, the only participant using the read_ptr 
> to do something possibly bad, is the reading participant which is currently 
> executing this code.
> The writing participant can access that read_ptr for example to check the 
> available space. But as the docs state (afair), the available sizes for 
> read/write are not strict functions, the only thing that counts is if you 
> have 
> space for reading/writing. And that is fulfilled if read_ptr!=write_ptr...

Actually things are a little better: you can be sure that _at least_
the space indicated is available for reading or writing, not just that
you can read or write one item (byte in this case).

Looking at the way things are implemented, the right value would
be returned even if the unmasked 'other side' count is used.

I'm using a slightly different (C++) implementation: the stored
values equivalent to jack's rd_ptr and wr_ptr (wrong names, they
are not pointers but indices) are never masked. The 'size-1' mask
(with size = 2^N) is applied only when they are used to index the
buffer array.

This has two consequences:

- You can use all 2^n elements, there is no ambiguity between
  full and empty.

- The code is simplified a bit:

  n_avail_read  = wr_index - rd_index;
  n_avail_write = rd_index - wr_index + size;

  and no conditions, masking, or -1  are necessary.

Ciao,

-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Dan Kegel
On Fri, Jul 8, 2011 at 7:23 PM, Dan Muresan  wrote:
> (http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html):
>...
> The only thing that volatile accomplishes is to slow down
> properly-synchronized programs. volatile is for signal handlers and
> interrupt handlers, not for threading.
>...
> This reordering cannot be prevented without proper synchronization. So
> my advice to anyone considering this would be to drop volatile and do
> proper synchronization at the application level (i.e. semaphores,
> since it has to work in real time).

+1

volatile is not the droid you're looking for.
- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Dan Muresan
>> > > the one wrinkle in this is that in theory a compiler
>> > > could so completely reorder instructions that even the
>> > > basic assumptions that make the single
>> > > reader/single-writer ringbuffer safe would break down.

Forget about the compiler, the hardware instruction pipeline could
reorder ANYTHING that is unsynchronized. Plus two different threads
might read/write to different cache memories. Here's a paragraph from
the Double Chekced Locking is Broken Declaration
(http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html):

"... processors have their own locally cached copies of memory. On
some processors, unless the processor performs a cache coherence
instruction (e.g., a memory barrier), reads can be performed out of
stale locally cached copies, even if other processors used memory
barriers to force their writes into global memory."

There's in link in that article to how that can happen on an Alpha.

>> > AFAIK nothing fatal can happen if the variables involved
>> > are declared volatile. A compiler is not allowed to

The only thing that volatile accomplishes is to slow down
properly-synchronized programs. volatile is for signal handlers and
interrupt handlers, not for threading.

> read/write are not strict functions, the only thing that counts is if you have
> space for reading/writing. And that is fulfilled if read_ptr!=write_ptr...

This is not the real problem. Here's where reordering can hurt:

"Without full sync, a non-owned pointer can "appear" to
  change before the pointed data is ready (due to cache non-coherence
  and access reordering by compiler or hardware). E.g. the consumer
  may "see" the write pointer move before its view of the pointed data
  is up-to-date, leading to bogus data. Alternatively, the producer
  may "see" the read pointer move before the consummer finishes
  loading the pointed data; the producer will thus see the
  still-in-use data as empty space and may overwrite it, leading to
  lost frames."

(From the write-up at the end of my own program, sintvert.c, at
https://github.com/danmbox/sintvert/blob/master/sintvert.c)

This reordering cannot be prevented without proper synchronization. So
my advice to anyone considering this would be to drop volatile and do
proper synchronization at the application level (i.e. semaphores,
since it has to work in real time).


-- Dan
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Arnold Krille
On Friday 08 July 2011 20:12:08 Gabriel M. Beddingfield wrote:
> On Friday, July 08, 2011 12:17:34 pm Fons Adriaensen wrote:
> > On Fri, Jul 08, 2011 at 09:21:55AM -0400, Paul Davis
> 
> wrote:
> > > the one wrinkle in this is that in theory a compiler
> > > could so completely reorder instructions that even the
> > > basic assumptions that make the single
> > > reader/single-writer ringbuffer safe would break down.
> > 
> > AFAIK nothing fatal can happen if the variables involved
> > are declared volatile. A compiler is not allowed to
> > omit, repeat, or re-order instructions involving them.
> 
> Take for instance jack_ringbuffer_read(), which has this
> line:
> 
>   rb->read_ptr = (rb->read_ptr + n) & rb->size_mask;
> 
> There's a remote possibility that the compiler could
> optimize this as:
> 
>   rb->read_ptr += n;
>   rb->read_ptr &= rb->size_mask;
> 
> ...and this would break the ringbuffer.  I don't know if the
> `volatile` keyword prevents this or not.

What would happen? The ringbuffer in jack is explicitely only for one-reader-
one-writer. So in this optimization, the only participant using the read_ptr 
to do something possibly bad, is the reading participant which is currently 
executing this code.
The writing participant can access that read_ptr for example to check the 
available space. But as the docs state (afair), the available sizes for 
read/write are not strict functions, the only thing that counts is if you have 
space for reading/writing. And that is fulfilled if read_ptr!=write_ptr...

Have fun,

Arnold


signature.asc
Description: This is a digitally signed message part.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Fons Adriaensen
On Fri, Jul 08, 2011 at 01:12:08PM -0500, Gabriel M. Beddingfield wrote:

> There's a remote possibility that the compiler could 
> optimize this as:
> 
>   rb->read_ptr += n;
>   rb->read_ptr &= rb->size_mask;
> 
> ...and this would break the ringbuffer.  I don't know if the 
> `volatile` keyword prevents this or not.

It certainly does.

Ciao,

-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Gabriel M. Beddingfield
On Friday, July 08, 2011 12:17:34 pm Fons Adriaensen wrote:
> On Fri, Jul 08, 2011 at 09:21:55AM -0400, Paul Davis 
wrote:
> > the one wrinkle in this is that in theory a compiler
> > could so completely reorder instructions that even the
> > basic assumptions that make the single
> > reader/single-writer ringbuffer safe would break down.
> 
> AFAIK nothing fatal can happen if the variables involved
> are declared volatile. A compiler is not allowed to
> omit, repeat, or re-order instructions involving them.

Take for instance jack_ringbuffer_read(), which has this 
line:

  rb->read_ptr = (rb->read_ptr + n) & rb->size_mask;

There's a remote possibility that the compiler could 
optimize this as:

  rb->read_ptr += n;
  rb->read_ptr &= rb->size_mask;

...and this would break the ringbuffer.  I don't know if the 
`volatile` keyword prevents this or not.

-gabriel
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Fons Adriaensen
On Fri, Jul 08, 2011 at 09:21:55AM -0400, Paul Davis wrote:

> the one wrinkle in this is that in theory a compiler could so
> completely reorder instructions that even the basic assumptions that
> make the single reader/single-writer ringbuffer safe would break down.

AFAIK nothing fatal can happen if the variables involved are
declared volatile. A compiler is not allowed to omit, repeat,
or re-order instructions involving them.

Ciao,

-- 
FA

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread David Robillard
On Thu, 2011-07-07 at 20:04 -0500, Gabriel M. Beddingfield wrote:
> On Thursday, July 07, 2011 07:50:57 pm James Morris wrote:
> > I thought a "lock-free" ring buffer was supposed to be
> > the easy solution!
> 
> It is... when you re-use one that's already been written and 
> debugged.  ;-)
> 
> Why not copy/paste the JACK ringbuffer (C) or even Ardours 
> (C++ Container)?

Here's another C++ one, for the record:

http://svn.drobilla.net/lad/trunk/raul/raul/RingBuffer.hpp

-dr


___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Paul Davis
On Fri, Jul 8, 2011 at 7:24 AM, James Morris  wrote:

> While I thought my rbuf was broken I read about how lock-free code is
> not really lock-free, it just uses very fine grained locking.
> http://stackoverflow.com/questions/2528969/lock-free-multi-threading-is-for-real-threading-experts/2530082#2530082

we've been through this several times on LAD-related mailing lists.

the usability of a lock free ringbuffer does NOT rely on the stuff
discussed in that article.

the reason it works is because even if there is a sync error between
threads, the error has no negative consequences. put differently, a
lock free ringbuffer does not rely on two threads interacting in a
particular way, it relies on the fact that if they interact in the
"wrong" way, all that happens is that the reader sees less data to
read than actually exists, and/or the writer sees less space to write
than is actually available. if your application views this as a
"negative", then it will have to use some kind of locking primitives
to avoid it, but most audio applications can deal with this without
blinking.

this is entirely different from more generic lock free programming
techniques (e.g. lock free lists) where it is definitely true that you
need to understand memory barriers and so forth to ensure that they
are safe). its also important to recognize that the "locks" used by
memory barriers are processor-level, and not really equivalent in most
senses to the locks used by the kernel or user space. they do have the
capability to block a thread, but the delay is measured in
instructions (mostly) and does not involve process-level information.

the one wrinkle in this is that in theory a compiler could so
completely reorder instructions that even the basic assumptions that
make the single reader/single-writer ringbuffer safe would break down.
this is theoretically possible (and actually, not that hard to create
in practice if you were really trying to) but as oliver mentioned,
we've done some fairly exhaustive testing that when combined with a
moderately common sense approach to compiler design (e.g. the
existence of function scope and related stuff) makes it fundamentally
not worth worrying about this theoretical possibility.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread Olivier Guilyardi
On 07/08/2011 02:21 PM, James Morris wrote:

> JACK's ringbuf, as most will have undoubtedly known all along, is
> faster, and the test code required 5 iterations less than when
> using my ring buf. maybe somewhere, the atomics are required?
> james.

I cannot comment on atomics op, but we have done rather exhaustive testing of
existing ringbuffer implementations on this list in the past.

The small test suite doesn't include any benchmarks, but it does perform
integrity checks which, at the time, allowed to fix a bug in the JACK 
ringbuffer.

The code tests and includes 4 implementations: JACK ringbuffer, PortAudio
ringbuffer, FFMpeg FIFO, and Fons' LFQ. You can check it out and run it with:

$ svn co http://svn.samalyse.com/misc/rbtest
$ cd rbtest
$ make test

There is no dependency. Maybe that you could use these tests (especially
test-int-array) to check data integrity with your implementation.

Hope that helps

--
  Olivier
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread James Morris
On 8 July 2011 12:24, James Morris  wrote:

> So I'd thought I'd compare the execution time of 7000 items passing
> through the two 32-item ring buffers, with the execution time 7000
> items passing though 32-item jack-ringbuffers. details to come,
> probably to throw all this out the window.
>

MSG_BUF_SIZE =  32,
MSG_COUNT = 70

JACK RING BUF:
[sirrom@scrapyard msg_test]$ time ./msg_test
  msg_test.c: 164 main(): read
back loop count:700032

real0m0.136s
user0m0.260s
sys 0m0.003s
[sirrom@scrapyard msg_test]$ time ./msg_test
  msg_test.c: 164 main(): read
back loop count:700032

real0m0.136s
user0m0.263s
sys 0m0.000s
[sirrom@scrapyard msg_test]$ time ./msg_test
  msg_test.c: 164 main(): read
back loop count:700032

real0m0.142s
user0m0.273s
sys 0m0.003s



MY RING BUF WITH ATOMIC POINTERS

[sirrom@scrapyard msg_test]$ time ./msg_test
  msg_test.c: 286 main(): read
back loop count:702657

real0m0.169s
user0m0.330s
sys 0m0.000s
[sirrom@scrapyard msg_test]$ time ./msg_test
  msg_test.c: 286 main(): read
back loop count:738329

real0m0.173s
user0m0.337s
sys 0m0.000s
[sirrom@scrapyard msg_test]$ time ./msg_test
  msg_test.c: 286 main(): read
back loop count:753976

real0m0.171s
user0m0.333s
sys 0m0.000s


A very simple data integrity check was used, (note: pointers to data
were stored in the rbufs rather than data), the main thread set the
items data to point to the item itself. the integrity check was that
it should come back intact, still pointing to itself. success in both
instances.

JACK's ringbuf, as most will have undoubtedly known all along, is
faster, and the test code required 5 iterations less than when
using my ring buf. maybe somewhere, the atomics are required?
james.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread James Morris
On 8 July 2011 01:50, James Morris  wrote:
> On 7 July 2011 13:10, James Morris  wrote:
>> just wondered if any more-experienced-than-i devs might comment on
>> this. written in c for c (obviously). i realize it's not portable
>> outside of GNU GCC (regarding the GCC atomic builtin funcs
>> __sync_***). meant for a single reader thread and a single writer
>> thread. comments regarding thread safety very much welcome. thanks in
>> advance.
>>
>> james.
>>
>
> I didn't have time to test it before going to work, when posting
> earlier, but was hoping someone who might see something immediately
> wrong with it might comment. I'm mainly posting back to say it doesn't
> work.
>
> It does mostly work, but not always, which is as good as not working
> at all. It looses the occasional write. For example my test program
> has two threads and two buffers. The first buffer is used for thread 1
> to send data to thread 2, and the second is used to send it back
> again. With 70 items originating in thread 1, and both buffers large
> enough to hold 32 items, 1 item is lost perhaps 1 in 7 program
> executions.
>
> I thought a "lock-free" ring buffer was supposed to be the easy solution!

I've just realized the problem was an error in the testing code. The
2nd thread would read an item from the 1st buffer, and then try to
write it to the 2nd buffer. When the test code failed to handle a
write failure (when both r/w ptrs of the 2nd buffer pointed to same
location *and* both were attempting to simultaneously access that same
location) it's loop back to read again, consequently discarding the
item that failed to write.

So I changed my test code thread loop and the problem goes away. The
loop has to stop reading when a write fails.

While I thought my rbuf was broken I read about how lock-free code is
not really lock-free, it just uses very fine grained locking.
http://stackoverflow.com/questions/2528969/lock-free-multi-threading-is-for-real-threading-experts/2530082#2530082

I'm still uncertain about the reliability of pointer operations being
atomic (otherwise why would glib provide atomic pointer ops?). I
currently assume it can't be guaranteed, which is why I wanted to use
the GCC atomic operations for reading/writing to the read/write
pointers.

So back to the fine grained (b)locking, I decided to count the noof
times it failed to write... I was going to provide counts of failures,
averages etc, but... Well, the counts seemed to increase the more
effort the code put into gathering statistics about failures.

So I'd thought I'd compare the execution time of 7000 items passing
through the two 32-item ring buffers, with the execution time 7000
items passing though 32-item jack-ringbuffers. details to come,
probably to throw all this out the window.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread James Morris
On 8 July 2011 02:04, Gabriel M. Beddingfield  wrote:

> Why not copy/paste the JACK ringbuffer (C) or even Ardours
> (C++ Container)?

Sorry, with my warbling I forgot all about asking when the lock
function of jack ringbuffer would be used?

Cheers,
James.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-08 Thread James Morris
On 8 July 2011 02:04, Gabriel M. Beddingfield  wrote:
> On Thursday, July 07, 2011 07:50:57 pm James Morris wrote:
>> I thought a "lock-free" ring buffer was supposed to be
>> the easy solution!
>
> It is... when you re-use one that's already been written and
> debugged.  ;-)
>
> Why not copy/paste the JACK ringbuffer (C) or even Ardours
> (C++ Container)?

I think that when I was coding BoxySeq, I did look at the JACK
ringbuffer code and decided to simplify it for my purposes.  I "fixed
it" so there was no need for the byte count parameters for read/write,
and removed some of the functions I decided I didn't need (ie peek,
but then re-introduced my own versions). I found problems with my
implementation but it basically worked 99% of the time so I came back
to it the other day with the mistaken belief that atomic read/write
pointer operations along with a reduction of variables used for each
read/write operation would fix it. I was rather pleased actually with
how much this strategy made the code *look* cleaner, so surely it
would work!

James.

>
> -gabriel
> ___
> Linux-audio-dev mailing list
> Linux-audio-dev@lists.linuxaudio.org
> http://lists.linuxaudio.org/listinfo/linux-audio-dev
>
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-07 Thread Gabriel M. Beddingfield
On Thursday, July 07, 2011 07:50:57 pm James Morris wrote:
> I thought a "lock-free" ring buffer was supposed to be
> the easy solution!

It is... when you re-use one that's already been written and 
debugged.  ;-)

Why not copy/paste the JACK ringbuffer (C) or even Ardours 
(C++ Container)?

-gabriel
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


Re: [LAD] a *simple* ring buffer, comments pls?

2011-07-07 Thread James Morris
On 7 July 2011 13:10, James Morris  wrote:
> just wondered if any more-experienced-than-i devs might comment on
> this. written in c for c (obviously). i realize it's not portable
> outside of GNU GCC (regarding the GCC atomic builtin funcs
> __sync_***). meant for a single reader thread and a single writer
> thread. comments regarding thread safety very much welcome. thanks in
> advance.
>
> james.
>

I didn't have time to test it before going to work, when posting
earlier, but was hoping someone who might see something immediately
wrong with it might comment. I'm mainly posting back to say it doesn't
work.

It does mostly work, but not always, which is as good as not working
at all. It looses the occasional write. For example my test program
has two threads and two buffers. The first buffer is used for thread 1
to send data to thread 2, and the second is used to send it back
again. With 70 items originating in thread 1, and both buffers large
enough to hold 32 items, 1 item is lost perhaps 1 in 7 program
executions.

I thought a "lock-free" ring buffer was supposed to be the easy solution!

James.

>
> #include "rng_buf.h" /* only prototypes the public functions and
> typedefs the struct */
>
> #include 
> #include 
> #include "debug.h"
>
>
> struct _RingBuffer
> {
>    size_t count;
>
>    void** buf;
>    void** bufend;
>
>    void** volatile w;
>    void** volatile r;
> };
>
>
> RngBuf* rng_buf_new(size_t count)
> {
>    size_t sz = 1;
>    RngBuf* mb = malloc(sizeof(RngBuf));
>
>    if (!mb)
>    {
>        return 0;
>    }
>
>    for (sz = 1; sz < count; sz <<= 1)
>        ;
>
>    mb->buf = calloc(sz, sizeof(void*));
>
>    if (!mb->buf)
>    {
>        free(mb);
>        return 0;
>    }
>
>    mb->count = sz;
>    mb->bufend = mb->buf + mb->count - 1;
>    mb->w = mb->buf;
>    mb->r = mb->buf;
>
>    return mb;
> }
>
>
> void rng_buf_free(RngBuf* mb)
> {
>    free(mb->buf);
>    free(mb);
> }
>
>
> size_t rng_buf_write(RngBuf* mb, const void* data)
> {
>    if (__sync_bool_compare_and_swap(mb->w, 0, data))
>    {
>        if (!(__sync_bool_compare_and_swap(&mb->w, mb->bufend, mb->buf)))
>            __sync_add_and_fetch(&mb->w, sizeof(void*));
>
>        return (size_t)1;
>    }
>
>    return (size_t)0;
> }
>
>
> void* rng_buf_read(RngBuf* mb)
> {
>    void* data;
>
>    if ((data = __sync_fetch_and_and(mb->r, 0)))
>    {
>        if (!__sync_bool_compare_and_swap(&mb->r, mb->bufend, mb->buf))
>            __sync_add_and_fetch(&mb->r, sizeof(void*));
>
>        return data;
>    }
>
>    return NULL;
> }
>
>
> void rng_buf_reset(RngBuf* mb)
> { /* needs work */
>    mb->r = mb->w = mb->buf;
> }
>
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev


[LAD] a *simple* ring buffer, comments pls?

2011-07-07 Thread James Morris
just wondered if any more-experienced-than-i devs might comment on
this. written in c for c (obviously). i realize it's not portable
outside of GNU GCC (regarding the GCC atomic builtin funcs
__sync_***). meant for a single reader thread and a single writer
thread. comments regarding thread safety very much welcome. thanks in
advance.

james.



#include "rng_buf.h" /* only prototypes the public functions and
typedefs the struct */

#include 
#include 
#include "debug.h"


struct _RingBuffer
{
size_t count;

void** buf;
void** bufend;

void** volatile w;
void** volatile r;
};


RngBuf* rng_buf_new(size_t count)
{
size_t sz = 1;
RngBuf* mb = malloc(sizeof(RngBuf));

if (!mb)
{
return 0;
}

for (sz = 1; sz < count; sz <<= 1)
;

mb->buf = calloc(sz, sizeof(void*));

if (!mb->buf)
{
free(mb);
return 0;
}

mb->count = sz;
mb->bufend = mb->buf + mb->count - 1;
mb->w = mb->buf;
mb->r = mb->buf;

return mb;
}


void rng_buf_free(RngBuf* mb)
{
free(mb->buf);
free(mb);
}


size_t rng_buf_write(RngBuf* mb, const void* data)
{
if (__sync_bool_compare_and_swap(mb->w, 0, data))
{
if (!(__sync_bool_compare_and_swap(&mb->w, mb->bufend, mb->buf)))
__sync_add_and_fetch(&mb->w, sizeof(void*));

return (size_t)1;
}

return (size_t)0;
}


void* rng_buf_read(RngBuf* mb)
{
void* data;

if ((data = __sync_fetch_and_and(mb->r, 0)))
{
if (!__sync_bool_compare_and_swap(&mb->r, mb->bufend, mb->buf))
__sync_add_and_fetch(&mb->r, sizeof(void*));

return data;
}

return NULL;
}


void rng_buf_reset(RngBuf* mb)
{ /* needs work */
mb->r = mb->w = mb->buf;
}
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev