Re: [LAD] [LAU] Simple, easy multithreaded circular buffer library for Linux?

2008-10-19 Thread Paul Davis
On Sat, 2008-10-18 at 23:24 -0500, Jack O'Quin wrote:

 If the amount read or written exactly fills the buffer, then a read or write
 pointer equal to the last entry plus one will be stored momentarily, before
 the correct (masked) value wraps it back to the beginning of the buffer.  If
 the other thread looks at it in that state, I believe data will be
 copied outside
 the bounds of the buffer, which is bad.

this is indeed the case. for read space, if the write ptr is ahead of
the read ptr, we return write_ptr - read_ptr, which will return a value
that is too large if write_ptr gets snapshotted after the increment
and before the mask. 

the same is true for writing. if the read ptr is ahead of the write ptr,
we return the difference between them minus one (the extra -1 stops the
write ptr catching up with the read ptr, a condition that indicates that
the buffer is empty). this value will also be too large if the read_ptr
is used in its intermediate state.

i am deeply embarrassed but also puzzled. i remember worrying about
precisely this issue on and over across a period of several years, and
yet satisfying myself that there actually wasn't a problem. i think that
my mistake was as follows: if you look at the read space case, but focus
on whether the *read* ptr is too large at that time, then the safety of
the ringbuffer is still guaranteed. ditto, in the write space case, the
write ptr being too large doesn't violate the safety guarantee. but this
is backwards - the thread that computes write space is the writing
thread, which is the only one that modifies write_ptr. ergo, it never
sees write_ptr as volatile. ditto, the reading thread never sees
read_ptr as volatile. the problems arise because of the *other* pointer,
where the temporary intermediate state *does* cause the computations to
be in error.

i don't know whether to shoot myself or eat another couple of the
oft-promised hats.

so the next question is how best to prevent it. as far as i can see we
have a couple of proposals:

   1) fons' design, which never actually wraps readptr or writeptr, but
  masks the address used to access the data buffer

   2) removing the intermediate state's visibility

i admit to preferring (2) even though i know that with a 64 bit index,
not wrapping the ptrs is not really a problem.

however, it is not totally clear to me how to prevent an optimizing
compiler from doing the wrong thing here. unlike the claims made by
someone involved with portaudio, i believe that it is correct to declare
the read_ptr and write_ptr volatile, so that the compiler knows that it
cannot try to be clever about it accesses of the other ptr (i.e.
read_ptr for the writing thread, and vice versa). maybe the comment on
the use of volatile was based on some idea that i thought it made the
variables thread safe, which is and never was the case.

i suspect that the safest code looks like:

size_t tmp = (ptr + incr)  mask;
barrier(); 
ptr = tmp;

but i am not sure whether barrier needs to be read, write or both.

i think that the simpler code:

   ptr = (ptr + incr)  mask;

is subject to potential compiler and/or processor optimization that
might reduce it back to the problem case of two ops without an
intermediate load/store location. the volatile declaration ought to
prevent the compiler from doing this, and i don't see why a processor
would do this, ever, but clearly i've already been deeply wrong about
this. does anybody know for certain?

--p



___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev


Re: [LAD] [LAU] Simple, easy multithreaded circular buffer library for Linux?

2008-10-19 Thread Paul Davis
On Sun, 2008-10-19 at 09:55 +0200, Paul Davis wrote:

 i am deeply embarrassed but also puzzled.

maybe i don't need to eat any hats except for the paying insufficient
attention one. the jack ringbuffer code was ported from my original C++
implementation, which specifically does not have the problem of visible
intermediate state (and, btw, does force the use of atomic loads 
stores).

--p


___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev


Re: [LAD] [LAU] Simple, easy multithreaded circular buffer library for Linux?

2008-10-19 Thread Jack O'Quin
On Sun, Oct 19, 2008 at 2:55 AM, Paul Davis [EMAIL PROTECTED] wrote:

 i don't know whether to shoot myself or eat another couple of the
 oft-promised hats.

Don't beat yourself up too badly.  Multiple threads accessing shared memory
is *very* tricky.  Even smart people (like you) get it wrong sometimes.

 so the next question is how best to prevent it. as far as i can see we
 have a couple of proposals:

   1) fons' design, which never actually wraps readptr or writeptr, but
  masks the address used to access the data buffer

   2) removing the intermediate state's visibility

 i admit to preferring (2) even though i know that with a 64 bit index,
 not wrapping the ptrs is not really a problem.

I also prefer (2).

 however, it is not totally clear to me how to prevent an optimizing
 compiler from doing the wrong thing here. unlike the claims made by
 someone involved with portaudio, i believe that it is correct to declare
 the read_ptr and write_ptr volatile, so that the compiler knows that it
 cannot try to be clever about it accesses of the other ptr (i.e.
 read_ptr for the writing thread, and vice versa). maybe the comment on
 the use of volatile was based on some idea that i thought it made the
 variables thread safe, which is and never was the case.

 i suspect that the safest code looks like:

size_t tmp = (ptr + incr)  mask;
barrier();
ptr = tmp;

 but i am not sure whether barrier needs to be read, write or both.

 i think that the simpler code:

   ptr = (ptr + incr)  mask;

 is subject to potential compiler and/or processor optimization that
 might reduce it back to the problem case of two ops without an
 intermediate load/store location. the volatile declaration ought to
 prevent the compiler from doing this, and i don't see why a processor
 would do this, ever, but clearly i've already been deeply wrong about
 this. does anybody know for certain?

It is best to avoid assumptions about what some future compiler may
consider an optimization.  If the register pressure is high at some
point in the program, it may decide to store some value just to free up
register space for other variables.

The volatile declaration should remove any need for the compiler
barrier() statement, AFAICT.  Note that barrier() is a compiler
directive, and has no effect on the CPU's ability to reorder cache
operations in an SMP memory hierarchy.

Here is a fairly clear and complete description of the memory barrier
issues:  http://lxr.linux.no/linux/Documentation/memory-barriers.txt

As best I can tell, both ring buffer threads require general memory
barriers, because they both read and write (different) shared data.  In
Linux kernel terms, they'd need to use smp_mb(), since the problems
they address are multiprocessor-only.  But, since JACK is not compiled
differently for UP and SMP, the full mb() seems more appropriate.

That stuff makes my head hurt.
-- 
 joq
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev


Re: [LAD] [LAU] Simple, easy multithreaded circular buffer library for Linux?

2008-10-19 Thread Paul Coccoli
On Sun, Oct 19, 2008 at 12:24 AM, Jack O'Quin [EMAIL PROTECTED] wrote:
 On Sat, Oct 18, 2008 at 10:45 PM, Paul Coccoli [EMAIL PROTECTED] wrote:
 On Sat, Oct 18, 2008 at 11:29 PM, Jack O'Quin [EMAIL PROTECTED] wrote:
 This is wrong.  For the single reader, single writer case, atomic operations
 are *not* necessary.  The bug, as was already pointed out, is due to storing

 Let's agree to disagree, then.  Single-reader, single-writer does not
 automatically make something SMP safe.  There is large body of
 literature on lock-free data structures that agrees with me; someone
 posted a link to a collection of those earlier in the thread.

 Let's not.  This is not just a matter of opinion.  If you read that 
 literature,
 you will find that the ring buffer *is* safe for the single reader,
 single writer
 case.  In many other SMP situations, atomic operations *are* required,
 but not for ring buffers.

The only time you can get away without atomic ops is on uni-processor.
 Please cite a reference that says otherwise.

Notice that all the fixes proposed all involve removing the += and
using only assignment.
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev


Re: [LAD] [LAU] Simple, easy multithreaded circular buffer library for Linux?

2008-10-19 Thread Fons Adriaensen
On Sun, Oct 19, 2008 at 12:36:55PM -0400, Paul Coccoli wrote:

 The only time you can get away without atomic ops is on uni-processor.
  Please cite a reference that says otherwise.

Plaese show us how using a non-atomic addition could go wrong.

 Notice that all the fixes proposed all involve removing the += and
 using only assignment.

First, that is not the same as requiring atomic addition.
Second, it is not true. The one I proposed just does the '+='.
If it's wrong show us how.

Ciao,

-- 
FA

Laboratorio di Acustica ed Elettroacustica
Parma, Italia

Lascia la spina, cogli la rosa.

___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev


Re: [LAD] [LAU] Simple, easy multithreaded circular buffer library for Linux?

2008-10-19 Thread Stephen Sinclair
On Sun, Oct 19, 2008 at 1:08 PM, Fons Adriaensen [EMAIL PROTECTED] wrote:
 On Sun, Oct 19, 2008 at 12:36:55PM -0400, Paul Coccoli wrote:

 The only time you can get away without atomic ops is on uni-processor.
  Please cite a reference that says otherwise.

 Plaese show us how using a non-atomic addition could go wrong.

On a side note, does anyone know what the performance penalty is (if
any) for using atomic ops?
And does it scale according to number of CPU cores?  What other
factors are there?  I assume the caching architecture makes a big
difference.

I've been using the atomic-ops library from HP for doing various
things lately, and I find it quite nice.  But I have found myself
wondering whether I am paying some kind of penalty.  Of course, I'm
sure it's less than the penalty for using locks.


Steve
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev


[LAD] Denemo 0.8 Release -

2008-10-19 Thread Nils Gey
Dear linux audio users and developers,

Short version:
==
Denemo 0.8 is fresh, hot and avaible now! Grab your tarball @
http://download.savannah.gnu.org/releases/denemo/ (Windows binaries
will join in later) and have a look at our newest features including
full scripting support!

If you are a programmer please help us to give Denemo
JACKmidi/JACKtransport support so that its capable of blending with the
rest of Linux-pro-audio. For contact, feature requests, bug reports and
further information please visit http://www.denemo.org

Interesting version:
==
Denemo is a music notation program for Linux and Windows (and MacOS
some time ago) that lets you rapidly enter notation for typesetting via
the LilyPond music engraver (because Lilypond is the reference and
there is no sense in coding your own WYSIWYG notation apps). Its mainly
controlled via your pc-keyboard with several edit-modes and shortcuts. 

Please note that we need help! Denemo has already many notation
features build-in and if anything is not avaible you can enter
Lilypond commandos and save them with your denemo file so that you can
use the whole range of lilypond features. This means that Denemo is
already capable of writing full, professional scores. 

But it lacks sequencer-features like advanced playback and routing via
JACKmidi and support for JACKtransport. To really become the first
usefull Linux notation-editor and notation-sequencer this is the last
piece of the puzzle.

Version 0.8 changelog:

 1. A scripting interface to the Denemo commands has been created.
 2. Example script-based commands are provided with the Denemo
installation.
 3. New scripts can be hand-written or recorded from a sequence of
menu item clicks or by editing another script or a mixture of
these.
 4. New commands (scripts) can be installed in the menu system,
given keyboard shortcuts, and generally used as other commands
are.
 5. The example scripts provided include a script showing the
potential of Denemo for use in music education. In this example,
random notes are generated and the user has to name the note.
 6. Other examples include scripts for commands useful when
generating scores with percussion, guitar fingerings, orchestral
markings etc.
 7. Various bugfixes and improvements to midi import have been made.

greetings,

Nils Gey
www.denemo.org
___
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev