Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2011-07-14 Thread robert bristow-johnson


On Jul 13, 2011, at 9:29 AM, Olli Niemitalo wrote:


On Sat, Jul 9, 2011 at 10:53 PM, robert bristow-johnson
r...@audioimagination.com wrote:

On Dec 7, 2010, at 5:27 AM, Olli Niemitalo wrote:


[I] chose that the ratio a(t)/a(-t) [...] should be preserved


by preserved, do you mean constant over all t?


Constant over all r.



i think i figgered that out after hitting the Send button.


what is the fundamental reason for preserving a(t)/a(-t) ?


I'm thinking outside your application of automatic finding of splice
points. Think of crossfades between clips in a multi-track sample
editor. For a cross-fade in which one signal is faded in using a
volume envelope that is a time-reverse of the volume envelope using
which the other signal is faded out, a(t)/a(-t) describes by what
proportions the two signals are mixed at each t. The fundamental
reason then is that I think it is a rather good description of the
shape of the fade, to a user, as it will describe how the second
signal swallows the first by time.


okay, i get it.

so instead of expressing the crossfade envelope as

a(t)  =   e(t)   +   o(t)

i think we could describe it as a constant-voltage crossfade (those  
used for splicing perfectly correlated snippets) bumped up a little by  
an overall loudness function.  an envelope acting on the envelope.   
and, as you correctly observed, for constant-voltage crossfades, the  
even component is always


e(t)  =   1/2

so, pulling another couple of letters outa the alfabet, we can  
represent the crossfade function as


a(t)  =  e(t)  +  o(t)  =  g(t)*( 1/2 + p(t) )

where

g(-t)  =   g(t)  is even
and
p(-t)  =  -p(t)  is odd


g(t) = 1 for constant-voltage crossfades, when r=1.
for constant-power crossfades, r=0, we know that g(0) = sqrt(2)  1

the shape p(t) is preserved for different values of r and we want to  
solve for g(t) given a specified correlation value r and a given  
shape family p(t).  indeed


   a(t)/a(-t)  =  (1/2 + p(t))/(1/2 - p(t))

and remains preserved over r if p(t) remains unchanged.

p(t) can be spec'd initially exactly like o(t) (linear crossfade,  
Hann, Flattened Hann, or whatever odd function your heart desires).  i  
think it should be easy to solve for g(t).  we know that



  e(t)  =  1/2 * g(t)

  o(t)  =  g(t) * p(t)

and recall the result

  e(t)  =  sqrt( (1/2)/(1+r) - (1-r)/(1+r)*(o(t))^2 )

which comes from

  (1+r)*( e(t) )^2  +  (1-r)*( o(t) )^2  =  1/2

so
  (1+r)*( 1/2*g(t) )^2  +  (1-r)*( g(t)*p(t) )^2  =  1/2


  ( g(t) )^2 * ( (1+r)/4 + (1-r)*(p(t))^2 )  =  1/2

and picking the positive square root for g(t) yields

  g(t)  =  1/sqrt( (1+r)/2 + 2*(1-r)*(p(t))^2 )

might this result match what you have?  (assemble a(t) from g(t) and  
p(t) just as we had previously from e(t) and o(t).)


remember that p(t) is odd so p(0)=0  so when

  r=1  ---   g(t) = 1  (constant-voltage crossfade)
and

  r=0  ---   g(0) = sqrt(2)(constant-power crossfade)



The user might choose one shape
for a particular crossfade. Then, depending on the correlation between
the superimposed signals, an appropriate symmetrical volume envelope
could be applied to the mixed signal to ensure that there is no peak
or dip in the contour of the mixed signal. Because the envelope is
symmetrical, applying it preserves a(t)/a(-t). It can also be
incorporated directly into a(t).

All that is not so far off from the application you describe.

but i don't think it is necessary to deal with lags where Rxx(tau)  
 0.  why
splice a waveform to another part of the same waveform that has  
opposite

polarity?  that would create an even a bigger glitch.


Splicing at quiet regions with negative correlation can give a smaller
glitch than splicing at louder regions with positive correlation.


okay.  i would still like to hunt for a splice displacement around  
that quiet region that would have correlation better than zero.  and,  
if both x(t) and y(t) have no DC, it should be possible to find  
something.



This
applies particularly to rhythmic material like drum loops, where the
time lag between the splice points is constrained, and it may make
most sense to look for quiet spots. However, if it's already so quiet
in there, I don't know how much it matters what you use for a
cross-fade.

Apart from it's so quiet it doesn't matter, I can think of one other
objection against using cross-fades tailored for r  0: For example,
let's imagine that our signal is white noise generated from a Gaussian
distribution, and we are dealing with given splice points for which
Rxx(tau)  0 (slightly).


but you should also be able to find a tau where Rxx(tau) is slightly  
greater than zero because Rxx(tau) should be DC free (if x(t) is DC  
free).  if it were true noise, it should not be far from zero so you  
would likely use the r=0 crossfade function.



Now, while the samples of the signal were
generated independently, there is by accident a bit of 

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2011-07-14 Thread Olli Niemitalo
On Thu, Jul 14, 2011 at 9:22 PM, robert bristow-johnson
r...@audioimagination.com wrote:

      g(t)  =  1/sqrt( (1+r)/2 + 2*(1-r)*(p(t))^2 )

 might this result match what you have?

Yes! I only derived the formula for the linear ramp, p(t) = t/2,
because one can get the other shapes by warping time and I didn't want
to bloat the cumbersome equations. With the linear ramp our results
match exactly.

 okay.  i would still like to hunt for a splice displacement around that
 quiet region that would have correlation better than zero

Sometimes you are stuck with a certain displacement. Think drum loops;
changing tau would change tempo.

 i think it's better to define p(t) (with the same restrictions as o(t)) and 
 find g(t) as a
 function of r than it is to do it with o(t) and e(t).

I agree, even though the theory was quite elegant with o(t) and e(t)...

-olli
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp


Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2011-07-14 Thread robert bristow-johnson


On Jul 14, 2011, at 5:36 PM, Olli Niemitalo wrote:


On Thu, Jul 14, 2011 at 9:22 PM, robert bristow-johnson
r...@audioimagination.com wrote:


 g(t)  =  1/sqrt( (1+r)/2 + 2*(1-r)*(p(t))^2 )

might this result match what you have?


Yes! I only derived the formula for the linear ramp, p(t) = t/2,
because one can get the other shapes by warping time and I didn't want
to bloat the cumbersome equations. With the linear ramp our results
match exactly.

okay.  i would still like to hunt for a splice displacement  
around that

quiet region that would have correlation better than zero


Sometimes you are stuck with a certain displacement. Think drum loops;
changing tau would change tempo.

i think it's better to define p(t) (with the same restrictions as  
o(t)) and find g(t) as a

function of r than it is to do it with o(t) and e(t).


I agree, even though the theory was quite elegant with o(t) and  
e(t)...




do you have any of this in a document?  i wonder if one of us should  
put this down in a pdf and put it in the music-dsp code archive.



--

r b-j  r...@audioimagination.com

Imagination is more important than knowledge.




--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp