subject:"\[music\-dsp\] A theory of optimal splicing of audio in the time domain."

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2011-07-15 Thread robert bristow-johnson



On Jul 15, 2011, at 12:46 AM, Sampo Syreeni wrote:



What are you trying to accomplish here, really? Optimum splicing,  
sure, but against which precise criterion?


the precise criterion is how well the two signals being spliced  
correlate to one another.  i tried to set that up with the inner  
product notation.


   +inf
   =   =  integral{ x(t)*y(t) * w(t) dt}
   -inf

where w(t) is a window function centered at t=0.

the normalized correlation measure is:

  r  =  /  =  /

if r=1, they are perfectly correlated and a constant-voltage splice  
should be used.  if r=0 they are completely uncorrelated and a  
constant-power splice should be used.  if 0 < r < 1 then some kinda  
splice in between a constant-voltage and constant-power splice should  
be used.  if r < 0, then there has to be a boost of even *more* than 3  
dB (that sqrt(2) factor at g(0)) to keep the expected loudness  
envelope constant.  Olli and i see the need for such slightly  
differently.


--

r b-j  r...@audioimagination.com

"Imagination is more important than knowledge."




--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2011-07-15 Thread Olli Niemitalo

On Fri, Jul 15, 2011 at 7:46 AM, Sampo Syreeni  wrote:
> On 2011-07-15, Olli Niemitalo wrote:
>
> What are you trying to accomplish here, really? Optimum splicing, sure, but
> against which precise criterion?

My objective has not been to find a method for automatic splicing, but
to do nice cross-fades at given splice points.

There were multiple objectives:
* Intuitive definition of the cross-fade shape. Mixing ratio as a
function of time is a good definition.
* For stationary signals, there should be no clicks or transients
produced. This is taken care of by the smoothness of the cross-fade
envelopes.
* For stationary signals, the resulting measurable transition from the
volume level of signal 1 to volume level of signal 2 should follow the
chosen cross-fade shape. This can be accomplished knowing the volume
levels of the two signals and the correlation coefficient between the
two signals.

-olli
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2011-07-14 Thread Sampo Syreeni


On 2011-07-15, Olli Niemitalo wrote:

What are you trying to accomplish here, really? Optimum splicing, sure, 
but against which precise criterion?

--
Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2011-07-14 Thread robert bristow-johnson



On Jul 14, 2011, at 5:36 PM, Olli Niemitalo wrote:


On Thu, Jul 14, 2011 at 9:22 PM, robert bristow-johnson
 wrote:


 g(t)  =  1/sqrt( (1+r)/2 + 2*(1-r)*(p(t))^2 )

might this result match what you have?


Yes! I only derived the formula for the linear ramp, p(t) = t/2,
because one can get the other shapes by warping time and I didn't want
to bloat the cumbersome equations. With the linear ramp our results
match exactly.

okay.  i would still like to "hunt" for a splice displacement  
around that

quiet region that would have correlation better than zero


Sometimes you are stuck with a certain displacement. Think drum loops;
changing tau would change tempo.

i think it's better to define p(t) (with the same restrictions as  
o(t)) and find g(t) as a

function of r than it is to do it with o(t) and e(t).


I agree, even though the theory was quite elegant with o(t) and  
e(t)...




do you have any of this in a document?  i wonder if one of us should  
put this down in a pdf and put it in the music-dsp "code" archive.



--

r b-j  r...@audioimagination.com

"Imagination is more important than knowledge."




--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2011-07-14 Thread Olli Niemitalo

On Thu, Jul 14, 2011 at 9:22 PM, robert bristow-johnson
 wrote:
>
>      g(t)  =  1/sqrt( (1+r)/2 + 2*(1-r)*(p(t))^2 )
>
> might this result match what you have?

Yes! I only derived the formula for the linear ramp, p(t) = t/2,
because one can get the other shapes by warping time and I didn't want
to bloat the cumbersome equations. With the linear ramp our results
match exactly.

> okay.  i would still like to "hunt" for a splice displacement around that
> quiet region that would have correlation better than zero

Sometimes you are stuck with a certain displacement. Think drum loops;
changing tau would change tempo.

> i think it's better to define p(t) (with the same restrictions as o(t)) and 
> find g(t) as a
> function of r than it is to do it with o(t) and e(t).

I agree, even though the theory was quite elegant with o(t) and e(t)...

-olli
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2011-07-14 Thread robert bristow-johnson



On Jul 13, 2011, at 9:29 AM, Olli Niemitalo wrote:


On Sat, Jul 9, 2011 at 10:53 PM, robert bristow-johnson
 wrote:

On Dec 7, 2010, at 5:27 AM, Olli Niemitalo wrote:


[I] chose that the ratio a(t)/a(-t) [...] should be preserved


by "preserved", do you mean constant over all t?


Constant over all r.



i think i figgered that out after hitting the Send button.


what is the fundamental reason for preserving a(t)/a(-t) ?


I'm thinking outside your application of automatic finding of splice
points. Think of crossfades between clips in a multi-track sample
editor. For a cross-fade in which one signal is faded in using a
volume envelope that is a time-reverse of the volume envelope using
which the other signal is faded out, a(t)/a(-t) describes by what
proportions the two signals are mixed at each t. The fundamental
reason then is that I think it is a rather good description of the
shape of the fade, to a user, as it will describe how the second
signal swallows the first by time.


okay, i get it.

so instead of expressing the crossfade envelope as

a(t)  =   e(t)   +   o(t)

i think we could describe it as a constant-voltage crossfade (those  
used for splicing perfectly correlated snippets) bumped up a little by  
an overall loudness function.  an envelope acting on the envelope.   
and, as you correctly observed, for constant-voltage crossfades, the  
even component is always


e(t)  =   1/2

so, pulling another couple of letters outa the alfabet, we can  
represent the crossfade function as


a(t)  =  e(t)  +  o(t)  =  g(t)*( 1/2 + p(t) )

where

g(-t)  =   g(t)  is even
and
p(-t)  =  -p(t)  is odd


g(t) = 1 for constant-voltage crossfades, when r=1.
for constant-power crossfades, r=0, we know that g(0) = sqrt(2) > 1

the shape p(t) is preserved for different values of r and we want to  
solve for g(t) given a specified correlation value r and a given  
"shape" family p(t).  indeed


   a(t)/a(-t)  =  (1/2 + p(t))/(1/2 - p(t))

and remains preserved over r if p(t) remains unchanged.

p(t) can be spec'd initially exactly like o(t) (linear crossfade,  
Hann, Flattened Hann, or whatever odd function your heart desires).  i  
think it should be easy to solve for g(t).  we know that



  e(t)  =  1/2 * g(t)

  o(t)  =  g(t) * p(t)

and recall the result

  e(t)  =  sqrt( (1/2)/(1+r) - (1-r)/(1+r)*(o(t))^2 )

which comes from

  (1+r)*( e(t) )^2  +  (1-r)*( o(t) )^2  =  1/2

so
  (1+r)*( 1/2*g(t) )^2  +  (1-r)*( g(t)*p(t) )^2  =  1/2


  ( g(t) )^2 * ( (1+r)/4 + (1-r)*(p(t))^2 )  =  1/2

and picking the positive square root for g(t) yields

  g(t)  =  1/sqrt( (1+r)/2 + 2*(1-r)*(p(t))^2 )

might this result match what you have?  (assemble a(t) from g(t) and  
p(t) just as we had previously from e(t) and o(t).)


remember that p(t) is odd so p(0)=0  so when

  r=1  --->   g(t) = 1  (constant-voltage crossfade)
and

  r=0  --->   g(0) = sqrt(2)(constant-power crossfade)



The user might choose one "shape"
for a particular crossfade. Then, depending on the correlation between
the superimposed signals, an appropriate symmetrical volume envelope
could be applied to the mixed signal to ensure that there is no peak
or dip in the contour of the mixed signal. Because the envelope is
symmetrical, applying it "preserves" a(t)/a(-t). It can also be
incorporated directly into a(t).

All that is not so far off from the application you describe.

but i don't think it is necessary to deal with lags where Rxx(tau)  
< 0.  why
splice a waveform to another part of the same waveform that has  
opposite

polarity?  that would create an even a bigger glitch.


Splicing at quiet regions with negative correlation can give a smaller
glitch than splicing at louder regions with positive correlation.


okay.  i would still like to "hunt" for a splice displacement around  
that quiet region that would have correlation better than zero.  and,  
if both x(t) and y(t) have no DC, it should be possible to find  
something.



This
applies particularly to rhythmic material like drum loops, where the
time lag between the splice points is constrained, and it may make
most sense to look for quiet spots. However, if it's already so quiet
in there, I don't know how much it matters what you use for a
cross-fade.

Apart from "it's so quiet it doesn't matter", I can think of one other
objection against using cross-fades tailored for r < 0: For example,
let's imagine that our signal is white noise generated from a Gaussian
distribution, and we are dealing with given splice points for which
Rxx(tau) < 0 (slightly).


but you should also be able to find a tau where Rxx(tau) is slightly  
greater than zero because Rxx(tau) should be DC free (if x(t) is DC  
free).  if it were true noise, it should not be far from zero so you  
would likely use the r=0 crossfade function.



Now, while the samples of the signal were
generated independently, there is "by accident" a bit of negative
c

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2011-07-13 Thread Olli Niemitalo

On Sat, Jul 9, 2011 at 10:53 PM, robert bristow-johnson
 wrote:
> On Dec 7, 2010, at 5:27 AM, Olli Niemitalo wrote:
>
> > [I] chose that the ratio a(t)/a(-t) [...] should be preserved
>
> by "preserved", do you mean constant over all t?

Constant over all r.

> what is the fundamental reason for preserving a(t)/a(-t) ?

I'm thinking outside your application of automatic finding of splice
points. Think of crossfades between clips in a multi-track sample
editor. For a cross-fade in which one signal is faded in using a
volume envelope that is a time-reverse of the volume envelope using
which the other signal is faded out, a(t)/a(-t) describes by what
proportions the two signals are mixed at each t. The fundamental
reason then is that I think it is a rather good description of the
shape of the fade, to a user, as it will describe how the second
signal swallows the first by time. The user might choose one "shape"
for a particular crossfade. Then, depending on the correlation between
the superimposed signals, an appropriate symmetrical volume envelope
could be applied to the mixed signal to ensure that there is no peak
or dip in the contour of the mixed signal. Because the envelope is
symmetrical, applying it "preserves" a(t)/a(-t). It can also be
incorporated directly into a(t).

All that is not so far off from the application you describe.

> but i don't think it is necessary to deal with lags where Rxx(tau) < 0.  why
> splice a waveform to another part of the same waveform that has opposite
> polarity?  that would create an even a bigger glitch.

Splicing at quiet regions with negative correlation can give a smaller
glitch than splicing at louder regions with positive correlation. This
applies particularly to rhythmic material like drum loops, where the
time lag between the splice points is constrained, and it may make
most sense to look for quiet spots. However, if it's already so quiet
in there, I don't know how much it matters what you use for a
cross-fade.

Apart from "it's so quiet it doesn't matter", I can think of one other
objection against using cross-fades tailored for r < 0: For example,
let's imagine that our signal is white noise generated from a Gaussian
distribution, and we are dealing with given splice points for which
Rxx(tau) < 0 (slightly). Now, while the samples of the signal were
generated independently, there is "by accident" a bit of negative
correlation in the instantiation of the noise, between those splice
points. Knowing all this, shouldn't we simply use a constant-power
fade, rather than a fade tailored for r < 0, because random deviations
in noise power are to be expected, and only a constant-power fade will
produce noise that is statistically identical to the original. I would
imagine that noise with long-time non-zero autocorrelation (all the
way across the splice points) is a very rare occurrence. Then again,
do we really know all this, or even that we are dealing with noise.

I should note that Rxx(tau) < 0 does not imply opposite polarity, in
the fullest sense of the adjective. Two equal sinusoids that have
phases 91 degrees apart have a correlation coefficient of about
-0.009.

RBJ, I'd like to return the favor and let you know that I have great
respect for you in these matters (and absolutely no disrespect in any
others :-) ). Hey, I wonder if you missed also my other post in the
parent thread? You can search for
AANLkTim=eM_kgPeibOqFGEr2FdKyL5uCCB_wJhz1Vne

-olli
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2011-07-09 Thread robert bristow-johnson



hi Olli (and others)...

i was reviewing this thread because i wanted to read what Stefan  
Stenzel had said and realized that you had posted this response, and i  
don't think i or anyone had responded to it.  i don't remember reading  
it (it must be the cannabis).  i hope you're listening Olli - i have a  
lot of respect for what i have read from you (the pink elephant paper).


since this comes from last December, i reposted (with more  
corrections) the original "theory" at the bottom.


On Dec 7, 2010, at 5:27 AM, Olli Niemitalo wrote:


RBJ,

I had a look at your theory, and compared it to my approach (dare not
call it a theory, as it was not as rigorously derived). The following
is how I imagine we thought things out.

Both of us wanted to preserve some aspect(s) of the known-to-be-good
constant-voltage crossfade envelopes, and to generalize from those the
envelope functions for arbitrary values of the correlation
coefficient.

You saw that the odd component o(t) determined the shape of the
constant-voltage envelopes. For those, the even component had to be
e(t) = 1/2 to satisfy the symmetry a(t) + a(-t) = 1 required in
constant-voltage crossfades.


it need not be the case that e(t) = 1/2 in the non-constant-voltage  
crossfades.



So apparently o(t) was capturing the
essential aspects of the crossfade envelope. You showed how to
recalculate e(t) for different values of the correlation coefficient
in such a way that o(t) was preserved.


i wasn't trying to preserve o(t).  it's just that it was easier to get  
a handle on a(t) (and a(-t)) if i split it into e(t) and o(t).  and  
then in the final solution, a square root was involved in solving for  
either o(t) or e(t).  since o(t) *has* to be bipolar, solving for o(t)  
in terms of e(t) is a little more problematic than vise versa because  
you *know* that o(t) is necessarily bipolar and you have to deal with  
the +/- sqrt() issue.  but if you specify o(t) and solve for e(t),  
there is no problem with defining e(t) to be always non-negative.



I, on the other hand, chose that the ratio a(t)/a(-t) (using your
notation) should be preserved for each value of t.


now, i do not understand why you would do that.  by "preserved", do  
you mean constant over all t?   even for simple, linear crossfades,  
you cannot satisfy that.



To accomplish this,
one could first do the crossfade using constant-voltage envelopes and
then apply to the resulting signal a volume envelope to adjust for any
deviation from perfect positive correlation. Or equivalently, the
compensation could be incorporated into a(t), which I showed how to do
in the case of a linear constant-voltage crossfade. Other
constant-voltage crossfade envelopes than linear could be handled by a
time deformation function u(t) which gives the time at which the
linear constant-voltage envelope function reaches the value of the
desired constant-voltage envelope function at time t. u(t) would then
used instead of t in the formula for a(t) derived for generalization
of the linear crossfade for arbitrary r.


so if a(t)/a(-t) is not "preserved" over different values of t but is  
preserved over different values of r, i am not sure you want to do that.


what is the fundamental reason for preserving a(t)/a(-t) ?


I believe your requirement for r >= 0 could be relaxed. For example,
if one is creating a drum-loop, then it would probably make most sense
to put the loop points in the more quiet areas between the transients.
And there you might only have noise that is independent between the
two loop points, thus giving values of the correlation coefficient
slightly positive or slightly negative. Because the length of a drum
loop is fixed, there might not be so much choice in placement of the
loop points, and a spot giving a slightly negative r might actually be
the most natural choice. I do not think your formulas will fall apart
just as long as -1 < r <= 1.


but i don't think it is necessary to deal with lags where Rxx(tau) <  
0.  why splice a waveform to another part of the same waveform that  
has opposite polarity?  that would create an even a bigger glitch.   
you want to find a value of the lag, tau, so that Rxx(tau) is maximum  
(not including tau around 0) and then your splice is as seamless as it  
can be.  then, if the splice is real good (r=1), you use a constant- 
voltage crossfade.  when your splice is poor (r=0 and it need not be  
poorer than that), you use a constant-power crossfade.


but i agree that the crossfade theory i presented does not require r >  
1.  i just wanted to show that it degenerates to a constant-voltage  
crossfade when r=1 and a constant-power crossfade when r=0.


--

r b-j  r...@audioimagination.com

"Imagination is more important than knowledge."




This is a continuation of the thread started by Element Green titled:
Algorithms for finding seamless loops in audio

As far as I know, it is not published anywhere.  A few years ago, I
was thinking

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2011-01-22 Thread Victor Lazzarini


OK, so explain a bit more.

On 21 Jan 2011, at 22:55, Sampo Syreeni wrote:

My best bet? Go into the cepstral domain to find the most likely  
loop duration


--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2011-01-21 Thread Sampo Syreeni


On 2010-12-06, robert bristow-johnson wrote:

i can't speak for Greenie or any others, but i myself would be very 
interested in what you might have to say about constructing seamless 
loops.


My best bet? Go into the cepstral domain to find the most likely loop 
duration. Then translate back through spectral downto temporal domain. 
Pick the right starting point (by hit/amplitude if you can't translate 
the cepstral domain outright/well), and apply a short term 
psychoacoustical, hill-climbing algorithm to pick the exact, sub-sample 
looping point.

--
Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2010-12-15 Thread Stefan Stenzel

Moin Robert & others,

On 14.12.2010 06:15, robert bristow-johnson wrote:
> this isn't a problem with piano, but what if the sample is of some acoustic 
> instrument with vibrato in the recording of a single note.  then there isn't 
> an exact pitch for the whole sample of the note, because it varies in time.

Right, but if you consider 1/loop length the fundamental frequncy, vibrato 
becomes simple FM.
This might sound stoopid, as we certainly perceive it our own time domain, but 
that does not
mean we cannot take advantage of frequency domain processing. The problem here 
lies not so much
in the frequency alignment itself but the pitch detection, which ideally finds 
a multiple of
both the fundamental and the modulation frequency.

In reality, if you choose your loop to be long enough, you can almost get away 
with any length,
even if this is completely unrelated to the original pitch. Consider a 4 sec 
loop, all frequencies
are multiples of 0.25 Hz. At 440 Hz, this difference is just 1 cent and hardly 
audible. Works for
major as well as for minor chords, as for some 10CC not-in-love vocal cluster.

> well, for sure you want the splice to be seamless for all harmonics, or 
> better yet "partials", of any appreciable magnitude.  being that there are 
> non-harmonic partials in a lot of acoustic instruments, most certainly piano, 
> i know why you would want to adjust them a little so that phases of all 
> partials are aligned the jump in the loop is seamless.

Yes, very seamless, I think this is what a loop should be. I cannot see how any 
frequency *not*
being a multiple of the loop frequency could be represented in that loop.

[...]
> i suppose i could illustrate what i mean here with a bogus example, if i 
> haven't made it sufficiently clear.  i just think that wavetable synthesis 
> has application that is broader than just playing single-cycle loops.
To be honest I didn't quite get that. It could help if the unamed manufacturer 
could be named,
I cannot yet see why it should remain anonymous.

Regards,
Stefan

--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2010-12-13 Thread robert bristow-johnson



thanks, Stefan, for getting back on this.

On Dec 13, 2010, at 5:57 AM, Stefan Stenzel wrote:

I construct seemless loops in frequency domain in a non-realtime  
application, and I
am quite happy with the results. If you ask for a recipe, this is  
what I am doing:


- detect pitch of (whole) sample via AC (via FFT)
- decide on block to be looped (behind attack segment, usually more  
than 1 s long)

- detect frequency peaks in that block (frequency domain)
- shift to integer fractions of loop length but preserve amplitude  
and initial phase

- back to time domain
- fade to loop in original sample (only played once as no fade is  
inside loop)


from just reading this, it appears to be the about same thing that a  
certain unnamed keyboard synth manufacturer does.  they detuned (very  
slightly) some of the partials so that each partial or overtone had an  
integer number of cycles over the length of the loop, even if they  
were slightly inharmonic, they were nudged slightly to be some other  
inharmonic ratio to the fundamental.  i doubt that the original peak  
nor the moved peak sat exactly on integer bin indices in the FFT.   
then interpolation in the frequency domain is necessary (besides  
having to delimit each peak from the adjacent peaks) to move those  
peaks slightly.


this isn't a problem with piano, but what if the sample is of some  
acoustic instrument with vibrato in the recording of a single note.   
then there isn't an exact pitch for the whole sample of the note,  
because it varies in time.


dunno if there is any PPG "secrets" or wisdom to confer, but i  
would like to hear or read it.


None of this in any PPG or Waldorf.


i can see that.  it's about sample loops, not the sequential single- 
cycle loops one would construct for wavetable synthesis.



Currently I use it for automatically looping huge piano
sample sets, not for the memory but in order to fight noise.


well, for sure you want the splice to be seamless for all harmonics,  
or better yet "partials", of any appreciable magnitude.  being that  
there are non-harmonic partials in a lot of acoustic instruments, most  
certainly piano, i know why you would want to adjust them a little so  
that phases of all partials are aligned the jump in the loop is  
seamless.



Tried it with other material
like chords with surprisingly good results though.


sure, if the loop is long enough and if you can adjust the frequencies  
slightly.  and, of course, it will work better on simple major chords  
than it would with a fully diminished chord or something with  
dissonant intervals.


this certain unnamed keyboard synth manufacturer didn't think so  
either (specifically certain non-experts in their engineering  
management), but, for this piano or some other pitched instrument, a  
wavetable analysis would do as well or even better for the cases where  
there is vibrato to track and deal with.


1. first pitch-detection (using AC or AMDF or whatever) is performed  
very often (say once or twice per millisecond) throughout the note,  
from beginning to end.  octave errors are dealt with and tight pitch  
tracking is done.


2. then single-cycle wavetables are computed for each of those  
milliseconds with each new period estimate.  (but the changing pitch  
is recorded and used for resynthesis.)


3. FFT of each wavetable is performed.  X[0] is DC, X[1] and X[N-1] is  
the first harmonic, etc.  the harmonics that are actually a little  
detuned and non-harmonic will have phase slipping a little each  
adjacent wavetable.  for the length of the loop, you would want the  
phase of each harmonic to be the same at the end as it was at the  
beginning of the loop.


4. the loop length is chosen to accomplish that for the lower  
harmonics (there would be an integer number of cycles for each of  
these lower harmonics in the loop length).  then the higher harmonics  
that do not quite get back to the same phase at the loop end that they  
were at the loop start, that phase difference is then split evenly for  
all wavetables in between.  this would cause an integer number of  
cycles for every harmonic, but they wouldn't necessarily be integer  
multiples of the fundamental.  it is true that if one were to consider  
the loop length as a "period", then all partials would be integer  
harmonics after this adjustment, but what was previously considered  
the fundamental would not be the fundamental if the loop length is  
called the period.


i suppose i could illustrate what i mean here with a bogus example, if  
i haven't made it sufficiently clear.  i just think that wavetable  
synthesis has application that is broader than just playing single- 
cycle loops.



--

r b-j  r...@audioimagination.com

"Imagination is more important than knowledge."




--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2010-12-13 Thread Stefan Stenzel

Moin Robert others,

On 06.12.2010 19:49, robert bristow-johnson wrote:
> 
> On Dec 6, 2010, at 1:23 PM, Stefan Stenzel wrote:
> 
>> On 06.12.2010 08:59, robert bristow-johnson wrote:
>>>
>>> This is a continuation of the thread started by Element Green titled: 
>>> Algorithms for finding seamless loops in audio
>>
>> I suspect it works better to *construct* a seamless loop instead trying find 
>> one where there is none.
> 
> i can't speak for Greenie or any others, but i myself would be very 
> interested in what you might have to say about constructing seamless loops.  
> regarding that, i would like to know the context (e.g. looping a non-realtime 
> sample editor for a sampling synth vs. a realtime pitch shifter) and the 
> kinds of signals (quasi-periodic vs. aperiodic vs. periodic but with detuned 
> higher harmonics).  processing in frequency domain or time domain (or some in 
> both)?

I construct seemless loops in frequency domain in a non-realtime application, 
and I
am quite happy with the results. If you ask for a recipe, this is what I am 
doing:

- detect pitch of (whole) sample via AC (via FFT)
- decide on block to be looped (behind attack segment, usually more than 1 s 
long)
- detect frequency peaks in that block (frequency domain)
- shift to integer fractions of loop length but preserve amplitude and initial 
phase
- back to time domain
- fade to loop in original sample (only played once as no fade is inside loop)

> dunno if there is any PPG "secrets" or wisdom to confer, but i would like to 
> hear or read it.

None of this in any PPG or Waldorf. Currently I use it for automatically 
looping huge piano
sample sets, not for the memory but in order to fight noise. Tried it with 
other material
like chords with surprisingly good results though.

Stefan
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2010-12-07 Thread Theo Verelst


"...I'm "publishing" the main ideas here on music-dsp..."

Wish more people would do such things.

I couldn´t resist thinking outloud about some of the main issues about 
this subject, summed up as:


  1.  The looping idea will have approximations of at the least the 
kind which makes a short loop like I understand the subject can be about 
(for instrument samples) have an issue with pitch. Meaning: a loop is 
only as pitch accurate as 1/#samples, so at 44.1 a loop at the final 
poiint in the signal path will at e.g. 1kHz be for sure nomore accurate 
than abot 2 percent divided by the number of waves in the loop (more 
fundamenal waves per loop on an original sample will make for probably a 
"noisy" loop).


  2. Even to make use of FFTs for detection of loop points I´d think 
will have the disadvantage of the quite big transient and 
edge-dicontinuities errors of that transform, unless measures like 
detuning to fundamental-equals-fft-length are taken


  3. transforming the signal to FFT form to interpolate loop ends or to 
capture frequencies for repetition will probably cause quite some signal 
degradation, so people expecting only sample detuning in a software 
package might not be too happy with the idea. A very shot transform 
based edit at beginning or end of a loop might be interesting.


I recall getting wavelet transform code sniplets with my Analog Devices 
DSP board years ago, THAT would probably be interesting to play with in 
the context, but hey, what should a top EE earn to do that ?! :)


Theo.
http://www.theover.org/Linuxconf
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2010-12-07 Thread Olli Niemitalo

RBJ,

I had a look at your theory, and compared it to my approach (dare not
call it a theory, as it was not as rigorously derived). The following
is how I imagine we thought things out.

Both of us wanted to preserve some aspect(s) of the known-to-be-good
constant-voltage crossfade envelopes, and to generalize from those the
envelope functions for arbitrary values of the correlation
coefficient.

You saw that the odd component o(t) determined the shape of the
constant-voltage envelopes. For those, the even component had to be
e(t) = 1/2 to satisfy the symmetry a(t) + a(-t) = 1 required in
constant-voltage crossfades. So apparently o(t) was capturing the
essential aspects of the crossfade envelope. You showed how to
recalculate e(t) for different values of the correlation coefficient
in such a way that o(t) was preserved.

I, on the other hand, chose that the ratio a(t)/a(-t) (using your
notation) should be preserved for each value of t. To accomplish this,
one could first do the crossfade using constant-voltage envelopes and
then apply to the resulting signal a volume envelope to adjust for any
deviation from perfect positive correlation. Or equivalently, the
compensation could be incorporated into a(t), which I showed how to do
in the case of a linear constant-voltage crossfade. Other
constant-voltage crossfade envelopes than linear could be handled by a
time deformation function u(t) which gives the time at which the
linear constant-voltage envelope function reaches the value of the
desired constant-voltage envelope function at time t. u(t) would then
used instead of t in the formula for a(t) derived for generalization
of the linear crossfade for arbitrary r.

I believe your requirement for r >= 0 could be relaxed. For example,
if one is creating a drum-loop, then it would probably make most sense
to put the loop points in the more quiet areas between the transients.
And there you might only have noise that is independent between the
two loop points, thus giving values of the correlation coefficient
slightly positive or slightly negative. Because the length of a drum
loop is fixed, there might not be so much choice in placement of the
loop points, and a spot giving a slightly negative r might actually be
the most natural choice. I do not think your formulas will fall apart
just as long as -1 < r <= 1.

-olli
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2010-12-06 Thread robert bristow-johnson



On Dec 6, 2010, at 1:23 PM, Stefan Stenzel wrote:


On 06.12.2010 08:59, robert bristow-johnson wrote:


This is a continuation of the thread started by Element Green  
titled: Algorithms for finding seamless loops in audio


I suspect it works better to *construct* a seamless loop instead  
trying find one where there is none.


i can't speak for Greenie or any others, but i myself would be very  
interested in what you might have to say about constructing seamless  
loops.  regarding that, i would like to know the context (e.g. looping  
a non-realtime sample editor for a sampling synth vs. a realtime pitch  
shifter) and the kinds of signals (quasi-periodic vs. aperiodic vs.  
periodic but with detuned higher harmonics).  processing in frequency  
domain or time domain (or some in both)?


dunno if there is any PPG "secrets" or wisdom to confer, but i would  
like to hear or read it.


bestest,

--

r b-j  r...@audioimagination.com

"Imagination is more important than knowledge."




--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2010-12-06 Thread Stefan Stenzel

On 06.12.2010 08:59, robert bristow-johnson wrote:
> 
> This is a continuation of the thread started by Element Green titled: 
> Algorithms for finding seamless loops in audio

I suspect it works better to *construct* a seamless loop instead trying find 
one where there is none.

Stefan
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

2010-12-06 Thread Andy Farnell



Thanks for sharing these thoughts Robert.

On Mon, 6 Dec 2010 03:38:28 -0500
robert bristow-johnson  wrote:
 
> This is a continuation of the thread started by Element Green titled:  
> Algorithms for finding seamless loops in audio
> 


-- 
Andy Farnell 
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

[music-dsp] A theory of optimal splicing of audio in the time domain.

2010-12-06 Thread robert bristow-johnson



< a few mistakes are spotted and corrected before i forget >


This is a continuation of the thread started by Element Green titled:  
Algorithms for finding seamless loops in audio


As far as I know, it is not published anywhere.  A few years ago, I  
was thinking of writing this up and publishing it (or submitting it  
for publication, probably to JAES), and had let it fall by the  
wayside.  I'm "publishing" the main ideas here on music-dsp because of  
some possible interest here (and the hope it might be helpful to  
somebody), and so that "prior art" is established in case of anyone  
like IVL is thinking of claiming it as their own.  I really do not  
know how useful it will be in practice.  It might not make any  
difference.  It's just a theory.


__

Section 0:

This is about the generalization of the different ways we can splice  
and crossfade audio that has these two extremes:


  (1)  Splicing perfectly coherent and correlated signals
  (2)  Splicing completely uncorrelated signals

I sometimes call the first case the "constant-voltage crossfade"  
because the crossfade envelopes of the two signals being spliced add  
up to one.  The two envelopes meet when both have a value of 1/2.  In  
the second case, we use a "constant-power crossfade", the square of  
the two envelopes add to one and they meet when both have a value of  
sqrt(1/2)=0.707.


The questions I wanted to answer are: What does one do for cases in  
between, and how does one know from the audio, which crossfade  
function to use?  How does one quantify the answers to these  
questions?  How much can we generalize the answer?


__

Section 1: Set up the problem.

We have two continuous-time audio signals, x(t) and y(t), and we want  
to splice from one to the other at time t=0.  In pitch-shifting or  
time-scaling or any other looping, y(t) can be some delayed or  
advanced version of x(t).


   e.g.y(t) = x(t-P)

   where P is a period length or some other "good" splice  
displacement.  We get that value from an algorithm we call a "pitch  
detector".


Also, it doesn't matter whether x(t) is getting spliced to y(t) or the  
other way around, it should work just as well for the audio played in  
reverse.  And it should be no loss of generality that the splice  
happens at t=0, we define our coordinate system any damn way we damn  
well please.


The signal resulting from the splice is

   v(t)  =  a(t)*x(t) + a(-t)*y(t)

By restricting our result to be equivalent if run either forward or  
backward in time, we can conclude that "fade-out" function (say that's  
a(t)) is the time-reversed copy of the "fade-in" function, a(-t).


For the correlated case   (1):   a(t)+  a(-t)= 1   for all t

For the uncorrelated case (2):  (a(t))^2 + (a(-t))^2 = 1   for all t

This crossfade function, a(t), has well-defined even and odd symmetry  
components:


   a(t)  =  e(t) + o(t)
where

   even part:  e(t) =  e(-t)  =  ( a(t) + a(-t) )/2
   odd part:   o(t) = -o(-t)  =  ( a(t) - a(-t) )/2

And it's clear that

   a(-t)  =  e(t) - o(t)  .


For example, if it's a simple linear crossfade (equivalent to splicing  
analog tape with a diagonally-oriented razor blade):


 { 0 for   t <= -1
 {
  a(t) = { 1/2 + t/2 for  -1 < t < 1
 {
 { 1 for   t >= 1

This is represented simply, in the even and odd components, as:

  e(t) = 1/2

 { t/2   for  |t| < 1
  o(t) = {
 { sgn(t)/2  for  |t| >= 1


   where  sgn(t) is the "sign function":  sgn(t) = t/|t| .

This is a constant voltage-crossfade, appropriate for perfectly  
correlated signals; x(t) and y(t).  There is no loss of generality by  
defining the crossfade to take place around t=0 and have two time  
units in length.  Both are simply a matter of offset and scaling of  
time.


Another constant-voltage crossfade would be what I might call a "Hann  
crossfade" (after the Hann window):


  e(t) = 1/2

 { (1/2)*sin(pi/2 * t) for  |t| < 1
  o(t) = {
 { sgn(t)/2for  |t| >= 1


Some might like that better because the derivative is continuous  
everywhere.  Extending this idea, one more constant-voltage crossfade  
is what I might call a "Flattened Hann crossfade":


  e(t) = 1/2

 { (9/16)*sin(pi/2 * t) + (1/16)*sin(3*pi/2 * t) for |t| < 1
  o(t) = {
 { sgn(t)/2 for |t| >= 1

This splice is everywhere continuous in the zeroth, first, and second  
derivative.  A very smooth crossfade.


As another example, a constant-power crossfade would be the same as  
any of the above, but where the above a(t) is square rooted:


 { 0   for   t <= -1
 {
  a(t) = { sqrt(1/2 + t/2) for  -1 < t < 1
 {
 { 1

[music-dsp] A theory of optimal splicing of audio in the time domain.

2010-12-05 Thread robert bristow-johnson



This is a continuation of the thread started by Element Green titled:  
Algorithms for finding seamless loops in audio


As far as I know, it is not published anywhere.  A few years ago, I  
was thinking of writing this up and publishing it (or submitting it  
for publication, probably to JAES), and had let it fall by the  
wayside.  I'm "publishing" the main ideas here on music-dsp because of  
some possible interest here (and the hope it might be helpful to  
somebody), and so that "prior art" is established in case of anyone  
like IVL is thinking of claiming it as their own.  I really do not  
know how useful it will be in practice.  It might not make any  
difference.  It's just a theory.


__

Section 0:

This is about the generalization of the different ways we can splice  
and crossfade audio that has these two extremes:


   (1)  Splicing perfectly coherent and correlated signals
   (2)  Splicing completely uncorrelated signals

I sometimes call the first case the "constant-voltage crossfade"  
because the crossfade envelopes of the two signals being spliced add  
up to one.  The two envelopes meet when both have a value of 1/2.  In  
the second case, we use a "constant-power crossfade", the square of  
the two envelopes add to one and they meet when both have a value of  
sqrt(1/2)=0.707.


The questions I wanted to answer are: What does one do for cases in  
between, and how does one know from the audio, which crossfade  
function to use?  How does one quantify the answers to these  
questions?  How much can we generalize the answer?


__

Section 1: Set up the problem.

We have two continuous-time audio signals, x(t) and y(t), and we want  
to splice from one to the other at time t=0.  In pitch-shifting or  
time-scaling or any other looping, y(t) can be some delayed or  
advanced version of x(t).


e.g.y(t) = x(t-P)

where P is a period length or some other "good" splice  
displacement.  We get that value from an algorithm we call a "pitch  
detector".


Also, it doesn't matter whether x(t) is getting spliced to y(t) or the  
other way around, it should work just as well for the audio played in  
reverse.  And it should be no loss of generality that the splice  
happens at t=0, we define our coordinate system any damn way we damn  
well please.


The signal resulting from the splice is

v(t)  =  a(t)*x(t) + a(-t)*y(t)

By restricting our result to be equivalent if run either forward or  
backward in time, we can conclude that "fade-out" function (say that's  
a(t)) is the time-reversed copy of the "fade-in" function, a(-t).


For the correlated case   (1):   a(t)+  a(-t)= 1   for all t

For the uncorrelated case (2):  (a(t))^2 + (a(-t))^2 = 1   for all t

This crossfade function, a(t), has well-defined even and odd symmetry  
components:


a(t)  =  e(t) + o(t)
where

even part:  e(t) =  e(-t)  =  ( a(t) + a(-t) )/2
odd part:   o(t) = -o(-t)  =  ( a(t) - a(-t) )/2

And it's clear that

a(-t)  =  e(t) - o(t)  .


For example, if it's a simple linear crossfade (equivalent to splicing  
analog tape with a diagonally-oriented razor blade):


   { 0 for   t <= 1
   {
a(t) = { 1/2 + t/2 for  |t| < 1
   {
   { 1 for   t >= 1

This is represented simply, in the even and odd components, as:

e(t) = 1/2

   { t/2   for  |t| < 1
o(t) = {
   { sgn(t)/2  for  |t| >= 1


where  sgn(t) is the "sign function":  sgn(t) = t/|t| .

This is a constant voltage-crossfade, appropriate for perfectly  
correlated signals; x(t) and y(t).  There is no loss of generality by  
defining the crossfade to take place around t=0 and have two time  
units in length.  Both are simply a matter of offset and scaling of  
time.


Another constant-voltage crossfade would be what I might call a "Hann  
crossfade" (after the Hann window):


e(t) = 1/2

   { (1/2)*sin(pi/2 * t) for  |t| < 1
o(t) = {
   { sgn(t)/2for  |t| >= 1


Some might like that better because the derivative is continuous  
everywhere.  Extending this idea, one more constant-voltage crossfade  
is what I might call a "Flattened Hann crossfade":


e(t) = 1/2

   { (9/16)*sin(pi/2 * t) - (1/16)*sin(3*pi/2 * t) for |t| < 1
o(t) = {
   { sgn(t)/2 for |t| >= 1

This splice is everywhere continuous in the zeroth, first, and second  
derivative.  A very smooth crossfade.


As another example, a constant-power crossfade would be the same as  
any of the above, but where the above a(t) is square rooted:


   { 0 for   t <= 1
   {
a(t) = { sqrt(1/2 + t/2)   for  |t| < 1
   {
   { 1 for

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

Re: [music-dsp] A theory of optimal splicing of audio in the time domain.

[music-dsp] A theory of optimal splicing of audio in the time domain.

[music-dsp] A theory of optimal splicing of audio in the time domain.

20 matches

Site Navigation

Mail list logo

Footer information