Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
On Jul 15, 2011, at 12:46 AM, Sampo Syreeni wrote: What are you trying to accomplish here, really? Optimum splicing, sure, but against which precise criterion? the precise criterion is how well the two signals being spliced correlate to one another. i tried to set that up with the inner product notation. +inf = = integral{ x(t)*y(t) * w(t) dt} -inf where w(t) is a window function centered at t=0. the normalized correlation measure is: r = / = / if r=1, they are perfectly correlated and a constant-voltage splice should be used. if r=0 they are completely uncorrelated and a constant-power splice should be used. if 0 < r < 1 then some kinda splice in between a constant-voltage and constant-power splice should be used. if r < 0, then there has to be a boost of even *more* than 3 dB (that sqrt(2) factor at g(0)) to keep the expected loudness envelope constant. Olli and i see the need for such slightly differently. -- r b-j r...@audioimagination.com "Imagination is more important than knowledge." -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
On Fri, Jul 15, 2011 at 7:46 AM, Sampo Syreeni wrote: > On 2011-07-15, Olli Niemitalo wrote: > > What are you trying to accomplish here, really? Optimum splicing, sure, but > against which precise criterion? My objective has not been to find a method for automatic splicing, but to do nice cross-fades at given splice points. There were multiple objectives: * Intuitive definition of the cross-fade shape. Mixing ratio as a function of time is a good definition. * For stationary signals, there should be no clicks or transients produced. This is taken care of by the smoothness of the cross-fade envelopes. * For stationary signals, the resulting measurable transition from the volume level of signal 1 to volume level of signal 2 should follow the chosen cross-fade shape. This can be accomplished knowing the volume levels of the two signals and the correlation coefficient between the two signals. -olli -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
On 2011-07-15, Olli Niemitalo wrote: What are you trying to accomplish here, really? Optimum splicing, sure, but against which precise criterion? -- Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front +358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2 -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
On Jul 14, 2011, at 5:36 PM, Olli Niemitalo wrote: On Thu, Jul 14, 2011 at 9:22 PM, robert bristow-johnson wrote: g(t) = 1/sqrt( (1+r)/2 + 2*(1-r)*(p(t))^2 ) might this result match what you have? Yes! I only derived the formula for the linear ramp, p(t) = t/2, because one can get the other shapes by warping time and I didn't want to bloat the cumbersome equations. With the linear ramp our results match exactly. okay. i would still like to "hunt" for a splice displacement around that quiet region that would have correlation better than zero Sometimes you are stuck with a certain displacement. Think drum loops; changing tau would change tempo. i think it's better to define p(t) (with the same restrictions as o(t)) and find g(t) as a function of r than it is to do it with o(t) and e(t). I agree, even though the theory was quite elegant with o(t) and e(t)... do you have any of this in a document? i wonder if one of us should put this down in a pdf and put it in the music-dsp "code" archive. -- r b-j r...@audioimagination.com "Imagination is more important than knowledge." -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
On Thu, Jul 14, 2011 at 9:22 PM, robert bristow-johnson wrote: > > g(t) = 1/sqrt( (1+r)/2 + 2*(1-r)*(p(t))^2 ) > > might this result match what you have? Yes! I only derived the formula for the linear ramp, p(t) = t/2, because one can get the other shapes by warping time and I didn't want to bloat the cumbersome equations. With the linear ramp our results match exactly. > okay. i would still like to "hunt" for a splice displacement around that > quiet region that would have correlation better than zero Sometimes you are stuck with a certain displacement. Think drum loops; changing tau would change tempo. > i think it's better to define p(t) (with the same restrictions as o(t)) and > find g(t) as a > function of r than it is to do it with o(t) and e(t). I agree, even though the theory was quite elegant with o(t) and e(t)... -olli -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
On Jul 13, 2011, at 9:29 AM, Olli Niemitalo wrote: On Sat, Jul 9, 2011 at 10:53 PM, robert bristow-johnson wrote: On Dec 7, 2010, at 5:27 AM, Olli Niemitalo wrote: [I] chose that the ratio a(t)/a(-t) [...] should be preserved by "preserved", do you mean constant over all t? Constant over all r. i think i figgered that out after hitting the Send button. what is the fundamental reason for preserving a(t)/a(-t) ? I'm thinking outside your application of automatic finding of splice points. Think of crossfades between clips in a multi-track sample editor. For a cross-fade in which one signal is faded in using a volume envelope that is a time-reverse of the volume envelope using which the other signal is faded out, a(t)/a(-t) describes by what proportions the two signals are mixed at each t. The fundamental reason then is that I think it is a rather good description of the shape of the fade, to a user, as it will describe how the second signal swallows the first by time. okay, i get it. so instead of expressing the crossfade envelope as a(t) = e(t) + o(t) i think we could describe it as a constant-voltage crossfade (those used for splicing perfectly correlated snippets) bumped up a little by an overall loudness function. an envelope acting on the envelope. and, as you correctly observed, for constant-voltage crossfades, the even component is always e(t) = 1/2 so, pulling another couple of letters outa the alfabet, we can represent the crossfade function as a(t) = e(t) + o(t) = g(t)*( 1/2 + p(t) ) where g(-t) = g(t) is even and p(-t) = -p(t) is odd g(t) = 1 for constant-voltage crossfades, when r=1. for constant-power crossfades, r=0, we know that g(0) = sqrt(2) > 1 the shape p(t) is preserved for different values of r and we want to solve for g(t) given a specified correlation value r and a given "shape" family p(t). indeed a(t)/a(-t) = (1/2 + p(t))/(1/2 - p(t)) and remains preserved over r if p(t) remains unchanged. p(t) can be spec'd initially exactly like o(t) (linear crossfade, Hann, Flattened Hann, or whatever odd function your heart desires). i think it should be easy to solve for g(t). we know that e(t) = 1/2 * g(t) o(t) = g(t) * p(t) and recall the result e(t) = sqrt( (1/2)/(1+r) - (1-r)/(1+r)*(o(t))^2 ) which comes from (1+r)*( e(t) )^2 + (1-r)*( o(t) )^2 = 1/2 so (1+r)*( 1/2*g(t) )^2 + (1-r)*( g(t)*p(t) )^2 = 1/2 ( g(t) )^2 * ( (1+r)/4 + (1-r)*(p(t))^2 ) = 1/2 and picking the positive square root for g(t) yields g(t) = 1/sqrt( (1+r)/2 + 2*(1-r)*(p(t))^2 ) might this result match what you have? (assemble a(t) from g(t) and p(t) just as we had previously from e(t) and o(t).) remember that p(t) is odd so p(0)=0 so when r=1 ---> g(t) = 1 (constant-voltage crossfade) and r=0 ---> g(0) = sqrt(2)(constant-power crossfade) The user might choose one "shape" for a particular crossfade. Then, depending on the correlation between the superimposed signals, an appropriate symmetrical volume envelope could be applied to the mixed signal to ensure that there is no peak or dip in the contour of the mixed signal. Because the envelope is symmetrical, applying it "preserves" a(t)/a(-t). It can also be incorporated directly into a(t). All that is not so far off from the application you describe. but i don't think it is necessary to deal with lags where Rxx(tau) < 0. why splice a waveform to another part of the same waveform that has opposite polarity? that would create an even a bigger glitch. Splicing at quiet regions with negative correlation can give a smaller glitch than splicing at louder regions with positive correlation. okay. i would still like to "hunt" for a splice displacement around that quiet region that would have correlation better than zero. and, if both x(t) and y(t) have no DC, it should be possible to find something. This applies particularly to rhythmic material like drum loops, where the time lag between the splice points is constrained, and it may make most sense to look for quiet spots. However, if it's already so quiet in there, I don't know how much it matters what you use for a cross-fade. Apart from "it's so quiet it doesn't matter", I can think of one other objection against using cross-fades tailored for r < 0: For example, let's imagine that our signal is white noise generated from a Gaussian distribution, and we are dealing with given splice points for which Rxx(tau) < 0 (slightly). but you should also be able to find a tau where Rxx(tau) is slightly greater than zero because Rxx(tau) should be DC free (if x(t) is DC free). if it were true noise, it should not be far from zero so you would likely use the r=0 crossfade function. Now, while the samples of the signal were generated independently, there is "by accident" a bit of negative c
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
On Sat, Jul 9, 2011 at 10:53 PM, robert bristow-johnson wrote: > On Dec 7, 2010, at 5:27 AM, Olli Niemitalo wrote: > > > [I] chose that the ratio a(t)/a(-t) [...] should be preserved > > by "preserved", do you mean constant over all t? Constant over all r. > what is the fundamental reason for preserving a(t)/a(-t) ? I'm thinking outside your application of automatic finding of splice points. Think of crossfades between clips in a multi-track sample editor. For a cross-fade in which one signal is faded in using a volume envelope that is a time-reverse of the volume envelope using which the other signal is faded out, a(t)/a(-t) describes by what proportions the two signals are mixed at each t. The fundamental reason then is that I think it is a rather good description of the shape of the fade, to a user, as it will describe how the second signal swallows the first by time. The user might choose one "shape" for a particular crossfade. Then, depending on the correlation between the superimposed signals, an appropriate symmetrical volume envelope could be applied to the mixed signal to ensure that there is no peak or dip in the contour of the mixed signal. Because the envelope is symmetrical, applying it "preserves" a(t)/a(-t). It can also be incorporated directly into a(t). All that is not so far off from the application you describe. > but i don't think it is necessary to deal with lags where Rxx(tau) < 0. why > splice a waveform to another part of the same waveform that has opposite > polarity? that would create an even a bigger glitch. Splicing at quiet regions with negative correlation can give a smaller glitch than splicing at louder regions with positive correlation. This applies particularly to rhythmic material like drum loops, where the time lag between the splice points is constrained, and it may make most sense to look for quiet spots. However, if it's already so quiet in there, I don't know how much it matters what you use for a cross-fade. Apart from "it's so quiet it doesn't matter", I can think of one other objection against using cross-fades tailored for r < 0: For example, let's imagine that our signal is white noise generated from a Gaussian distribution, and we are dealing with given splice points for which Rxx(tau) < 0 (slightly). Now, while the samples of the signal were generated independently, there is "by accident" a bit of negative correlation in the instantiation of the noise, between those splice points. Knowing all this, shouldn't we simply use a constant-power fade, rather than a fade tailored for r < 0, because random deviations in noise power are to be expected, and only a constant-power fade will produce noise that is statistically identical to the original. I would imagine that noise with long-time non-zero autocorrelation (all the way across the splice points) is a very rare occurrence. Then again, do we really know all this, or even that we are dealing with noise. I should note that Rxx(tau) < 0 does not imply opposite polarity, in the fullest sense of the adjective. Two equal sinusoids that have phases 91 degrees apart have a correlation coefficient of about -0.009. RBJ, I'd like to return the favor and let you know that I have great respect for you in these matters (and absolutely no disrespect in any others :-) ). Hey, I wonder if you missed also my other post in the parent thread? You can search for AANLkTim=eM_kgPeibOqFGEr2FdKyL5uCCB_wJhz1Vne -olli -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
hi Olli (and others)... i was reviewing this thread because i wanted to read what Stefan Stenzel had said and realized that you had posted this response, and i don't think i or anyone had responded to it. i don't remember reading it (it must be the cannabis). i hope you're listening Olli - i have a lot of respect for what i have read from you (the pink elephant paper). since this comes from last December, i reposted (with more corrections) the original "theory" at the bottom. On Dec 7, 2010, at 5:27 AM, Olli Niemitalo wrote: RBJ, I had a look at your theory, and compared it to my approach (dare not call it a theory, as it was not as rigorously derived). The following is how I imagine we thought things out. Both of us wanted to preserve some aspect(s) of the known-to-be-good constant-voltage crossfade envelopes, and to generalize from those the envelope functions for arbitrary values of the correlation coefficient. You saw that the odd component o(t) determined the shape of the constant-voltage envelopes. For those, the even component had to be e(t) = 1/2 to satisfy the symmetry a(t) + a(-t) = 1 required in constant-voltage crossfades. it need not be the case that e(t) = 1/2 in the non-constant-voltage crossfades. So apparently o(t) was capturing the essential aspects of the crossfade envelope. You showed how to recalculate e(t) for different values of the correlation coefficient in such a way that o(t) was preserved. i wasn't trying to preserve o(t). it's just that it was easier to get a handle on a(t) (and a(-t)) if i split it into e(t) and o(t). and then in the final solution, a square root was involved in solving for either o(t) or e(t). since o(t) *has* to be bipolar, solving for o(t) in terms of e(t) is a little more problematic than vise versa because you *know* that o(t) is necessarily bipolar and you have to deal with the +/- sqrt() issue. but if you specify o(t) and solve for e(t), there is no problem with defining e(t) to be always non-negative. I, on the other hand, chose that the ratio a(t)/a(-t) (using your notation) should be preserved for each value of t. now, i do not understand why you would do that. by "preserved", do you mean constant over all t? even for simple, linear crossfades, you cannot satisfy that. To accomplish this, one could first do the crossfade using constant-voltage envelopes and then apply to the resulting signal a volume envelope to adjust for any deviation from perfect positive correlation. Or equivalently, the compensation could be incorporated into a(t), which I showed how to do in the case of a linear constant-voltage crossfade. Other constant-voltage crossfade envelopes than linear could be handled by a time deformation function u(t) which gives the time at which the linear constant-voltage envelope function reaches the value of the desired constant-voltage envelope function at time t. u(t) would then used instead of t in the formula for a(t) derived for generalization of the linear crossfade for arbitrary r. so if a(t)/a(-t) is not "preserved" over different values of t but is preserved over different values of r, i am not sure you want to do that. what is the fundamental reason for preserving a(t)/a(-t) ? I believe your requirement for r >= 0 could be relaxed. For example, if one is creating a drum-loop, then it would probably make most sense to put the loop points in the more quiet areas between the transients. And there you might only have noise that is independent between the two loop points, thus giving values of the correlation coefficient slightly positive or slightly negative. Because the length of a drum loop is fixed, there might not be so much choice in placement of the loop points, and a spot giving a slightly negative r might actually be the most natural choice. I do not think your formulas will fall apart just as long as -1 < r <= 1. but i don't think it is necessary to deal with lags where Rxx(tau) < 0. why splice a waveform to another part of the same waveform that has opposite polarity? that would create an even a bigger glitch. you want to find a value of the lag, tau, so that Rxx(tau) is maximum (not including tau around 0) and then your splice is as seamless as it can be. then, if the splice is real good (r=1), you use a constant- voltage crossfade. when your splice is poor (r=0 and it need not be poorer than that), you use a constant-power crossfade. but i agree that the crossfade theory i presented does not require r > 1. i just wanted to show that it degenerates to a constant-voltage crossfade when r=1 and a constant-power crossfade when r=0. -- r b-j r...@audioimagination.com "Imagination is more important than knowledge." This is a continuation of the thread started by Element Green titled: Algorithms for finding seamless loops in audio As far as I know, it is not published anywhere. A few years ago, I was thinking
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
OK, so explain a bit more. On 21 Jan 2011, at 22:55, Sampo Syreeni wrote: My best bet? Go into the cepstral domain to find the most likely loop duration -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
On 2010-12-06, robert bristow-johnson wrote: i can't speak for Greenie or any others, but i myself would be very interested in what you might have to say about constructing seamless loops. My best bet? Go into the cepstral domain to find the most likely loop duration. Then translate back through spectral downto temporal domain. Pick the right starting point (by hit/amplitude if you can't translate the cepstral domain outright/well), and apply a short term psychoacoustical, hill-climbing algorithm to pick the exact, sub-sample looping point. -- Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front +358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2 -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
Moin Robert & others, On 14.12.2010 06:15, robert bristow-johnson wrote: > this isn't a problem with piano, but what if the sample is of some acoustic > instrument with vibrato in the recording of a single note. then there isn't > an exact pitch for the whole sample of the note, because it varies in time. Right, but if you consider 1/loop length the fundamental frequncy, vibrato becomes simple FM. This might sound stoopid, as we certainly perceive it our own time domain, but that does not mean we cannot take advantage of frequency domain processing. The problem here lies not so much in the frequency alignment itself but the pitch detection, which ideally finds a multiple of both the fundamental and the modulation frequency. In reality, if you choose your loop to be long enough, you can almost get away with any length, even if this is completely unrelated to the original pitch. Consider a 4 sec loop, all frequencies are multiples of 0.25 Hz. At 440 Hz, this difference is just 1 cent and hardly audible. Works for major as well as for minor chords, as for some 10CC not-in-love vocal cluster. > well, for sure you want the splice to be seamless for all harmonics, or > better yet "partials", of any appreciable magnitude. being that there are > non-harmonic partials in a lot of acoustic instruments, most certainly piano, > i know why you would want to adjust them a little so that phases of all > partials are aligned the jump in the loop is seamless. Yes, very seamless, I think this is what a loop should be. I cannot see how any frequency *not* being a multiple of the loop frequency could be represented in that loop. [...] > i suppose i could illustrate what i mean here with a bogus example, if i > haven't made it sufficiently clear. i just think that wavetable synthesis > has application that is broader than just playing single-cycle loops. To be honest I didn't quite get that. It could help if the unamed manufacturer could be named, I cannot yet see why it should remain anonymous. Regards, Stefan -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
thanks, Stefan, for getting back on this. On Dec 13, 2010, at 5:57 AM, Stefan Stenzel wrote: I construct seemless loops in frequency domain in a non-realtime application, and I am quite happy with the results. If you ask for a recipe, this is what I am doing: - detect pitch of (whole) sample via AC (via FFT) - decide on block to be looped (behind attack segment, usually more than 1 s long) - detect frequency peaks in that block (frequency domain) - shift to integer fractions of loop length but preserve amplitude and initial phase - back to time domain - fade to loop in original sample (only played once as no fade is inside loop) from just reading this, it appears to be the about same thing that a certain unnamed keyboard synth manufacturer does. they detuned (very slightly) some of the partials so that each partial or overtone had an integer number of cycles over the length of the loop, even if they were slightly inharmonic, they were nudged slightly to be some other inharmonic ratio to the fundamental. i doubt that the original peak nor the moved peak sat exactly on integer bin indices in the FFT. then interpolation in the frequency domain is necessary (besides having to delimit each peak from the adjacent peaks) to move those peaks slightly. this isn't a problem with piano, but what if the sample is of some acoustic instrument with vibrato in the recording of a single note. then there isn't an exact pitch for the whole sample of the note, because it varies in time. dunno if there is any PPG "secrets" or wisdom to confer, but i would like to hear or read it. None of this in any PPG or Waldorf. i can see that. it's about sample loops, not the sequential single- cycle loops one would construct for wavetable synthesis. Currently I use it for automatically looping huge piano sample sets, not for the memory but in order to fight noise. well, for sure you want the splice to be seamless for all harmonics, or better yet "partials", of any appreciable magnitude. being that there are non-harmonic partials in a lot of acoustic instruments, most certainly piano, i know why you would want to adjust them a little so that phases of all partials are aligned the jump in the loop is seamless. Tried it with other material like chords with surprisingly good results though. sure, if the loop is long enough and if you can adjust the frequencies slightly. and, of course, it will work better on simple major chords than it would with a fully diminished chord or something with dissonant intervals. this certain unnamed keyboard synth manufacturer didn't think so either (specifically certain non-experts in their engineering management), but, for this piano or some other pitched instrument, a wavetable analysis would do as well or even better for the cases where there is vibrato to track and deal with. 1. first pitch-detection (using AC or AMDF or whatever) is performed very often (say once or twice per millisecond) throughout the note, from beginning to end. octave errors are dealt with and tight pitch tracking is done. 2. then single-cycle wavetables are computed for each of those milliseconds with each new period estimate. (but the changing pitch is recorded and used for resynthesis.) 3. FFT of each wavetable is performed. X[0] is DC, X[1] and X[N-1] is the first harmonic, etc. the harmonics that are actually a little detuned and non-harmonic will have phase slipping a little each adjacent wavetable. for the length of the loop, you would want the phase of each harmonic to be the same at the end as it was at the beginning of the loop. 4. the loop length is chosen to accomplish that for the lower harmonics (there would be an integer number of cycles for each of these lower harmonics in the loop length). then the higher harmonics that do not quite get back to the same phase at the loop end that they were at the loop start, that phase difference is then split evenly for all wavetables in between. this would cause an integer number of cycles for every harmonic, but they wouldn't necessarily be integer multiples of the fundamental. it is true that if one were to consider the loop length as a "period", then all partials would be integer harmonics after this adjustment, but what was previously considered the fundamental would not be the fundamental if the loop length is called the period. i suppose i could illustrate what i mean here with a bogus example, if i haven't made it sufficiently clear. i just think that wavetable synthesis has application that is broader than just playing single- cycle loops. -- r b-j r...@audioimagination.com "Imagination is more important than knowledge." -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
Moin Robert others, On 06.12.2010 19:49, robert bristow-johnson wrote: > > On Dec 6, 2010, at 1:23 PM, Stefan Stenzel wrote: > >> On 06.12.2010 08:59, robert bristow-johnson wrote: >>> >>> This is a continuation of the thread started by Element Green titled: >>> Algorithms for finding seamless loops in audio >> >> I suspect it works better to *construct* a seamless loop instead trying find >> one where there is none. > > i can't speak for Greenie or any others, but i myself would be very > interested in what you might have to say about constructing seamless loops. > regarding that, i would like to know the context (e.g. looping a non-realtime > sample editor for a sampling synth vs. a realtime pitch shifter) and the > kinds of signals (quasi-periodic vs. aperiodic vs. periodic but with detuned > higher harmonics). processing in frequency domain or time domain (or some in > both)? I construct seemless loops in frequency domain in a non-realtime application, and I am quite happy with the results. If you ask for a recipe, this is what I am doing: - detect pitch of (whole) sample via AC (via FFT) - decide on block to be looped (behind attack segment, usually more than 1 s long) - detect frequency peaks in that block (frequency domain) - shift to integer fractions of loop length but preserve amplitude and initial phase - back to time domain - fade to loop in original sample (only played once as no fade is inside loop) > dunno if there is any PPG "secrets" or wisdom to confer, but i would like to > hear or read it. None of this in any PPG or Waldorf. Currently I use it for automatically looping huge piano sample sets, not for the memory but in order to fight noise. Tried it with other material like chords with surprisingly good results though. Stefan -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
"...I'm "publishing" the main ideas here on music-dsp..." Wish more people would do such things. I couldn´t resist thinking outloud about some of the main issues about this subject, summed up as: 1. The looping idea will have approximations of at the least the kind which makes a short loop like I understand the subject can be about (for instrument samples) have an issue with pitch. Meaning: a loop is only as pitch accurate as 1/#samples, so at 44.1 a loop at the final poiint in the signal path will at e.g. 1kHz be for sure nomore accurate than abot 2 percent divided by the number of waves in the loop (more fundamenal waves per loop on an original sample will make for probably a "noisy" loop). 2. Even to make use of FFTs for detection of loop points I´d think will have the disadvantage of the quite big transient and edge-dicontinuities errors of that transform, unless measures like detuning to fundamental-equals-fft-length are taken 3. transforming the signal to FFT form to interpolate loop ends or to capture frequencies for repetition will probably cause quite some signal degradation, so people expecting only sample detuning in a software package might not be too happy with the idea. A very shot transform based edit at beginning or end of a loop might be interesting. I recall getting wavelet transform code sniplets with my Analog Devices DSP board years ago, THAT would probably be interesting to play with in the context, but hey, what should a top EE earn to do that ?! :) Theo. http://www.theover.org/Linuxconf -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
RBJ, I had a look at your theory, and compared it to my approach (dare not call it a theory, as it was not as rigorously derived). The following is how I imagine we thought things out. Both of us wanted to preserve some aspect(s) of the known-to-be-good constant-voltage crossfade envelopes, and to generalize from those the envelope functions for arbitrary values of the correlation coefficient. You saw that the odd component o(t) determined the shape of the constant-voltage envelopes. For those, the even component had to be e(t) = 1/2 to satisfy the symmetry a(t) + a(-t) = 1 required in constant-voltage crossfades. So apparently o(t) was capturing the essential aspects of the crossfade envelope. You showed how to recalculate e(t) for different values of the correlation coefficient in such a way that o(t) was preserved. I, on the other hand, chose that the ratio a(t)/a(-t) (using your notation) should be preserved for each value of t. To accomplish this, one could first do the crossfade using constant-voltage envelopes and then apply to the resulting signal a volume envelope to adjust for any deviation from perfect positive correlation. Or equivalently, the compensation could be incorporated into a(t), which I showed how to do in the case of a linear constant-voltage crossfade. Other constant-voltage crossfade envelopes than linear could be handled by a time deformation function u(t) which gives the time at which the linear constant-voltage envelope function reaches the value of the desired constant-voltage envelope function at time t. u(t) would then used instead of t in the formula for a(t) derived for generalization of the linear crossfade for arbitrary r. I believe your requirement for r >= 0 could be relaxed. For example, if one is creating a drum-loop, then it would probably make most sense to put the loop points in the more quiet areas between the transients. And there you might only have noise that is independent between the two loop points, thus giving values of the correlation coefficient slightly positive or slightly negative. Because the length of a drum loop is fixed, there might not be so much choice in placement of the loop points, and a spot giving a slightly negative r might actually be the most natural choice. I do not think your formulas will fall apart just as long as -1 < r <= 1. -olli -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
On Dec 6, 2010, at 1:23 PM, Stefan Stenzel wrote: On 06.12.2010 08:59, robert bristow-johnson wrote: This is a continuation of the thread started by Element Green titled: Algorithms for finding seamless loops in audio I suspect it works better to *construct* a seamless loop instead trying find one where there is none. i can't speak for Greenie or any others, but i myself would be very interested in what you might have to say about constructing seamless loops. regarding that, i would like to know the context (e.g. looping a non-realtime sample editor for a sampling synth vs. a realtime pitch shifter) and the kinds of signals (quasi-periodic vs. aperiodic vs. periodic but with detuned higher harmonics). processing in frequency domain or time domain (or some in both)? dunno if there is any PPG "secrets" or wisdom to confer, but i would like to hear or read it. bestest, -- r b-j r...@audioimagination.com "Imagination is more important than knowledge." -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
On 06.12.2010 08:59, robert bristow-johnson wrote: > > This is a continuation of the thread started by Element Green titled: > Algorithms for finding seamless loops in audio I suspect it works better to *construct* a seamless loop instead trying find one where there is none. Stefan -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
Re: [music-dsp] A theory of optimal splicing of audio in the time domain.
Thanks for sharing these thoughts Robert. On Mon, 6 Dec 2010 03:38:28 -0500 robert bristow-johnson wrote: > This is a continuation of the thread started by Element Green titled: > Algorithms for finding seamless loops in audio > -- Andy Farnell -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp
[music-dsp] A theory of optimal splicing of audio in the time domain.
< a few mistakes are spotted and corrected before i forget > This is a continuation of the thread started by Element Green titled: Algorithms for finding seamless loops in audio As far as I know, it is not published anywhere. A few years ago, I was thinking of writing this up and publishing it (or submitting it for publication, probably to JAES), and had let it fall by the wayside. I'm "publishing" the main ideas here on music-dsp because of some possible interest here (and the hope it might be helpful to somebody), and so that "prior art" is established in case of anyone like IVL is thinking of claiming it as their own. I really do not know how useful it will be in practice. It might not make any difference. It's just a theory. __ Section 0: This is about the generalization of the different ways we can splice and crossfade audio that has these two extremes: (1) Splicing perfectly coherent and correlated signals (2) Splicing completely uncorrelated signals I sometimes call the first case the "constant-voltage crossfade" because the crossfade envelopes of the two signals being spliced add up to one. The two envelopes meet when both have a value of 1/2. In the second case, we use a "constant-power crossfade", the square of the two envelopes add to one and they meet when both have a value of sqrt(1/2)=0.707. The questions I wanted to answer are: What does one do for cases in between, and how does one know from the audio, which crossfade function to use? How does one quantify the answers to these questions? How much can we generalize the answer? __ Section 1: Set up the problem. We have two continuous-time audio signals, x(t) and y(t), and we want to splice from one to the other at time t=0. In pitch-shifting or time-scaling or any other looping, y(t) can be some delayed or advanced version of x(t). e.g.y(t) = x(t-P) where P is a period length or some other "good" splice displacement. We get that value from an algorithm we call a "pitch detector". Also, it doesn't matter whether x(t) is getting spliced to y(t) or the other way around, it should work just as well for the audio played in reverse. And it should be no loss of generality that the splice happens at t=0, we define our coordinate system any damn way we damn well please. The signal resulting from the splice is v(t) = a(t)*x(t) + a(-t)*y(t) By restricting our result to be equivalent if run either forward or backward in time, we can conclude that "fade-out" function (say that's a(t)) is the time-reversed copy of the "fade-in" function, a(-t). For the correlated case (1): a(t)+ a(-t)= 1 for all t For the uncorrelated case (2): (a(t))^2 + (a(-t))^2 = 1 for all t This crossfade function, a(t), has well-defined even and odd symmetry components: a(t) = e(t) + o(t) where even part: e(t) = e(-t) = ( a(t) + a(-t) )/2 odd part: o(t) = -o(-t) = ( a(t) - a(-t) )/2 And it's clear that a(-t) = e(t) - o(t) . For example, if it's a simple linear crossfade (equivalent to splicing analog tape with a diagonally-oriented razor blade): { 0 for t <= -1 { a(t) = { 1/2 + t/2 for -1 < t < 1 { { 1 for t >= 1 This is represented simply, in the even and odd components, as: e(t) = 1/2 { t/2 for |t| < 1 o(t) = { { sgn(t)/2 for |t| >= 1 where sgn(t) is the "sign function": sgn(t) = t/|t| . This is a constant voltage-crossfade, appropriate for perfectly correlated signals; x(t) and y(t). There is no loss of generality by defining the crossfade to take place around t=0 and have two time units in length. Both are simply a matter of offset and scaling of time. Another constant-voltage crossfade would be what I might call a "Hann crossfade" (after the Hann window): e(t) = 1/2 { (1/2)*sin(pi/2 * t) for |t| < 1 o(t) = { { sgn(t)/2for |t| >= 1 Some might like that better because the derivative is continuous everywhere. Extending this idea, one more constant-voltage crossfade is what I might call a "Flattened Hann crossfade": e(t) = 1/2 { (9/16)*sin(pi/2 * t) + (1/16)*sin(3*pi/2 * t) for |t| < 1 o(t) = { { sgn(t)/2 for |t| >= 1 This splice is everywhere continuous in the zeroth, first, and second derivative. A very smooth crossfade. As another example, a constant-power crossfade would be the same as any of the above, but where the above a(t) is square rooted: { 0 for t <= -1 { a(t) = { sqrt(1/2 + t/2) for -1 < t < 1 { { 1
[music-dsp] A theory of optimal splicing of audio in the time domain.
This is a continuation of the thread started by Element Green titled: Algorithms for finding seamless loops in audio As far as I know, it is not published anywhere. A few years ago, I was thinking of writing this up and publishing it (or submitting it for publication, probably to JAES), and had let it fall by the wayside. I'm "publishing" the main ideas here on music-dsp because of some possible interest here (and the hope it might be helpful to somebody), and so that "prior art" is established in case of anyone like IVL is thinking of claiming it as their own. I really do not know how useful it will be in practice. It might not make any difference. It's just a theory. __ Section 0: This is about the generalization of the different ways we can splice and crossfade audio that has these two extremes: (1) Splicing perfectly coherent and correlated signals (2) Splicing completely uncorrelated signals I sometimes call the first case the "constant-voltage crossfade" because the crossfade envelopes of the two signals being spliced add up to one. The two envelopes meet when both have a value of 1/2. In the second case, we use a "constant-power crossfade", the square of the two envelopes add to one and they meet when both have a value of sqrt(1/2)=0.707. The questions I wanted to answer are: What does one do for cases in between, and how does one know from the audio, which crossfade function to use? How does one quantify the answers to these questions? How much can we generalize the answer? __ Section 1: Set up the problem. We have two continuous-time audio signals, x(t) and y(t), and we want to splice from one to the other at time t=0. In pitch-shifting or time-scaling or any other looping, y(t) can be some delayed or advanced version of x(t). e.g.y(t) = x(t-P) where P is a period length or some other "good" splice displacement. We get that value from an algorithm we call a "pitch detector". Also, it doesn't matter whether x(t) is getting spliced to y(t) or the other way around, it should work just as well for the audio played in reverse. And it should be no loss of generality that the splice happens at t=0, we define our coordinate system any damn way we damn well please. The signal resulting from the splice is v(t) = a(t)*x(t) + a(-t)*y(t) By restricting our result to be equivalent if run either forward or backward in time, we can conclude that "fade-out" function (say that's a(t)) is the time-reversed copy of the "fade-in" function, a(-t). For the correlated case (1): a(t)+ a(-t)= 1 for all t For the uncorrelated case (2): (a(t))^2 + (a(-t))^2 = 1 for all t This crossfade function, a(t), has well-defined even and odd symmetry components: a(t) = e(t) + o(t) where even part: e(t) = e(-t) = ( a(t) + a(-t) )/2 odd part: o(t) = -o(-t) = ( a(t) - a(-t) )/2 And it's clear that a(-t) = e(t) - o(t) . For example, if it's a simple linear crossfade (equivalent to splicing analog tape with a diagonally-oriented razor blade): { 0 for t <= 1 { a(t) = { 1/2 + t/2 for |t| < 1 { { 1 for t >= 1 This is represented simply, in the even and odd components, as: e(t) = 1/2 { t/2 for |t| < 1 o(t) = { { sgn(t)/2 for |t| >= 1 where sgn(t) is the "sign function": sgn(t) = t/|t| . This is a constant voltage-crossfade, appropriate for perfectly correlated signals; x(t) and y(t). There is no loss of generality by defining the crossfade to take place around t=0 and have two time units in length. Both are simply a matter of offset and scaling of time. Another constant-voltage crossfade would be what I might call a "Hann crossfade" (after the Hann window): e(t) = 1/2 { (1/2)*sin(pi/2 * t) for |t| < 1 o(t) = { { sgn(t)/2for |t| >= 1 Some might like that better because the derivative is continuous everywhere. Extending this idea, one more constant-voltage crossfade is what I might call a "Flattened Hann crossfade": e(t) = 1/2 { (9/16)*sin(pi/2 * t) - (1/16)*sin(3*pi/2 * t) for |t| < 1 o(t) = { { sgn(t)/2 for |t| >= 1 This splice is everywhere continuous in the zeroth, first, and second derivative. A very smooth crossfade. As another example, a constant-power crossfade would be the same as any of the above, but where the above a(t) is square rooted: { 0 for t <= 1 { a(t) = { sqrt(1/2 + t/2) for |t| < 1 { { 1 for