okay, i don't seem to get any time to deal with this except late at night.

so this is continuing that thread that was named "A theory of optimal splicing of audio in the time domain."

On Dec 15, 2010, at 11:20 AM, Stefan Stenzel wrote:

On 14.12.2010 06:15, robert bristow-johnson wrote:
this isn't a problem with piano, but what if the sample is of some acoustic instrument with vibrato in the recording of a single note. then there isn't an exact pitch for the whole sample of the note, because it varies in time.

Right, but if you consider 1/loop length the fundamental frequncy, vibrato becomes simple FM.

well, you will have a sparse line spectrum for your "single cycle". the "real" first harmonic becomes something like the 50th harmonic of your 1/(loop_length) fundamental if the loop had 50 cycles of the tone between endpoints. then you will have a spike at around the 100th harmonic, 150th, 200th and so on. you can DFT the entire loop length (with no windowing), and the DFT will have the Fourier coefficients of your big, long "single cycle" (which looks like 50 cycles). if there was no vibrato, the energy would be nearly all in the X[50], x[100], X[150], X[200] ... bins. because this is a DFT of an integer number of cycles, the adjacent bins would be nearly zero, relatively (if there is no vibrato).

like with a piano, the higher harmonics would start to get a little sharp and, say, the "real" 12th harmonic would lie perhaps at X[601] instead of X[600] if that harmonic was 2.88 cents sharp. but the 11th harmonic and the 13th harmonic would also be sharp and not by exactly one (or some other small integer) bin. then there *will* neighboring bins with significant energy, because it would be like a sinc() function sampled off of the integer values. you would have to interpolate around these adjacent bins to get the "true" peak location (at a fractional in-between bin location) and peak height so you would know that there is not a precise integer number of cycles of that harmonic in your loop.

now i presume that you would want to move those slightly detuned harmonics to squarely an integer bin location and you would compute the distance from the interpolated peak to the nearest integer bin. higher or lower, i'm not entirely sure - if they're, say, 0.4 bin width sharp, you might want to bump it up to the next integer bin rather down to the nearest bin where it would not be slightly sharp anymore. i dunno, we want to preserve these outa-tune harmonics to keep the sample "live" sounding.

now one problem, i might guess, would be *if* there is also vibrato, those harmonic peaks will get spread out among the adjacent bins, and i am not sure that it will be symmetrical about the "true" peak, and if it is not, i am not sure how you determine exactly where the peak is before moving it. not necessarily a big issue. so then you move each peak (and adjacent bins) to an integer bin location, inverse DFT, and all of the partials should have an integer number of cycles between the loop endpoints.

This might sound stoopid, as we certainly perceive it our own time domain, but that does not mean we cannot take advantage of frequency domain processing. The problem here lies not so much in the frequency alignment itself but the pitch detection, which ideally finds a multiple of
both the fundamental and the modulation frequency.

i know, selecting the correct number of cycles for the loop so that there is an integer number of vibrato cycles would be the main criterion of choosing a loop length and endpoints. you would do that with little regard of what those sharpened harmonics are doing and fix them later with this frequency-domain method. (and there is a wavetable way to do it, that tracks the varying fundamental.)

In reality, if you choose your loop to be long enough, you can almost get away with any length, even if this is completely unrelated to the original pitch. Consider a 4 sec loop, all frequencies are multiples of 0.25 Hz. At 440 Hz, this difference is just 1 cent and hardly audible.

well if you get 1760.5 cycles in the loop (because it's not exactly 440 Hz or not exactly 4 sec) then instead of 1760, you *could* get a glitch in the splice, no matter how slow the crossfade is, because when the crossfade is at 50-50 (%), then you will get destructive interference for all odd harmonics. but, i know you would adjust it a little to get an exact integer number of cycles. but, in my opinion, you would have to track the cycle phase tightly to do it, which would be equivalent to cross-correlating (or AMDF) the two loop endpoints together to get the best loop length.

Works for major as well as for minor chords, as for some 10CC not-in- love vocal cluster.

works for dissonance? if you were looping that, i might expect a constant-power crossfade (that hits both envelopes at 70.7% when halfway through) would be better than a constant-voltage crossfade. there are sample editors that had options to do this and this optimal splicing theory was meant to generalize the idea.

well, for sure you want the splice to be seamless for all harmonics, or better yet "partials", of any appreciable magnitude. being that there are non-harmonic partials in a lot of acoustic instruments, most certainly piano, i know why you would want to adjust them a little so that phases of all partials are aligned the jump in the loop is seamless.

Yes, very seamless, I think this is what a loop should be. I cannot see how any frequency *not* being a multiple of the loop frequency could be represented in that loop.

[...]
i suppose i could illustrate what i mean here with a bogus example, if i haven't made it sufficiently clear. i just think that wavetable synthesis has application that is broader than just playing single-cycle loops.
To be honest I didn't quite get that. It could help if the unamed manufacturer could be named,
I cannot yet see why it should remain anonymous.


well, i told you separately, but i'm not saying it out loud. it's such a litigious society we Americans have (even, =ahem=, the non- Americans). this company is known to have been involved in litigation in its history.

but i'll try to explain how you would employ wavetable analysis, modification, and resynthesis, to create the same loop with some slightly detuned harmonics.

so let's say it's equivalent to above, you have a vibrato going and, in very close to one or two vibrato cycles, you get 50 of the tone cycles. but the 12th harmonic is 2.88 cents sharp (a frequency ratio of 601/600). that's not so bad with the loop, because the sharpened 12th harmonic will still have precisely 601 cycles in the loop. but the 11th harmonic will not quite have 551 cycles in the loop, but it has more than 500 cycle. let's say that the 11th harmonic is 1.88 cents sharp and has 550.6 cycles in the loop and you want to bump it up to 551 cycles. you want to sharpen that harmonic a further 1.26 cents to bump it to exactly 551 cycles in the loop so the splice is nice.

so here is the wavetable way to do it: let's say you derive some number (let's say 16 wavetables, for a nice number) of wavetables, equally spaced throughout the about-to-be-looped segment of tone (which has 50 cycles in it). now, without considering the vibrato for the moment, the number of cycles between neighboring wavetables would be 50/16 (or 25/8). or 3 1/8 cycles between centers of the frames you plop down and derive a wavetable from each. this means, if no wavetable alignment is done, the phase of the fundamental would advance by 1/8 cycle or 45 degrees between one wavetable and the next. so, to align the wavetables, you rotate or spin the second wavetable back by 1/8 cycle (say, by 128 samples if we're allocating 1024 samples per wavetable) to line them up. but we do our bookkeeping and retain the fact that this wavetable was rotated 1/8 cycle when resynthesizing.

now that will do nicely for the fundamental and lower harmonics that are very harmonic. after doing this rotating, you can perform nice DFTs on the wavetables (if there are 1024 samples per wavetable, then N=1024 in the DFT). X[0] is the DC component and let's set it to zero just so we don't have to think about it. X[1] and X[N-1] make the Fourier series coefficient for the 1st harmonic, exactly. X[2] and X[N-2] make the Fourier coefficient for the 2nd harmonic. now, because of spinning the second wavetable (lining it up with the first), the phase of the 1st and 2nd (and other lower harmonics) in the second wavetable will be nearly identical to the corresponding phases of first wavetable.

but the 11th harmonic is not exactly the 11th harmonic. if it *were* exactly harmonic, its phase in the second wavetable would line up with the phase of the first. it's really the 11.012th harmonic (11.012 = 11*550.6/550). so when the fundamental advanced precisely 25/8 cycle to go from the first wavetable to the second, the 11th harmonc did not advance by 11*25/8 cycles but that harmonic advanced 11.012*25/8. when the wavetable is aligned (to make the lower harmonics line up) by spinning it 1/8 cycle, the 11th harmonic gets off by 0.012*(25/8) cycle or 13.5 degrees. now, for each successive wavetable, the 11th harmonic will advance in phase by 13.5 degrees and in the time of 16 wavetables, the 11th harmonic will advance 16*13.5 = 216 degrees (or 0.6 cycle).

even though the 11th harmonic isn't at exactly 11 times the fundamental, wavetable synthesis treats it as exactly the 11th harmonic but with the phase advancing a little with each and every successive wavetable that is created from the data.

now we *want* the phase of the 11th harmonic to be off a little bit, because it *is* supposed to be sharp a little. but we want that harmonic to complete an entire extra cycle in the time of the whole loop, so we have to help that 11th harmonic on by adding 0.4 cycle (or 144 degrees) in the time of 16 wavetables. this means we have to advance the phase (artificially, by souping-up the phase for X[11] and X[N-11] by multiplying X[11] with exp(j*phi) and X[N-11] with exp(- j*phi)) where phi = 144/16 degrees.

so here is the procedure:

1. decide on loop endpoints based on getting very nearly an integer number of vibrato cycles in there and getting exactly an integer number of tone cycles in the loop length.

2. divide that loop length into an decently large integer number of equally-spaced frames. call that number of frames, K. (my example above was K=16 frames.)

3. extract the period (as a possible non-integer number of samples) for each frame and derive a representative wavetable for that frame.

4. knowing what the period length is and knowing the time spacing from one frame to the next, you know exactly how much to spin each successive wavetable to best align it with the previous. (problem is that the harmonics that are a little non-harmonic will not align as well.)

5. FFT or DFT each wavetable. this is now the Fourier series data for that waveform "snapshot" (using Andrew Horner's language) of each frame.

6. for each harmonic observe how far out of phase the last wavetable is from the first. the last wavetable is K-1 frames displacements away from the first and the phase in the last frame should be off by M*360*(K-1)/K degrees from the phase of the first where M is some integer (M=0 for a "well-tuned" harmonic, M would be the number of complete cycle "slips" for that harmonic in the whole loop length). if that is the case, that means in K frame displacements, that this harmonic advances by M cycles or M*360 degrees.

7. if the phase differential (from first to last wavetable) is off from that M*360*(K-1)/K degrees then that harmonic does *not* advance by exactly M cycles, then add (with the correct sign) to that harmonic's phase k/K times that phase differential (where 0 <= k < K is the sequential index of each of the K equally-spaced frames). what you did was hurry up the phase (or slow it down) so that this harmonic completes an entire extra cycle (or two or some bigger integer) in the time of the loop length.

8. inverse DFT each Fourier series snapshot data back to the time- domain wavetable.

9. recreate the time-varying tone using wavetable synthesis (and interpolating between adjacent wavetables). every harmonic will line up at the loop endpoints.

does this make sense? i know this is long and wordy, but without drawings i don't know how to better put it. lemme know if there are questions i might be able to answer or to better explain.

--

r b-j                  r...@audioimagination.com

"Imagination is more important than knowledge."




--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Reply via email to