okay, i don't seem to get any time to deal with this except late at
night.
so this is continuing that thread that was named "A theory of optimal
splicing of audio in the time domain."
On Dec 15, 2010, at 11:20 AM, Stefan Stenzel wrote:
On 14.12.2010 06:15, robert bristow-johnson wrote:
this isn't a problem with piano, but what if the sample is of some
acoustic instrument with vibrato in the recording of a single
note. then there isn't an exact pitch for the whole sample of the
note, because it varies in time.
Right, but if you consider 1/loop length the fundamental frequncy,
vibrato becomes simple FM.
well, you will have a sparse line spectrum for your "single cycle".
the "real" first harmonic becomes something like the 50th harmonic of
your 1/(loop_length) fundamental if the loop had 50 cycles of the tone
between endpoints. then you will have a spike at around the 100th
harmonic, 150th, 200th and so on. you can DFT the entire loop length
(with no windowing), and the DFT will have the Fourier coefficients of
your big, long "single cycle" (which looks like 50 cycles). if there
was no vibrato, the energy would be nearly all in the X[50], x[100],
X[150], X[200] ... bins. because this is a DFT of an integer number
of cycles, the adjacent bins would be nearly zero, relatively (if
there is no vibrato).
like with a piano, the higher harmonics would start to get a little
sharp and, say, the "real" 12th harmonic would lie perhaps at X[601]
instead of X[600] if that harmonic was 2.88 cents sharp. but the 11th
harmonic and the 13th harmonic would also be sharp and not by exactly
one (or some other small integer) bin. then there *will* neighboring
bins with significant energy, because it would be like a sinc()
function sampled off of the integer values. you would have to
interpolate around these adjacent bins to get the "true" peak location
(at a fractional in-between bin location) and peak height so you would
know that there is not a precise integer number of cycles of that
harmonic in your loop.
now i presume that you would want to move those slightly detuned
harmonics to squarely an integer bin location and you would compute
the distance from the interpolated peak to the nearest integer bin.
higher or lower, i'm not entirely sure - if they're, say, 0.4 bin
width sharp, you might want to bump it up to the next integer bin
rather down to the nearest bin where it would not be slightly sharp
anymore. i dunno, we want to preserve these outa-tune harmonics to
keep the sample "live" sounding.
now one problem, i might guess, would be *if* there is also vibrato,
those harmonic peaks will get spread out among the adjacent bins, and
i am not sure that it will be symmetrical about the "true" peak, and
if it is not, i am not sure how you determine exactly where the peak
is before moving it. not necessarily a big issue. so then you move
each peak (and adjacent bins) to an integer bin location, inverse DFT,
and all of the partials should have an integer number of cycles
between the loop endpoints.
This might sound stoopid, as we certainly perceive it our own time
domain, but that does not
mean we cannot take advantage of frequency domain processing. The
problem here lies not so much
in the frequency alignment itself but the pitch detection, which
ideally finds a multiple of
both the fundamental and the modulation frequency.
i know, selecting the correct number of cycles for the loop so that
there is an integer number of vibrato cycles would be the main
criterion of choosing a loop length and endpoints. you would do that
with little regard of what those sharpened harmonics are doing and fix
them later with this frequency-domain method. (and there is a
wavetable way to do it, that tracks the varying fundamental.)
In reality, if you choose your loop to be long enough, you can
almost get away with any length,
even if this is completely unrelated to the original pitch. Consider
a 4 sec loop, all frequencies
are multiples of 0.25 Hz. At 440 Hz, this difference is just 1 cent
and hardly audible.
well if you get 1760.5 cycles in the loop (because it's not exactly
440 Hz or not exactly 4 sec) then instead of 1760, you *could* get a
glitch in the splice, no matter how slow the crossfade is, because
when the crossfade is at 50-50 (%), then you will get destructive
interference for all odd harmonics. but, i know you would adjust it a
little to get an exact integer number of cycles. but, in my opinion,
you would have to track the cycle phase tightly to do it, which would
be equivalent to cross-correlating (or AMDF) the two loop endpoints
together to get the best loop length.
Works for major as well as for minor chords, as for some 10CC not-in-
love vocal cluster.
works for dissonance? if you were looping that, i might expect a
constant-power crossfade (that hits both envelopes at 70.7% when
halfway through) would be better than a constant-voltage crossfade.
there are sample editors that had options to do this and this optimal
splicing theory was meant to generalize the idea.
well, for sure you want the splice to be seamless for all
harmonics, or better yet "partials", of any appreciable magnitude.
being that there are non-harmonic partials in a lot of acoustic
instruments, most certainly piano, i know why you would want to
adjust them a little so that phases of all partials are aligned the
jump in the loop is seamless.
Yes, very seamless, I think this is what a loop should be. I cannot
see how any frequency *not*
being a multiple of the loop frequency could be represented in that
loop.
[...]
i suppose i could illustrate what i mean here with a bogus example,
if i haven't made it sufficiently clear. i just think that
wavetable synthesis has application that is broader than just
playing single-cycle loops.
To be honest I didn't quite get that. It could help if the unamed
manufacturer could be named,
I cannot yet see why it should remain anonymous.
well, i told you separately, but i'm not saying it out loud. it's
such a litigious society we Americans have (even, =ahem=, the non-
Americans). this company is known to have been involved in litigation
in its history.
but i'll try to explain how you would employ wavetable analysis,
modification, and resynthesis, to create the same loop with some
slightly detuned harmonics.
so let's say it's equivalent to above, you have a vibrato going and,
in very close to one or two vibrato cycles, you get 50 of the tone
cycles. but the 12th harmonic is 2.88 cents sharp (a frequency ratio
of 601/600). that's not so bad with the loop, because the sharpened
12th harmonic will still have precisely 601 cycles in the loop. but
the 11th harmonic will not quite have 551 cycles in the loop, but it
has more than 500 cycle. let's say that the 11th harmonic is 1.88
cents sharp and has 550.6 cycles in the loop and you want to bump it
up to 551 cycles. you want to sharpen that harmonic a further 1.26
cents to bump it to exactly 551 cycles in the loop so the splice is
nice.
so here is the wavetable way to do it: let's say you derive some
number (let's say 16 wavetables, for a nice number) of wavetables,
equally spaced throughout the about-to-be-looped segment of tone
(which has 50 cycles in it). now, without considering the vibrato for
the moment, the number of cycles between neighboring wavetables would
be 50/16 (or 25/8). or 3 1/8 cycles between centers of the frames you
plop down and derive a wavetable from each. this means, if no
wavetable alignment is done, the phase of the fundamental would
advance by 1/8 cycle or 45 degrees between one wavetable and the
next. so, to align the wavetables, you rotate or spin the second
wavetable back by 1/8 cycle (say, by 128 samples if we're allocating
1024 samples per wavetable) to line them up. but we do our
bookkeeping and retain the fact that this wavetable was rotated 1/8
cycle when resynthesizing.
now that will do nicely for the fundamental and lower harmonics that
are very harmonic. after doing this rotating, you can perform nice
DFTs on the wavetables (if there are 1024 samples per wavetable, then
N=1024 in the DFT). X[0] is the DC component and let's set it to zero
just so we don't have to think about it. X[1] and X[N-1] make the
Fourier series coefficient for the 1st harmonic, exactly. X[2] and
X[N-2] make the Fourier coefficient for the 2nd harmonic. now,
because of spinning the second wavetable (lining it up with the
first), the phase of the 1st and 2nd (and other lower harmonics) in
the second wavetable will be nearly identical to the corresponding
phases of first wavetable.
but the 11th harmonic is not exactly the 11th harmonic. if it *were*
exactly harmonic, its phase in the second wavetable would line up with
the phase of the first. it's really the 11.012th harmonic (11.012 =
11*550.6/550). so when the fundamental advanced precisely 25/8 cycle
to go from the first wavetable to the second, the 11th harmonc did not
advance by 11*25/8 cycles but that harmonic advanced 11.012*25/8.
when the wavetable is aligned (to make the lower harmonics line up) by
spinning it 1/8 cycle, the 11th harmonic gets off by 0.012*(25/8)
cycle or 13.5 degrees. now, for each successive wavetable, the 11th
harmonic will advance in phase by 13.5 degrees and in the time of 16
wavetables, the 11th harmonic will advance 16*13.5 = 216 degrees (or
0.6 cycle).
even though the 11th harmonic isn't at exactly 11 times the
fundamental, wavetable synthesis treats it as exactly the 11th
harmonic but with the phase advancing a little with each and every
successive wavetable that is created from the data.
now we *want* the phase of the 11th harmonic to be off a little bit,
because it *is* supposed to be sharp a little. but we want that
harmonic to complete an entire extra cycle in the time of the whole
loop, so we have to help that 11th harmonic on by adding 0.4 cycle (or
144 degrees) in the time of 16 wavetables. this means we have to
advance the phase (artificially, by souping-up the phase for X[11] and
X[N-11] by multiplying X[11] with exp(j*phi) and X[N-11] with exp(-
j*phi)) where phi = 144/16 degrees.
so here is the procedure:
1. decide on loop endpoints based on getting very nearly an integer
number of vibrato cycles in there and getting exactly an integer
number of tone cycles in the loop length.
2. divide that loop length into an decently large integer number of
equally-spaced frames. call that number of frames, K. (my example
above was K=16 frames.)
3. extract the period (as a possible non-integer number of samples)
for each frame and derive a representative wavetable for that frame.
4. knowing what the period length is and knowing the time spacing
from one frame to the next, you know exactly how much to spin each
successive wavetable to best align it with the previous. (problem is
that the harmonics that are a little non-harmonic will not align as
well.)
5. FFT or DFT each wavetable. this is now the Fourier series data
for that waveform "snapshot" (using Andrew Horner's language) of each
frame.
6. for each harmonic observe how far out of phase the last wavetable
is from the first. the last wavetable is K-1 frames displacements
away from the first and the phase in the last frame should be off by
M*360*(K-1)/K degrees from the phase of the first where M is some
integer (M=0 for a "well-tuned" harmonic, M would be the number of
complete cycle "slips" for that harmonic in the whole loop length).
if that is the case, that means in K frame displacements, that this
harmonic advances by M cycles or M*360 degrees.
7. if the phase differential (from first to last wavetable) is off
from that M*360*(K-1)/K degrees then that harmonic does *not* advance
by exactly M cycles, then add (with the correct sign) to that
harmonic's phase k/K times that phase differential (where 0 <= k < K
is the sequential index of each of the K equally-spaced frames). what
you did was hurry up the phase (or slow it down) so that this harmonic
completes an entire extra cycle (or two or some bigger integer) in the
time of the loop length.
8. inverse DFT each Fourier series snapshot data back to the time-
domain wavetable.
9. recreate the time-varying tone using wavetable synthesis (and
interpolating between adjacent wavetables). every harmonic will line
up at the loop endpoints.
does this make sense? i know this is long and wordy, but without
drawings i don't know how to better put it. lemme know if there are
questions i might be able to answer or to better explain.
--
r b-j r...@audioimagination.com
"Imagination is more important than knowledge."
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp