On 8/25/15 7:08 PM, Ethan Duni wrote:
>if you can, with optimal coefficients designed with the tool of your
choice, so i am ignoring any images between B and Nyquist-B, >upsample
by 512x and then do linear interpolation between adjacent samples for
continuous-time interpolation, you can show that it's >something like
12 dB S/N per octave of oversampling plus another 12 dB. that's 120
dB. that's how i got to 512x.
Wait, where does the extra 12dB come from? Seems like it should just
be 12dB per octave of oversampling. What am I missing?
okay, this is painful. in our 2-decade old paper, Duane and i did this
theoretical approximation analysis for drop-sample interpolation, and i
did it myself for linear, but we did not put in the math for linear
interpolation in the paper.
so, to satisfy Nyquist (or Shannon or Whittaker or the Russian guy) the
sample rate Fs must exceed 2B which is twice the bandwidth. the
oversampling ratio is defined to be Fs/(2B). and in octaves it is
log2(Fs/(2B)). all frequencies in your baseband satisfy |f|<B and if
it's highly oversampled, 2B << Fs.
now, i'm gonna assume that Fs is so much (like 512x) greater than 2B
that i will assume the attenuation due to the sinc^2 for |f|<B is
negligible. i will assume that the spectrum between -B and +B is
uniformly flat (that's not quite worst case, but it's worser case than
what music, in the bottom 5 or 6 octaves, is). so given a unit height
on that uniform power spectrum, the energy will be 2B.
so, the k-th image (where k is not 0) will have a zero of the sinc^2
function going right through the heart of it. that's what's gonna kill
the son-of-a-bitch. the energy of that image is:
k*Fs+B
integral{ (sinc(f/Fs))^4 df }
k*Fs-B
since it's power spectrum it's sinc^4 for linear and sinc^2 for
drop-sample interpolation.
changing the variable of integration
+B
integral{ (sinc((k*Fs+f)/Fs))^4 df }
-B
+B
integral{ (sinc(k+f/Fs))^4 df }
-B
sinc(k+f/Fs) = sin(pi*(k+f/Fs))/(pi*(k+f/Fs))
= (-1)^k * sin(pi*f/Fs)/(pi*(k+f/Fs))
=approx (-1)^k * (pi*f/Fs)/(pi*k)
since |f| < B << Fs
raising to the 4th power gets rid of the toggling polarity. so now it's
+B
1/(k*Fs)^4 * integral{ f^4 df } = (2/5)/(k*Fs)^4 * B^5
-B
now you have to sum up the energies of all of the bad images (we are
assuming that *all* of those images, *after* they are beaten down, will
somehow fall into the baseband during resampling and their energies will
team up). there are both negative and positive frequency images to add
up. (but we don't add up the energy from the image at the baseband,
that's the "good" image.)
+inf +inf
2 * SUM{ (2/5)/(k*Fs)^4 * B^5 } = B*(4/5)*(B/Fs)^4 * SUM{1/k^4}
k=1 k=1
the summation on the right is (pi^4)/90
so the energy of all of the nasty images (after being beaten down due to
the application of the sinc^2 that comes from linear interpolation
between the "subsamples") is
B*(4/5)*(B/Fs)^4 * (pi^4)/90
and the S/N ratio is 2B divided by that.
( (2/450) * (2B/Fs)^4 * (pi/2)^4 )^-1
in dB we use 3.01*log2() because this is an *energy* ratio, not a
voltage ratio.
-3.01*log2( (2/450) * (2B/Fs)^4 * (pi/2)^4 )
= 3.01*log2(225) + 12.04*log2(2/pi) + 12.04*log2( Fs/(2B) )
= 15.6 dB + (12.04 dB) * log2( Fs/(2B) )
so, it seems to come out a little more than 12 dB. i think Duane did a
better empirical analysis and he got it slightly less.
but, using linear interpolation between subsamples, you should get about
12 dB of S/N for every octave of oversampling plus 15 dB more.
>but the difference in price in memory only, *not* in computational burden.
Well, you don't get the full cost in computational burden since you
can skip computing most of the upsamples.
exactly and it's the same whether you upsample by 32x or 512x. but
upsampling by 512x will cost 8 times the memory to store coefficients.
But the complexity still goes up with increasing oversampling factor
since the interpolation filter needs to get longer and longer, no?
no. that deals with a different issue, in my opinion. the oversampling
ratio determines the number of discrete (and uniformly spaced)
fractional delays. there is one FIR filter for each fractional delay.
the number of coefs in the FIR filter is a performance issue regarding
how well you're gonna beat down them images in between baseband and the
next *oversampled* image. in the analysis above, i am assuming all of
those in-between images are beaten down to zero. it's a crude analysis
and i just wanted to see what the linear interpolation (on the upsampled
signal) does for us.
So there is some balancing of computational burden involved. I can see
how frequent coefficient calculations could swamp that for high order
and/or exotic interpolators, along with the increased upsampler
complexity since you need to compute more of the polyphase components
to drive it. But it's not obvious to me on its face exactly where the
minimum lies...
Of course that all goes out the window if you already need
oversampling for other system concerns anyway, or are using some very
cheap hardware resampler or whatever. Or if you're happy to throw an
IIR upsampler at it. And in many cases you'll already have access to
some nice optimized resampling software, whereas polynomial
interpolators would need to be invented from scratch, so there's a
practical man-hours concern as well. Likewise, it depends on how
frequently the fractional delay is going to change. Obviously there
are good reasons why analyses that include both the interpolator and
the resampler are somewhat rare, there are a lot of moving parts and
potential trade-offs.
>some apps where you might care less about inharmonic energy from
images folding over (a.k.a. "aliasing"), you might not need to go that
high of whatever-x.
I think this is the point where we need to fork into whether we are
doing just a fractional delay, or if there is also difference between
the output and input sampling rates.
those were the two classes of apps that i mentioned. in one class (the
resampling class, of which SRC and pitch shifting are apps), if you're
doing linear interpolation between the fractional delays, i think it is
sufficient to multiply the input spectrum by (sinc(f/Fs))^2 (where 2|f|
< 2B << Fs is the oversampled Fs so the sinc^2 might not do much to your
baseband image, but it will beat down the other images real good).
If there is a sampling rate change, then we are worried about alias
suppression and need to squash the images as you describe. But if it's
just fractional delay, where we end up at the same sampling rate as
the input, then the images all land back where they started and there
is no signal aliasing.
agreed! so then when you are delayed by 1/2 sample, the filter is
H(z)=(1/2)(1 + z^-1). and that filter goes to -inf dB at Fs/2 (the
oversampled Fs). when it's a slowly changing or unchanging fractional
delay, there is no issue about energy from aliasing or the foldover of
images. it's just an LTI system and the issue is what LTI filter is it?
Instead, we only get aliasing of the polynomial interpolator's
spectrum. I.e., we just end up with a linear filter that has an
imperfect fractional-delay response (with the imperfection depending
on fractional delay - worst at 1/2 sample - and also on frequency -
worst at Nyquist).
It's not obvious to me how to create a spec on the fractional delay
filter response that is a fair comparison to the 120dB (or is it 108dB
as I mentioned above???)
or maybe 123 dB.
spec on aliasing suppression for the rate-change case. It's kind of
apples and oranges. The analysis of how much error you get in the
final response as a function of oversampling and polynomial order
requires more complicated math/numerics (which I'll try to later do if
I get some spare time), but for reference I would note that a
half-sample delay achieved with (perfect) 512x oversampling and linear
interpolation ends up with a worst-case (in-band) ripple of around
0.00005dB. That's a pretty tight filter spec. But note that it if we
consider that difference to be an "error signal," it turns out to be
at around -106dB, and not -120dB (or -108dB if that is the correct
number). This is because those signal images added up coherently, so
suppressing them by XdB doesn't guarantee an XdB "noise floor" in the
final result. On the other hand, the response at lower frequencies is
much tighter and the "noise floor" is actually much *lower* than the
margin that the worst images were suppressed by (since in that region
the coherent addition is working in our favor).
Again, that's not really an apples to apples comparison, but the point
is that the coherent imaging in the case of fractional delay violates
the assumptions of the straightforward aliasing-suppression analysis.
yes. i have been saying that (or something consistent with that) all along.
but, with the same fractional-delay interpolator, you can accomplish
either task, but the performance of the interpolator is evaluated
differently.
It ends up being a question of how much oversampling is required to
operate in a region of the interpolator response that is sufficiently
close to the ideal filter, rather than a question of alias
suppression. But I'm not sure how to systematically compare the two
cases, again because it's not clear how to compare signal-to-alias
ratio against an alias-free signal with an imperfect fractional-delay
response. All I would add is that the general rate-change case has to
contend with both aliasing suppression and imperfect fractional delay
response, so I would expect a fractional-delay-only system to have
looser requirements since the signal aliasing issue has been removed.
i think we're on the same page. ain't we?
--
r b-j r...@audioimagination.com
"Imagination is more important than knowledge."
_______________________________________________
music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp