Re: [music-dsp] Cheap spectral centroid recipe

Ethan Duni Thu, 18 Feb 2016 13:55:47 -0800

I was kind of hoping someone would chime in with a reference to a
publication of some tests comparing different spectral centroid methods,
showing how well they match some subjective ratings of "brightness" or
whatever, for various signal classes. This doesn't seem particularly
difficult, although it requires pinning down exactly what we want these
things to do. And, yes, subjective testing, statistics, etc. I've noticed
in my (cursory) searches that some people use amplitude spectra and others
use power spectra, but the only thing I've found in the way of comparison
tests was to do with whether it gets normalized by fundamental frequency or
not.


I'm not a partisan for any particular definition, just want to understand
how the various statistics stack up.

>We're mathematicians, not neuroscientists, and that discipline comes
>with a powerful confirmation bias for simple, "elegant" solutions.

Who is "we?" My experience is that audio DSP is a rather multi-disciplinary
field. And music DSP, even more so.

At any rate, I'm certainly happy to turn the crank on the straightforward
definitions as well. Here's my understanding of the situation so far, for
spectral centroid defined in terms of the normalized power spectrum:

Let's start in continuous time, with some real signal x(t) with FT X(w).
Recall the differentiation property, d/dt x(t) <=> jwX(w). Next, let's use
Parseval's theorem (ignoring the normalization constants because they'll
cancel out later):

integral( |x(t)|^2 dt) = integral( |X(w)|^2 dw), and likewise integral(
|d/dt x(t)|^2 dt) = integral( |w|^2 |X(w)|^2 dw).

Thus, the ratio of the time-domain integrals gives:

integral( |d/dt x(t)|^2 dt)/integral( |x(t)|^2 dt) = integral( |w|^2
|X(w)|^2 dw)/integral( |X(w)|^2 dw)

I.e., if we run a differentiator, then compute the ratio of the power in
that to the power in the original signal, the result is the second moment
of the (normalized) power spectrum. This corresponds to the system Evan
proposed in the OP, without the later square root modification. So that's
something, but presumably we want to get the *first* moment of the
normalized power spectrum.

One option is to replace the differentiator with an inverse pinking filter,
as rbj suggested. Are there any good references on design of inverse
pinking filters?

Another option is to stick some square roots on these quantities, as Evan
suggested in a subsequent post. But moving those through the integrals
means, according to Jensen's inequality, that we get an over-estimate of
the first moment of the normalized power spectrum. How big the
overestimation is depends on the shape of the spectrum, but this may well
be quite usable regardless and should be substantially cheaper than the
inverse pinking filter approach.

Next let's consider how this would work in discrete time. Naively, we might
simply replace the differentiator with a first difference. Recall the
relevant DTFT property: x[n] - x[n-1] <=> (1-e^(-jw))X(w). This gets us the
graph and explanation that Evan provided in the OP: for sufficiently small
values of w, it is approximately linear, so we can simulate the
continuous-time case via oversampling. We could also add a high frequency
compensation filter, or again, just replace the difference/sqrt() approach
with an inverse pinking filter designed according to whatever criteria.

Are we all on the same page with this analysis so far?

I notice that various sources define spectral centroid in terms of
amplitude spectrum, rather than power spectrum. This makes the analysis
more difficult, since we can't rely on Parseval's theorem directly. But
this is part of why I asked what the consensus is on definitions - is it
worth analyzing, or is it just something people do when using FFT based
methods, without much further thought on the alternatives?

E



On Thu, Feb 18, 2016 at 10:55 AM, Evan Balster <e...@imitone.com> wrote:

> *To use log magnitude you'd first have to normalize it to look like a
> probability density (non-negative, sums to one). Meaning you add an offset
> so that the lowest value is zero, and then normalize.  Obviously that puts
> restrictions on the class of signals it can handle - there can't be any
> zeros on the unit circle (in practice we'd just apply a minimum threshold
> at, say, -60dB or whatever) - and involves other complications (I'm not
> sure there's a sensible time-domain interpretation).*
>
> I've solved a lot of problems in the past by "massaging" numbers into
> ranges or formats where they suit the problem I'm trying to solve.  That
> approach -- adding mathematical complexity according to convenience and
> intuition rather than specific theoretical justification -- is
> unscientific, and time and time again it has led me to false leads and
> discouraging results when solving low-level problems.  (That said, the
> failures that result from "fumbling in the dark" can sometimes lead to
> groundbreaking discoveries.)
>
> Research into perception tells us that most phenomena are perceived
> proportional to the logarithm of their intensity.  It tells us further that
> auditory stimuli are received in a form *resembling *the frequency
> domain.  We're mathematicians, not neuroscientists, and that discipline
> comes with a powerful confirmation bias for simple, "elegant" solutions.
> But the cochlea is not cleanly modeled
> <http://www.cns.nyu.edu/~david/courses/perception/lecturenotes/pitch/pitch.html>
> by a fourier transform, and as to what happens beyond, Minsky said it best:
> the simplest explanation is that there is no simple explanation.  In
> absence of hard research, we can't reasonably expect to add logarithm
> flavoring to such a simple formula and expect it to converge with the
> result of billions of years of evolution.
>
> Anyway, that's why -- in spite of my extensive research in pitch tracking
> -- I don't touch perception modeling with a ten-foot pole.  It's a soft
> science and it's all too easy to develop the misconception that you know
> what you're doing.  Because it will be a long time before the perceptual
> properties of any brightness metric can be clearly understood, I'll stick
> to formulas whose mathematical properties are transparent -- these lend
> themselves infinitely better to being small pieces of larger systems.
>
> – Evan Balster
> creator of imitone <http://imitone.com>
>
> On Thu, Feb 18, 2016 at 11:24 AM, Ethan Duni <ethan.d...@gmail.com> wrote:
>
>> >Weighting a mean with log-magnitude can quickly lead to nonsense.
>>
>> To use log magnitude you'd first have to normalize it to look like a
>> probability density (non-negative, sums to one). Meaning you add an offset
>> so that the lowest value is zero, and then normalize. Obviously that
>> puts restrictions on the class of signals it can handle - there can't be
>> any zeros on the unit circle (in practice we'd just apply a minimum
>> threshold at, say, -60dB or whatever) - and involves other complications
>> (I'm not sure there's a sensible time-domain interpretation).
>>
>> >I apply Occam's razor when making decisions about what metrics
>> correspond most closely to nature
>>
>> What is the natural phenomenon that we're trying to model here?
>>
>> > log-magnitude is rarely sensible outside of perception modeling
>>
>> But isn't the goal here to estimate the "brightness" of a signal?
>> Perceptual modelling is exactly why I bring log spectra up.
>>
>> E
>>
>>
>>
>> On Thu, Feb 18, 2016 at 7:42 AM, Evan Balster <e...@imitone.com> wrote:
>>
>>> Weighting a mean with log-magnitude can quickly lead to nonsense.
>>> Trivial examples:
>>>
>>>    - 0dB sine at 100hz, 6dB sine at 200hz --> log centroid is 200hz
>>>    - -6dB sine at 100hz, 12dB sine at 200hz --> log centroid is 300hz
>>>    (!)
>>>
>>> Sanfillipo's adaptive median finding technique is still applicable, but
>>> will produce the same result as a power or magnitude version.
>>>
>>> I apply Occam's razor when making decisions about what metrics
>>> correspond most closely to nature.  I choose the formula which is
>>> mathematically simplest while utilizing operations that make sense for the
>>> dimensionality of the operands and do not induce undue discontinuities.
>>> Power is simpler to compute than magnitude, log-magnitude is rarely
>>> sensible outside of perception modeling, and (unlike zero-crossing
>>> techniques) a small change in the signal will always produce a
>>> proportionally small change in the metrics.
>>>
>>> At next opportunity I should post up some code describing how to compute
>>> higher moments with the differential brightness estimator.
>>>
>>> – Evan Balster
>>> creator of imitone <http://imitone.com>
>>>
>>> On Thu, Feb 18, 2016 at 1:00 AM, Ethan Duni <ethan.d...@gmail.com>
>>> wrote:
>>>
>>>> >normalized to fundamental frequency or not
>>>> >normalized (so that no pitch detector is needed)?
>>>>
>>>> Yeah tonal signals open up a whole other can of worms. I'd like to
>>>> understand the broadband case first, with relatively simple spectral
>>>> statistics that correspond to the clever time-domain estimators discussed
>>>> so far in the thread.
>>>>
>>>> The ideas for time-domain approaches got me thinking about what the
>>>> optimal time-domain approach would look like. But of course it depends on
>>>> what definition of spectral centroid you use. For the mean of the power
>>>> spectrum it seems relatively straightforward to get some tractable
>>>> expressions - I guess this is the inspiration for the one based on an
>>>> approximate differentiator. But I suspect that mean of the log power
>>>> spectrum is more perceptually meaningful.
>>>>
>>>> E
>>>>
>>>> On Wed, Feb 17, 2016 at 8:34 PM, robert bristow-johnson <
>>>> r...@audioimagination.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> ---------------------------- Original Message
>>>>> ----------------------------
>>>>> Subject: Re: [music-dsp] Cheap spectral centroid recipe
>>>>> From: "Ethan Duni" <ethan.d...@gmail.com>
>>>>> Date: Wed, February 17, 2016 11:21 pm
>>>>> To: "A discussion list for music-related DSP" <
>>>>> music-dsp@music.columbia.edu>
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> >>It's essentially computing a frequency median,
>>>>> >>rather than a frequency mean as is the case
>>>>> >>with the derivative-power technique described
>>>>> >> in my original approach.
>>>>> >
>>>>> > So I'm wondering, is there any consensus on what is the best measure
>>>>> of
>>>>> > central tendency for a music signal spectrum? There's the median vs
>>>>> the
>>>>> > mean (vs trimmed means, mode, etc). But what is the right domain in
>>>>> the
>>>>> > first place: magnitude spectrum, power spectrum, log power spectrum
>>>>> or ???
>>>>>
>>>>> normalized to fundamental frequency or not normalized (so that no
>>>>> pitch detector is needed)?  should identical waveforms at higher pitches
>>>>> have the same centroid parameter or a higher centroids?
>>>>>
>>>>> spectral "brightness" is a multi-dimensional perceptual parameter.
>>>>>  you can have two tones with the same spectral centroid (however 
>>>>> consistent
>>>>> way you measure it) and sound very different if the "second moment" or
>>>>> "variance" is much different.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> r b-j                   r...@audioimagination.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> "Imagination is more important than knowledge."
>>>>>
>>>>> _______________________________________________
>>>>> dupswapdrop: music-dsp mailing list
>>>>> music-dsp@music.columbia.edu
>>>>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> dupswapdrop: music-dsp mailing list
>>>> music-dsp@music.columbia.edu
>>>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>>>
>>>
>>>
>>> _______________________________________________
>>> dupswapdrop: music-dsp mailing list
>>> music-dsp@music.columbia.edu
>>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>>
>>
>>
>> _______________________________________________
>> dupswapdrop: music-dsp mailing list
>> music-dsp@music.columbia.edu
>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>
>
>
> _______________________________________________
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
>

_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] Cheap spectral centroid recipe

Reply via email to