On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
What do the 4096 resulting complex numbers represent
Bin 0 is energy at 0Hz
Bin 1 to 2047 are energy at (bin * samplingRate / 4096) hz
Bin 2048 is energy at Nyquist frequency
Bin 2049 to 4095 are the energy for negative frequencies and
conjugate of bin 2047 down to 1.
float fftBinToFrequency(float fftBin, int fftSize, float
samplingRate)
{
return (samplingRate * fftBin) / fftSize;
}
A FFT that operates only on real numbers (aka "RealFFT") will
thus give you 2048 + 1 bins instead of 4096; and is twice as fast.
Each of these bins carry both amplitude (quantity of energy
similar to a frequency) and phase information, relative to the
START of your time-domain window, if you need energy and phase at
the center of the window you will need "zero-phase" windowing
which is basically an offset. (See Dplug's FFTAnalyzer).
Each bin of a time-frequency transform basically answers the
question: "How much does this signal looks like a sinusoid that
would make k rotation inside the time-window, and with how much
offset ?".
Now, it seems you want to carry peak picking in a spectrum.
The best method I know is called the "2nd Quinn estimator" from
the paper:
"Estimation of frequency, amplitude and phase from the DFT of a
time series" (1997)
Since your sound isn't complex it's going to be _way_ simpler to
find and classify those peaks (no need for the best estimator
either) rather than a real pitch detection that would work on
voice.
You should resist the temptation to interpolate in the bin
domain, it's better to oversample the FFT results with generous
"zero-padding" instead, then using an estimation method like the
above.
How should I use the result to check whether the 1209Hz,
1336Hz, 1477Hz, or 1633Hz tones are present in that part of the
sound?
1. Detect all peaks larger in magnitude from their 2 or 4
neighbours,
2. if sufficiently above the mean energy of the spectrum
3. then optionally refine their amplitude and frequency with an
estimator like the one above
4. from which it will very clear if 1336hz or nearby was present.
Your FFT doesn't need to operate on much more than 20ms in
general.