On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
What do the 4096 resulting complex numbers represent

Bin 0 is energy at 0Hz
Bin 1 to 2047 are energy at (bin * samplingRate / 4096) hz
Bin 2048 is energy at Nyquist frequency
Bin 2049 to 4095 are the energy for negative frequencies and conjugate of bin 2047 down to 1.

float fftBinToFrequency(float fftBin, int fftSize, float samplingRate)
    {
        return (samplingRate * fftBin) / fftSize;
    }


A FFT that operates only on real numbers (aka "RealFFT") will thus give you 2048 + 1 bins instead of 4096; and is twice as fast.

Each of these bins carry both amplitude (quantity of energy similar to a frequency) and phase information, relative to the START of your time-domain window, if you need energy and phase at the center of the window you will need "zero-phase" windowing which is basically an offset. (See Dplug's FFTAnalyzer).

Each bin of a time-frequency transform basically answers the question: "How much does this signal looks like a sinusoid that would make k rotation inside the time-window, and with how much offset ?".


Now, it seems you want to carry peak picking in a spectrum.
The best method I know is called the "2nd Quinn estimator" from the paper: "Estimation of frequency, amplitude and phase from the DFT of a time series" (1997)

Since your sound isn't complex it's going to be _way_ simpler to find and classify those peaks (no need for the best estimator either) rather than a real pitch detection that would work on voice.

You should resist the temptation to interpolate in the bin domain, it's better to oversample the FFT results with generous "zero-padding" instead, then using an estimation method like the above.


How should I use the result to check whether the 1209Hz, 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the sound?

1. Detect all peaks larger in magnitude from their 2 or 4 neighbours,
2. if sufficiently above the mean energy of the spectrum
3. then optionally refine their amplitude and frequency with an estimator like the one above
4. from which it will very clear if 1336hz or nearby was present.

Your FFT doesn't need to operate on much more than 20ms in general.


Reply via email to