On Tuesday, 8 July 2025 at 18:11:27 UTC, Matthew wrote:
I can't figure out how the 4096 results of the FFT relate to the frequencies in the input.

I tried taking the magnitude of each element,

That's correct!

What do the 4096 resulting complex numbers represent?

The magnitude of each element (computed with `std.complex.abs`) corresponds to the amplitude of each frequency component, the angle in the complex plane represents the phase (computed with `std.complex.arg` in radians).

The frequencies are all relative to the FFT window. `res[0]` is 0 Hz, `res[1]` corresponds to a sine wave that fits 1 cycle inside your window, res[2] is 2 cycles etc. The frequency in Hz depends on your sample rate. If it's 44100, 44100 / 4096 = ~10 so your window fits 10 times in 1 second. That means res[1] is around 10 hz, res[2] 20 hz etc. up to res[4095] which is 40950 hz. Although everything from res[2048] onwards is just a mirrored copy since 44100 samples/s can only capture frequencies up to 22 Khz (for more info search 'Nyquist frequency' and 'aliasing').

How should I use the result to check whether the 1209Hz, 1336Hz, 1477Hz, or 1633Hz tones are present in that part of the sound?

The closest bucket to 1209Hz is 1209 * (4096 / 44100) = 112.3, which is not an exact match so it will leak frequencies in all other bins, but it will still mostly contribute to bins 112 and 113 so it's probably good enough to just check those. If you need better frequency resolution you can try applying a 'window function' to reduce spectral leakage or increasing the window size either by including more samples reducing the time resolution, or by padding the window with 0's which will essentially adds interpolated bins.

I haven't programmed pitch detection myself yet (still on my todo list!) so I don't know how much of this is needed for your use case, but you can just start by checking the closest bin and see how far you get. Good luck!

Reply via email to