Consider an mp3 file, mono (single channel), 44.1 kHz, encoded at 128 kb/s constant bitrate (to keep things simple) with your encoder of choice using average settings (let's say whatever ffmpeg uses as defaults for this case).

Think of the full 3D representation of the spectrum of the whole file, with time being one dimension, frequency another dimension, and relative amplitude the 3rd dimension. Or the waterfall diagram - again time is one dimension, frequency the other, and the relative amplitude is color-coded.

For that particular file, the resolution of the time dimension is pretty clear: it's 44100 samples per second. What's less clear to me is the resolutions of the other two dimensions. If I were to build the full 3D representation, what resolutions should I choose on the other two dimensions to achieve, overall, a similar amount of information as that contained in the original mp3 file?

For the frequency dimension, what are the limits? Is it 20 Hz and 20 kHz? And how many frequency "buckets" do I need to keep things comparable to the original mp3 file?

For the relative amplitude, how many bits do I need to capture more or less the same amount of info as the original mp3 file? 8 bit? 16 bit? Keep in mind this is the completely rolled out waterfall representation, not the encoded mp3 stream.

I think all these questions are ultimately tied into the total amount of information contained in the mp3 file. And I'm only looking for reasonable estimates for these parameters.

--
Florin Andrei
http://florin.myip.org/
_______________________________________________
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to