Re: [MP3 ENCODER] Looking for educated guess or explanation

Mark Taylor Tue, 23 May 2000 11:12:21 -0700
> 
> Hello,
> 
> After a few dozen of those frequency analysis graphs, I noticed
> something that made me curious: The 16+kHz region of a VBR encoded
> file vs the 256S cbr.
> 
> http://users.belgacom.net/gc247244/extra/why_oh_why.png [8KB]
> 
> is a _very_ striking illustration, and I started thinking about this.
> 
> As you can see, the shape of the curve is still there, but there is a
> constant dB drop in the 16->22kHz region.
> 
> Why is this?
> 
> the best I could come up with:
> - VBR: psy-model does noise calculations, decides at many instances the 16-22kHz
> (background)noise is below hearing treshold, decides upon lower bitrate, and the
> MDCT coefficients from high freqs, which are the least important, are
> scaled: 0 bits available. =>  Average fft sum of decoded mp3 gives
> drop in dB's.  At points where needed (>treshold), the high band is represented
> correctly.
> - bug (aka "feature")
> 
> I'm trying to get some understanding in this matter, so please, I'd
> like some educated material or guesses...
> 
> Q: Why so outspoken in the 16kHz band (32th band? [I keep hearing
> "22", but prolly just 16->22 meant?]
> Why no gradual decrease in db's progressive with freq, throughout
> different bands?
> 

Actually, there really are 22 "critical bands" or "scale factor bands"
used by MP3. I guess we should stick to the C convention, and call the
last band the 21'st band.  Here are the frequency ranges,
along with the ATH:


sfb= 0 freq(khz):  0.00 .. 0.15  ATH=-93.46 
sfb= 1 freq(khz):  0.15 .. 0.31  ATH=-103.59
sfb= 2 freq(khz):  0.31 .. 0.46  ATH=-106.77
sfb= 3 freq(khz):  0.46 .. 0.61  ATH=-108.40
sfb= 4 freq(khz):  0.61 .. 0.77  ATH=-109.43
sfb= 5 freq(khz):  0.77 .. 0.92  ATH=-110.16
sfb= 6 freq(khz):  0.92 .. 1.15  ATH=-111.02
sfb= 7 freq(khz):  1.15 .. 1.38  ATH=-111.76
sfb= 8 freq(khz):  1.38 .. 1.68  ATH=-112.81
sfb= 9 freq(khz):  1.68 .. 1.99  ATH=-114.04
sfb=10 freq(khz):  1.99 .. 2.37  ATH=-115.84
sfb=11 freq(khz):  2.37 .. 2.83  ATH=-117.92
sfb=12 freq(khz):  2.83 .. 3.45  ATH=-118.98
sfb=13 freq(khz):  3.45 .. 4.21  ATH=-118.92
sfb=14 freq(khz):  4.21 .. 5.13  ATH=-116.48
sfb=15 freq(khz):  5.13 .. 6.20  ATH=-113.20
sfb=16 freq(khz):  6.20 .. 7.50  ATH=-111.72
sfb=17 freq(khz):  7.50 .. 9.11  ATH=-110.10
sfb=18 freq(khz):  9.11 ..11.03  ATH=-106.49
sfb=19 freq(khz): 11.03 ..13.09  ATH=-98.69 
sfb=20 freq(khz): 13.09 ..16.00  ATH=-84.16 
sfb=21 freq(khz): 16.00 ..22.05  ATH=-48.04 

MP3 has a "global_gain" variable, which controls the total
number of bits allocated for the frame.  You allocate bits
among these 22 bands by adjusting the scalefactors, but
there are only scalefactors for bands 0..20.

The ATH is given in db, normalized so that the loudest possible 3khz
sine wave is about 1db. (normalization is a little different than the
db used in your plot) The value of the ATH in band 21 means that
humans cannot hear a 16khz signal which is weaker than -48db.

Perceptual audio coding takes this one step further and makes the
claim that the encoding will be perceptually identical to the original
if the strength of the noise/error component of the signal is less than -48db.

The total allowed noise is actually given as the maximum of the 
maskings computed by psymodel.c and the ATH.  But for band 21,
psymodel.c does not compute any maskings.


In your case, take the signal at 16khz:

original = -40db   energy=10.0e-5   =      .01 * 16khz_sine_wave
quantized= -48db   energy= 1.6e-5   =      .004 * 16khz_sine_wave

Now define:  error = quantized - original.

LAME spends most of its time trying to get the volume of
the error signal less than some tolerence.

In this case:
error = .006 * 16khz_sine_wave       energy = 3.6e-5 = -44db

The signal that is sent to your speaker when playing
the MP3 file is the quantized signal, is now equal to:

quantized = original  - error

So LAME VBR spends its time trying to make sure that that the volume of
the error signal is less than the ATH.  If the volume of the error
was -50db, then the error (as a seperate signal) would be
inaudible.  So the claim is that original + inaudible_error
sounds the same as the original.  

In this example, the error has a volume of
-44db, which is larger than the ATH and thus LAME VBR should
have allocated more bits.  But these numbers are all approximate
and we would have to look at them at the frame level.

Mark



--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Looking for educated guess or explanation

Reply via email to