Re: [MP3 ENCODER] the "-mx" mode - different philosophy

Mark Taylor Tue, 22 Aug 2000 12:14:56 -0700

> it would be a JS mode, but unlike the "-mj" mode it would not try to predict
> anything, but just achieve optimal quality in an empirical way.
> 
> -----------
> for cbr: encode each set of samples to both a M/S and a S frame and
>          take the one with least amount of introduced distortion.
>          (can you use the calculation that now is used in vbr?)
> 
> for vbr: see how low you can go in M/S, and then check if at this
>          bitrate if S gives equal or better results.
>          If so see how low you can go in S...
> -----------
> 
Hi Roel,

Problem is, this is a lot of work and it is not clear that it would
really improve things.  The hard part is how do you tell if M/S gives
better results than S?  The only way is by some measure of distortion
- allowed_distortion.  But as we know, all measures of distortion are
just very rough guidelines - if they were really good, VBR would be pefect!
Thus my fear is that if we use some function of the distortion, we
will not be able to tell in a reliable way if mid/side sounds better
than stereo.

Here is what I would suggest:  
(I've done this many times: it is tedious and a lot of work!)

Can you isolate the problems in velvet.wav to a few frames, and then
look at them with the frame analyzer, both with m/s and s.  If you
find an example where LAME uses m/s when it sounds better with s, then
you can compare the distortion numbers and see if they could be used
to indicate that s would sound better than m/s.

The best way I've found to isolate probems is to play
back the file with mpg123, using the -k and -n options
to play a shifting window of about 50 frames, to 
narrow the problem down to a range of just a few frames.

Then take a look at those frames with the frame analyzer, but
on the original .wav file not the mp3 file. (if you run it
on just the mp3 file, the psycho acoustic information used
to produce the mp3 file will not be available)

You can see the number of bits allocated to each granule of each
channel, as well as the amount of distortion.  Some of the
information in the outdated pull down help menu might be
usefull, but look instead for the cryptic line that says something like:

FFT0  pe=0.77K/1.7  n=5/5.9/16.9/-96.6

The first two numbers are the PE and the short block energy
variation which are used to determine CBR/ABR bit allocations
and if short blocks should be used.

The next 4 numbers are the distortion.  In this case,

over = 5             (5 bands have quantization noise > masking)

max_noise = 5.9      maximum (over all bands) of 
                     quantization_noise - masking = 5.9db

over_noise = 16.9    Divide this number by 5 (over) to get the
                     average value of
                     quantization_noise(db) - masking(db)
                     in each band where quantization_noise > masking

tot_noise = -96.6    Divide by the number of bands to get the
                     average value of 
                     quantization_noise(db) - masking(db)
                     in all bands.


tot_noise is not that usefull since there will always be a couple
of bands which have very little quantization noise and they
skew the average downward.


Mark






























--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] the "-mx" mode - different philosophy

Reply via email to