Re: [MP3 ENCODER] Some suggestions for LAME - please review

Frank Klemm Sun, 24 Sep 2000 06:19:59 -0700
::  
::  >  
::  > > 1. go to transform sizes 1024 and 128
::  > >
::  > MP3 uses 576 and 192. When 576 is too low for tonal music and 192 too long for
::  > percussions, then this is right. But a 1:8 ratio can create other problems.
::  > Note that MD uses 128, 256, 512 and 1024 sample blocks.
::  > Useful are block sizes from 1 ms ... 35 ms.
::  > 
::  
::  I guess it is a trade off between simplicity and flexability.  The
::  1024/128 windows come from AAC, but I have no idea if they are
::  optimal.  Adding more window sizes increases complexity, since every
::  different since window requires a different window function and a
::  different set of huffman tables and partitioning schemes.
::
increases complexity:
    Often complexity is decreased by such trades. You replaces
    hard decision by much more softer decisions

different window:
    If the window size is only 1:2, I would use composed cos^2
    windows. You need FFTs of 128*(2,3,4,5,6,8,9,10,12,16) samples.

huffman tables:
    Yes, difficult

partitioning schemes:
    A problem of MP3 like systems with uniformed time slices.


::  Also, every transition from two different size windows is lossy.  The
::  MDCT is only lossless for overlapping windows of the same size. 
::
Is this a problem of bad designed (asymmetric) window functions or a
problem of the MDCT (different from DCT).


::  So it is good to minimize transitions.  Another thing to keep in mind is
::  that short windows are not bad for tonal music - they just are not as
::  efficient. 
::
You need more bits, or in CBR modes, quality decreases. But it is true,
short blocks only need a small amount of additional bits, if used instead
of long blocks. Vice versa you need a lot more bits.


::  > 5.
::  > Spectral prefiltering to get nearly constant ATH in every CB.
::  > 
::  If I understood your original posts on this topic, the point of this is
::  to keep large amplitude signals in the lower CB effecting lower
::  amplitude signals in the higher CBs (the so called filter leakage). But
::  I dont think this is a problem since the current filter banks have
::  pretty good frequency resolution.  The prefilter, unless you go to a
::  much larger (and more expensive) window will have just as much leakage
::  as the current filterbanks.
::  
This is true for all CBs except the first and the last.

1. DC in the signal increases quantization steps of the first two or three
   bands (not CB), but is not a masker at all. The same is true for AC in
   the range from 16...70 Hz. Also note, that there are HQ loudspeakers out
   there cutting all frequencies below the loudspeaker's power frequency
   response.  They have a nice flat frequncy response down to x Hz, and
   below this frequency the frequency response falls with 48 dB/oct. x can
   be something in the range from 80 Hz (some of the active controlled B&O,
   a very tiny box with 2x4" bass speakers, sounds good, but is unable to
   produce any low frequencies) down to 25 Hz (Canton Digital 1, digital
   controlled and equalized). These loudspeakers also don't create any
   distortion if there are high level low frequency signals (a vented tube
   box creates a lot of them, it is also possible to kill a 150 W
   loudspeaker with 1...5 Watt because of pull out the diaphragm).

   So signal below 80 Hz should not affect masking, right?
   This is not a MP3/MP4/AAC related topic, it's a psycho related topic.

2. Quantization steps of all bands within a CB are identically. This doesn't
   matter for sfb2...19, but for sfb 0, 1, 20 and 21, especially for 0
   (0...120 Hz) and 21 (16...22/24 kHz).

   But the ATH in this cfb's differs by >20 dB. So especially sfb21 is
   still difficult to code if there's an addition sfb21 scaler. The only
   possibility you have is to cut high frequencies (for instance >18 kHz).

   To change this psycho model and coding (=> no standard MP3) must be
   changed.

3. Prefiltering eases the programming of psycho. You are separating
   static ATH and dynamic masking and you can handle them separately.
   The human ear also do that. ATH is a property of sound conducting 
   to the free field to the cochlea, masking is an effect within
   the cochlea.

4. The prefilter has a extremly short size of 4 or 5 TAPs, which is
   far below 128, 192, 576, or 1024.

   

::  noise shaping is the act of allocating bits among the critical bands. 
::  You have to decide which bands are important and deserve lots of
::  bits/resolution, and which bands can be quantized with very few bits.
::
Aha. Bit-balancing between the CBs.
Not bit-balancing within one CB.

::  These decisions are based on continously computing the quatization noise
::  and comparing it to the psycho acoustic maskings in each CB.  (The
::  effects you were describing are attempted to be modeled by the psycho
::  acoustics).
::
I don't understand how psycho acoustics can model the effect I described.
IMHO it's a quantization problem.

The human ear can't distinguish (white) noise from another (white) noise
with the same auto correlation function but a fully different temporal MDCT
spectrum. So it is a waste of bandwidth to try to store exactly this
(white) noise signal.

So it should be possible to increase the quantization steps and to use a
1D-error diffusion in the frequency domain to synthese a different but
(nearly) in the same way colored noise after decoding.

-- 
Mit freundlichen Grüßen
Frank Klemm
 
eMail | [EMAIL PROTECTED]       home: [EMAIL PROTECTED]
phone | +49 (3641) 64-2721    home: +49 (3641) 390545
sMail | R.-Breitscheid-Str. 43, 07747 Jena, Germany

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Some suggestions for LAME - please review

Reply via email to