::
:: >
:: > > 1. go to transform sizes 1024 and 128
:: > >
:: > MP3 uses 576 and 192. When 576 is too low for tonal music and 192 too long for
:: > percussions, then this is right. But a 1:8 ratio can create other problems.
:: > Note that MD uses 128, 256, 512 and 1024 sample blocks.
:: > Useful are block sizes from 1 ms ... 35 ms.
:: >
::
:: I guess it is a trade off between simplicity and flexability. The
:: 1024/128 windows come from AAC, but I have no idea if they are
:: optimal. Adding more window sizes increases complexity, since every
:: different since window requires a different window function and a
:: different set of huffman tables and partitioning schemes.
::
increases complexity:
Often complexity is decreased by such trades. You replaces
hard decision by much more softer decisions
different window:
If the window size is only 1:2, I would use composed cos^2
windows. You need FFTs of 128*(2,3,4,5,6,8,9,10,12,16) samples.
huffman tables:
Yes, difficult
partitioning schemes:
A problem of MP3 like systems with uniformed time slices.
:: Also, every transition from two different size windows is lossy. The
:: MDCT is only lossless for overlapping windows of the same size.
::
Is this a problem of bad designed (asymmetric) window functions or a
problem of the MDCT (different from DCT).
:: So it is good to minimize transitions. Another thing to keep in mind is
:: that short windows are not bad for tonal music - they just are not as
:: efficient.
::
You need more bits, or in CBR modes, quality decreases. But it is true,
short blocks only need a small amount of additional bits, if used instead
of long blocks. Vice versa you need a lot more bits.
:: > 5.
:: > Spectral prefiltering to get nearly constant ATH in every CB.
:: >
:: If I understood your original posts on this topic, the point of this is
:: to keep large amplitude signals in the lower CB effecting lower
:: amplitude signals in the higher CBs (the so called filter leakage). But
:: I dont think this is a problem since the current filter banks have
:: pretty good frequency resolution. The prefilter, unless you go to a
:: much larger (and more expensive) window will have just as much leakage
:: as the current filterbanks.
::
This is true for all CBs except the first and the last.
1. DC in the signal increases quantization steps of the first two or three
bands (not CB), but is not a masker at all. The same is true for AC in
the range from 16...70 Hz. Also note, that there are HQ loudspeakers out
there cutting all frequencies below the loudspeaker's power frequency
response. They have a nice flat frequncy response down to x Hz, and
below this frequency the frequency response falls with 48 dB/oct. x can
be something in the range from 80 Hz (some of the active controlled B&O,
a very tiny box with 2x4" bass speakers, sounds good, but is unable to
produce any low frequencies) down to 25 Hz (Canton Digital 1, digital
controlled and equalized). These loudspeakers also don't create any
distortion if there are high level low frequency signals (a vented tube
box creates a lot of them, it is also possible to kill a 150 W
loudspeaker with 1...5 Watt because of pull out the diaphragm).
So signal below 80 Hz should not affect masking, right?
This is not a MP3/MP4/AAC related topic, it's a psycho related topic.
2. Quantization steps of all bands within a CB are identically. This doesn't
matter for sfb2...19, but for sfb 0, 1, 20 and 21, especially for 0
(0...120 Hz) and 21 (16...22/24 kHz).
But the ATH in this cfb's differs by >20 dB. So especially sfb21 is
still difficult to code if there's an addition sfb21 scaler. The only
possibility you have is to cut high frequencies (for instance >18 kHz).
To change this psycho model and coding (=> no standard MP3) must be
changed.
3. Prefiltering eases the programming of psycho. You are separating
static ATH and dynamic masking and you can handle them separately.
The human ear also do that. ATH is a property of sound conducting
to the free field to the cochlea, masking is an effect within
the cochlea.
4. The prefilter has a extremly short size of 4 or 5 TAPs, which is
far below 128, 192, 576, or 1024.
:: noise shaping is the act of allocating bits among the critical bands.
:: You have to decide which bands are important and deserve lots of
:: bits/resolution, and which bands can be quantized with very few bits.
::
Aha. Bit-balancing between the CBs.
Not bit-balancing within one CB.
:: These decisions are based on continously computing the quatization noise
:: and comparing it to the psycho acoustic maskings in each CB. (The
:: effects you were describing are attempted to be modeled by the psycho
:: acoustics).
::
I don't understand how psycho acoustics can model the effect I described.
IMHO it's a quantization problem.
The human ear can't distinguish (white) noise from another (white) noise
with the same auto correlation function but a fully different temporal MDCT
spectrum. So it is a waste of bandwidth to try to store exactly this
(white) noise signal.
So it should be possible to increase the quantization steps and to use a
1D-error diffusion in the frequency domain to synthese a different but
(nearly) in the same way colored noise after decoding.
--
Mit freundlichen Grüßen
Frank Klemm
eMail | [EMAIL PROTECTED] home: [EMAIL PROTECTED]
phone | +49 (3641) 64-2721 home: +49 (3641) 390545
sMail | R.-Breitscheid-Str. 43, 07747 Jena, Germany
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )