I finally got a chance to thoroughly read Davis Pan's articles. Some
questions for the experts out there:
For Layer 3, why bother with the "polyphase" filterbanks? Why can't
you process the whole frame with a single MDCT - then start removing
and quantizing the MDCT spectral coefficients based on the psy model?
Pan talks about several components of the psychoacoustic model:
1. frequency & quiet masking. As presented in the article, this seems pretty
straightforward, and easy to implement an indepedent version?
2. temporal masking: Seems rather complicated - requires detailed hearing tests?
3. tonal/atonal components: This is only alluded too. The different
components have different masking properties. Anyone know what
this is? I'm guessing the tone-masking-noise is stronger than the
noise-masking-tone?
4. joint stereo: the side channel can be compressed further. Since my
computer speakers came with only one subwoofer, I see that the low
frequencies in the side channel can be eliminated. But what other
extra tricks are there for the side channel?
I'm guessing that #1 is responsible for most of the compression.
Any thoughts as to how important the other techniques are?
Maybe they are only important for really low bandwidths?
Mark