Re: [MP3 ENCODER] Re: MS Stereo
Good idea, but I'm afraid that doing only this it would still miss: *very short tones (less than 3 granules long) *tones rapidly changing of freqs (sweeps) But yes, doing forward and backward prediction is a good idea. Well.. there are several alternative methods, check out the method implemented in PEAQ (ITU-R 1387) and Frank Baumgarte's 'non linear' model. Instead of computing tonality, these models perform exponential additions of individual maskers, so the final effect is very similar to tonality estimation (why? - because tones are built on individual peaks and noise contains many similar spectral lines) It is very important that one that implements this approach tunes-up the alpha factor of smearing, so the pure noise and pure tone gives masking powers according to Zwicker's data. I figured out that alpha factor depends on window size and partition band median bark value. I have tried this approach in the AAC encoder, but the problem of this model is its speed - it requires lots of 'pow()' calculations in spreading-function convolution process, and therefore it is not really useful in real-time conditions. However, according to Baumgarte - it gives much better masking estimation. However, FhG encoders do not use this. Also, check out Anibal Ferreira's PhD thesis, where he described intra-frame tonality estimation based on MDCT spectrum. I haven't tried this because it was full of hard-core math, but one of these days I might try it :) Best Regards, Ivan Dimkovic ___ mp3encoder mailing list [EMAIL PROTECTED] http://minnie.tuhs.org/mailman/listinfo/mp3encoder
[MP3 ENCODER] Re: MS Stereo
E. Zwicker: psychoacoustics, facts and models. Let me elaborate just a little on this tonality estimation. First of all, why do we need tonality estimation? We need it because a non-tonal sound generates more masking than a tonal one, and thus we need this estimation to compute the ammount of masking. Gpsycho: based on the ISO model2 demonstration. It uses predictability. If amplitude and position of a sound can be accurately preticted from the 2 previous granules data, then the sound is considered tonal. It is a good idea, but the problem is that it can't detect the tonality of the sound before the 3rd granule where the sound is present. So the 2 first granules are wrongs. It's a little like the ISO short block estimation, were iso model needed data from previous granule, and then was switching 1 granule too late. Perhaps this could be fixed by doing tonality estimation of further 2 granules, and when a sound is detected as tonal, mark it as also tonal in the 2 previous granules. (as obviously it's already tonal since 2 granules) The second problem is that in the case of a tonal with rapid change in frequency, like a sine sweep, we miss it everytime. Nspsytune: based on the same kind of ideas as the ISO model1 demonstration. (in the case of nspstytune I'm not really sure, I hope that Naoki will correct me if I'm wrong) It uses peak detection. If a freq amplitude is higher by a threshold than its neighbours, then it's considered as tonal. There is no delay like in gpsycho, but if several tones are close enough, it will miss them (could it be the case with Fatboy?). So the 2 methods are differents, and right now none of them works perfectly. Perhaps a corrected (like suggested) method one, or a combination of the 2 methods would be accurate enough... Btw I'd suggest you to have a look at references on the Lame website, I added references to papers about this tonality estimation. Regards, Gabriel Bouvigne www.mp3-tech.org - Original Message - From: reinhard To: [EMAIL PROTECTED] Sent: Monday, January 28, 2002 10:58 AM Subject: Re: [MP3 ENCODER] MS Stereo One of the biggest differences between l3psycho_anal_ns and l3psyco_anal is exactly what you are asking about - how the estimate the tonality index. One is a tweaked and cleaned up version of the MPEG1/2 recommendation: the predictiictability of the energy in each band over several granules. I believe it comes from thesis work of one of the creators of MP3. The other is based on how peaked the spectrum is, and uses data just from a single granule. Naoki wrote it based on data in Zweicker's book. Zweicker's book?? would you tell me the name of the book or more information about the l3psycho_anal_ns Keep in mind that all the models are very crude estimates, and the output should be considered as a rough guide to the noise shaping algorthims rather than absolute truth. ___ mp3encoder mailing list [EMAIL PROTECTED] http://minnie.tuhs.org/mailman/listinfo/mp3encoder
Re: [MP3 ENCODER] Re: MS Stereo
On 28 Jan, Ivan Dimkovic wrote: Well.. there are several alternative methods, check out the method implemented in PEAQ (ITU-R 1387) and Frank Baumgarte's 'non linear' model. Instead of computing tonality, these models perform exponential additions of individual maskers, so the final effect is very similar to tonality estimation (why? - because tones are built on individual peaks and noise contains many similar spectral lines) It is very important that one that implements this approach tunes-up the alpha factor of smearing, so the pure noise and pure tone gives masking powers according to Zwicker's data. I figured out that alpha factor depends on window size and partition band median bark value. I have tried this approach in the AAC encoder, but the problem of this model is its speed - it requires lots of 'pow()' calculations in spreading-function convolution process, and therefore it is not really useful in real-time conditions. However, according to Baumgarte - it gives much better masking estimation. However, FhG encoders do not use this. If someone wants to experiment a little bit with it: I've commited a part of Frank Baumgarte's non linear model to LAME's psymodel.c recently. To use it you have to set the environment variable CONFIG_DEFS to '-DNON_LINEAR_PSYMODEL' and run configure. Bye, Alexander. -- To boldly go where I surely don't belong. http://www.Leidinger.net Alexander @ Leidinger.net GPG fingerprint = C518 BC70 E67F 143F BE91 3365 79E2 9C60 B006 3FE7 ___ mp3encoder mailing list [EMAIL PROTECTED] http://minnie.tuhs.org/mailman/listinfo/mp3encoder