Re: [MP3 ENCODER] Re: MS Stereo

2002-02-04 Thread Ivan Dimkovic

 Good idea, but I'm afraid that doing only this it would still miss:

 *very short tones (less than 3 granules long)
 *tones rapidly changing of freqs (sweeps)

 But yes, doing forward and backward prediction is a good idea.


Well.. there are several alternative methods, check out the method
implemented in PEAQ (ITU-R 1387) and Frank Baumgarte's 'non linear' model.
Instead of computing tonality, these models perform exponential additions of
individual maskers, so the final effect is very similar to tonality
estimation (why? - because tones are built on individual peaks and noise
contains many similar spectral lines)

It is very important that one that implements this approach tunes-up the
alpha factor of smearing, so the pure noise and pure tone gives masking
powers according to Zwicker's data. I figured out that alpha factor
depends on window size and partition band median bark value.

I have tried this approach in the AAC encoder, but the problem of this model
is its speed - it requires lots of 'pow()' calculations in
spreading-function convolution process, and therefore it is not really
useful in real-time conditions. However, according to Baumgarte - it gives
much better masking estimation. However, FhG encoders do not use this.

Also, check out Anibal Ferreira's PhD thesis, where he described intra-frame
tonality estimation based on MDCT spectrum. I haven't tried this because it
was full of hard-core math, but one of these days I might try it :)

Best Regards,
Ivan Dimkovic


___
mp3encoder mailing list
[EMAIL PROTECTED]
http://minnie.tuhs.org/mailman/listinfo/mp3encoder



[MP3 ENCODER] Re: MS Stereo

2002-01-29 Thread Gabriel Bouvigne

E. Zwicker: psychoacoustics, facts and models.


Let me elaborate just a little on this tonality estimation.

First of all, why do we need tonality estimation? We need it because a
non-tonal sound generates more masking than a tonal one, and thus we need
this estimation to compute the ammount of masking.

Gpsycho: based on the ISO model2 demonstration. It uses predictability. If
amplitude and position of a sound can be accurately preticted from the 2
previous granules data, then the sound is considered tonal. It is a good
idea, but the problem is that it can't detect the tonality of the sound
before the 3rd granule where the sound is present. So the 2 first granules
are wrongs.
It's a little like the ISO short block estimation, were iso model needed
data from previous granule, and then was switching 1 granule too late.
Perhaps this could be fixed by doing tonality estimation of further 2
granules, and when a sound is detected as tonal, mark it as also tonal in
the 2 previous granules. (as obviously it's already tonal since 2 granules)
The second problem is that in the case of a tonal with rapid change in
frequency, like a sine sweep, we miss it everytime.

Nspsytune: based on the same kind of ideas as the ISO model1 demonstration.
(in the case of nspstytune I'm not really sure, I hope that Naoki will
correct me if I'm wrong)
 It uses peak detection. If a freq amplitude is higher by a threshold than
its neighbours, then it's considered as tonal. There is no delay like in
gpsycho, but if several tones are close enough, it will miss them (could it
be the case with Fatboy?).

So the 2 methods are differents, and right now none of them works perfectly.
Perhaps a corrected (like suggested) method one, or a combination of the 2
methods would be accurate enough...

Btw I'd suggest you to have a look at references on the Lame website, I
added references to papers about this tonality estimation.


Regards,


Gabriel Bouvigne
www.mp3-tech.org




- Original Message -
From: reinhard
To: [EMAIL PROTECTED]
Sent: Monday, January 28, 2002 10:58 AM
Subject: Re: [MP3 ENCODER] MS Stereo


One of the biggest differences between l3psycho_anal_ns and
l3psyco_anal is exactly what you are asking about - how the estimate
the tonality index.  One is a tweaked and cleaned up version of the
MPEG1/2 recommendation:  the predictiictability of the energy in each
band over several granules.  I believe it comes from thesis work
of one of the creators of MP3.  The other is based on how peaked the
spectrum is, and uses data just from a single granule.  Naoki wrote
it based on data in Zweicker's book.
   Zweicker's book??  would you tell me the name of the
book
or more information about the l3psycho_anal_ns

Keep in mind that all the models are very crude estimates,
and the output should be considered as a rough guide to the noise
shaping algorthims rather than absolute truth.


___
mp3encoder mailing list
[EMAIL PROTECTED]
http://minnie.tuhs.org/mailman/listinfo/mp3encoder



Re: [MP3 ENCODER] Re: MS Stereo

2002-01-29 Thread Alexander Leidinger

On 28 Jan, Ivan Dimkovic wrote:

 Well.. there are several alternative methods, check out the method
 implemented in PEAQ (ITU-R 1387) and Frank Baumgarte's 'non linear' model.
 Instead of computing tonality, these models perform exponential additions of
 individual maskers, so the final effect is very similar to tonality
 estimation (why? - because tones are built on individual peaks and noise
 contains many similar spectral lines)
 
 It is very important that one that implements this approach tunes-up the
 alpha factor of smearing, so the pure noise and pure tone gives masking
 powers according to Zwicker's data. I figured out that alpha factor
 depends on window size and partition band median bark value.
 
 I have tried this approach in the AAC encoder, but the problem of this model
 is its speed - it requires lots of 'pow()' calculations in
 spreading-function convolution process, and therefore it is not really
 useful in real-time conditions. However, according to Baumgarte - it gives
 much better masking estimation. However, FhG encoders do not use this.

If someone wants to experiment a little bit with it: I've commited a
part of Frank Baumgarte's non linear model to LAME's psymodel.c
recently. To use it you have to set the environment variable CONFIG_DEFS
to '-DNON_LINEAR_PSYMODEL' and run configure.

Bye,
Alexander.

-- 
  To boldly go where I surely don't belong.

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7

___
mp3encoder mailing list
[EMAIL PROTECTED]
http://minnie.tuhs.org/mailman/listinfo/mp3encoder