Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)

2000-09-24 Thread Gabriel Bouvigne

  MP3 uses 576 and 192. When 576 is too low for tonal music and 192 too
long for
  percussions, then this is right. But a 1:8 ratio can create other
problems.
  Note that MD uses 128, 256, 512 and 1024 sample blocks.
  Useful are block sizes from 1 ms ... 35 ms.

Minidisc also uses mixed windows. Perhaps mixed windows would help in our
case.
I've got another question about window sizes: are the short ones really
essential in VBR? Would it be possible to only use long ones, and then
allocating a lot more bits in the case of transcients? After all, Xing uses
only long ones, and does a not so bad job for transcients for an encoder
using only long ones. (note: I'm not saying that Xing is a reference in term
of quality)


  5.
  Spectral prefiltering to get nearly constant ATH in every CB.

Why can we read in the litterature that humans got 25 CB but mp3 uses only
22?


 I believe noise shaping is the main difference between different MP3
 encoders.  I'm sure MPEG did not document any good noise shaping
 algorithms on purpose :-)  There are a few simple things in the
 literature, but I've never found any documentaion of a noise shaping
 algorithm used in an actual commercial encoder.

Have you tried digging into audio patents? It would perhaps bring an idea.


Regards,

--

Gabriel Bouvigne - France
[EMAIL PROTECTED]
mobile phone: [EMAIL PROTECTED]
icq: 12138873

MP3' Tech: www.mp3-tech.org


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)

2000-09-24 Thread Frank Klemm

On Sun, Sep 24, 2000 at 10:57:39AM +0200, Gabriel Bouvigne wrote:

 I've got another question about window sizes: are the short ones really
 essential in VBR? Would it be possible to only use long ones, and then
 allocating a lot more bits in the case of transcients? After all, Xing uses
 only long ones, and does a not so bad job for transcients for an encoder
 using only long ones. (note: I'm not saying that Xing is a reference in term
 of quality)
 
Tested with a synthesized signal:

--noshort -b128:awful
--noshort -b320:bad
--noshort -b550 --freeformat:   Decoder SIGSEG
-b320:  good, but distinguishable from the origin
without any effort (20/20)
-b550 --freeformat: okay

Note: 
All-purpose lossless compressing utilities gave a better compression ratio:

input uses  input uses round to 
HQ quantization nearest integer quantization

gzip190 kbps74 kbps
bzip154 kbps68 kbps

Very short attacks seems to be a nightmare for MP3.

Signal is:

  * white noise
  * attack time: 0.5 ms
  * release time: 25 ms
  * pause time to fill the bit pool: 474.5 ms
  * both channels are uncorrelated
  * all attacks are different and also sounding a little bit different

Note: The percussion attacks in "Money for nothing" are a little bit similar
  to these attacks:

  * white noise from 1...18 kHz (+/- 3 dB)
  * attack time: ca. 0.5 ms
  * release time: ca. 20...30 ms
  * but: no silence between the attacks

How to capture Win95 Screen Shots? What utility would be the best?

-- 
Frank Klemm

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)

2000-09-24 Thread Frank Klemm

On Sun, Sep 24, 2000 at 10:57:39AM +0200, Gabriel Bouvigne wrote:
 
 Minidisc also uses mixed windows. Perhaps mixed windows would help in our
 case.
 I've got another question about window sizes: are the short ones really
 essential in VBR? Would it be possible to only use long ones, and then
 allocating a lot more bits in the case of transcients? 

A long window have a duration of up to 36 ms (32 kHz).
So the worst case pre-echo's are:

  * - 5 dB for dt = 12 ms (32 kHz) or dt =  9 ms (44.1 kHz)
  * -12 dB for dt = 18 ms (32 kHz) or dt = 13 ms (44.1 kHz)
  * -24 dB for dt = 24 ms (32 kHz) or dt = 17 ms (44.1 kHz)

Because this is much more flat than the human pre-masking, you really need a
huge amount of more bits. Often 320 kbps are sounding worse.

For post-masking I found values around 1 dB/6 ms for 1...5 kHz.
What's the value for pre-masking?


   5.
   Spectral prefiltering to get nearly constant ATH in every CB.
 
 Why can we read in the literature that humans got 25 CB but mp3 uses only
 22?
 
I think it is because the low frequency CBs are larger than the in the
literature. You have two problems:

  * MP3 uses only CB width which are a multiple of 4, perhaps to make
use of the Intel SIMD instructions ;-)
  * So all CBs have sizes of multiples of 111 Hz/153 Hz/167 Hz, which can't
be mapped to the CBs often found in literature.
  * The exact width of an CB is a little bit arbitrary, you can found
values from 40 Hz...120 Hz for low frequencies. It depends on
the exact definition of the item "CB". A lot of people say that
100 Hz for low frequencies is much too large.
  * Note, that Zwicker also splits the 3 low frequency CBs into several
subbands to compensate the ATH frequency dependencies (see Table 1 in
DIN 45631).

See also ISO 532: "Methode de calcul du niveau d'isononie"

Another question: I have some C++ programs generating test signals.
The programs are around 1...2 KByte large and generating WAV-Files in the
range from 1...10 MByte.
Some of them are really nasty for MP3. Should we collect such programs?


-- 
Frank Klemm

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)

2000-09-24 Thread Robert Hegemann

Hi Gaby

 Why can we read in the litterature that humans got 25 CB but mp3 uses only
 22?

let us try to get it in order:

bark scale is used by the spreading function
Bark 0 : 0-100 Hz,  Bark 24: 15.5 - 20.4 kHz

masking is calculated for convolution bands
Lame uses 64 equidistant convolution bands from 0 Hz up to Nyquist

each of the 22 scalefactor bands is responsible for a group of
subbands (the convolution bands), but we have only 21 scalefactors
(12 scalefactors w/ short blocks)


Ciao Robert



--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re[2]: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)

2000-09-24 Thread Dmitry

Hello Frank,

Sunday, September 24, 2000, 7:43:06 PM, you wrote:

FK How to capture Win95 Screen Shots? What utility would be the best?

press 'print screen' button (copy)
and paste into paintbrush...

8)

Best regards,
 Dmitrymail to: [EMAIL PROTECTED]

   http://www.chat.ru/~dkutsanov/~index.htm


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)

2000-09-24 Thread Gabriel Bouvigne

  Why can we read in the litterature that humans got 25 CB but mp3 uses
only
  22?

 let us try to get it in order:

 bark scale is used by the spreading function
 Bark 0 : 0-100 Hz,  Bark 24: 15.5 - 20.4 kHz

 masking is calculated for convolution bands
 Lame uses 64 equidistant convolution bands from 0 Hz up to Nyquist

 each of the 22 scalefactor bands is responsible for a group of
 subbands (the convolution bands), but we have only 21 scalefactors
 (12 scalefactors w/ short blocks)


So the highest subbands don't have any scalefactor? I know that Brandebourg
said that there is no proof that 16kHz really contribute to the hearing of
the music, and then it could be intentionnal, but could it be a "bug" or
mistake in the mp3 specs?
After all, I think that in 48kHz encoding some freq higher than 16kHz got a
scalefactor, so it could be theorically be possible to affect a scalefactor.
Is there a scalefactor for 16kHz in AAC? (Meno, are you listening?)


Also an off topic question for Robert: as you're german, is there a specific
knowledge about audio compression floating around in your university? (like
specialists, research or thesis)


Regards,


--

Gabriel Bouvigne - France
[EMAIL PROTECTED]
mobile phone: [EMAIL PROTECTED]
icq: 12138873

MP3' Tech: www.mp3-tech.org


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)

2000-09-24 Thread menno

"Gabriel Bouvigne" [EMAIL PROTECTED] wrote:
 Is there a scalefactor for 16kHz in AAC? (Meno, are you listening?)

AAC has scalefactorbands that fill the whole frequency range, scalefactors
are calculated for all scalefactorbands.

Bye, Menno

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)

2000-09-24 Thread Robert Hegemann

Gabriel Bouvigne schrieb am Son, 24 Sep 2000:
 So the highest subbands don't have any scalefactor? I know that Brandebourg
 said that there is no proof that 16kHz really contribute to the hearing of
 the music, and then it could be intentionnal, but could it be a "bug" or
 mistake in the mp3 specs?

The only thing you can do for the highest bands is adjusting
the global quantizer stepsize and then try to color the bands
where you have scalefactors for. This would require to compute
masking for sfb21 (sfb12 for short blocks). 

 After all, I think that in 48kHz encoding some freq higher than 16kHz got a
 scalefactor, so it could be theorically be possible to affect a scalefactor.
 Is there a scalefactor for 16kHz in AAC? (Meno, are you listening?)
 
 
 Also an off topic question for Robert: as you're german, is there a specific
 knowledge about audio compression floating around in your university? (like
 specialists, research or thesis)

Well, to my shame I must say, that I don't know if there are any
audio experts at the University of Dortmund.
All I   Learned About Mpeg Encoding   I gathered in the last year
joining the LAME project in my spare time.

A book I own for a few days that I can recommend is:
"Psychoacoustics, Facts and Models", E. Zwicker, H. Fastl; Second
Updated Edition, Springer Series in Information Sciences 22; 1999
ISBN 3-540-65063-6, ISSN 0720-678X

 Regards,
 
 
 --
 
 Gabriel Bouvigne - France
 [EMAIL PROTECTED]
 mobile phone: [EMAIL PROTECTED]
 icq: 12138873
 
 MP3' Tech: www.mp3-tech.org


Ciao Robert



--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME- please review)

2000-09-24 Thread Frank Klemm

::  
::  So the highest subbands don't have any scalefactor? I know that
::  Brandebourg said that there is no proof that 16kHz really contribute to
::  the hearing of the music, and then it could be intentionnal, but could
::  it be a "bug" or mistake in the mp3 specs?
::
40 Hz...16 kHz (+0.2dB,-0.3dB)°) seems to be not enough to pass AB tests.
25 Hz...18 kHz seems to be sufficient, and 20...20 kHz are recommended.

These are values for monoton decreasing frequency response. Using a slight
emphasis from fu-1.5 kHz to fu reduces significantly the bandwidth needs.

The easiest way is to do this with a static frequency response like:

12.5 kHz 0.0 dB
13 kHz  -0.2 dB
13.5 kHz 0.0 dB
14 kHz  +0.2 dB
14.5 kHz+0.4 dB
15 kHz  +0.6 dB
15.5 kHz+0.8 dB
16 kHz  +1.0 dB
16.5 kHz-oo  dB

The frequency response should be so calculated, that the white noise'
cochlea excitement is not changed. This should be possible for fu = 14 kHz.

Better methods are calculating this preemphasis dynamically from the actual
signal.

I've tested the first method with fs=29.4 kHz and got a nearly
indistinguishable signal compared to the classical low pass filtering with
fu = 0.45*fs = 13.2 kHz resulting in "poor" quality.

To my mind 16 kHz are enough for music. Using some emphasis tricks makes
this statement more secure.

Have someone a piece of music with a triangle? For my experiments I still
need some very tonal high frequency samples.


::  After all, I think that in 48kHz encoding some freq higher than 16kHz got a
::  scalefactor, so it could be theorically be possible to affect a scalefactor.
::
No. The scaleband assignments are different for 32/44.1/48 kHz, so you got
16 kHz for all fs.

Long Blocks Short Blocks
32 kHz: ...15.25 kHz...14.92 kHz
44.1 kHz:   ...15.96 kHz...15.50 kHz
48 kHz: ...15.96 kHz...15.62 kHz

-- 
Mit freundlichen Grüßen
Frank Klemm

PS: °) minimum requirements of studio equipment frequency response.


eMail | [EMAIL PROTECTED]   home: [EMAIL PROTECTED]
phone | +49 (3641) 64-2721home: +49 (3641) 390545
sMail | R.-Breitscheid-Str. 43, 07747 Jena, Germany

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)

2000-09-24 Thread Mark Taylor

 
 Hi Gaby
 
  Why can we read in the litterature that humans got 25 CB but mp3 uses only
  22?
 
   let us try to get it in order:
 
   bark scale is used by the spreading function
   Bark 0 : 0-100 Hz,  Bark 24: 15.5 - 20.4 kHz
 
   masking is calculated for convolution bands
   Lame uses 64 equidistant convolution bands from 0 Hz up to Nyquist
 
   each of the 22 scalefactor bands is responsible for a group of
   subbands (the convolution bands), but we have only 21 scalefactors
   (12 scalefactors w/ short blocks)
 
 
   Ciao Robert
 

Some more info:

Barks, as used in mp3, is just a different way to measure
frequency.  The conversion is givin in freq2bark() in util.c

There is nothing magic about 22 or 25.  The important thing
is that the bands have about the same width when measured
in barks.  In MP3, each band is about .9 barks, and AAC,
each band is about .5 barks.  (AAC has 49 bands, IIRC).

MP3 and AAC psycho acoustics actually computes everything
in bands of about .33 barks wide (the 64 convolution bands
Robert mentioned), and then this information
is mapped to the 22 scalefactor bands.  


Mark
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



[MP3 ENCODER] Various

2000-04-27 Thread Shawn Riley

I have a few questions  ideas - potentially stupid, but they've been
bugging me. I'd try all the ideas myself except I can't get Lame to compile
 I don't have a clue how to implement them anyway.

1- Is it possible to change the sample rate by encoding frames using other
than 1152 samples? As an example, if we used a 44.1kHz WAV, making Lame
encode 2304 samples for each frame, purging all frequencies over 11025Hz or
the specified LPF, could we write it as a valid 22050Hz MP3 without
actually doing any resampling? (or something along those lines...) There'd
probably have to be a slight time/pitch shift for non-integer resampling
ratios, which would be worse for upsampling, but I think it might *sound*
better than resampling per sé. Normal resampling for upsampling, this
routine for downsampling?

2- Are some people saying Layer2 is actually better than Layer3 at the same
bitrates for some types of music? I wonder if quality could be improved by
switching layers midstream... Do MPEG standards support that?

3- Bit reservoir  Joint Stereo. Maybe this is already done, but just in
case it isn't... If switching between M/S  L/R modes lowers the quality,
then why not make the switch only when the new mode (not using the bit
reservoir) will be of better quality than the previous mode (using the
reservoir)?

4- In M/S encoding, approximately how much bandwidth is offered to the mid
channel  the side channel when M  S are of similar amplitude?

5- I think Lame would benefit if it could be forced to use short blocks
more readily when there's sharp attacks mixed with analog silence,
especially for lower sample rates. I have a sample where a lot of pre-echo
is introduced. I'm using 320kbit/sec for 44.1kHz,  160kBit/sec for
22.05kHz,  the pre-echo is noticeable on both, especially the 22.05kHz
version... I think it's because the encoder isn't switching to short
blocks. And I'm sure it's *not* because of resampling - I LPFed both the
samples at 10kHz  they both sounded (almost) the same as WAVs.

6- What's the difference between normal stereo  dual channel apart from
normal stereo allowing a more "free" allocation of bandwidth between the
channels? In which circumstances would it be preferred over normal stereo?

Shawn
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



RE: [MP3 ENCODER] Various

2000-04-27 Thread Mathew Hendry

 From: Shawn Riley [mailto:[EMAIL PROTECTED]]

 6- What's the difference between normal stereo  dual channel 

In terms of bitstream format, nothing, apart from the frame header. Dual
channel is simply a hint to the decoder that the two channels are intended
to be played separately, rather than together as a stereo track.

 normal stereo allowing a more "free" allocation of bandwidth between the
channels?

AFAIK it doesn't. I'm not sure where that idea originated.

-- Mat.
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Various

2000-04-27 Thread Gabriel Bouvigne

Mathew Hendry a écrit :
 
  From: Shawn Riley [mailto:[EMAIL PROTECTED]]
 
  6- What's the difference between normal stereo  dual channel
 
 In terms of bitstream format, nothing, apart from the frame header. Dual
 channel is simply a hint to the decoder that the two channels are intended
 to be played separately, rather than together as a stereo track.
 
  normal stereo allowing a more "free" allocation of bandwidth between the
 channels?
 
 AFAIK it doesn't. I'm not sure where that idea originated.
 
 -- Mat.
 --
 MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )


Dual channel was made for using in dual channel transmissions or other
things like this.
In dual channel, each channel has to got exactly half of the bits.

In stereo, you're not constrained to 50% for each channel.

Regards,
-- 

Gabriel Bouvigne - France

www.mp3-tech.org
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Various

2000-04-27 Thread Gabriel Bouvigne

Shawn Riley a écrit :
 2- Are some people saying Layer2 is actually better than Layer3 at the same
 bitrates for some types of music? I wonder if quality could be improved by
 switching layers midstream... Do MPEG standards support that?

I think that it's forbidden by iso

Regards,
-- 

Gabriel Bouvigne - France

www.mp3-tech.org
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



RE: [MP3 ENCODER] Various

2000-04-27 Thread Mathew Hendry

 From: Gabriel Bouvigne [mailto:[EMAIL PROTECTED]]
 
 In dual channel, each channel has to got exactly half of the bits.

Do you have a reference for that in the ISO/IEC docs? Throughout 11172-3
stereo and dual_channel seem to be treated as entirely equivalent.

-- Mat.
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Various

2000-04-27 Thread Ross Levis

Mathew Hendry wrote:

  normal stereo allowing a more "free" allocation of bandwidth between the
 channels?

 AFAIK it doesn't. I'm not sure where that idea originated.

I have been under the impression for several years that Stereo (mode 0) shares
bits between the channels. If one channel was more complex than the other then
it would allocated more to the channel that required it.  I presume this is
what LAME is doing, is it not?

Ross.

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



RE: [MP3 ENCODER] Various

2000-04-27 Thread Mathew Hendry

 From: Ross Levis [mailto:[EMAIL PROTECTED]]
 
 I have been under the impression for several years that 
 Stereo (mode 0) shares
 bits between the channels. If one channel was more complex 
 than the other then
 it would allocated more to the channel that required it.  I 
 presume this is
 what LAME is doing, is it not?

Yes it is. The question is whether dual_channel is more restricted than
that.

-- Mat.
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Various

2000-04-27 Thread Ross Levis

 Yes it is. The question is whether dual_channel is more restricted than
 that.

Dual-channel is just what the name suggests.  Each channel is completely
independant.  I don't see any advantage of using dual-channel.

Ross.

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )