Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)
MP3 uses 576 and 192. When 576 is too low for tonal music and 192 too long for percussions, then this is right. But a 1:8 ratio can create other problems. Note that MD uses 128, 256, 512 and 1024 sample blocks. Useful are block sizes from 1 ms ... 35 ms. Minidisc also uses mixed windows. Perhaps mixed windows would help in our case. I've got another question about window sizes: are the short ones really essential in VBR? Would it be possible to only use long ones, and then allocating a lot more bits in the case of transcients? After all, Xing uses only long ones, and does a not so bad job for transcients for an encoder using only long ones. (note: I'm not saying that Xing is a reference in term of quality) 5. Spectral prefiltering to get nearly constant ATH in every CB. Why can we read in the litterature that humans got 25 CB but mp3 uses only 22? I believe noise shaping is the main difference between different MP3 encoders. I'm sure MPEG did not document any good noise shaping algorithms on purpose :-) There are a few simple things in the literature, but I've never found any documentaion of a noise shaping algorithm used in an actual commercial encoder. Have you tried digging into audio patents? It would perhaps bring an idea. Regards, -- Gabriel Bouvigne - France [EMAIL PROTECTED] mobile phone: [EMAIL PROTECTED] icq: 12138873 MP3' Tech: www.mp3-tech.org -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)
On Sun, Sep 24, 2000 at 10:57:39AM +0200, Gabriel Bouvigne wrote: I've got another question about window sizes: are the short ones really essential in VBR? Would it be possible to only use long ones, and then allocating a lot more bits in the case of transcients? After all, Xing uses only long ones, and does a not so bad job for transcients for an encoder using only long ones. (note: I'm not saying that Xing is a reference in term of quality) Tested with a synthesized signal: --noshort -b128:awful --noshort -b320:bad --noshort -b550 --freeformat: Decoder SIGSEG -b320: good, but distinguishable from the origin without any effort (20/20) -b550 --freeformat: okay Note: All-purpose lossless compressing utilities gave a better compression ratio: input uses input uses round to HQ quantization nearest integer quantization gzip190 kbps74 kbps bzip154 kbps68 kbps Very short attacks seems to be a nightmare for MP3. Signal is: * white noise * attack time: 0.5 ms * release time: 25 ms * pause time to fill the bit pool: 474.5 ms * both channels are uncorrelated * all attacks are different and also sounding a little bit different Note: The percussion attacks in "Money for nothing" are a little bit similar to these attacks: * white noise from 1...18 kHz (+/- 3 dB) * attack time: ca. 0.5 ms * release time: ca. 20...30 ms * but: no silence between the attacks How to capture Win95 Screen Shots? What utility would be the best? -- Frank Klemm -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)
On Sun, Sep 24, 2000 at 10:57:39AM +0200, Gabriel Bouvigne wrote: Minidisc also uses mixed windows. Perhaps mixed windows would help in our case. I've got another question about window sizes: are the short ones really essential in VBR? Would it be possible to only use long ones, and then allocating a lot more bits in the case of transcients? A long window have a duration of up to 36 ms (32 kHz). So the worst case pre-echo's are: * - 5 dB for dt = 12 ms (32 kHz) or dt = 9 ms (44.1 kHz) * -12 dB for dt = 18 ms (32 kHz) or dt = 13 ms (44.1 kHz) * -24 dB for dt = 24 ms (32 kHz) or dt = 17 ms (44.1 kHz) Because this is much more flat than the human pre-masking, you really need a huge amount of more bits. Often 320 kbps are sounding worse. For post-masking I found values around 1 dB/6 ms for 1...5 kHz. What's the value for pre-masking? 5. Spectral prefiltering to get nearly constant ATH in every CB. Why can we read in the literature that humans got 25 CB but mp3 uses only 22? I think it is because the low frequency CBs are larger than the in the literature. You have two problems: * MP3 uses only CB width which are a multiple of 4, perhaps to make use of the Intel SIMD instructions ;-) * So all CBs have sizes of multiples of 111 Hz/153 Hz/167 Hz, which can't be mapped to the CBs often found in literature. * The exact width of an CB is a little bit arbitrary, you can found values from 40 Hz...120 Hz for low frequencies. It depends on the exact definition of the item "CB". A lot of people say that 100 Hz for low frequencies is much too large. * Note, that Zwicker also splits the 3 low frequency CBs into several subbands to compensate the ATH frequency dependencies (see Table 1 in DIN 45631). See also ISO 532: "Methode de calcul du niveau d'isononie" Another question: I have some C++ programs generating test signals. The programs are around 1...2 KByte large and generating WAV-Files in the range from 1...10 MByte. Some of them are really nasty for MP3. Should we collect such programs? -- Frank Klemm -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)
Hi Gaby Why can we read in the litterature that humans got 25 CB but mp3 uses only 22? let us try to get it in order: bark scale is used by the spreading function Bark 0 : 0-100 Hz, Bark 24: 15.5 - 20.4 kHz masking is calculated for convolution bands Lame uses 64 equidistant convolution bands from 0 Hz up to Nyquist each of the 22 scalefactor bands is responsible for a group of subbands (the convolution bands), but we have only 21 scalefactors (12 scalefactors w/ short blocks) Ciao Robert -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re[2]: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)
Hello Frank, Sunday, September 24, 2000, 7:43:06 PM, you wrote: FK How to capture Win95 Screen Shots? What utility would be the best? press 'print screen' button (copy) and paste into paintbrush... 8) Best regards, Dmitrymail to: [EMAIL PROTECTED] http://www.chat.ru/~dkutsanov/~index.htm -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)
Why can we read in the litterature that humans got 25 CB but mp3 uses only 22? let us try to get it in order: bark scale is used by the spreading function Bark 0 : 0-100 Hz, Bark 24: 15.5 - 20.4 kHz masking is calculated for convolution bands Lame uses 64 equidistant convolution bands from 0 Hz up to Nyquist each of the 22 scalefactor bands is responsible for a group of subbands (the convolution bands), but we have only 21 scalefactors (12 scalefactors w/ short blocks) So the highest subbands don't have any scalefactor? I know that Brandebourg said that there is no proof that 16kHz really contribute to the hearing of the music, and then it could be intentionnal, but could it be a "bug" or mistake in the mp3 specs? After all, I think that in 48kHz encoding some freq higher than 16kHz got a scalefactor, so it could be theorically be possible to affect a scalefactor. Is there a scalefactor for 16kHz in AAC? (Meno, are you listening?) Also an off topic question for Robert: as you're german, is there a specific knowledge about audio compression floating around in your university? (like specialists, research or thesis) Regards, -- Gabriel Bouvigne - France [EMAIL PROTECTED] mobile phone: [EMAIL PROTECTED] icq: 12138873 MP3' Tech: www.mp3-tech.org -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)
"Gabriel Bouvigne" [EMAIL PROTECTED] wrote: Is there a scalefactor for 16kHz in AAC? (Meno, are you listening?) AAC has scalefactorbands that fill the whole frequency range, scalefactors are calculated for all scalefactorbands. Bye, Menno -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)
Gabriel Bouvigne schrieb am Son, 24 Sep 2000: So the highest subbands don't have any scalefactor? I know that Brandebourg said that there is no proof that 16kHz really contribute to the hearing of the music, and then it could be intentionnal, but could it be a "bug" or mistake in the mp3 specs? The only thing you can do for the highest bands is adjusting the global quantizer stepsize and then try to color the bands where you have scalefactors for. This would require to compute masking for sfb21 (sfb12 for short blocks). After all, I think that in 48kHz encoding some freq higher than 16kHz got a scalefactor, so it could be theorically be possible to affect a scalefactor. Is there a scalefactor for 16kHz in AAC? (Meno, are you listening?) Also an off topic question for Robert: as you're german, is there a specific knowledge about audio compression floating around in your university? (like specialists, research or thesis) Well, to my shame I must say, that I don't know if there are any audio experts at the University of Dortmund. All I Learned About Mpeg Encoding I gathered in the last year joining the LAME project in my spare time. A book I own for a few days that I can recommend is: "Psychoacoustics, Facts and Models", E. Zwicker, H. Fastl; Second Updated Edition, Springer Series in Information Sciences 22; 1999 ISBN 3-540-65063-6, ISSN 0720-678X Regards, -- Gabriel Bouvigne - France [EMAIL PROTECTED] mobile phone: [EMAIL PROTECTED] icq: 12138873 MP3' Tech: www.mp3-tech.org Ciao Robert -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME- please review)
:: :: So the highest subbands don't have any scalefactor? I know that :: Brandebourg said that there is no proof that 16kHz really contribute to :: the hearing of the music, and then it could be intentionnal, but could :: it be a "bug" or mistake in the mp3 specs? :: 40 Hz...16 kHz (+0.2dB,-0.3dB)°) seems to be not enough to pass AB tests. 25 Hz...18 kHz seems to be sufficient, and 20...20 kHz are recommended. These are values for monoton decreasing frequency response. Using a slight emphasis from fu-1.5 kHz to fu reduces significantly the bandwidth needs. The easiest way is to do this with a static frequency response like: 12.5 kHz 0.0 dB 13 kHz -0.2 dB 13.5 kHz 0.0 dB 14 kHz +0.2 dB 14.5 kHz+0.4 dB 15 kHz +0.6 dB 15.5 kHz+0.8 dB 16 kHz +1.0 dB 16.5 kHz-oo dB The frequency response should be so calculated, that the white noise' cochlea excitement is not changed. This should be possible for fu = 14 kHz. Better methods are calculating this preemphasis dynamically from the actual signal. I've tested the first method with fs=29.4 kHz and got a nearly indistinguishable signal compared to the classical low pass filtering with fu = 0.45*fs = 13.2 kHz resulting in "poor" quality. To my mind 16 kHz are enough for music. Using some emphasis tricks makes this statement more secure. Have someone a piece of music with a triangle? For my experiments I still need some very tonal high frequency samples. :: After all, I think that in 48kHz encoding some freq higher than 16kHz got a :: scalefactor, so it could be theorically be possible to affect a scalefactor. :: No. The scaleband assignments are different for 32/44.1/48 kHz, so you got 16 kHz for all fs. Long Blocks Short Blocks 32 kHz: ...15.25 kHz...14.92 kHz 44.1 kHz: ...15.96 kHz...15.50 kHz 48 kHz: ...15.96 kHz...15.62 kHz -- Mit freundlichen Grüßen Frank Klemm PS: °) minimum requirements of studio equipment frequency response. eMail | [EMAIL PROTECTED] home: [EMAIL PROTECTED] phone | +49 (3641) 64-2721home: +49 (3641) 390545 sMail | R.-Breitscheid-Str. 43, 07747 Jena, Germany -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] various questions (was: Some suggestions for LAME - please review)
Hi Gaby Why can we read in the litterature that humans got 25 CB but mp3 uses only 22? let us try to get it in order: bark scale is used by the spreading function Bark 0 : 0-100 Hz, Bark 24: 15.5 - 20.4 kHz masking is calculated for convolution bands Lame uses 64 equidistant convolution bands from 0 Hz up to Nyquist each of the 22 scalefactor bands is responsible for a group of subbands (the convolution bands), but we have only 21 scalefactors (12 scalefactors w/ short blocks) Ciao Robert Some more info: Barks, as used in mp3, is just a different way to measure frequency. The conversion is givin in freq2bark() in util.c There is nothing magic about 22 or 25. The important thing is that the bands have about the same width when measured in barks. In MP3, each band is about .9 barks, and AAC, each band is about .5 barks. (AAC has 49 bands, IIRC). MP3 and AAC psycho acoustics actually computes everything in bands of about .33 barks wide (the 64 convolution bands Robert mentioned), and then this information is mapped to the 22 scalefactor bands. Mark -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
[MP3 ENCODER] Various
I have a few questions ideas - potentially stupid, but they've been bugging me. I'd try all the ideas myself except I can't get Lame to compile I don't have a clue how to implement them anyway. 1- Is it possible to change the sample rate by encoding frames using other than 1152 samples? As an example, if we used a 44.1kHz WAV, making Lame encode 2304 samples for each frame, purging all frequencies over 11025Hz or the specified LPF, could we write it as a valid 22050Hz MP3 without actually doing any resampling? (or something along those lines...) There'd probably have to be a slight time/pitch shift for non-integer resampling ratios, which would be worse for upsampling, but I think it might *sound* better than resampling per sé. Normal resampling for upsampling, this routine for downsampling? 2- Are some people saying Layer2 is actually better than Layer3 at the same bitrates for some types of music? I wonder if quality could be improved by switching layers midstream... Do MPEG standards support that? 3- Bit reservoir Joint Stereo. Maybe this is already done, but just in case it isn't... If switching between M/S L/R modes lowers the quality, then why not make the switch only when the new mode (not using the bit reservoir) will be of better quality than the previous mode (using the reservoir)? 4- In M/S encoding, approximately how much bandwidth is offered to the mid channel the side channel when M S are of similar amplitude? 5- I think Lame would benefit if it could be forced to use short blocks more readily when there's sharp attacks mixed with analog silence, especially for lower sample rates. I have a sample where a lot of pre-echo is introduced. I'm using 320kbit/sec for 44.1kHz, 160kBit/sec for 22.05kHz, the pre-echo is noticeable on both, especially the 22.05kHz version... I think it's because the encoder isn't switching to short blocks. And I'm sure it's *not* because of resampling - I LPFed both the samples at 10kHz they both sounded (almost) the same as WAVs. 6- What's the difference between normal stereo dual channel apart from normal stereo allowing a more "free" allocation of bandwidth between the channels? In which circumstances would it be preferred over normal stereo? Shawn -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
RE: [MP3 ENCODER] Various
From: Shawn Riley [mailto:[EMAIL PROTECTED]] 6- What's the difference between normal stereo dual channel In terms of bitstream format, nothing, apart from the frame header. Dual channel is simply a hint to the decoder that the two channels are intended to be played separately, rather than together as a stereo track. normal stereo allowing a more "free" allocation of bandwidth between the channels? AFAIK it doesn't. I'm not sure where that idea originated. -- Mat. -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Various
Mathew Hendry a écrit : From: Shawn Riley [mailto:[EMAIL PROTECTED]] 6- What's the difference between normal stereo dual channel In terms of bitstream format, nothing, apart from the frame header. Dual channel is simply a hint to the decoder that the two channels are intended to be played separately, rather than together as a stereo track. normal stereo allowing a more "free" allocation of bandwidth between the channels? AFAIK it doesn't. I'm not sure where that idea originated. -- Mat. -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ ) Dual channel was made for using in dual channel transmissions or other things like this. In dual channel, each channel has to got exactly half of the bits. In stereo, you're not constrained to 50% for each channel. Regards, -- Gabriel Bouvigne - France www.mp3-tech.org -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Various
Shawn Riley a écrit : 2- Are some people saying Layer2 is actually better than Layer3 at the same bitrates for some types of music? I wonder if quality could be improved by switching layers midstream... Do MPEG standards support that? I think that it's forbidden by iso Regards, -- Gabriel Bouvigne - France www.mp3-tech.org -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
RE: [MP3 ENCODER] Various
From: Gabriel Bouvigne [mailto:[EMAIL PROTECTED]] In dual channel, each channel has to got exactly half of the bits. Do you have a reference for that in the ISO/IEC docs? Throughout 11172-3 stereo and dual_channel seem to be treated as entirely equivalent. -- Mat. -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Various
Mathew Hendry wrote: normal stereo allowing a more "free" allocation of bandwidth between the channels? AFAIK it doesn't. I'm not sure where that idea originated. I have been under the impression for several years that Stereo (mode 0) shares bits between the channels. If one channel was more complex than the other then it would allocated more to the channel that required it. I presume this is what LAME is doing, is it not? Ross. -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
RE: [MP3 ENCODER] Various
From: Ross Levis [mailto:[EMAIL PROTECTED]] I have been under the impression for several years that Stereo (mode 0) shares bits between the channels. If one channel was more complex than the other then it would allocated more to the channel that required it. I presume this is what LAME is doing, is it not? Yes it is. The question is whether dual_channel is more restricted than that. -- Mat. -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Various
Yes it is. The question is whether dual_channel is more restricted than that. Dual-channel is just what the name suggests. Each channel is completely independant. I don't see any advantage of using dual-channel. Ross. -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )