[MP3 ENCODER] Bug with -q1 (win32)
I'm using new Win32 compiles from Dmitry. There appears to be a bug in LAME 3.84 CVS where it won't start encoding some WAV files (LAME -q1 file.wav). It just sits there doing nothing probably in an infinite loop. -q2 works fine. I managed to encode an entire album but for another album there are 4 tracks which will not encode. All files ripped from CD. I've stripped a half second from the beginning of one track and put it here for debugging purposes http://www.enternet.co.nz/users/ross/test.zip (57kb). Ross. -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
[MP3 ENCODER] Looking for educated guess or explanation
Hello, After a few dozen of those frequency analysis graphs, I noticed something that made me curious: The 16+kHz region of a VBR encoded file vs the 256S cbr. http://users.belgacom.net/gc247244/extra/why_oh_why.png [8KB] is a _very_ striking illustration, and I started thinking about this. As you can see, the shape of the curve is still there, but there is a constant dB drop in the 16-22kHz region. Why is this? the best I could come up with: - VBR: psy-model does noise calculations, decides at many instances the 16-22kHz (background)noise is below hearing treshold, decides upon lower bitrate, and the MDCT coefficients from high freqs, which are the least important, are scaled: 0 bits available. = Average fft sum of decoded mp3 gives drop in dB's. At points where needed (treshold), the high band is represented correctly. - bug (aka "feature") I'm trying to get some understanding in this matter, so please, I'd like some educated material or guesses... Q: Why so outspoken in the 16kHz band (32th band? [I keep hearing "22", but prolly just 16-22 meant?] Why no gradual decrease in db's progressive with freq, throughout different bands? thanks :) -- Best regards, Roelmailto:[EMAIL PROTECTED] -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
[MP3 ENCODER] Analysers (Seen on freshmeat.net)
Hello, *Perhaps* you'll find it useful... http://www.cmis.csiro.au/dmis/maaate/ (and for people who are very new to MP3 : http://www.cmis.csiro.au/dmis/maaate/layer3.txt Now, I can understand more posts than before ! ;-) ) Also, there's a program named mp3_check that you can download after a search via freshmeat. Sorry if you find this post stupid or unuseful... Pierre Darbon -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Silence Detection Problems
Okay, I isolated a 3 second clip from that track: http://home.hiwaay.net/~stephena/near-silence.wav http://home.hiwaay.net/~stephena/near-silence.mp3 I was wrong, earlier lame's encode the same way as the newer alphas. I am still surprised however at the bitrate selected for this piece. VBR 2 chooses 112 for 83% of the frames in this 3 second clip. With my volume maxed out I can barely hear the "sound". I guess I'm just a little surprised that lame doesn't choose a more aggressive bitrate considering the output level is so low. I haven't started to dig into the source yet, but I wonder if a more aggresive VBR selection could be used depending on the output level of a frame depending on the peak output (or average) of the entire piece. The 3 second track is all quite so maybe 112 could be justified (in case volume is maxed). But, in terms of the original track, the three seconds cropped could be considered silence in comparison to the peak or average output level of the whole track. Is this a rediculous idea? Steve On 22 May 2000, at 19:18, Stephen Anderson wrote: The files that I was talking about are located at: http://home.hiwaay.net/~stephena/test1.wav http://home.hiwaay.net/~stephena/test1.mp3 Steve On 22 May 2000, at 18:09, Stephen Anderson wrote: I believe I have detected a problem with the silence detection routines after the modification for them to detect 16 kHz. I have encoded Pink Floyd - A Momentary Lapse of Reason - A New Machine (part 2) Approximately half of the 38 second track is what I would consider (and in a wave analyzer looks like) analog silence. However, using dk's lame384a1_b and the settings: -m j -h -v -V 2 -b 96 I get the following statistics from musicutter: Bitrate - Frames - Percentage 0 - 0 - 0.0% 32 - 1 - 0.1% 40 - 0 - 0.0% 48 - 0 - 0.0% 56 - 0 - 0.0% 64 - 1 - 0.1% 80 - 0 - 0.0% |96 - 26 - 1.7% 112 - 362 - 24.3% 128 - 593 - 39.9% |||160 - 457 - 30.7% |192 - 39 - 2.6% 224 - 5 - 0.3% 256 - 2 - 0.1% 320 - 2 - 0.1% Average bitrate: 135.7 Length: 00:38.87 Total frames: 1488 Previously on tracks with lots of silence I got lots of bitrate 32 frames. In this piece I have 1. I think this is a mistake. I will be making the wav and mp3 accessible on a webserver for those interested to take a look at. With my 56k modem it will take a while though. I will reply to this message once the upload to my webserver is complete with the address that you can get the files from. Thanks. Stephen Anderson [EMAIL PROTECTED] -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ ) -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ ) -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Looking for educated guess or explanation
Hello, After a few dozen of those frequency analysis graphs, I noticed something that made me curious: The 16+kHz region of a VBR encoded file vs the 256S cbr. http://users.belgacom.net/gc247244/extra/why_oh_why.png [8KB] is a _very_ striking illustration, and I started thinking about this. As you can see, the shape of the curve is still there, but there is a constant dB drop in the 16-22kHz region. Why is this? the best I could come up with: - VBR: psy-model does noise calculations, decides at many instances the 16-22kHz (background)noise is below hearing treshold, decides upon lower bitrate, and the MDCT coefficients from high freqs, which are the least important, are scaled: 0 bits available. = Average fft sum of decoded mp3 gives drop in dB's. At points where needed (treshold), the high band is represented correctly. - bug (aka "feature") I'm trying to get some understanding in this matter, so please, I'd like some educated material or guesses... Q: Why so outspoken in the 16kHz band (32th band? [I keep hearing "22", but prolly just 16-22 meant?] Why no gradual decrease in db's progressive with freq, throughout different bands? Actually, there really are 22 "critical bands" or "scale factor bands" used by MP3. I guess we should stick to the C convention, and call the last band the 21'st band. Here are the frequency ranges, along with the ATH: sfb= 0 freq(khz): 0.00 .. 0.15 ATH=-93.46 sfb= 1 freq(khz): 0.15 .. 0.31 ATH=-103.59 sfb= 2 freq(khz): 0.31 .. 0.46 ATH=-106.77 sfb= 3 freq(khz): 0.46 .. 0.61 ATH=-108.40 sfb= 4 freq(khz): 0.61 .. 0.77 ATH=-109.43 sfb= 5 freq(khz): 0.77 .. 0.92 ATH=-110.16 sfb= 6 freq(khz): 0.92 .. 1.15 ATH=-111.02 sfb= 7 freq(khz): 1.15 .. 1.38 ATH=-111.76 sfb= 8 freq(khz): 1.38 .. 1.68 ATH=-112.81 sfb= 9 freq(khz): 1.68 .. 1.99 ATH=-114.04 sfb=10 freq(khz): 1.99 .. 2.37 ATH=-115.84 sfb=11 freq(khz): 2.37 .. 2.83 ATH=-117.92 sfb=12 freq(khz): 2.83 .. 3.45 ATH=-118.98 sfb=13 freq(khz): 3.45 .. 4.21 ATH=-118.92 sfb=14 freq(khz): 4.21 .. 5.13 ATH=-116.48 sfb=15 freq(khz): 5.13 .. 6.20 ATH=-113.20 sfb=16 freq(khz): 6.20 .. 7.50 ATH=-111.72 sfb=17 freq(khz): 7.50 .. 9.11 ATH=-110.10 sfb=18 freq(khz): 9.11 ..11.03 ATH=-106.49 sfb=19 freq(khz): 11.03 ..13.09 ATH=-98.69 sfb=20 freq(khz): 13.09 ..16.00 ATH=-84.16 sfb=21 freq(khz): 16.00 ..22.05 ATH=-48.04 MP3 has a "global_gain" variable, which controls the total number of bits allocated for the frame. You allocate bits among these 22 bands by adjusting the scalefactors, but there are only scalefactors for bands 0..20. The ATH is given in db, normalized so that the loudest possible 3khz sine wave is about 1db. (normalization is a little different than the db used in your plot) The value of the ATH in band 21 means that humans cannot hear a 16khz signal which is weaker than -48db. Perceptual audio coding takes this one step further and makes the claim that the encoding will be perceptually identical to the original if the strength of the noise/error component of the signal is less than -48db. The total allowed noise is actually given as the maximum of the maskings computed by psymodel.c and the ATH. But for band 21, psymodel.c does not compute any maskings. In your case, take the signal at 16khz: original = -40db energy=10.0e-5 = .01 * 16khz_sine_wave quantized= -48db energy= 1.6e-5 = .004 * 16khz_sine_wave Now define: error = quantized - original. LAME spends most of its time trying to get the volume of the error signal less than some tolerence. In this case: error = .006 * 16khz_sine_wave energy = 3.6e-5 = -44db The signal that is sent to your speaker when playing the MP3 file is the quantized signal, is now equal to: quantized = original - error So LAME VBR spends its time trying to make sure that that the volume of the error signal is less than the ATH. If the volume of the error was -50db, then the error (as a seperate signal) would be inaudible. So the claim is that original + inaudible_error sounds the same as the original. In this example, the error has a volume of -44db, which is larger than the ATH and thus LAME VBR should have allocated more bits. But these numbers are all approximate and we would have to look at them at the frame level. Mark -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Silence Detection Problems
I was wrong, earlier lame's encode the same way as the newer alphas. I am still surprised however at the bitrate selected for this piece. VBR 2 chooses 112 for 83% of the frames in this 3 second clip. With my volume maxed out I can barely hear the "sound". I guess I'm just a little surprised that lame doesn't choose a more aggressive bitrate considering the output level is so low. Right now LAME uses the ATH has a threshold: below that, analog silence is detected, but above that, the frame is encoded at full quality. Since LAME doesn't know how loud you will play your mp3's, the volume of the input is just a scaling that is removed during the encoding through the "global_gain" variable. I haven't started to dig into the source yet, but I wonder if a more aggresive VBR selection could be used depending on the output level of a frame depending on the peak output (or average) of the entire piece. The 3 second track is all quite so maybe 112 could be justified (in case volume is maxed). But, in terms of the original track, the three seconds cropped could be considered silence in comparison to the peak or average output level of the whole track. Is this a rediculous idea? Its not a ridiculous idea, but I would be afraid to get involved since it sounds like yet even more settings and thresholds that have to be tuned! Mark -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Bug with -q1 (win32)
I'm using new Win32 compiles from Dmitry. There appears to be a bug in LAME 3.84 CVS where it won't start encoding some WAV files (LAME -q1 file.wav). It just sits there doing nothing probably in an infinite loop. -q2 works fine. I managed to encode an entire album but for another album there are 4 tracks which will not encode. All files ripped from CD. The -q option is for internal testing only :-) -q1 enables some more the thorough scalefactor searching code that hasn't been worked on in a while. I was hoping Iwasa would put in some of his code, but it isn't done yet. Try -Z for something similar. Mark -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re[2]: [MP3 ENCODER] Looking for educated guess or explanation
Hello Mark, Tuesday, May 23, 2000, 8:14:18 PM, you wrote: MT Actually, there really are 22 "critical bands" or "scale factor bands" MT used by MP3. I guess we should stick to the C convention, and call the MT last band the 21'st band. Here are the frequency ranges, MT along with the ATH: MT sfb= 0 freq(khz): 0.00 .. 0.15 ATH=-93.46 MT sfb= 1 freq(khz): 0.15 .. 0.31 ATH=-103.59 MT sfb= 2 freq(khz): 0.31 .. 0.46 ATH=-106.77 MT sfb= 3 freq(khz): 0.46 .. 0.61 ATH=-108.40 MT sfb= 4 freq(khz): 0.61 .. 0.77 ATH=-109.43 MT sfb= 5 freq(khz): 0.77 .. 0.92 ATH=-110.16 MT sfb= 6 freq(khz): 0.92 .. 1.15 ATH=-111.02 MT sfb= 7 freq(khz): 1.15 .. 1.38 ATH=-111.76 MT sfb= 8 freq(khz): 1.38 .. 1.68 ATH=-112.81 MT sfb= 9 freq(khz): 1.68 .. 1.99 ATH=-114.04 MT sfb=10 freq(khz): 1.99 .. 2.37 ATH=-115.84 MT sfb=11 freq(khz): 2.37 .. 2.83 ATH=-117.92 MT sfb=12 freq(khz): 2.83 .. 3.45 ATH=-118.98 MT sfb=13 freq(khz): 3.45 .. 4.21 ATH=-118.92 MT sfb=14 freq(khz): 4.21 .. 5.13 ATH=-116.48 MT sfb=15 freq(khz): 5.13 .. 6.20 ATH=-113.20 MT sfb=16 freq(khz): 6.20 .. 7.50 ATH=-111.72 MT sfb=17 freq(khz): 7.50 .. 9.11 ATH=-110.10 MT sfb=18 freq(khz): 9.11 ..11.03 ATH=-106.49 MT sfb=19 freq(khz): 11.03 ..13.09 ATH=-98.69 MT sfb=20 freq(khz): 13.09 ..16.00 ATH=-84.16 MT sfb=21 freq(khz): 16.00 ..22.05 ATH=-48.04 aha... I thought the downside of mp3 was that the heritage of L1 and L2 was that there was a linear distribution in freq ranges, but this doen't look linear to me ... So instead of 32 linear just 22 log. thanks, I'll do some reading-up... MT The ATH is given in db, normalized so that the loudest possible 3khz MT sine wave is about 1db. (normalization is a little different than the MT db used in your plot) The value of the ATH in band 21 means that MT humans cannot hear a 16khz signal which is weaker than -48db. sounds plausible, because I did some highpass filter hearing tests on my music ("can I hear artifacts, or is it just imagination"), and roughly perceived I could hear -40dB... MT ... MT inaudible. So the claim is that original + inaudible_error MT sounds the same as the original. MT ... I'll have to think about what you said concerning the error etc... thanks for the lengthy explanation. My idea was just: does it have to be bad that the _average_ volume of the error is -44dB? If I display the freq analysis real-time the graph ends up much higher most of the time, and also lower at other times. I thought that at the times where (in time) the original ended up below that -48dB treshold, the encoder made a quantized version that is even lower, but inaudible (so bigger error, but no audible parts). Then later when you sum the whole song, the average error gets bigger than the -44dB, but not on the important parts? I'll try to gain some more insight... thanks -- Best regards, Roelmailto:[EMAIL PROTECTED] -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
[MP3 ENCODER] Lame: Use for 320 JS frames in vbr mode?
Hello, Is there use for 320kbit/s frames in JS mode? Lame -V1 -mj -h -b128 gives them regulary, and I remember someone told me that the max for each channel is 160kbit/s anyways, so there would be no quality improvement? or - are the saved bits used for bit reservoir? - to avoid switching too much between S and JS? - other reason? - possible oversight, because no reason to use JS? Sorry for the question, but Rubén Castañón got me doubting ... -- Best regards, Roelmailto:[EMAIL PROTECTED] -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Looking for educated guess or explanation
On Tue, May 23, 2000 at 10:03:01PM +0200, Roel VdB ([EMAIL PROTECTED]) wrote Hello Mark, Tuesday, May 23, 2000, 8:14:18 PM, you wrote: MT Actually, there really are 22 "critical bands" or "scale factor bands" MT used by MP3. I guess we should stick to the C convention, and call the MT last band the 21'st band. Here are the frequency ranges, MT along with the ATH: aha... I thought the downside of mp3 was that the heritage of L1 and L2 was that there was a linear distribution in freq ranges, but this doen't look linear to me ... So instead of 32 linear just 22 log. thanks, I'll do some reading-up... My understanding is: For encoding purposes, the audio is indeed divided up into 32 bands by the polyphase filter -- just like MP1 and MP2. However, MP3 makes this a bit better by super-imposing the scalefactor bands on top of these. This is an advantage because the SFB more closly correspond to the critical bands. These 22 SFBs may apply to less than 1 (for the lower ranges), or more than 1 (for the upper ranges) of the 32 linear bands created by the polyphase filter. John -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Lame: Use for 320 JS frames in vbr mode?
Hello, Is there use for 320kbit/s frames in JS mode? Lame -V1 -mj -h -b128 gives them regulary, and I remember someone told me that the max for each channel is 160kbit/s anyways, so there would be no quality improvement? No such maximum. You can, for example, encode a mono file at 320kbs. Mark -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
[MP3 ENCODER] Fixed silly mp3input bug with libsndfile...
Right, just committed a fix for a silly little bug which prevented mp3input to work together with libsndfile... I also noted that the --decode feature doesn't care about endianess, thus ending up with a big endian wav (read; garbage. ;) ) on big endian machines. Couldn't think up a quick fix for this due to the way this feature works, though it could be nice if it used libsndfile (and thus being able to write almost any type of file) when available... ;) - CISC -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )