[MP3 ENCODER] Bug with -q1 (win32)

2000-05-23 Thread Ross Levis

I'm using new Win32 compiles from Dmitry.  There appears to be a bug in LAME
3.84 CVS where it won't start encoding some WAV files (LAME -q1 file.wav).  It
just sits there doing nothing probably in an infinite loop.  -q2 works fine.  I
managed to encode an entire album but for another album there are 4 tracks which
will not encode.  All files ripped from CD.

I've stripped a half second from the beginning of one track and put it here for
debugging purposes http://www.enternet.co.nz/users/ross/test.zip (57kb).

Ross.

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



[MP3 ENCODER] Looking for educated guess or explanation

2000-05-23 Thread Roel VdB

Hello,

After a few dozen of those frequency analysis graphs, I noticed
something that made me curious: The 16+kHz region of a VBR encoded
file vs the 256S cbr.

http://users.belgacom.net/gc247244/extra/why_oh_why.png [8KB]

is a _very_ striking illustration, and I started thinking about this.

As you can see, the shape of the curve is still there, but there is a
constant dB drop in the 16-22kHz region.

Why is this?

the best I could come up with:
- VBR: psy-model does noise calculations, decides at many instances the 16-22kHz
(background)noise is below hearing treshold, decides upon lower bitrate, and the
MDCT coefficients from high freqs, which are the least important, are
scaled: 0 bits available. =  Average fft sum of decoded mp3 gives
drop in dB's.  At points where needed (treshold), the high band is represented
correctly.
- bug (aka "feature")

I'm trying to get some understanding in this matter, so please, I'd
like some educated material or guesses...

Q: Why so outspoken in the 16kHz band (32th band? [I keep hearing
"22", but prolly just 16-22 meant?]
Why no gradual decrease in db's progressive with freq, throughout
different bands?

thanks :)

-- 
Best regards,
 Roelmailto:[EMAIL PROTECTED]


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



[MP3 ENCODER] Analysers (Seen on freshmeat.net)

2000-05-23 Thread Pierre Darbon et Hurgon J.Sebastien

Hello,

*Perhaps* you'll find it useful...

http://www.cmis.csiro.au/dmis/maaate/

(and for people who are very new to MP3 :
http://www.cmis.csiro.au/dmis/maaate/layer3.txt
Now, I can understand more posts than before ! ;-) )

Also, there's a program named mp3_check that you can download after a
search via freshmeat.

Sorry if you find this post stupid or unuseful...


Pierre Darbon



--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Silence Detection Problems

2000-05-23 Thread Stephen Anderson

Okay, I isolated a 3 second clip from that track:

http://home.hiwaay.net/~stephena/near-silence.wav
http://home.hiwaay.net/~stephena/near-silence.mp3

I was wrong, earlier lame's encode the same way as the newer 
alphas.  I am still surprised however at the bitrate selected for this 
piece.  VBR 2 chooses 112 for 83% of the frames in this 3 second 
clip.  With my volume maxed out I can barely hear the "sound".  I 
guess I'm just a little surprised that lame doesn't choose a more 
aggressive bitrate considering the output level is so low.

I haven't started to dig into the source yet, but I wonder if a more 
aggresive VBR selection could be used depending on the output level 
of a frame depending on the peak output (or average) of the entire 
piece.  The 3 second track is all quite so maybe 112 could be 
justified (in case volume is maxed).  But, in terms of the original 
track, the three seconds cropped could be considered silence in 
comparison to the peak or average output level of the whole track.

Is this a rediculous idea?

Steve

On 22 May 2000, at 19:18, Stephen Anderson wrote:

 The files that I was talking about are located at:
 
 http://home.hiwaay.net/~stephena/test1.wav
 http://home.hiwaay.net/~stephena/test1.mp3
 
 Steve
 
 
 On 22 May 2000, at 18:09, Stephen Anderson wrote:
 
  I believe I have detected a problem with the silence detection routines 
  after the modification for them to detect  16 kHz.
  
  I have encoded Pink Floyd - A Momentary Lapse of Reason - A New 
  Machine (part 2)
  
  Approximately half of the 38 second track is what I would consider 
  (and in a wave analyzer looks like) analog silence.  However, using 
  dk's lame384a1_b and the settings: -m j -h -v -V 2 -b 96
  I get the following statistics from musicutter:
  
  Bitrate - Frames - Percentage
  0 - 0 - 0.0%
  32 - 1 - 0.1%
  40 - 0 - 0.0%
  48 - 0 - 0.0%
  56 - 0 - 0.0%
  64 - 1 - 0.1%
  80 - 0 - 0.0%
  |96 - 26 - 1.7%
  112 - 362 - 24.3%
  128 - 593 - 39.9%
  |||160 - 457 - 30.7%
  |192 - 39 - 2.6%
  224 - 5 - 0.3%
  256 - 2 - 0.1%
  320 - 2 - 0.1%
  Average bitrate: 135.7
  Length: 00:38.87
  Total frames: 1488
  
  
  Previously on tracks with lots of silence I got lots of bitrate 32 frames. 
   In this piece I have 1.  I think this is a mistake.
  
  I will be making the wav and mp3 accessible on a webserver for 
  those interested to take a look at.  With my 56k modem it will take a 
  while though.  I will reply to this message once the upload to my 
  webserver is complete with the address that you can get the files 
  from.  Thanks.
  
  Stephen Anderson
  [EMAIL PROTECTED]
  --
  MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
  
 
 
 --
 MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
 


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Looking for educated guess or explanation

2000-05-23 Thread Mark Taylor

 
 Hello,
 
 After a few dozen of those frequency analysis graphs, I noticed
 something that made me curious: The 16+kHz region of a VBR encoded
 file vs the 256S cbr.
 
 http://users.belgacom.net/gc247244/extra/why_oh_why.png [8KB]
 
 is a _very_ striking illustration, and I started thinking about this.
 
 As you can see, the shape of the curve is still there, but there is a
 constant dB drop in the 16-22kHz region.
 
 Why is this?
 
 the best I could come up with:
 - VBR: psy-model does noise calculations, decides at many instances the 16-22kHz
 (background)noise is below hearing treshold, decides upon lower bitrate, and the
 MDCT coefficients from high freqs, which are the least important, are
 scaled: 0 bits available. =  Average fft sum of decoded mp3 gives
 drop in dB's.  At points where needed (treshold), the high band is represented
 correctly.
 - bug (aka "feature")
 
 I'm trying to get some understanding in this matter, so please, I'd
 like some educated material or guesses...
 
 Q: Why so outspoken in the 16kHz band (32th band? [I keep hearing
 "22", but prolly just 16-22 meant?]
 Why no gradual decrease in db's progressive with freq, throughout
 different bands?
 

Actually, there really are 22 "critical bands" or "scale factor bands"
used by MP3. I guess we should stick to the C convention, and call the
last band the 21'st band.  Here are the frequency ranges,
along with the ATH:


sfb= 0 freq(khz):  0.00 .. 0.15  ATH=-93.46 
sfb= 1 freq(khz):  0.15 .. 0.31  ATH=-103.59
sfb= 2 freq(khz):  0.31 .. 0.46  ATH=-106.77
sfb= 3 freq(khz):  0.46 .. 0.61  ATH=-108.40
sfb= 4 freq(khz):  0.61 .. 0.77  ATH=-109.43
sfb= 5 freq(khz):  0.77 .. 0.92  ATH=-110.16
sfb= 6 freq(khz):  0.92 .. 1.15  ATH=-111.02
sfb= 7 freq(khz):  1.15 .. 1.38  ATH=-111.76
sfb= 8 freq(khz):  1.38 .. 1.68  ATH=-112.81
sfb= 9 freq(khz):  1.68 .. 1.99  ATH=-114.04
sfb=10 freq(khz):  1.99 .. 2.37  ATH=-115.84
sfb=11 freq(khz):  2.37 .. 2.83  ATH=-117.92
sfb=12 freq(khz):  2.83 .. 3.45  ATH=-118.98
sfb=13 freq(khz):  3.45 .. 4.21  ATH=-118.92
sfb=14 freq(khz):  4.21 .. 5.13  ATH=-116.48
sfb=15 freq(khz):  5.13 .. 6.20  ATH=-113.20
sfb=16 freq(khz):  6.20 .. 7.50  ATH=-111.72
sfb=17 freq(khz):  7.50 .. 9.11  ATH=-110.10
sfb=18 freq(khz):  9.11 ..11.03  ATH=-106.49
sfb=19 freq(khz): 11.03 ..13.09  ATH=-98.69 
sfb=20 freq(khz): 13.09 ..16.00  ATH=-84.16 
sfb=21 freq(khz): 16.00 ..22.05  ATH=-48.04 

MP3 has a "global_gain" variable, which controls the total
number of bits allocated for the frame.  You allocate bits
among these 22 bands by adjusting the scalefactors, but
there are only scalefactors for bands 0..20.

The ATH is given in db, normalized so that the loudest possible 3khz
sine wave is about 1db. (normalization is a little different than the
db used in your plot) The value of the ATH in band 21 means that
humans cannot hear a 16khz signal which is weaker than -48db.

Perceptual audio coding takes this one step further and makes the
claim that the encoding will be perceptually identical to the original
if the strength of the noise/error component of the signal is less than -48db.

The total allowed noise is actually given as the maximum of the 
maskings computed by psymodel.c and the ATH.  But for band 21,
psymodel.c does not compute any maskings.


In your case, take the signal at 16khz:

original = -40db   energy=10.0e-5   =  .01 * 16khz_sine_wave
quantized= -48db   energy= 1.6e-5   =  .004 * 16khz_sine_wave

Now define:  error = quantized - original.

LAME spends most of its time trying to get the volume of
the error signal less than some tolerence.

In this case:
error = .006 * 16khz_sine_wave   energy = 3.6e-5 = -44db

The signal that is sent to your speaker when playing
the MP3 file is the quantized signal, is now equal to:

quantized = original  - error

So LAME VBR spends its time trying to make sure that that the volume of
the error signal is less than the ATH.  If the volume of the error
was -50db, then the error (as a seperate signal) would be
inaudible.  So the claim is that original + inaudible_error
sounds the same as the original.  

In this example, the error has a volume of
-44db, which is larger than the ATH and thus LAME VBR should
have allocated more bits.  But these numbers are all approximate
and we would have to look at them at the frame level.

Mark



--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Silence Detection Problems

2000-05-23 Thread Mark Taylor


 
 I was wrong, earlier lame's encode the same way as the newer 
 alphas.  I am still surprised however at the bitrate selected for this 
 piece.  VBR 2 chooses 112 for 83% of the frames in this 3 second 
 clip.  With my volume maxed out I can barely hear the "sound".  I 
 guess I'm just a little surprised that lame doesn't choose a more 
 aggressive bitrate considering the output level is so low.
 
Right now LAME uses the ATH has a threshold: below that, analog
silence is detected, but above that, the frame is encoded at full
quality.  Since LAME doesn't know how loud you will play your mp3's,
the volume of the input is just a scaling that is removed during the
encoding through the "global_gain" variable.


 I haven't started to dig into the source yet, but I wonder if a more 
 aggresive VBR selection could be used depending on the output level 
 of a frame depending on the peak output (or average) of the entire 
 piece.  The 3 second track is all quite so maybe 112 could be 
 justified (in case volume is maxed).  But, in terms of the original 
 track, the three seconds cropped could be considered silence in 
 comparison to the peak or average output level of the whole track.
 
 Is this a rediculous idea?
 

Its not a ridiculous idea, but I would be afraid to get involved
since it sounds like yet even more settings and thresholds 
that have to be tuned!

Mark









--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Bug with -q1 (win32)

2000-05-23 Thread Mark Taylor

 
 I'm using new Win32 compiles from Dmitry.  There appears to be a bug in LAME
 3.84 CVS where it won't start encoding some WAV files (LAME -q1 file.wav).  It
 just sits there doing nothing probably in an infinite loop.  -q2 works fine.  I
 managed to encode an entire album but for another album there are 4 tracks which
 will not encode.  All files ripped from CD.
 

The -q option is for internal testing only :-)

-q1 enables some more the thorough scalefactor searching 
code that hasn't been worked on in a while. I was hoping
Iwasa would put in some of his code, but it isn't done yet.

Try -Z for something similar.

Mark
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re[2]: [MP3 ENCODER] Looking for educated guess or explanation

2000-05-23 Thread Roel VdB

Hello Mark,

Tuesday, May 23, 2000, 8:14:18 PM, you wrote:

MT Actually, there really are 22 "critical bands" or "scale factor bands"
MT used by MP3. I guess we should stick to the C convention, and call the
MT last band the 21'st band.  Here are the frequency ranges,
MT along with the ATH:


MT sfb= 0 freq(khz):  0.00 .. 0.15  ATH=-93.46 
MT sfb= 1 freq(khz):  0.15 .. 0.31  ATH=-103.59
MT sfb= 2 freq(khz):  0.31 .. 0.46  ATH=-106.77
MT sfb= 3 freq(khz):  0.46 .. 0.61  ATH=-108.40
MT sfb= 4 freq(khz):  0.61 .. 0.77  ATH=-109.43
MT sfb= 5 freq(khz):  0.77 .. 0.92  ATH=-110.16
MT sfb= 6 freq(khz):  0.92 .. 1.15  ATH=-111.02
MT sfb= 7 freq(khz):  1.15 .. 1.38  ATH=-111.76
MT sfb= 8 freq(khz):  1.38 .. 1.68  ATH=-112.81
MT sfb= 9 freq(khz):  1.68 .. 1.99  ATH=-114.04
MT sfb=10 freq(khz):  1.99 .. 2.37  ATH=-115.84
MT sfb=11 freq(khz):  2.37 .. 2.83  ATH=-117.92
MT sfb=12 freq(khz):  2.83 .. 3.45  ATH=-118.98
MT sfb=13 freq(khz):  3.45 .. 4.21  ATH=-118.92
MT sfb=14 freq(khz):  4.21 .. 5.13  ATH=-116.48
MT sfb=15 freq(khz):  5.13 .. 6.20  ATH=-113.20
MT sfb=16 freq(khz):  6.20 .. 7.50  ATH=-111.72
MT sfb=17 freq(khz):  7.50 .. 9.11  ATH=-110.10
MT sfb=18 freq(khz):  9.11 ..11.03  ATH=-106.49
MT sfb=19 freq(khz): 11.03 ..13.09  ATH=-98.69 
MT sfb=20 freq(khz): 13.09 ..16.00  ATH=-84.16 
MT sfb=21 freq(khz): 16.00 ..22.05  ATH=-48.04 

aha... I thought the downside of mp3 was that the heritage of L1 and L2
was that there was a linear distribution in freq ranges, but this
doen't look linear to me ... So instead of 32 linear just 22 log.
thanks, I'll do some reading-up...

MT The ATH is given in db, normalized so that the loudest possible 3khz
MT sine wave is about 1db. (normalization is a little different than the
MT db used in your plot) The value of the ATH in band 21 means that
MT humans cannot hear a 16khz signal which is weaker than -48db.

sounds plausible, because I did some highpass filter hearing tests on
my music ("can I hear artifacts, or is it just imagination"), and
roughly perceived I could hear -40dB...

MT ...
MT inaudible.  So the claim is that original + inaudible_error
MT sounds the same as the original.  
MT ...

I'll have to think about what you said concerning the error etc...
thanks for the lengthy explanation.

My idea was just: does it have to be bad that the _average_ volume of
the error is -44dB? If I display the freq analysis real-time the graph
ends up much higher most of the time, and also lower at other times.
I thought that at the times where (in time) the original ended up
below that -48dB treshold, the encoder made a quantized version that
is even lower, but inaudible (so bigger error, but no audible parts).
Then later when you sum the whole song, the average error gets bigger
than the -44dB, but not on the important parts?

I'll try to gain some more insight...

thanks

-- 
Best regards,
 Roelmailto:[EMAIL PROTECTED]


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



[MP3 ENCODER] Lame: Use for 320 JS frames in vbr mode?

2000-05-23 Thread Roel VdB

Hello,

Is there use for 320kbit/s frames in JS mode? Lame -V1 -mj -h -b128
gives them regulary, and I remember someone told me that the max for
each channel is 160kbit/s anyways, so there would be no quality
improvement?

or

- are the saved bits used for bit reservoir?
- to avoid switching too much between S and JS?
- other reason?
- possible oversight, because no reason to use JS?

Sorry for the question, but Rubén Castañón got me doubting ...

-- 
Best regards,
 Roelmailto:[EMAIL PROTECTED]


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Looking for educated guess or explanation

2000-05-23 Thread John T. Larkin

On Tue, May 23, 2000 at 10:03:01PM +0200, Roel VdB ([EMAIL PROTECTED]) wrote
 Hello Mark,
 
 Tuesday, May 23, 2000, 8:14:18 PM, you wrote:
 
 MT Actually, there really are 22 "critical bands" or "scale factor bands"
 MT used by MP3. I guess we should stick to the C convention, and call the
 MT last band the 21'st band.  Here are the frequency ranges,
 MT along with the ATH:
 
 aha... I thought the downside of mp3 was that the heritage of L1 and L2
 was that there was a linear distribution in freq ranges, but this
 doen't look linear to me ... So instead of 32 linear just 22 log.
 thanks, I'll do some reading-up...

My understanding is: For encoding purposes, the audio is indeed
divided up into 32 bands by the polyphase filter -- just like MP1 and
MP2.  However, MP3 makes this a bit better by super-imposing the
scalefactor bands on top of these.  This is an advantage because the
SFB more closly correspond to the critical bands.  These 22 SFBs may
apply to less than 1 (for the lower ranges), or more than 1 (for the
upper ranges) of the 32 linear bands created by the polyphase filter.



John

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Lame: Use for 320 JS frames in vbr mode?

2000-05-23 Thread Mark Taylor


 
 Hello,
 
 Is there use for 320kbit/s frames in JS mode? Lame -V1 -mj -h -b128
 gives them regulary, and I remember someone told me that the max for
 each channel is 160kbit/s anyways, so there would be no quality
 improvement?
 

No such maximum.  You can, for example, encode a mono file
at 320kbs.  

Mark

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



[MP3 ENCODER] Fixed silly mp3input bug with libsndfile...

2000-05-23 Thread Sigbjørn Skjæret

Right, just committed a fix for a silly little bug which prevented mp3input
to work together with libsndfile...

I also noted that the --decode feature doesn't care about endianess, thus
ending up with a big endian wav (read; garbage. ;) ) on big endian machines.

Couldn't think up a quick fix for this due to the way this feature works,
though it could be nice if it used libsndfile (and thus being able to write
almost any type of file) when available... ;)


- CISC

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )