Re: [MP3 ENCODER] Voice encoding questions
| 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good. (Man, | is it slow, though.) Again, without the forced MPEG-1 sampling rate, the | mp3enc31 will attempt to use 22050. ... | So my question(s) are: Is the solution to my problem to filter/downsample | (and use joint, when I get around to coding it up)? That seems to be what | is making the difference in the case of LAME; I assume that FhG is using | some filtering as well, though there's no way to disable it and see for use option -bw 22050 as bandwidth in Hz Jaroslav Lukesh -- note: (Bill) Gates to Hell! -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice encoding questions
> Another question: > Is there any tool to analyze the number of SI, MS and LR frames in a MP3? > Frank, you just need a GTK enabled version of lame :-) run lame -g on the mp3 file, scroll to the end, and then click 'show' under the 'stats' pull down menu. It shows the info you want, and any additional statistics would be easy to add. You can also use to to examine the mid/side bit allocation frame by frame. You could test your ideas about near mono files via the following: Modify reduce_side() function in quantize-pvt.c to be more aggressive. Right now it allocates at most a 33/66 split between side channel and mid channel, based on the side_channel_energy/total_energy ratio. As Robert mentioned, a more aggressive split can create artifacts. I think the problem is that allocating just a few bits to the side channel can produce audible glitches which will sound worse than if 0 bits were used. But no one has done a detailed study of this. > -mm Use Mono > -mi Use Intensity Stereo, MS-Stereo and LR-Stereo > -mj Use MS-Stereo and LR-Stereo > -ms Use LR-Stereo > -ma Analyze FIle before any converting, select -mm, -mj or -ms > > I think -ma would be beyond the scope of LAME. A seperate analysis program should be written, and then a GUI front end should run the analysis and make the selection. This is similar to automatic level adjustment. A couple people have expressed interest in adding a volume adjustment to LAME, which is a fine, but the additional step of runing some analysis on the file to determine the adjustment should be left to a seperate program. Mark -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice encoding questions
> F> We should support an option (-ma for Mode Auto) which switches > F> between -a -mm for highly correlated channels (r > 0.98 => > F> mono), -mj for a normal correlated signals (r = -1.00...-0.20, > F> 0.20...0.91 => stereo) and -ms for nearly not > > I am afraid most of decoders can't treat an mp3 file correctly > whose mode(stereo <-> mono) is changing during one file. Switching between any stereo modes (stereo, m/s, is, ms and is) is allowed, but switching between stereo, mono and dual is forbidden by the standard. Regards, -- Gabriel Bouvigne - France [EMAIL PROTECTED] icq: 12138873 MP3' Tech: www.mp3-tech.org -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice encoding questions
> "F" == Frank Klemm <[EMAIL PROTECTED]> writes: F> We should support an option (-ma for Mode Auto) which switches F> between -a -mm for highly correlated channels (r > 0.98 => F> mono), -mj for a normal correlated signals (r = -1.00...-0.20, F> 0.20...0.91 => stereo) and -ms for nearly not I am afraid most of decoders can't treat an mp3 file correctly whose mode(stereo <-> mono) is changing during one file. --- Takehiro TOMINAGA // may the source be with you! -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice encoding questions
> I like to think that I have fixed at least a few. Now that I've finished a > first pass clean, rewrite, overhaul, and verify, I'm taking a closer look at > algorithmic (as opposed to purely implementational) problems, starting with > the main loop, and probably ending with the #&^@% psych model. Of course, > if advanced features are going to make a bigger difference, though, they may > gain a higher priority. > I'd suggest you to look at the archives of this list, and to look at Lame 3.00. It's code was probably a lot easier, and it was mainly bugfixed ISO with addition of joint stereo. Regards, -- Gabriel Bouvigne - France [EMAIL PROTECTED] icq: 12138873 MP3' Tech: www.mp3-tech.org -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
RE: [MP3 ENCODER] Voice encoding questions
Howdy All, Thanks for the quick replies! Gabriel Bouvigne wrote: > If you want to encode voice signals, I'd suggest you to use --voice > or --preset voice Actually, I want to encode general signals (mostly TV and movies), many of which have significant voice components, and, unfortunately, many of which do not. My coded is doing OK on music, and sucking at voice, so what I'm really trying to do is figure out why _voice_ signals are a problem for _general purpose_ encoders. Otherwise I would just bandpass 300-3000 Hz. > > 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds > very good. (Man, > > is it slow, though.) Again, without the forced MPEG-1 > sampling rate, the > > mp3enc31 will attempt to use 22050. > > You're disabling intensity stereo, but not joint stereo. With those > settings, mp3enc is using m/s stereo. This is an advantage > over Lame that > you forced to use plain stereo. Yeah, I noticed that. As I'm sure you have already discovered, there is no way to disable M/S in mp3enc, so the comparison is bad. > I forget something: the sample you're using is very closed to > mono, so joint > stereo helps a lot. A very good point. I would hate to give FhG more credit than they deserve. > > 4) Layer-II (64 kbps stereo CRC) sounds good. > > The layer II encoder is probably using joint stereo. In Layer > II, joint > stereo is quite similar to the intensity stereo of layer III Actually, there is no joint stereo code in our Layer-II encoder, so I'm sure it's not using it. I should probably qualify my rating of 'good' to say that there are no obvious and distracting high frequency artifacts. Of course, the whole thing sounds like AM radio, but, in my experience, that is the difference between Layer-II and Layer-III degradation. Layer-II has an initial series of 'non-linear' (to pervert a term) distortions at a relatively low compression ratio, after which it just starts evenly raising the noise floor ('linear' distortion). Distortions in Layer-III are almost always 'non-linear' (wateriness, blips, missing frequencys, lowpass), though the noise floor stays consistently low. At low bitrates, I find 'linear' distortion infinitely preferable to the 'non-linear', though this is, of course, purely a matter of taste. > For your problem, there are mainly 2 soulutions: > a: downsampling > b: using joint stereo. For voice signal, the best joint mode > would probably > be intensity stereo. But it's not implemented in Lame. This was my suspicion, I was really just looking for confirmation. Thanks. > You mentionned that you use crc. Are you aware that the ISO > crc code is > brocken? It may well have been broken (though I seem to remember that it was simply not present for Layer-III) - I wouldn't know, since I removed it and wrote my own, which is not. (For realtime multicast, it was a feature we had to have.) Greg Maxwell wrote: > The dist10 encoder has a bug in the short block code which > makes it stink > on fricatives in speech. Does anyone have any more info on this? The frame analyzer doesn't indicate that I'm using short blocks on the fricatives in question - or is that the bug? Mark Taylor wrote: > Why do you disable the 22050 downsampling? This is done based on the > idea that encoding at 22khz is better than encoding at 44khz and > removing have the specturm with filters. Because I was trying to compare apples to apples (MPEG-1 to MPEG-1) and my encoder doesn't use LSF yet. > FhG is probably using joint stereo? This will increase the > bandwidth by 10-20%. Yes, as discussed above, this is definitely cooking the books. > The main difference between LAME and ISO is that the ISO > code has serious flaws in several major components. jstereo, > filtering and other advanced features help, but you gotta fix > the bugs first! I like to think that I have fixed at least a few. Now that I've finished a first pass clean, rewrite, overhaul, and verify, I'm taking a closer look at algorithmic (as opposed to purely implementational) problems, starting with the main loop, and probably ending with the #&^@% psych model. Of course, if advanced features are going to make a bigger difference, though, they may gain a higher priority. > You rate FhG as 'very good', and Layer II as 'good'. So I'm assuming > layer III beats layer II. The thing layer III adds to layer II is: 1) > MDCT transform (lossless to roundoff), 2) entropy coding (lossless), > 3) bitreservoir (prevents wasting of bits) and 4) the ability to do > more advanced noise shaping. #1,2 and 3 can only improve the > quality. The only way I can see layer II out-perform layer III is if > #4 is not tuned properly for the desired compression. Your assumption is correct. And, based on my observations about distortion above, I would concur with your analysis; the noise shaping seems to be breaking down pretty badly at this (ridiculously high, I am aware) compression ratio. - I'd just like to say that I really appreci
Re: [MP3 ENCODER] Voice encoding questions
> 1) With my encoder (64kbps stereo CRC), every fricative is almost painful to > listen to, as the pink noise bursts end up being narrow band filtered (due > to lack of bits - only the MDCT coeffs closest to the pole are making it > into the bitstream), and there are occasional weird high frequency blips and > arpeggiation which are very annoying. > > 2) LAME (-m s -h -b 64 -p --resample 44.1) (we use CRC and I haven't enabled > LSF yet) sounds pretty good. There are occasional minor glitches, but > that's to be expected at this bitrate. However, LAME (as above plus -k to > turn off the filters) sounds pretty similar to what I'm getting. I note > that without the forced resampling, LAME will attempt to downsample to > 22050. > > 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good. (Man, > is it slow, though.) Again, without the forced MPEG-1 sampling rate, the > mp3enc31 will attempt to use 22050. > The main difference between FhG and LAME is probably the lowpass filters. Try different values of --lowpass. The compression ratio you are using (about 22x) is not commonly used, and the LAME's default guess at a lowpass setting wont be very good. Why do you disable the 22050 downsampling? This is done based on the idea that encoding at 22khz is better than encoding at 44khz and removing have the specturm with filters. FhG is probably using joint stereo? This will increase the bandwidth by 10-20%. The main difference between LAME and ISO is that the ISO code has serious flaws in several major components. jstereo, filtering and other advanced features help, but you gotta fix the bugs first! > some filtering as well, though there's no way to disable it and see for > sure. Are there really just not enough bits for this type of signal at this > bitrate? Why does Layer-II do so much better a job with this type of > signal? Do other codecs (AAC/MPEG-4) hand this kind of signal better as You rate FhG as 'very good', and Layer II as 'good'. So I'm assuming layer III beats layer II. The thing layer III adds to layer II is: 1) MDCT transform (lossless to roundoff), 2) entropy coding (lossless), 3) bitreservoir (prevents wasting of bits) and 4) the ability to do more advanced noise shaping. #1,2 and 3 can only improve the quality. The only way I can see layer II out-perform layer III is if #4 is not tuned properly for the desired compression. > well? And what is the capital of Assyria? > during which century? Mark -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice encoding questions
> So my question(s) are: Is the solution to my problem to filter/downsample > (and use joint, when I get around to coding it up)? That seems to be what > is making the difference in the case of LAME; I assume that FhG is using > some filtering as well, though there's no way to disable it and see for > sure. Are there really just not enough bits for this type of signal at this > bitrate? Why does Layer-II do so much better a job with this type of > signal? Do other codecs (AAC/MPEG-4) hand this kind of signal better as > well? I forget something: the sample you're using is very closed to mono, so joint stereo helps a lot. For your problem, there are mainly 2 soulutions: a: downsampling b: using joint stereo. For voice signal, the best joint mode would probably be intensity stereo. But it's not implemented in Lame. You mentionned that you use crc. Are you aware that the ISO crc code is brocken? Regards, -- Gabriel Bouvigne - France [EMAIL PROTECTED] icq: 12138873 MP3' Tech: www.mp3-tech.org -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice encoding questions
- Original Message - From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, August 04, 2000 4:14 PM Subject: [MP3 ENCODER] Voice encoding questions > Howdy All, > > In testing my (comparatively naive) hack of the dist10 encoder, I have > discovered that, while it does OK for music, it has real problems with > speech signals. (Caveat: at our lowest overall bitrate of 300kbps for > combined video/audio, we run the audio at 32kbit mono - though we go way up > to 64kbps mono for higher overall bitrate signals, and are aiming to default > at 64kbps stereo [not joint].) In particular, the broadband noise bursts > associated with fricatives really wreak havoc. > > My test signal here is spfe49_1 from the AAC SQAM test suite, which is a > female English speaker going on about giving pills to animals. I ran it > through 1) my encoder, 2) LAME (3.85 w/ frame analyzer), 3) mp3enc31, and 4) > our current Layer-II encoder. > > 1) With my encoder (64kbps stereo CRC), every fricative is almost painful to > listen to, as the pink noise bursts end up being narrow band filtered (due > to lack of bits - only the MDCT coeffs closest to the pole are making it > into the bitstream), and there are occasional weird high frequency blips and > arpeggiation which are very annoying. > > 2) LAME (-m s -h -b 64 -p --resample 44.1) (we use CRC and I haven't enabled > LSF yet) sounds pretty good. There are occasional minor glitches, but > that's to be expected at this bitrate. However, LAME (as above plus -k to > turn off the filters) sounds pretty similar to what I'm getting. I note > that without the forced resampling, LAME will attempt to downsample to > 22050. If you want to encode voice signals, I'd suggest you to use --voice or --preset voice > 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good. (Man, > is it slow, though.) Again, without the forced MPEG-1 sampling rate, the > mp3enc31 will attempt to use 22050. You're disabling intensity stereo, but not joint stereo. With those settings, mp3enc is using m/s stereo. This is an advantage over Lame that you forced to use plain stereo. > 4) Layer-II (64 kbps stereo CRC) sounds good. The layer II encoder is probably using joint stereo. In Layer II, joint stereo is quite similar to the intensity stereo of layer III >And what is the capital of Assyria? The first assyrian capital was Assur, and it was later replaced by Kalah. -- Gabriel Bouvigne - France [EMAIL PROTECTED] icq: 12138873 MP3' Tech: www.mp3-tech.org -- MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )