RE: [MP3 ENCODER] Voice encoding questions

alex . broadhead Fri, 04 Aug 2000 09:56:50 -0700
Howdy All,

Thanks for the quick replies!

Gabriel Bouvigne wrote:

> If you want to encode voice signals, I'd suggest you to use --voice
> or --preset voice

Actually, I want to encode general signals (mostly TV and movies), many of
which have significant voice components, and, unfortunately, many of which
do not.  My coded is doing OK on music, and sucking at voice, so what I'm
really trying to do is figure out why _voice_ signals are a problem for
_general purpose_ encoders.  Otherwise I would just bandpass 300-3000 Hz.

> > 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds
> very good.  (Man,
> > is it slow, though.)  Again, without the forced MPEG-1
> sampling rate, the
> > mp3enc31 will attempt to use 22050.
>
> You're disabling intensity stereo, but not joint stereo. With those
> settings, mp3enc is using m/s stereo. This is an advantage
> over Lame that
> you forced to use plain stereo.

Yeah, I noticed that.  As I'm sure you have already discovered, there is no
way to disable M/S in mp3enc, so the comparison is bad.

> I forget something: the sample you're using is very closed to
> mono, so joint
> stereo helps a lot.

A very good point.  I would hate to give FhG more credit than they deserve.

> > 4) Layer-II (64 kbps stereo CRC) sounds good.
>
> The layer II encoder is probably using joint stereo. In Layer
> II, joint
> stereo is quite similar to the intensity stereo of layer III

Actually, there is no joint stereo code in our Layer-II encoder, so I'm sure
it's not using it.

I should probably qualify my rating of 'good' to say that there are no
obvious and distracting high frequency artifacts.  Of course, the whole
thing sounds like AM radio, but, in my experience, that is the difference
between Layer-II and Layer-III degradation.  Layer-II has an initial series
of 'non-linear' (to pervert a term) distortions at a relatively low
compression ratio, after which it just starts evenly raising the noise floor
('linear' distortion).  Distortions in Layer-III are almost always
'non-linear' (wateriness, blips, missing frequencys, lowpass), though the
noise floor stays consistently low.  At low bitrates, I find 'linear'
distortion infinitely preferable to the 'non-linear', though this is, of
course, purely a matter of taste.

> For your problem, there are mainly 2 soulutions:
> a: downsampling
> b: using joint stereo. For voice signal, the best joint mode
> would probably
> be intensity stereo. But it's not implemented in Lame.

This was my suspicion, I was really just looking for confirmation.  Thanks.

> You mentionned that you use crc. Are you aware that the ISO
> crc code is
> brocken?

It may well have been broken (though I seem to remember that it was simply
not present for Layer-III) - I wouldn't know, since I removed it and wrote
my own, which is not.  (For realtime multicast, it was a feature we had to
have.)

Greg Maxwell wrote:

> The dist10 encoder has a bug in the short block code which
> makes it stink
> on fricatives in speech.

Does anyone have any more info on this?  The frame analyzer doesn't indicate
that I'm using short blocks on the fricatives in question - or is that the
bug?

Mark Taylor wrote:

> Why do you disable the 22050 downsampling?  This is done based on the
> idea that encoding at 22khz is better than encoding at 44khz and
> removing have the specturm with filters.

Because I was trying to compare apples to apples (MPEG-1 to MPEG-1) and my
encoder doesn't use LSF yet.

> FhG is probably using joint stereo?  This will increase the
> bandwidth by 10-20%.

Yes, as discussed above, this is definitely cooking the books.

> The main difference between LAME and ISO is that the ISO
> code has serious flaws in several major components.  jstereo,
> filtering and other advanced features help, but you gotta fix
> the bugs first!

I like to think that I have fixed at least a few.  Now that I've finished a
first pass clean, rewrite, overhaul, and verify, I'm taking a closer look at
algorithmic (as opposed to purely implementational) problems, starting with
the main loop, and probably ending with the #&^@% psych model.  Of course,
if advanced features are going to make a bigger difference, though, they may
gain a higher priority.

> You rate FhG as 'very good', and Layer II as 'good'.  So I'm assuming
> layer III beats layer II.  The thing layer III adds to layer II is: 1)
> MDCT transform (lossless to roundoff), 2) entropy coding (lossless),
> 3) bitreservoir (prevents wasting of bits) and 4) the ability to do
> more advanced noise shaping.  #1,2 and 3 can only improve the
> quality. The only way I can see layer II out-perform layer III is if
> #4 is not tuned properly for the desired compression.

Your assumption is correct.  And, based on my observations about distortion
above, I would concur with your analysis; the noise shaping seems to be
breaking down pretty badly at this (ridiculously high, I am aware)
compression ratio.

-----

I'd just like to say that I really appreciate the feedback that this list
provides - I don't know of any other knowledgeable source of information on
MPEG audio that deigns to interact with the public.  (Try getting any info
from the committee, for instance - it took months for me just to get a bug
report to them.)  And there is a real dearth of printed material on the
subject as well.

I feel guilty using a list mainly devoted to an open source codec (LAME) to
further the development of ClearBand's 'proprietary' codec.  (Is a standards
based codec implementation proprietary?  We don't sell the codec - we sell a
multicast system, mostly to ISPs and corporations, and the proprietary part
is the multicast part.  My superiors just didn't want to license FhG's
source, I guess...)  I am disgusted by 'programmers' who grab LAME wholesale
and use it in their own commercial software - though I must confess to
confusion about the legal status of almost everything associated with audio
compression, including LAME.  Anyway, I am always on the lookout for ways
that I can assist in LAME's development, and I try to report bugs I find in
our common root (dist10), and to answer questions to the list about which I
have expertise when they arise.

Thanks again,
Alex

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
RE: [MP3 ENCODER] Voice encoding questions

Reply via email to