Re: [MP3 ENCODER] Voice encoding questions

2000-08-07 Thread Jaroslav Lukesh

| 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good. 
(Man,
| is it slow, though.)  Again, without the forced MPEG-1 sampling rate, the
| mp3enc31 will attempt to use 22050.
 
...

| So my question(s) are:  Is the solution to my problem to
filter/downsample
| (and use joint, when I get around to coding it up)?  That seems to be
what
| is making the difference in the case of LAME; I assume that FhG is using
| some filtering as well, though there's no way to disable it and see for

use option -bw 22050 as bandwidth in Hz



 Jaroslav Lukesh
--
 note: (Bill) Gates to Hell!

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice encoding questions

2000-08-05 Thread Mark Taylor



> Another question:
>   Is there any tool to analyze the number of SI, MS and LR frames in a MP3?
> 
Frank, you just need a GTK enabled version of lame :-)
run lame -g on the mp3 file, scroll to the end, and then
click 'show' under the 'stats' pull down menu.
It shows the info you want, and any additional statistics
would be easy to add.  You can also use to to examine
the mid/side bit allocation frame by frame. 

You could test your ideas about near mono files 
via the following:  

Modify reduce_side() function in quantize-pvt.c to
be more aggressive.  Right now it allocates at most
a 33/66 split between side channel and mid channel,
based on the side_channel_energy/total_energy ratio.

As Robert mentioned, a more aggressive split can
create artifacts.  I think the problem is that 
allocating just a few bits to the side channel
can produce audible glitches which will sound worse
than if 0 bits were used.  But no one has done a
detailed study of this.  



> -mm   Use Mono
> -mi   Use Intensity Stereo, MS-Stereo and LR-Stereo
> -mj   Use MS-Stereo and LR-Stereo
> -ms   Use LR-Stereo
> -ma   Analyze FIle before any converting, select -mm, -mj or -ms
> 
> 

I think -ma would be beyond the scope of LAME. A 
seperate analysis program should be written, and then a 
GUI front end should run the analysis and make the selection.

This is similar to automatic level adjustment.  A couple people
have expressed interest in adding a volume adjustment to
LAME, which is a fine, but the additional step of runing
some analysis on the file to determine the adjustment
should be left to a seperate program.

Mark






--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice encoding questions

2000-08-05 Thread Gabriel Bouvigne

> F> We should support an option (-ma for Mode Auto) which switches
> F> between -a -mm for highly correlated channels (r > 0.98 =>
> F> mono), -mj for a normal correlated signals (r = -1.00...-0.20,
> F> 0.20...0.91 => stereo) and -ms for nearly not
>
> I am afraid most of decoders can't treat an mp3 file correctly
> whose mode(stereo <-> mono) is changing during one file.

Switching between any stereo modes (stereo, m/s, is, ms and is) is allowed,
but switching between stereo, mono and dual is forbidden by the standard.


Regards,
--

Gabriel Bouvigne - France
[EMAIL PROTECTED]
icq: 12138873

MP3' Tech: www.mp3-tech.org


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice encoding questions

2000-08-05 Thread Takehiro Tominaga

> "F" == Frank Klemm <[EMAIL PROTECTED]> writes:

F> We should support an option (-ma for Mode Auto) which switches
F> between -a -mm for highly correlated channels (r > 0.98 =>
F> mono), -mj for a normal correlated signals (r = -1.00...-0.20,
F> 0.20...0.91 => stereo) and -ms for nearly not

I am afraid most of decoders can't treat an mp3 file correctly
whose mode(stereo <-> mono) is changing during one file.
--- 
Takehiro TOMINAGA // may the source be with you!
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice encoding questions

2000-08-04 Thread Gabriel Bouvigne


> I like to think that I have fixed at least a few.  Now that I've finished
a
> first pass clean, rewrite, overhaul, and verify, I'm taking a closer look
at
> algorithmic (as opposed to purely implementational) problems, starting
with
> the main loop, and probably ending with the #&^@% psych model.  Of course,
> if advanced features are going to make a bigger difference, though, they
may
> gain a higher priority.
>

I'd suggest you to look at the archives of this list, and to look at Lame
3.00. It's code was probably a lot easier, and it was mainly bugfixed ISO
with addition of joint stereo.

Regards,

--

Gabriel Bouvigne - France
[EMAIL PROTECTED]
icq: 12138873

MP3' Tech: www.mp3-tech.org


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



RE: [MP3 ENCODER] Voice encoding questions

2000-08-04 Thread alex . broadhead

Howdy All,

Thanks for the quick replies!

Gabriel Bouvigne wrote:

> If you want to encode voice signals, I'd suggest you to use --voice
> or --preset voice

Actually, I want to encode general signals (mostly TV and movies), many of
which have significant voice components, and, unfortunately, many of which
do not.  My coded is doing OK on music, and sucking at voice, so what I'm
really trying to do is figure out why _voice_ signals are a problem for
_general purpose_ encoders.  Otherwise I would just bandpass 300-3000 Hz.

> > 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds
> very good.  (Man,
> > is it slow, though.)  Again, without the forced MPEG-1
> sampling rate, the
> > mp3enc31 will attempt to use 22050.
>
> You're disabling intensity stereo, but not joint stereo. With those
> settings, mp3enc is using m/s stereo. This is an advantage
> over Lame that
> you forced to use plain stereo.

Yeah, I noticed that.  As I'm sure you have already discovered, there is no
way to disable M/S in mp3enc, so the comparison is bad.

> I forget something: the sample you're using is very closed to
> mono, so joint
> stereo helps a lot.

A very good point.  I would hate to give FhG more credit than they deserve.

> > 4) Layer-II (64 kbps stereo CRC) sounds good.
>
> The layer II encoder is probably using joint stereo. In Layer
> II, joint
> stereo is quite similar to the intensity stereo of layer III

Actually, there is no joint stereo code in our Layer-II encoder, so I'm sure
it's not using it.

I should probably qualify my rating of 'good' to say that there are no
obvious and distracting high frequency artifacts.  Of course, the whole
thing sounds like AM radio, but, in my experience, that is the difference
between Layer-II and Layer-III degradation.  Layer-II has an initial series
of 'non-linear' (to pervert a term) distortions at a relatively low
compression ratio, after which it just starts evenly raising the noise floor
('linear' distortion).  Distortions in Layer-III are almost always
'non-linear' (wateriness, blips, missing frequencys, lowpass), though the
noise floor stays consistently low.  At low bitrates, I find 'linear'
distortion infinitely preferable to the 'non-linear', though this is, of
course, purely a matter of taste.

> For your problem, there are mainly 2 soulutions:
> a: downsampling
> b: using joint stereo. For voice signal, the best joint mode
> would probably
> be intensity stereo. But it's not implemented in Lame.

This was my suspicion, I was really just looking for confirmation.  Thanks.

> You mentionned that you use crc. Are you aware that the ISO
> crc code is
> brocken?

It may well have been broken (though I seem to remember that it was simply
not present for Layer-III) - I wouldn't know, since I removed it and wrote
my own, which is not.  (For realtime multicast, it was a feature we had to
have.)

Greg Maxwell wrote:

> The dist10 encoder has a bug in the short block code which
> makes it stink
> on fricatives in speech.

Does anyone have any more info on this?  The frame analyzer doesn't indicate
that I'm using short blocks on the fricatives in question - or is that the
bug?

Mark Taylor wrote:

> Why do you disable the 22050 downsampling?  This is done based on the
> idea that encoding at 22khz is better than encoding at 44khz and
> removing have the specturm with filters.

Because I was trying to compare apples to apples (MPEG-1 to MPEG-1) and my
encoder doesn't use LSF yet.

> FhG is probably using joint stereo?  This will increase the
> bandwidth by 10-20%.

Yes, as discussed above, this is definitely cooking the books.

> The main difference between LAME and ISO is that the ISO
> code has serious flaws in several major components.  jstereo,
> filtering and other advanced features help, but you gotta fix
> the bugs first!

I like to think that I have fixed at least a few.  Now that I've finished a
first pass clean, rewrite, overhaul, and verify, I'm taking a closer look at
algorithmic (as opposed to purely implementational) problems, starting with
the main loop, and probably ending with the #&^@% psych model.  Of course,
if advanced features are going to make a bigger difference, though, they may
gain a higher priority.

> You rate FhG as 'very good', and Layer II as 'good'.  So I'm assuming
> layer III beats layer II.  The thing layer III adds to layer II is: 1)
> MDCT transform (lossless to roundoff), 2) entropy coding (lossless),
> 3) bitreservoir (prevents wasting of bits) and 4) the ability to do
> more advanced noise shaping.  #1,2 and 3 can only improve the
> quality. The only way I can see layer II out-perform layer III is if
> #4 is not tuned properly for the desired compression.

Your assumption is correct.  And, based on my observations about distortion
above, I would concur with your analysis; the noise shaping seems to be
breaking down pretty badly at this (ridiculously high, I am aware)
compression ratio.

-

I'd just like to say that I really appreci

Re: [MP3 ENCODER] Voice encoding questions

2000-08-04 Thread Mark Taylor


> 1) With my encoder (64kbps stereo CRC), every fricative is almost painful to
> listen to, as the pink noise bursts end up being narrow band filtered (due
> to lack of bits - only the MDCT coeffs closest to the pole are making it
> into the bitstream), and there are occasional weird high frequency blips and
> arpeggiation which are very annoying.
> 
> 2) LAME (-m s -h -b 64 -p --resample 44.1) (we use CRC and I haven't enabled
> LSF yet) sounds pretty good.  There are occasional minor glitches, but
> that's to be expected at this bitrate.  However, LAME (as above plus -k to
> turn off the filters) sounds pretty similar to what I'm getting.  I note
> that without the forced resampling, LAME will attempt to downsample to
> 22050.
> 
> 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good.  (Man,
> is it slow, though.)  Again, without the forced MPEG-1 sampling rate, the
> mp3enc31 will attempt to use 22050.
> 

The main difference between FhG and LAME is probably the lowpass
filters.  Try different values of --lowpass.  The compression ratio
you are using (about 22x) is not commonly used, and the LAME's
default guess at a lowpass setting wont be very good.

Why do you disable the 22050 downsampling?  This is done based on the
idea that encoding at 22khz is better than encoding at 44khz and
removing have the specturm with filters.

FhG is probably using joint stereo?  This will increase the
bandwidth by 10-20%.  

The main difference between LAME and ISO is that the ISO
code has serious flaws in several major components.  jstereo, 
filtering and other advanced features help, but you gotta fix
the bugs first!


> some filtering as well, though there's no way to disable it and see for
> sure.  Are there really just not enough bits for this type of signal at this
> bitrate?  Why does Layer-II do so much better a job with this type of
> signal?  Do other codecs (AAC/MPEG-4) hand this kind of signal better as

You rate FhG as 'very good', and Layer II as 'good'.  So I'm assuming
layer III beats layer II.  The thing layer III adds to layer II is: 1)
MDCT transform (lossless to roundoff), 2) entropy coding (lossless),
3) bitreservoir (prevents wasting of bits) and 4) the ability to do
more advanced noise shaping.  #1,2 and 3 can only improve the
quality. The only way I can see layer II out-perform layer III is if
#4 is not tuned properly for the desired compression.


> well?  And what is the capital of Assyria?
> 
during which century?

Mark
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice encoding questions

2000-08-04 Thread Gabriel Bouvigne


> So my question(s) are:  Is the solution to my problem to filter/downsample
> (and use joint, when I get around to coding it up)?  That seems to be what
> is making the difference in the case of LAME; I assume that FhG is using
> some filtering as well, though there's no way to disable it and see for
> sure.  Are there really just not enough bits for this type of signal at
this
> bitrate?  Why does Layer-II do so much better a job with this type of
> signal?  Do other codecs (AAC/MPEG-4) hand this kind of signal better as
> well?

I forget something: the sample you're using is very closed to mono, so joint
stereo helps a lot.

For your problem, there are mainly 2 soulutions:
a: downsampling
b: using joint stereo. For voice signal, the best joint mode would probably
be intensity stereo. But it's not implemented in Lame.

You mentionned that you use crc. Are you aware that the ISO crc code is
brocken?

Regards,


--

Gabriel Bouvigne - France
[EMAIL PROTECTED]
icq: 12138873

MP3' Tech: www.mp3-tech.org


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )



Re: [MP3 ENCODER] Voice encoding questions

2000-08-04 Thread Gabriel Bouvigne


- Original Message -
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, August 04, 2000 4:14 PM
Subject: [MP3 ENCODER] Voice encoding questions


> Howdy All,
>
> In testing my (comparatively naive) hack of the dist10 encoder, I have
> discovered that, while it does OK for music, it has real problems with
> speech signals.  (Caveat:  at our lowest overall bitrate of 300kbps for
> combined video/audio, we run the audio at 32kbit mono - though we go way
up
> to 64kbps mono for higher overall bitrate signals, and are aiming to
default
> at 64kbps stereo [not joint].)  In particular, the broadband noise bursts
> associated with fricatives really wreak havoc.
>
> My test signal here is spfe49_1 from the AAC SQAM test suite, which is a
> female English speaker going on about giving pills to animals.  I ran it
> through 1) my encoder, 2) LAME (3.85 w/ frame analyzer), 3) mp3enc31, and
4)
> our current Layer-II encoder.
>
> 1) With my encoder (64kbps stereo CRC), every fricative is almost painful
to
> listen to, as the pink noise bursts end up being narrow band filtered (due
> to lack of bits - only the MDCT coeffs closest to the pole are making it
> into the bitstream), and there are occasional weird high frequency blips
and
> arpeggiation which are very annoying.
>
> 2) LAME (-m s -h -b 64 -p --resample 44.1) (we use CRC and I haven't
enabled
> LSF yet) sounds pretty good.  There are occasional minor glitches, but
> that's to be expected at this bitrate.  However, LAME (as above plus -k to
> turn off the filters) sounds pretty similar to what I'm getting.  I note
> that without the forced resampling, LAME will attempt to downsample to
> 22050.

If you want to encode voice signals, I'd suggest you to use --voice
or --preset voice


> 3) FhG (-br 64000 -qual 9 -crc -no-is -esr 44100) sounds very good.  (Man,
> is it slow, though.)  Again, without the forced MPEG-1 sampling rate, the
> mp3enc31 will attempt to use 22050.

You're disabling intensity stereo, but not joint stereo. With those
settings, mp3enc is using m/s stereo. This is an advantage over Lame that
you forced to use plain stereo.


> 4) Layer-II (64 kbps stereo CRC) sounds good.

The layer II encoder is probably using joint stereo. In Layer II, joint
stereo is quite similar to the intensity stereo of layer III



>And what is the capital of Assyria?
The first assyrian capital was Assur, and it was later replaced by Kalah.

--

Gabriel Bouvigne - France
[EMAIL PROTECTED]
icq: 12138873

MP3' Tech: www.mp3-tech.org


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )