Re: [MP3 ENCODER] Voice mode

Greg Maxwell Thu, 14 Oct 1999 20:47:53 -0700
On Thu, 14 Oct 1999, Gabriel Bouvigne wrote:

> 
> > Panned stereo mode!
> >
> > For each frame you examine a spectrally weighed (ignore low freqs) energy
> > ratio of left/right to pick the stereo panning at two points in time
> > (middle, and right) then interpoate from old_left to middle then right,
> > enforcing a maximum rate of change.
> >
> > Then mix to mono, encode as mid, and the position as side, you only have
> > to encode lower scalfactors because it should consist of low freqs only
> > (because of your slow interpolation). I suppose you could shape your side
> > wav to MDCT well too..
> >
> > Am I missing something?
> >
> > I'd think that this would allow you to get mono quality at about the same
> > bitrate, but still preserve panning which would be help at differentiating
> > between people speaking.
> 
> 
> According to me, this can't be done using m/s stereo, because in the case of
> someone speaking on side, the side channel would be as high as the middle
> channel.
> What you're describing here looks lot like the intensity stereo mode, where
> the signal is encoded as mono on the left channel, and location is encoded
> on the right one. This would help a lot voice encoding, but also music at
> low bitrates. To my mind, it's something missing to Lame in order to be able
> to compete FhG at low bitrates.

Duh.. I see, M/S is define with addition/subtraction not multiplication.

So, is there a downloadably doc that describes intensity mode, are
decoders a good source of info?

 
> Unfortunately, it's unlikely that I'll have the time to work on this. As it
> does not seems to be much difficult to make (at least it's easier than m/s
> stereo), perhaps Patrick could allow some of his students to work on this. I
> can't do it during my own student project, as mine must be about image
> processing or image synthesis.

Are you aware you can use lame for image compression? :)

First create an greyscale image ((576*x)*n, or (1152*x)*n for best
results) save it as ascii ppm.
Cut off the header.
Use sox to go from 8bit unsigned to a 16 bit mono wav file set at 44100. 
Encode, reverse..

Looks preety good, perhaps if you do some kind of transform on the input
first to make it less 'transiant' (some kind of reversiable convolution)
you might get better compression then jpg (it's not too far as is). 

Kinda gives you respect for how tough audio compression is VS image
compression.. Check out http://linuxpower.cx/~greg/mp3crap/ for some
examples of this kind of perversion..

(Why did I do this? I was hoping nasty artifacts might be more easily
found in a picture rather then listening to a sample)

--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Voice mode

Reply via email to