Re: [Freetel-codec2] Clarifications on quantization

2023-09-23 Thread david
> misleading to call this "spectral magnitudes" under Bit Allocation on
> the
> website. (Or are LPC coefficients actually also called spectral
> magnitudes?)

The LPC coefficients can be interpreted as conveying the spectral
magnitude information, try taking the DFT of the LPC synthesis filter
spectra.

> In the decoder, a signal is synthesized using the LPC synth filter.

No - we use freq domain techniques, see sine.c: synthesise().

> That should
> already be audible speech. But since the audio quality of pure LPC
> systems is
> low (you write of a "mechanical quality" in your thesis), you do the
> following
> trick: You get the DFT of the LPC-produced signal and extract the
> harmonic
> amplitudes - but from the RMS for reasons not well understood. You
> then apply
> the sinusoidal model (ie. "Reverse FFT"?) including phase information
> derived
> from the voicing bits. Effectively, you are enhancing the LPC
> synthesized speech
> by correcting the harmonic phases, resulting in increased quality.
> 
> Did I get this right?

Not really. The harmonic magnitures are extracted from the LPC
synthesis filter spectra, and we also sample the LPC synthesis filter
spectra to obtain the dispersive part of the phase spectra.

> 
> PS: Off-topic, but what system did you write your thesis in? Is this
> TeX or Troff?
> 

I made a poor decision to use Word, various versions through the 90's.
 I did use Word basic for some automation.

- David




___
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2


Re: [Freetel-codec2] Clarifications on quantization

2023-09-21 Thread Robin Haberkorn via Freetel-codec2
Hello David,

thank you for your explanations. Everything makes a little bit more sense to me
now. I think most of the confusion comes from the fact - correct me if I am
wrong - that Codec 2 is actually a combination of Sinusoidal and LPC coding, but
in most materials you present it as a Sinusoidal coder, which is a bit 
misleading.

Let me try to sum up:

So you are basically using LPC with LSP quantization on the source signal. On
the channel you are transmitting a fixed number of LSP-coded LPC filter
coefficients instead of the variable harmonic amplitudes. It might be a bit
misleading to call this "spectral magnitudes" under Bit Allocation on the
website. (Or are LPC coefficients actually also called spectral magnitudes?)
Voicing is determined via MBE, but reduced to a single bit per 10ms.
Pitch is determined using the NLP algorithm (cf. chapter 4) and some refinement
via MBE (for which you naturally need a DFT as well).

In the decoder, a signal is synthesized using the LPC synth filter. That should
already be audible speech. But since the audio quality of pure LPC systems is
low (you write of a "mechanical quality" in your thesis), you do the following
trick: You get the DFT of the LPC-produced signal and extract the harmonic
amplitudes - but from the RMS for reasons not well understood. You then apply
the sinusoidal model (ie. "Reverse FFT"?) including phase information derived
from the voicing bits. Effectively, you are enhancing the LPC synthesized speech
by correcting the harmonic phases, resulting in increased quality.

Did I get this right?
How exactly the harmonic amplitudes are extracted, I still don't fully
understand. It's described in chapter 5.2.1, so that's my problem. Also, I do
not understand what the quantized energy information is used for.

This is not so bad for my small report, as I can outline first a pure sinusoidal
model with pros and cons, then LPC with pros and cons as an alternative and
ultimately Codec 2 as a clever synthesis of both.

I must say, that this is the most complex DSP algorithm I have seen so far, but
I presume there is equally complex stuff going on in video coding.

Best regards,
Robin Haberkorn

PS: Off-topic, but what system did you write your thesis in? Is this TeX or 
Troff?

On 9/20/23 22:14, david wrote:
> Hi Robin,
> 
>> Regarding the quantization of sinusoidal magnitudes/amplitudes, you
>> write in a
>> blog post (https://www.rowetel.com/?p=130) that the "red line" Am is
>> quantized.
>> This is not the plain frequency curve (the green one Sw). How exactly
>> do you
>> derive Am from Sw?
> 
> By sampling the LPC synthesis filter Pw=1/|A(e^jw)|^2 at each harmonic.
> 
>> But in the Harmonic Sinusoidal Model, you need to have all L
>> amplitudes
>> available to synthesize the speech signal. How is that achieved? Are
>> you simply
>> synthesizing 10 harmonics with an appropriately scaled Wo no matter
>> what?
>>
> 
> The LSPs are converted back to LPC coeffcients {ak}, which are used to
> create a LPC synthesis filter, which we sample.  Well actually we take
> the RMS value of the spectra in that band rather than sampling at the
> harmonic centre.  The blog post you linked to explains that a little
> further down, and I think it's in the thesis too.
> 
>> The fundamental frequency is determined by trying a number of
>> frequencies
>> between 50-500 Hz, determining the sinusoidal amplitudes, decoding
>> that data and
>> comparing it with the original signal? The fundamental frequency will
>> be the one
>> where that comparison yields the smallest error. This is the
>> algorithm described
>> in chapter 3.4 of your PhD thesis.
>>
> We use the non linear pitch estimation algorithm (in the thesis), the
> the MBE pitch estimator (which you outlined above) is used for
> refinement of the pitch estimate.
> 
>> What's the algorithm you are using to estimate voicing?
> 
> The MBE algorithm, but the voicing of all bands is averaged to get a
> single metric which we compare to a threshold.
> 
>> Furthermore, LPC analysis is performed directly on the speech samples
>> (time
>> domain) according to the block diagram. How does that fit together
>> with using Am
>> which is obviously a feature in the frequency domain?
> 
> The Am are extracted using freq domain techniques for the purpose of
> estimating voicing.  In the LPC quantised modes, then Am are then
> discarded and the time domain LPC are transformed to LSPs and sent to
> the decoder, where the Am are extracted.
>  
>> I do have a little bit of experience in signal/audio processing, but
>> still find
>> it hard to understand all of it. Okay I admit, I get terribly
>> confused.
> 
> Yes, we realise there is a gap here.  We plan to write a complete
> algorithm description to provide a reference in one place.
> 
> Cheers,
> David R
> 
> 
> 
> ___
> Freetel-codec2 mailing list
> Freetel-codec2@lists.sourceforge.net
> 

Re: [Freetel-codec2] Clarifications on quantization

2023-09-21 Thread Sebastien F4GRX

Hello,

Le 20/09/2023 à 21:14, david a écrit :

Yes, we realise there is a gap here.  We plan to write a complete
algorithm description to provide a reference in one place.


It has been years, time flies :D

I started the attached document a long time ago by browsing the code but 
got lost in more code and more projects and life and pandemic and kids etc.


It may contains ridiculous mistakes, is grossly incomplete, and this is 
just my surface understanding of what I read.


I did not have a deep look at the 700bps rates which are much more complex.

But it's a start.

Sebastien


codec2_spec.odt
Description: application/vnd.oasis.opendocument.text
___
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2


Re: [Freetel-codec2] Clarifications on quantization

2023-09-20 Thread david
Hi Robin,

> Regarding the quantization of sinusoidal magnitudes/amplitudes, you
> write in a
> blog post (https://www.rowetel.com/?p=130) that the "red line" Am is
> quantized.
> This is not the plain frequency curve (the green one Sw). How exactly
> do you
> derive Am from Sw?

By sampling the LPC synthesis filter Pw=1/|A(e^jw)|^2 at each harmonic.

> But in the Harmonic Sinusoidal Model, you need to have all L
> amplitudes
> available to synthesize the speech signal. How is that achieved? Are
> you simply
> synthesizing 10 harmonics with an appropriately scaled Wo no matter
> what?
> 

The LSPs are converted back to LPC coeffcients {ak}, which are used to
create a LPC synthesis filter, which we sample.  Well actually we take
the RMS value of the spectra in that band rather than sampling at the
harmonic centre.  The blog post you linked to explains that a little
further down, and I think it's in the thesis too.

> The fundamental frequency is determined by trying a number of
> frequencies
> between 50-500 Hz, determining the sinusoidal amplitudes, decoding
> that data and
> comparing it with the original signal? The fundamental frequency will
> be the one
> where that comparison yields the smallest error. This is the
> algorithm described
> in chapter 3.4 of your PhD thesis.
> 
We use the non linear pitch estimation algorithm (in the thesis), the
the MBE pitch estimator (which you outlined above) is used for
refinement of the pitch estimate.

> What's the algorithm you are using to estimate voicing?

The MBE algorithm, but the voicing of all bands is averaged to get a
single metric which we compare to a threshold.

> Furthermore, LPC analysis is performed directly on the speech samples
> (time
> domain) according to the block diagram. How does that fit together
> with using Am
> which is obviously a feature in the frequency domain?

The Am are extracted using freq domain techniques for the purpose of
estimating voicing.  In the LPC quantised modes, then Am are then
discarded and the time domain LPC are transformed to LSPs and sent to
the decoder, where the Am are extracted.
 
> I do have a little bit of experience in signal/audio processing, but
> still find
> it hard to understand all of it. Okay I admit, I get terribly
> confused.

Yes, we realise there is a gap here.  We plan to write a complete
algorithm description to provide a reference in one place.

Cheers,
David R



___
Freetel-codec2 mailing list
Freetel-codec2@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freetel-codec2