Re: [FFmpeg-soc] [PATCH] AMR-WB Decoder

Vitor Sessak Mon, 30 Aug 2010 05:31:29 -0700

On 08/27/2010 11:13 PM, Marcelo Galvão Póvoa wrote:

On 26 August 2010 18:55, Vitor Sessak<[email protected]>  wrote:

Marcelo Galvão Póvoa wrote:

Fixed excitation array incorrect length.


Should now be applicable cleanly to the latest ffmpeg revision
(fd151a5f8bd152c456a).

First look:


[...]

+/**
+ * Convert an ISF vector into an ISP vector
+ *
+ * @param[in] isf                  Isf vector
+ * @param[out] isp                 Output isp vector
+ * @param[in] size                 Isf/isp size
+ */
+static void isf2isp(const float *isf, double *isp, int size)
+{
+    int i;
+
+    for (i = 0; i<  size - 1; i++)
+        isp[i] = cos(2.0 * M_PI * isf[i]);
+
+    isp[size - 1] = cos(4.0 * M_PI * isf[size - 1]);
+}

Almost duplication of amrnbdec.c:lsf2lsp().

Yes, but this last element doubling diverges between AMR-NB and AMR-WB
reference sources.

Can't you just do "isp[size - 1] *= 2" or it is used elsewhere?+f[i] = val * f[i - 1] + 2 * f[i - 2];


[...]

+        for(j = i - 1; j>  1; j--)
+            f[j] += f[j - 1] * val + f[j - 2];
+        f[1] += 0.25 * val;
+    }
+}

Hmm, can't this function be replaced by ff_lsp2polyf() with some rescaling?

Actually I realize now that the difference between these functions is
just the entire output scaled down by 0.25. That was silly because I
was scaling by 4.0 after this function. It is weird why the reference
decoder does this process. Maybe it should worth someone checking it
out to be sure about this.

I suppose you checked that after this change the output of your decoderdidn't change, no?

+/**
+ * Convert a ISP vector to LP coefficient domain {a_k}
+ * Equations from TS 26.190 section 5.2.4
+ *
+ * @param[in] isp                  ISP vector for a subframe
+ * @param[out] lp                  LP coefficients
+ * @param[in] lp_half_order        Half the number of LPs to construct
+ */
+static void isp2lp(const double *isp, float *lp, int lp_half_order) {
+    double pa[10 + 1], qa[10 + 1];
+    double last_isp = isp[2 * lp_half_order - 1];
+    double qa_old = 0.0;
+    float *lp2 =&lp[2 * lp_half_order];
+    int i;
+
+    if (lp_half_order>  8) { // high-band specific
+        lsp2polyf_16k(isp,     pa, lp_half_order);
+        lsp2polyf_16k(isp + 1, qa, lp_half_order - 1);
+
+        for (i = 0; i<= lp_half_order; i++)
+            pa[i] *= 4.0;
+        for (i = 0; i<  lp_half_order; i++)
+            qa[i] *= 4.0;
+    } else {
+        ff_lsp2polyf(isp,     pa, lp_half_order);
+        ff_lsp2polyf(isp + 1, qa, lp_half_order - 1);
+    }
+
+    for (i = 1; i<  lp_half_order; i++) {
+        double paf = (1 + last_isp) * pa[i];
+        double qaf = (1 - last_isp) * (qa[i] - qa_old);
+
+        qa_old = qa[i - 1];
+
+        lp[i]   = 0.5 * (paf + qaf);
+        lp2[-i] = 0.5 * (paf - qaf);
+    }
+
+    lp[0] = 1.0;
+    lp[lp_half_order] = 0.5 * (1 + last_isp) * pa[lp_half_order];
+    lp2[0] = last_isp;
+}

Please double-check if this is not a duplication of sipr.c:lsp2lpc_sipr().

Interesting, I believe they are the same function except for the fact
that in AMR-WB the first LP coefficient is always 1.0 but this should
be easily adapted by passing lp + 1 to the function and setting lp[0]
= 1.0. Also, the LP order is different between these codecs. To reuse
this existing function it should be moved to another file (lsp.c I
guess) and modified. Should I do it and include in the next patch?


You can send directly a patch to ffmpeg-devel to move this code to lsp.c.

+static void decode_pitch_vector(AMRWBContext *ctx,
+                                const AMRWBSubFrame *amr_subframe,
+                                const int subframe)
+{
+    int pitch_lag_int, pitch_lag_frac;
+    int i;
+    float *exc     = ctx->excitation;
+    enum Mode mode = ctx->fr_cur_mode;
+
+    if (mode<= MODE_8k85) {
+        decode_pitch_lag_low(&pitch_lag_int,&pitch_lag_frac,
amr_subframe->ada
p,
+&ctx->base_pitch_lag, subframe, mode);
+    } else
+        decode_pitch_lag_high(&pitch_lag_int,&pitch_lag_frac,
amr_subframe->ad
ap,
+&ctx->base_pitch_lag, subframe);
+
+    ctx->pitch_lag_int = pitch_lag_int;
+    pitch_lag_int     += (pitch_lag_frac<  0 ? -1 : 0) + (pitch_lag_frac
? 1 : 0);
+
+
+    /* Calculate the pitch vector by interpolating the past excitation at
the
+       pitch lag using a hamming windowed sinc function */
+    ff_acelp_interpolatef(exc, exc + 1 - pitch_lag_int,
+                          ac_inter, 4,
+                          pitch_lag_frac + (pitch_lag_frac>  0 ? 0 : 4),
+                          LP_ORDER, AMRWB_SFR_SIZE + 1);

ac_inter is yet another hamming function. Can you check if you cannot reuse
acelp_vectors.c:ff_b60_sinc or sipr16kdata.h:sinc_win?

Well, I cannot tell precisely if it cannot be reused but I guess so.
The number of coefficients in the table is related to the fractional
resolution of the interpolation: in AMR-NB and SIPR it is 4 but in
AMR-WB it is 6. Also, I don't see similarities between the table I
extracted from the reference and these tables.


Ok, so leave it like this by now.

+/**
+ * Reduce fixed vector sparseness by smoothing with one of three IR
filters
+ * Also known as "adaptive phase dispersion"
+ *
+ * @param[in] ctx                  The context
+ * @param[in,out] fixed_vector     Unfiltered fixed vector
+ * @param[out] buf                 Space for modified vector if necessary
+ *
+ * @return The potentially overwritten filtered fixed vector address
+ */
+static float *anti_sparseness(AMRWBContext *ctx,
+                              float *fixed_vector, float *buf)

amrnbedec.c has a function with the same name. Any code can be reused?

The function skeletons are similar but there are differences
everywhere. I can think about easily reusing only about 6 lines, which
wouldn't worth much


Ok, leave it as-is then.

+/**
+ * Apply to synthesis a 2nd order high-pass filter
+ *
+ * @param[out] out                 Buffer for filtered output
+ * @param[in] hpf_coef             Filter coefficients as used below
+ * @param[in,out] mem              State from last filtering (updated)
+ * @param[in] in                   Input speech data
+ *
+ * @remark It is safe to pass the same array in in and out parameters
+ */
+static void high_pass_filter(float *out, const float hpf_coef[2][3],
+                             float mem[4], const float *in)
+{
+    int i;
+    float *x = mem - 1, *y = mem + 2; // previous inputs and outputs
+
+    for (i = 0; i<  AMRWB_SFR_SIZE; i++) {
+        float x0 = in[i];
+
+        out[i] = hpf_coef[0][0] * x0   + hpf_coef[1][0] * y[0] +
+                 hpf_coef[0][1] * x[1] + hpf_coef[1][1] * y[1] +
+                 hpf_coef[0][2] * x[2];
+
+        y[1] = y[0];
+        y[0] = out[i];
+
+        x[2] = x[1];
+        x[1] = x0;
+    }
+}

acelp_filter.c:ff_acelp_apply_order_2_transfer_function()

Are you sure? I can't see them as equivalent, could you explain?


I'll give a better look at it soon...

+/**
+ * Upsample a signal by 5/4 ratio (from 12.8kHz to 16kHz) using
+ * a FIR interpolation filter. Uses past data from before *in address
+ *
+ * @param[out] out                 Buffer for interpolated signal
+ * @param[in] in                   Current signal data (length
0.8*o_size)
+ * @param[in] o_size               Output signal length
+ */
+static void upsample_5_4(float *out, const float *in, int o_size)
+{
+    const float *in0 = in - UPS_FIR_SIZE + 1;
+    int i;
+
+    for (i = 0; i<  o_size; i++) {
+        int int_part  = (4 * i) / 5;
+        int frac_part = (4 * i) - 5 * int_part;

You can break this loop in two to avoid the division:

i = 0;
for (j = 0; j<  o_size/5; j++)
    for (k = 0; k<  5; k++) {
        ....
        i++;
    }

I've done this in not so neat way, adding 4 and calculating a simple
remainder by 5. Tell me if can be done in a better way.


Does something like the following work?

    int int_part = 0, frac_part = 0;

    i = 0;
    for (j = 0; j<  o_size / 5; j++) {
        out[i] = in[int_part];
        frac_part = 4;
        i++;
        for (k = 1; k<  5; k++) {
            out[i] = ff_dot_productf(in0 + int_part, upsample_fir[4 - 
frac_part],
                                         UPS_MEM_SIZE);
            int_part++;
            frac_part--;
            i++;
       }
    }

+/**
+ * Generate the high-band excitation with the same energy from the lower
+ * one and scaled by the given gain
+ *
+ * @param[in] ctx                  The context
+ * @param[out] hb_exc              Buffer for the excitation
+ * @param[in] synth_exc            Low-band excitation used for synthesis
+ * @param[in] hb_gain              Wanted excitation gain
+ */
+static void scaled_hb_excitation(AMRWBContext *ctx, float *hb_exc,
+                                 const float *synth_exc, float hb_gain)
+{
+    int i;
+    float energy = ff_dot_productf(synth_exc, synth_exc, AMRWB_SFR_SIZE);
+
+    /* Generate a white-noise excitation */
+    for (i = 0; i<  AMRWB_SFR_SIZE_16k; i++)
+        hb_exc[i] = 32768.0 - (uint16_t) av_lfg_get(&ctx->prng);
+
+    ff_scale_vector_to_given_sum_of_squares(hb_exc, hb_exc, energy,
+                                            AMRWB_SFR_SIZE_16k);
+
+    for (i = 0; i<  AMRWB_SFR_SIZE_16k; i++)
+        hb_exc[i] *= hb_gain;
+}

Why are you scaling it twice?

The first scaling is to match the energy with the lower band part. The
second is the high_band gain itself. This can be done in only one loop
using ff_scale_vector_to_given_sum_of_squares() ?

I think you will be obliged to duplicate a little offf_scale_vector_to_given_sum_of_squares(), but it will make your codefaster...

+/**
+ * Apply to high-band samples a 15th order filter
+ * The filter characteristic depends on the given coefficients
+ *
+ * @param[out] out                 Buffer for filtered output
+ * @param[in] fir_coef             Filter coefficients
+ * @param[in,out] mem              State from last filtering (updated)
+ * @param[in] cp_gain              Compensation gain (usually the filter
gain)
+ * @param[in] in                   Input speech data (high-band)
+ *
+ * @remark It is safe to pass the same array in in and out parameters
+ */
+static void hb_fir_filter(float *out, const float fir_coef[HB_FIR_SIZE +
1],
+                          float mem[HB_FIR_SIZE], float cp_gain, const
float *in)
+{
+    int i, j;
+    float data[AMRWB_SFR_SIZE_16k + HB_FIR_SIZE]; // past and current
samples
+
+    memcpy(data, mem, HB_FIR_SIZE * sizeof(float));
+
+    for (i = 0; i<  AMRWB_SFR_SIZE_16k; i++)
+        data[i + HB_FIR_SIZE] = in[i] / cp_gain;
+
+    for (i = 0; i<  AMRWB_SFR_SIZE_16k; i++) {
+        out[i] = 0.0;
+        for (j = 0; j<= HB_FIR_SIZE; j++)
+            out[i] += data[i + j] * fir_coef[j];
+    }
+
+    memcpy(mem, data + AMRWB_SFR_SIZE_16k, HB_FIR_SIZE * sizeof(float));
+}

I think it is cleaner (and more consistent) to do like the synthesis filter
and get one single buffer for output and memory...

The problem is that this filter memory consists of past inputs and not
outputs. If I would do with a single buffer I would have to copy the
input to it in order to avoid overwriting the last filter output.


ok

Also

+/**
+ * Parses a speech frame, storing data in the Context
+ *
+ * @param[in,out] ctx              The context
+ * @param[in] buf                  Pointer to the input buffer
+ * @param[in] buf_size             Size of the input buffer
+ *
+ * @return The frame mode
+ */
+static enum Mode unpack_bitstream(AMRWBContext *ctx, const uint8_t *buf,
+                                  int buf_size)
+{
+    GetBitContext gb;
+    uint16_t *data;
+
+    init_get_bits(&gb, buf, buf_size * 8);
+
+    /* decode frame header (1st octet) */
+    skip_bits(&gb, 1);  // padding bit
+    ctx->fr_cur_mode  = get_bits(&gb, 4);
+    ctx->fr_quality   = get_bits1(&gb);
+    skip_bits(&gb, 2);  // padding bits
+
+    // XXX: We are using only the "MIME/storage" format
+    // used by libopencore. This format is simpler and
+    // does not carry the auxiliary information of the frame
+
+    data = (uint16_t *)&ctx->frame;
+    memset(data, 0, sizeof(AMRWBFrame));
+    buf++;
+
+    if (ctx->fr_cur_mode<  MODE_SID) { /* Normal speech frame */
+        const uint16_t *perm = amr_bit_orderings_by_mode[ctx->fr_cur_mode];
+        int field_size;
+
+        while ((field_size = *perm++)) {
+            int field = 0;
+            int field_offset = *perm++;
+            while (field_size--) {
+               uint16_t bit_idx = *perm++;
+               field<<= 1;
+               /* The bit index inside the byte is reversed (MSB->LSB) */
+               field |= BIT_POS(buf[bit_idx>>  3], 7 - (bit_idx&  7));
+            }
+            data[field_offset] = field;
+        }
+    }
+
+    return ctx->fr_cur_mode;
+}

Can't you reorder the bit indexes in a way it is in the LSB->MSB so youcan use just the usual bitstream functions?


-Vitor

_______________________________________________
FFmpeg-soc mailing list
[email protected]
https://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-soc

Re: [FFmpeg-soc] [PATCH] AMR-WB Decoder

Reply via email to