Hello.

[Do not blindly apply patches from this e-mail! They mutually exclusive, and I don't have a firm opinion which one of them is correct.]

Today I tried to improve the existing module-virtual-surround-sink (but the same issue also affects the IIR-based rewrite that is still sitting on my laptop). The problem is: the current normalization code does not do what it is designed to do. The module clips on some testcases. Let me copy-paste the problematic code for easy discussion.

    /* normalize hrir to avoid clipping */
    hrir_max = 0;
    for (i = 0; i < u->hrir_samples; i++) {
        hrir_sum = 0;
        for (j = 0; j < u->hrir_channels; j++)
            hrir_sum += fabs(u->hrir_data[i * u->hrir_channels + j]);

        if (hrir_sum > hrir_max)
            hrir_max = hrir_sum;
    }
    if (hrir_max > 1) {
        for (i = 0; i < u->hrir_samples; i++) {
            for (j = 0; j < u->hrir_channels; j++)
                u->hrir_data[i * u->hrir_channels + j] /= hrir_max * 1.2;
        }
    }

And here is where clipping (i.e. sum_left or sum_right becoming > 1.0) can happen:

        sum_right = 0;
        sum_left = 0;

        /* fold the input buffer with the impulse response */
        for (j = 0; j < u->hrir_samples; j++) {
            for (k = 0; k < u->channels; k++) {
current_sample = u->input_buffer[((u->input_buffer_offset + j) % u->hrir_samples) * u->channels + k];

sum_left += current_sample * u->hrir_data[j * u->hrir_channels + u->mapping_left[k]]; sum_right += current_sample * u->hrir_data[j * u->hrir_channels + u->mapping_right[k]];
            }
        }

So, basically, here is the "normalization" logic: for each HRIR sample, take the sum of all absolute values of HRIR channels at that time. Find the maximum of these sums. Divide that factor out, with an extra "fudge factor" of 1.2. Without the fudge factor, the corresponding explanation in terms of the second code fragment is: if a single-sample click is played simultaneously through all 5.1 channels, the output of the filter (sum_left or sum_right) should just reach 1.0 at its maximum. With hrir-kemar.wav from http://stuff.salscheider-online.de/hrir_kemar.tar.gz (mentioned in the original submission), this logic (including the fudge factor) divides the HRIR by 1.862.

What I don't understand is why "simultaneous click in all input channels" is considered at all.

If the task is to avoid clipping even in the worst possible case, then let's construct this worst possible case. Here are the instructions. Take hrir-kemar.wav. Time-reverse it. Saturate all positive samples to 1.0 and all negative samples to -1.0. With this specially-constructed sound, sum_left would at one point in time receive the sum of all absolute values of all samples (in all channels) in the HRIR, so, that is the value to divide out, and not what is written in the first code fragment. I.e., this would be the patch (sorry if mangled, it is not meant to be applied anyway):

--- a/src/modules/module-virtual-surround-sink.c
+++ b/src/modules/module-virtual-surround-sink.c
@@ -533,7 +533,7 @@ int pa__init(pa_module*m) {

     const char *hrir_file;
     unsigned i, j, found_channel_left, found_channel_right;
-    float hrir_sum, hrir_max;
+    float hrir_sum;
     float *hrir_data;

     pa_sample_spec hrir_ss;
@@ -759,20 +759,21 @@ int pa__init(pa_module*m) {
     }

     /* normalize hrir to avoid clipping */
-    hrir_max = 0;
+    hrir_sum = 0.0;
     for (i = 0; i < u->hrir_samples; i++) {
-        hrir_sum = 0;
         for (j = 0; j < u->hrir_channels; j++)
             hrir_sum += fabs(u->hrir_data[i * u->hrir_channels + j]);
+    }
+

-        if (hrir_sum > hrir_max)
-            hrir_max = hrir_sum;
+    if (hrir_sum < 0.01) {
+        pa_log("hrir file is too quiet!");
+        goto fail;
     }
-    if (hrir_max > 1) {
-        for (i = 0; i < u->hrir_samples; i++) {
-            for (j = 0; j < u->hrir_channels; j++)
-                u->hrir_data[i * u->hrir_channels + j] /= hrir_max * 1.2;
-        }
+
+    for (i = 0; i < u->hrir_samples; i++) {
+        for (j = 0; j < u->hrir_channels; j++)
+            u->hrir_data[i * u->hrir_channels + j] /= hrir_sum;
     }

     /* create mapping between hrir and input */

(Here I also removed the "don't normalize quiet HRIRs" logic, as I don't understand its purpose and it had an obvious bug of not including the fudge factor.)

Result: after this patch, the HRIR is divided by 24.339 and the module indeed survives the worst possible testcase just without clipping. But then, it becomes too quiet for normal use. That's why I don't want you to apply the above patch.

The original code, even though it clips heavily in the worst possible testcase, indeed produces no audible clipping on most (but not all!) music DVDs that I tried, and I think that the correct problem to solve here is indeed "produce no audible clipping on typical content". So there is some grain of truth in the current code.

Still, I want to change the fudge factor, because, as of now, the module does not survive the following testcases (i.e. clips, with this clipping being very easy to detect by ear):

* A full-amplitude sine sweep from 100 Hz to 10 kHz, same samples in all channels. * The 5.1 soundtrack from the "Lichtmond 2: Universe of Light" music BluRay.

The fudge factor required to play the whole Lichtmond 2 soundtrack without clipping is 2.4, corresponding to dividing the HRIR by 3.724. So:

--- a/src/modules/module-virtual-surround-sink.c
+++ b/src/modules/module-virtual-surround-sink.c
@@ -768,11 +768,15 @@ int pa__init(pa_module*m) {
         if (hrir_sum > hrir_max)
             hrir_max = hrir_sum;
     }
-    if (hrir_max > 1) {
-        for (i = 0; i < u->hrir_samples; i++) {
-            for (j = 0; j < u->hrir_channels; j++)
-                u->hrir_data[i * u->hrir_channels + j] /= hrir_max * 1.2;
-        }
+
+    if (hrir_max < 0.01) {
+        pa_log("hrir file is too quiet!");
+        goto fail;
+    }
+
+    for (i = 0; i < u->hrir_samples; i++) {
+        for (j = 0; j < u->hrir_channels; j++)
+            u->hrir_data[i * u->hrir_channels + j] /= hrir_max * 2.4;
     }

     /* create mapping between hrir and input */

I don't like the above (working and almost minimal) patch, as it just replaces one magic number with another, and I still don't understand why the calculated hrir_max value is relevant. I.e. the same "correct" scaling factor, 3.724, can be also approximately produced this way, from the energy contained in the impulse response:

--- a/src/modules/module-virtual-surround-sink.c
+++ b/src/modules/module-virtual-surround-sink.c
@@ -533,7 +533,7 @@ int pa__init(pa_module*m) {

     const char *hrir_file;
     unsigned i, j, found_channel_left, found_channel_right;
-    float hrir_sum, hrir_max;
+    float hrir_sample, hrir_energy;
     float *hrir_data;

     pa_sample_spec hrir_ss;
@@ -758,21 +758,25 @@ int pa__init(pa_module*m) {
         goto fail;
     }

-    /* normalize hrir to avoid clipping */
-    hrir_max = 0;
+    /* normalizg hrir to avoid clipping on typical music */
+    hrir_energy = 0.0;
     for (i = 0; i < u->hrir_samples; i++) {
-        hrir_sum = 0;
-        for (j = 0; j < u->hrir_channels; j++)
-            hrir_sum += fabs(u->hrir_data[i * u->hrir_channels + j]);
+        for (j = 0; j < u->hrir_channels; j++) {
+            hrir_sample = u->hrir_data[i * u->hrir_channels + j];
+            hrir_energy += hrir_sample * hrir_sample;
+       }
+    }

-        if (hrir_sum > hrir_max)
-            hrir_max = hrir_sum;
+    if (hrir_energy < 0.0001) {
+        pa_log("hrir file is too quiet!");
+        goto fail;
     }
-    if (hrir_max > 1) {
-        for (i = 0; i < u->hrir_samples; i++) {
-            for (j = 0; j < u->hrir_channels; j++)
-                u->hrir_data[i * u->hrir_channels + j] /= hrir_max * 1.2;
-        }
+
+    hrir_sample = sqrt(hrir_energy) * 1.67;
+
+    for (i = 0; i < u->hrir_samples; i++) {
+        for (j = 0; j < u->hrir_channels; j++)
+            u->hrir_data[i * u->hrir_channels + j] /= hrir_sample;
     }

     /* create mapping between hrir and input */

Of course, the same criticism applies as for the second patch - why would the energy be relevant here?

Some final remarks.

I don't insist on the 3.724 scaling factor and can accept occasional clipping if it is not detectable by ear.

Due to the fact that we are dealing with "typical content", I understand that a fudge factor is needed anyway, to deal with its typicalness.

The code has to deal with HRIRs of arbitrary origin, including those from http://stuff.salscheider-online.de/hrir_listen.tar.gz , that's why a static factor to divide out won't cut it. But I haven't tested any other HRIRs yet.

Wrong equalization of the HRIRs (i.e. expensive headphones sounding like cheap plugs with this module) is a separate bug that will be addressed in a separate e-mail when I become able to discuss it.

Thoughts?

--
Alexander E. Patrakov

_______________________________________________
pulseaudio-discuss mailing list
pulseaudio-discuss@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss

Reply via email to