08.02.2015 13:21, Alexander E. Patrakov wrote:
08.02.2015 02:14, Georg Chini wrote:
Sorry, but I do not think the smoother is the problem here. I do get
quite reliable latency results.
The problem is really (if there is a problem at all) the execution time
of the code. These are not
asynchronously called functions, they wait until they are finished. And
that exactly is the problem,
sometimes the queue is quite full and so it takes a lot of time until
the pa_asyncmsgq_send() returns.
The smoother was one of the first things I suspected to be responsible
for false reports, but I could
not verify it. Since I measure the time for the second call to
pa_asyncmsgq_send() the numbers
reported look ok.
I think there really is some "latency jitter" that cannot be avoided -
interrupts that cannot be handled
immediately, USB bus in use when the sound card wants to deliver data
and you can probably come
up with a lot more situations where it is possible that data cannot be
delivered on time. You'll never get
perfect results when you measure something and so you cannot apply a
perfect regulator even
if it would be nice in theory.

OK, but I cannot reproduce anything of this kind at home, so cannot really argue. Anyway, we could implement some kind of median filter in order to suppress this jitter - but this would be an incremental improvement, let's first review what we have :)


Unfortunately, this is easy to recommend, but I can't really see how
this can be done.

The smoother is updated _after_ a successful write to the alsa device
(via traditional UNIX write or via mmap), while the pop callbacks are
executed just before that. So, they are called at the moment when the
influence from a bad rate estimation via the smoother is the greatest.

Now the suggestions.

I think that, ideally, for such use cases, it is important to have
timestamped latency snapshots for both sinks and sources in PulseAudio
core. This would mean introducing a new message that gets the latest
reliable latency snapshot (i.e., timestamp according to the wall
clock, send/receive counter, input/output buffer size, and the latency
itself), without any interpolation. If the sink does not implement
this, just fallback (in the generic sink code) to getting the current
latency. Also, because such snapshots for the sink and the source will
not happen at the same moment, you have to deal with it.

You can actually try and get both snapshots at the same time. I did this
and was quite astonished to
find that the results were less reliable this way. I could not figure
out why. (You can call
get_latency_in_thread() for source and sink from both snapshot functions
without crashing pulse,
at least when you make sure they are ONLY called from the timer
function. Something else seems to
call one of the snapshots for whatever reason).

I have read this, but have no ideas. Anyway, latency reports obtained this way also rely on the smoother.



Also, I have a very heretic thought. Namely, that the smoother in the
alsa sink and source may actually be a bad idea and is better removed.
I have not tested this. But it is used only in two places: for
reporting latency (where it confuses your module) and for calculating
the amount of time to sleep

As I said, I think the latency deviations I see are real and not
artifacts, so there is no confusion.

in the case of timer-based scheduling (where even module-alsa-sink
does not trust the result, i.e. discards it if it is greater than the
non-transformed time interval). And, if I recollect correctly, there
were complaints about it being fooled by batch cards, and they were
cited as one of the reasons not to enable timer-based scheduling on
batch cards. So - maybe, for the purposes of timer based-scheduling we
should just assume the worst case, i.e. the card that is, say, 0.75%
faster than nominal, and use the nominal rate together with the latest
snapshot time in {source,sink}_get_latency()? Basically, the fear is
that the smoother makes a greater mistake in the estimated rate than
just assuming the nominal one. Maybe you can try this suggestion?


For timer based scheduling the regulator works perfect, you would not
even need a stop criterion,
so why bother?

I think there is some misunderstanding. Let me repeat in a different way.

The smoother works perfectly (both for timer-based scheduling and for the needs of your module) on non-batch cards.

But, even for batch cards, where timer-based scheduling is disabled, the smoother is active and is actually used for reporting the latency to your module. An attempt to use the smoother for timer-based scheduling on batch cards has failed. That's why I suspect that it, on batch cards, also tells lies to your module.


For Tanu's patch status page: please leave the status of this patch as
unreviewed. The general idea of the patch does not look brilliant, but
it's the best known-working idea that we currently have on the topic,
and I have not reviewed all the fine details.


Well from a practical point of view it does a pretty good job although
the idea may not be brilliant.
I'm willing to implement your better idea when you come up with it.
  Did you ever test it? And compare it to what the current
module-loopback does?

I did not test it, will do it now and add some logging in order to
verify what you said above. And hopefully will try to implement an
alternative latency-snapshotting implementation, just to compare.


I can confirm (based on a reimplementation attempt) that the code after patching deals with the capture and playback timestamp difference 100% correctly - so it cannot be the problem. Just a minor nitpick: I moved saving of the timestamp to the message handlers. For me, this makes no difference, though. The patch (to be applied on top of yours) is attached. Could you please confirm or disprove that it makes no difference in your setup, either?

So, the current status of the patch, from my viewpoint, is:

1. The patch adds a perfectly correct (assuming no xruns) way to account for latency snapshots being made not simultaneously for playback and capture. I think that this is the main improvement, and it needs to be merged even if we disagree on the rest.

2. The result has an optimal coefficient that relates the observed latency difference and the resulting rate correction, assuming the currently-implemented way to snapshot the latency and assuming no interference from the smoother - which still has to be verified independently, possibly after merging.

3. The patch adds buffer_latency_msec, which seems to be an unrelated improvement, and I think it should be split out. I have no opinion on whether this change should be merged.

4. The patch has a criterion when to stop adjusting rates, and it is a source of disagreements. But I could not suggest anything constructive. So I think that a good approach would be to split it out and let others comment. Also, it would be a good idea to add a debugging message so that we can see when it happens.

If you want, I can do the splitting for you.

--
Alexander E. Patrakov
>From 8ab49257877163a609cb1c26ead2a48be77947e5 Mon Sep 17 00:00:00 2001
From: "Alexander E. Patrakov" <patra...@gmail.com>
Date: Sun, 8 Feb 2015 14:35:13 +0500
Subject: [PATCH] Moved timestamp saving closer to latency saving

---
 src/modules/module-loopback.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/modules/module-loopback.c b/src/modules/module-loopback.c
index 8def03e..1b76ef5 100644
--- a/src/modules/module-loopback.c
+++ b/src/modules/module-loopback.c
@@ -108,10 +108,12 @@ struct userdata {
         int64_t send_counter;
         size_t source_output_buffer;
         pa_usec_t source_latency;
+        pa_usec_t source_timestamp;
 
         int64_t recv_counter;
         size_t sink_input_buffer;
         pa_usec_t sink_latency;
+        pa_usec_t sink_timestamp;
     } latency_snapshot;
 };
 
@@ -182,12 +184,13 @@ static void teardown(struct userdata *u) {
 }
 
 /* Called from main context */
-static void adjust_rates(struct userdata *u, pa_usec_t snapshot_delay) {
+static void adjust_rates(struct userdata *u) {
     size_t buffer;
     uint32_t old_rate, base_rate, new_rate, hours, cut_off_frequency;
     pa_usec_t final_latency, source_sink_latency, current_buffer_latency, current_latency, corrected_latency;
     double min_cycles;
     int32_t latency_difference;
+    pa_usec_t snapshot_delay;
 
     pa_assert(u);
     pa_assert_ctl_context();
@@ -227,6 +230,7 @@ static void adjust_rates(struct userdata *u, pa_usec_t snapshot_delay) {
         buffer = PA_CLIP_SUB(buffer, (size_t) (u->latency_snapshot.recv_counter - u->latency_snapshot.send_counter));
     current_buffer_latency = pa_bytes_to_usec(buffer, &u->sink_input->sample_spec);
 
+    snapshot_delay = u->latency_snapshot.sink_timestamp - u->latency_snapshot.source_timestamp;
     current_latency = u->latency_snapshot.sink_latency + current_buffer_latency + u->latency_snapshot.source_latency - snapshot_delay;
 
     source_sink_latency = u->sink_latency_sum / u->sink_adjust_counter +
@@ -282,7 +286,6 @@ static void adjust_rates(struct userdata *u, pa_usec_t snapshot_delay) {
 /* Called from main context */
 static void time_callback(pa_mainloop_api *a, pa_time_event *e, const struct timeval *t, void *userdata) {
     struct userdata *u = userdata;
-    pa_usec_t timestamp;
 
     pa_assert(u);
     pa_assert(a);
@@ -294,10 +297,9 @@ static void time_callback(pa_mainloop_api *a, pa_time_event *e, const struct tim
     /* Get sink and source latency snapshot. The resulting latency must be corrected by the delay between sink and
      * source snapshot which can somtimes be quite large (>>10 ms) */
     pa_asyncmsgq_send(u->sink_input->sink->asyncmsgq, PA_MSGOBJECT(u->sink_input), SINK_INPUT_MESSAGE_LATENCY_SNAPSHOT, NULL, 0, NULL);
-    timestamp = pa_rtclock_now();
     pa_asyncmsgq_send(u->source_output->source->asyncmsgq, PA_MSGOBJECT(u->source_output), SOURCE_OUTPUT_MESSAGE_LATENCY_SNAPSHOT, NULL, 0, NULL);
 
-    adjust_rates(u, pa_rtclock_now() - timestamp);
+    adjust_rates(u);
 }
 
 /* Called from main context
@@ -413,6 +415,7 @@ static int source_output_process_msg_cb(pa_msgobject *obj, int code, void *data,
             u->latency_snapshot.send_counter = u->send_counter;
             u->latency_snapshot.source_output_buffer = u->source_output->thread_info.resampler ? pa_resampler_result(u->source_output->thread_info.resampler, length) : length;
             u->latency_snapshot.source_latency = pa_source_get_latency_within_thread(u->source_output->source);
+            u->latency_snapshot.source_timestamp = pa_rtclock_now();
 
             return 0;
         }
@@ -713,6 +716,7 @@ static int sink_input_process_msg_cb(pa_msgobject *obj, int code, void *data, in
             u->latency_snapshot.sink_input_buffer = pa_memblockq_get_length(u->memblockq) +
                                                     (u->sink_input->thread_info.resampler ? pa_resampler_request(u->sink_input->thread_info.resampler, length) : length);
             u->latency_snapshot.sink_latency = pa_sink_get_latency_within_thread(u->sink_input->sink);
+            u->latency_snapshot.sink_timestamp = pa_rtclock_now();
 
             return 0;
         }
-- 
2.2.1

_______________________________________________
pulseaudio-discuss mailing list
pulseaudio-discuss@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pulseaudio-discuss

Reply via email to