Self-review / heads-up before this gets reviewer time: I have found a
demuxer bug in v2 that affects real-world DS2 QP recordings, and I would
like to fix it in a v3 rather than have it reviewed as-is.

The DS2 demuxer treats the block stream as a plain concatenation of the
506-byte payloads. That is correct for continuous recordings, but
voice-activated dictation (the common real-world case) inserts "empty"
blocks (per-block frame_count == 0) at every pause. An empty block's
payload carries only the few continuation bytes that finish the frame
straddling the block boundary, followed by padding; the rest must be
discarded. v2 feeds the whole payload into the frame stream, which
desyncs every frame after the first pause -- output is bit-exact up to
the first pause and decorrelated noise afterwards.

This is the same issue the dss-codec spec attributes to the existing
libavformat/dss.c ("does NOT handle empty blocks, producing corrupt
output"), and is likely the root of the original trac #6091 symptom
("distorted, duration doubled"). It surfaced only after deployment,
because our pre-merge validation -- and the current FATE sample -- used
gap-free read-throughs that contain no empty blocks.

The fix is small and local to the demuxer. For a frame_count == 0 block,
emit only the continuation bytes and drop the rest:

    cont_size = 2 * header[1] + 2 * swap - 6     /* swap == 0 in QP */

Blocks with a non-zero frame count are untouched, so files without empty
blocks decode byte-for-byte identically (no regression). I have verified
the fixed output bit-exact against the licensed Olympus decoder across an
18-minute paused recording.

v3 will contain:
  - the demuxer empty-block fix, and
  - a new FATE sample that actually exercises a pause (the current
    sample-qp.ds2 is continuous and would pass with or without the fix,
    so it cannot guard this path), with its regenerated framecrc.

One known gap I will flag rather than hide: a rare "over-count" block
(frame_count larger than a block can physically hold) seen once at a
28-block group boundary is not covered by the empty-block rule and is
not described in any reference I have. It appears to be a distinct, much
rarer segment-boundary event; v3 will detect such blocks and skip
cleanly rather than emit subtly wrong audio, pending proper
reverse-engineering.

Apologies for the extra round-trip -- better a correct decoder than a
fast merge. v3 to follow once I have recorded a permissively-licensed
paused sample for FATE.

Guillain
_______________________________________________
ffmpeg-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to