Re: about subtractive dither, for audio and other use (also scientific)

Sampo Syreeni Sat, 20 Nov 2021 14:20:43 -0800

On 2021-11-20, Andy Farnell wrote:

Now maybe your problem looks like this (unless I have gravelymisunderstood it); You want to find a set of known codes with the beststatistical properties for signal dithering which can be expressedgeneratively (perhaps as a set of keys for generators of keys forgenerators... chained thusly) that can be compactly sent or pre-knownby the receiver. Searching (resyncing) is now measured inmilliseconds.

This is not too far from what I'm doing at the moment, actually. Butthere are those pesky little things which come from audio and statisticsin the way, which have lead to my current (unworkable; why I'm asking)architecture.

Basically, if we had reliable metadata telling us the system was or wasnot used, I could probably just use a linear feedback shift register oflong enough period, using a random seed conveyed in the metadata, and bedone with it. That's a problem solved on the crypto side, for somedecades now. However, there are at least four problems with that.

First, we don't often have a side channel available on which to reliablyconvey whether the scheme was applied. So, if you subtract out somethingwhich wasn't there to begin with, you add another RPDF/TPDF worth ofnoise instead.

Second, the synchronization: only coherent subtraction works, so ifthere's a timing glitch of even a single sampling time, even with anencoded bitstreaam you'll immediately go from coherent to incoherentsubtraction for the rest of the stream, adding instead of subtractingnoise.

Third, if you can't reliably pass a random seed to the decoder everytime, so that you have to start with a known initial condition for yourPRNG, you might not have much of a problem with one-shot audio signals,but you're definitely going to end up with trouble when you work withensembles of signals, because the dither signal is then always the same,and piles up in ensemble averages. That might seem like a small deal,but let's just think about what this would e.g. do to a deep learningbased speech recognition software's training step: the learner couldquite easily latch onto the constant, recurring dither signal. Appliedover a corpus of millions of training signals, the dither might easilyamplify into a feature beyond any desirable feature to be extracted.

And fourth, *actually* LSFR's don't survive too many tests ofindistinguishability from true randomness. That's usually more of anissue for cryptographic work, and things like Monte Carlo methods incomputational physics. But in fact it *does* show up in audio work aswell, from time to time: e.g. fast shift-and-add algorithms for impulseresponse measurement infamously react *really* badly to any nonlinearitywhen the probe noise is derived from an LSFR. That brings on a spectreof further concerns, which I'd like to engineer through once and forall, instead of leaving them hanging. I'd like to make my scheme theno-nonsense go-to for any digitized signal, after all; the silverbullet, built into anything and everything.

So, let me recount my current solution in a bit more historical detail,wrt how it tries to address the above concerns.

My first iteration was just subtractive dither using an LSFR. Synchronyand metadata was assumed. Of course it can't work out in the wild, butthat's what you build as a proof of concept. It used the orthodox 1 bitpeak-to-peak RPDF as the dither. It makes even 4-bit audio surprisinglypleasant to listen to, because you can suddenly hear well below thenoise floor, and compared to additive TPDF, the drop in noise floor ishighly noticeable.

Of course, next I added a second generator LSFR with a period mutuallyprime to the first one, in order to arrive at TPDF. In subtractive modethere is of course no difference, but in additive mode, TPDF of courseworks better. Next I also implemented the oldest psychoacoustic trick ofthem all in dither, which is to say I differenced two subsequent dithervalues, leading to a first order highpass filter. Even better quality,but not something I'm aiming at in the end.

Now, the blind synchronization problem. If I want the technique to bebroadly applicable, it needs to see -- even without metadata -- whetherit is being used. Also because of the above considerations, it needs tobe able to key itself randomly to avoid leaving repetitive traces ofitself on running data. It also needs to do all of this without relyingon fully correlative machinery, because that sort of thing is tooexpensive to be implemented on every PCM I/O-line out there.Compatibility between different word widths is also an issue, because aswe know from audio practice, 16 bit signals are often passed over 24 bitinterfaces, simply keeping the lower order (maybe even some of thehigher order) bits zero.

So, I do that parity trick over every word received. It's not sensitiveto the relative alignment of the utility data within a sample word. Atthe same time it's an optimal collector of entropy over the bits: anyentropy in any of the bits of the word will be reflected into theparity, so that even if one of them holds evidence of true physicalnoise, the parity will be at least as noisy. This is a trick borrowedfrom entropy pool work on the crypto side. Additionally, the trick ischeap in both hardware and software, and in fact on the CPU side oftencomes as a side effect of any arithmetic operation, and sometimes even aload. It's inelegant to implement in plain C, but still quite doable andreasonably fast.

Next, I maintain a small deterministic automaton which decodes acomma-free code from the bitstream assembled above. The idea here isthat *if* the bitstream above gathers as much entropy from the noiseinherent to digitized analogue signals, it'll be statistically more orless white and independently distributed.

If you want to do synchro the normal way, you'd correlate some amount ofsuccessive bits against a known sync word and go from there. However,correlation is expensive, so that we'll do *stochastic*, *pseudorandom*correlation instead. This is a trick I learnt from rsync, which firstdoes a fast CRC and only then goes into further validation, and atthe same time from the drop-out sampling methods: if you want to derivecircularly uniform noise on the 2D plane, often the best method is notto derive it directly, but just derive a rectangular distribution,measure whether it is within a circle, and then just drop off andrepeat.

Thus what I do is I gather a slow speed bitstream from the inversecoding, which latches that compressed parity bit into a 32 bit shiftregister, every so rarely. The idea is that we slowly gain entropy fromthe incoming stream, sampling it stochastically, and that the intervalof sampling -- i.e. when we advance the long shift register -- doessomething like a stochastic exponential backoff, into the past.

That's then because I'm aiming for very little hardware and memory, sothat this kind of thing could become universal. In a cheap device, youdon't have the luxury of a lot of memory, even if you process tons ofdata per microsecond. So if you want to rekey your dither generatorreliably, so that you don't add perceptible patterns into the signalwhile rekeying, you have to be capable of scrambling your PRNG usingnoise from rather a long time back in the signal. I mean, it's notuncommon for entire seconds of digital zero to come in, with no entropyin them, and so on. If you relied on straight correlative machinery,you'd have to remember what has gone before, and nobody can afford thatkind of machinery for what is essentially a finishing touch, and not thereal meat.

So instead we sample every now and then. The sampling algorithm proceedsby comparing a fixed number of the bits in the long shift registeragainst a constant. If the bits in the register match, a new bit isshifted in. Under the theory that we have a good entropy extractor inthe XOR trick, what is being tested is a fully white sequence, so thatit does not matter which value we test against. Then, since the testingand latching interval itself depends on a hit, the testing intervalrelevant to resynch/latching will itself spread out (stochastically)exponentially in the the past. With very little circuitry orcomputational load.

Penultimately, the problem, and two things I want to implement. Theproblem is that since self-synchronizing gizmoes like mine need to readtheir own output on the encoder side, so that the decoder can stayonboard, my current implementation breaks up. It often produces veryshort-period resynch cycles, so that the codec, as it is, does moredamage to the signal than good. I'm very certain the basic *approach* issound, but since the implementation is plenty complex, I'm havingtrouble pinpointing what precisely leads to this higher than expectedreset velocity. Certainly I don't know how to analyze the total dynamicsof the feedback loop so as to guarantee those nice statistical boundsI'd like to, to begin with.

In implementation, there are also two things lacking as of now. First,positive, correlative latch-on. Even now, my code does catch on. But forhigh fidelity open-ended work, I'd like it to confirm before it latches.In my architecture it's easy if you know which S/N ratio you're workingat, or bitwidth, because after a latch, you know your generated dithersequence, and can correlate it against your samples. It would beefficient, because you don't need to do a sliding *correlation*, butjust a single vector product. The trouble is that doing that optimallystill depends on relative levels and the like.

Finally, we'd also like to solve the dual problem of unlatching onto astream after error. While the machinery used to solve the positive synchproblem is pretty much the same as in here, really disengaging this kindof active processing from the signal chain might have to happen muchmore rapidly and reliably. In order not to degrade valuable data, say,from a multi-billion dollar space probe. And in these kinds ofmeasurement/telemetry applications, it *would* have to be possible tolater on unprocess from the signal even coding faults which would be ofno significance in usual audio work.

Seriously, I've thought about this stuff for quite a while now, andwould like ideas. (Some of which I've already gotten; thank ya'll!)

--
Sampo Syreeni, aka decoy - [email protected], http://decoy.iki.fi/front
+358-40-3751464, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Re: about subtractive dither, for audio and other use (also scientific)

Reply via email to