Re: MQA, generalized sampling and the lot (was: about subtractive dither, for audio and other use (also scientific))

Zhiguang Zhang Sun, 30 Jan 2022 10:47:10 -0800

a bit of an aside because i'm 1) not as technical as Sampo 2) not really
working in pro-audio but this is also a point that someone wanted to make
and I didn't want to reply with a canned response about MQA


https://gearspace.com/board/showpost.php?p=15835460&postcount=460

On Sat, Jan 29, 2022 at 11:05 PM Sampo Syreeni <de...@iki.fi> wrote:

> On 2022-01-08, vicki melchior wrote:
>
> (I'm somewhat late to the fray, but hey, it's interesting stuff with
> which I'm well acquainted. Let the flamewars rage on.)
>
> > MQA is based on finite rate of innovation sampling (and
> > reconstruction) theory.  Primary authors are M. Unser, M. Vetterli,
> > P.L. Dragotti, and others.
>
> I'm aware of the work from a long time before MQA came into the scene.
> What these guys did was to start with nonuniform sampling, and show that
> you can propagate the information theoretical sharp rate bound through a
> rather generic sampling and reconstruction setup, so that instead of
> constant entropy per sample in time as with the Shannonian setup, you
> can talk extremely generally about innovation (possibly averaged time
> varying entropy production of a source) using a sampling-like setup.
> Then, pretty much anything goes as long as your sampling process
> recovers more entropy than the source process innovates.
>
> This should not be a huge surprise. The basics come down to the age-old
> idea of degrees of freedom in the solution of equation groups. Barring
> issues of noise, stability, singularity and internal symmetry, you'd
> expect a general group of N equations in N unknows to be uniquely
> soluble. The classical sampling problem is like this in the short term,
> and easily handled since we have linearity, an equispaced grid, and an
> L^2 space, which lets us go to an exact limit in the long term. What
> these guys showed is then "just" a (vast) generalization to nonuniform
> sampling, and all kinds of possible basis waveforms which can be vastly
> different from sinc pulses.
>
> Some of the applications aren't new at all, either. E.g. bandlimited
> sampling was well known before their work. There what you do is you do
> the exact same analysis as with the sampling theorem, but instead of
> assuming the spectrum is identical to zero from some given bandlimit
> upwards, you assume the spectrum is compact, i.e its support is a finite
> union of disjoint line segments. Now the antialiasing and anti-imaging
> filters assume a form where their frequency response is a linear-phase
> rectangular pulse train. Their impulse response looks wonky, but
> otherwise the sampling theorem goes through as-is. What the machinery
> amounts to is a projection on an equidistant pulse train, just not of
> sinc(x-t)'s, but something else. The reconstruction is exactly the same
> as with the Shannon-Nyquist construction.
>
> In nonuniform sampling the reconstruction and analysis filters are no
> longer identical or time-invariant, but just biorthogonal. Is this idea
> new? Nope: these ideas have been well-known since wavelets came about,
> and especially the short term craze with using overcomplete
> filterbanks/bases in signal analysis and compression. (Overcomplete
> bases require extra optimization machinery, since analysis is then an
> underdetermined problem.) So what is it that we know about the
> connections of MQA to this sort of modern sampling theory?
>
> The gist of it has to do with basis choice. In the case of MQA, the
> chosen basis is an equidistantly spaced series of polynomial splines, of
> compact support. The rationale is Craven's idea that time resolution is
> hurt by the conventional sampling paradigm, which necessarily -- even in
> the bandlimited sampling framework -- leads to continuous time basis
> functions which are nonlocal. You just cannot have compact support in
> the time and frequency bases at the same time; this is in fact the most
> basic form of the uncertainty principle (a mathematical theorem about
> Fourier transforms, and not an empirical physical thingy, as some seem
> to think).
>
> Furthermore do note that such bases can only be discretely
> shift-invariant, not continuously. That's because of one of the basic
> classification theorems in functional and harmonical analysis: every
> shift-invariant subspace of an L^2 (by slight modification, an L^1 one)
> space is spanned by some combination of quadrature sines and cosines. So
> essentially everything truly linear and time-invariant comes out of the
> bandlimited sampling framework. This implies that MQA, while linear in
> its sampling step, is not time-invariant, and it cannot be; it'll
> necessarily lead to intermodulation products related to the sample rate,
> just as if you amplitude modulated the signal before sampling with some
> harmonic combination of sinewaves having to do with the rate.
>
> Craven's approach actually grew out of an earlier paper in which he
> dealt with the problem (?) of preringing in DAC anti-imaging filters in
> the fully time-invariant way: instead of using anti-aliasing and
> anti-imaging filters with linear phase, he went with minimum phase. And
> indeed, those don't pre-ring, evenwhile being brickwall in amplitude.
> They post-ring as fuck, but then they don't compound any more badly than
> linear phase sinc filters do (since a product of minimum phase filters
> is minimum phase, and the phase response of a minimum phase filter is
> wholly determined by amplitude response, multiplying two all-pass
> minimum phase filters has to be idempotent). So why worry about
> pre-ringing so much, as to forgo conventional sampling theory?
>
> Well, there's actually some little-known evidence in the deep
> psychoacoustical literature for that. One of the most worrisome ones I
> know about comes from the sixties, when researchers still used analog
> machinery. There's at least one paper somewhere which systematically
> tested how binaural (then called dichotic) localisation accuracy
> depended on the bandwidth (and so inverse time localisation) of brief
> pulses. In it, it was shown that localisation accuracy decreased until
> the bandwidth of the signal was something like a megahertz. That of
> course is on its face insane, since even abnormally acute children can't
> hear a sustained tone past the 24-26kHz mark, and most adolescents can't
> go past 18kHz. However, the results *are* there in the literature, so
> how do you account for them?
>
> The only obvious way to understand them theoretically is that the LTI
> framework we use to understand sound is incomplete as applied to
> psychoacoustics. What we Fourier analyse as being too high in frequency
> to hear, and measure as being unhearable as sustained tones, could still
> affect how we process and perceive sound if the ears and the following
> neural processes are in some ways nonlinear. It could for example be,
> that while we can't hear a sustained tone above 20kHZ, in impulsive
> transients the exact first rise time might be measured via a mechanism
> separate from cochlear resonance, and might be smeared if your
> reconstruction filter is 1) too narrowband so as to smear things, 2) or
> prerings.
>
> Is there any newer evidence that something like that might happen? Yes,
> a little bit. First, auditory neurons are *definitely* not linear, as we
> know from the missing fundamental phenomenon, and its explanation via
> the rectifying behavior of cochlear cilia. And of course not linear
> overall, because they communicate via pulse-like action potentials in
> the first place, so that they're essentially "continuous time
> shift-invariant but quantized in amplitude". They also work
> stochastically together as fields to encode information, and not as
> individuals. They sure seem to be rather consistent in responding to
> minute changes and relaying information forward at a consistent speed,
> and they do so at low microsecond temporal accuracy. That's from
> electromicroassay data, so in fact it *is* possible there is a separate
> pathway for onset-detection of transients, whose bandwidth (in the sense
> of linear Fourier theory, now not so much applicable) is much wider than
> that of the cochlearly mediated resonance based one used to consciously
> detect sustained tones, and to do auditory scene analysis.
>
> Another one is the fact that already *decades* ago the idea of people
> not being able to sense absolute phase was debunked. True, we don't
> usually pay attention to whether sound comes at us compression or
> rarefaction first. But in fact auditory neurons respond highly
> nonlinearly to the fact, and in particular in dichotic hearing, the
> information *is* there to separate between the two cases, especially
> when overtones are present for comparison. You can demonstrably teach
> people to hear absolute phase, provided the stimulus isn't fully
> symmetrical about the mean, and in time; that wouldn't be possible if
> there wasn't something nonlinear in the way, and if there then is, that
> "something" can theoretically catch onto ultrasonics.
>
> Again, is this stuff new? No it isn't. For example, in the days of yore,
> I probably talked about SRS Labs and the like even on-list. Sony's SACD
> with its single-bit DSD idea referred to the same ideas (which I went on
> to debunk then). And of course every single increase in sampling rate
> has been sold with some argument like this since the CD appeared on the
> scene. (Perhaps the only tenable argument for higher sample rates came
> from Audio Renaissance for Acoustics, at the time people were mired in
> the Audio-DVD vs. SACD debates. That argument in fact came from the
> Ambisonics and Meridian affiliated crowd, probably including Bob Stuart,
> Peter Gerzon and Peter Craven; it suggested something like 50kHz and 14
> bits with aggressive perceptual noise shaping would *precisely* fill the
> human auditory envelope.)
>
> So, back to MQA. How does it really go into new sampling theory for
> example? Of course it does. It utilizes a set of basis functions which
> are not bandlimited, so that it can theoretically reconstruct things
> like hihgly time-localised transients which bandlimited sampling cannot.
> But that's then at the cost of not being able to perfectly reconstruct
> bandlimited signals; the finite innovation framework works to prove this
> too, in its contrary, because the proof is entropically as tight as the
> original Shannon one. "There is no free lunch."
>
> It comes down to the fact that while the modern generalized sampling
> theory does admit time-frequency tradeoffs and sampling/reconstruction
> regimes different from the highly uniform and well-behaved one of the
> classical theory (and could in principle accommodate almost arbitrary
> nonlinearity), each such system relies on an a priori assumption of what
> your source model is, and what you're thus trying to reconstruct, based
> on your sample stream. In the case of Shannon-Nyquist, the a priori
> assumption is a signal which which is bandlimited, or in the slightly
> generalized case, a signal whose spectrum is compact. In the case of
> MQA, the assumption is a signal which decomposes nicely in a
> time-variant basis composed of recurring polymial splines of some kind.
>
> The latter model is not well suited to *anything* but time-domain work.
> As I said above, it'll necessarily lead to intermodulation distortion.
> If that can be held below the masking threshold, fine, it might not be
> noticeable. And yeah, if you do some signaling and adaptation on the
> fly, perfect reconstruction is still possible even for band-limited
> signals (signal a change to a prior which is bandlimited, and
> reconstruct via the above mentioned N-determines-N-idea, using brute
> force, running matrix inversion, and what in the coding theory circuit
> is called among others "successive cancellation").
>
> This way of looking at it, probably *does* explain why they claim
> adaptability to the precise source material. Because, when you work with
> these kinds of off-kilter bases in your decomposition, of course you can
> adapt the sampling basis on the fly. If your source material has a
> certain spectrum, you can optimize your sampling spline to have similar
> characteristics. That then leads to certain matching conditions, which
> make rate-distortion analysis and error accumulation constraints go
> through much easier, like they automatically do in the Shannon-Nyquist
> framework. *However*...
>
> That stuff is much easier to do within the conventional critically
> sampled + quantizer + dither + lossless statistical compression
> framework. Like FLAC, or in fact Meridian Lossless Packing, from the
> same folks who instigated all this MQA nonsense (why? except to extract
> royalties?!?). How you do it is, you brickwall and sample
> conventionally, at a very high rate. (If you want to go as far as that
> 1MHz bound I mentioned from analogue work, you bring in a coherent,
> higher frequency but lower precision converter to fill in.) You apply
> your favourite apodising LTI filter, delta-sigma-method and whatnot, to
> arrive at a phase and amplitude spectrum to your liking. It will now
> typically be something headily bottom-sided in frequency, which you then
> quantize with well-thought-out dither; apodization takes out ringing,
> and the system remains fully translation invariant even on the
> continuous side time, so that if you just *have* to apply Craven's
> original minimum phase ideas, the receiver can do so fully by its own
> discretion. No proprietary hardware or software needed, and you get to
> choose.
>
> Then you pack it via lossless digital means. Here the MLP work is first
> rate, since it does numerical analysis of reversible, discretized
> filters, and all that. What here happens is that the unequal power of
> the analogue utility signal is systematically transformed into a
> lossless digital code. Whatever good the original, weird sampling
> machinery might have done, is more systematically transformed into a
> similarly or better performing digital coding format, using the
> well-developed machinery of statistical compression and numerical
> analysis. Without the sampling artifacts which necessarily come with
> non-bandlimited basis functions and time-variance. Best of both, nay,
> *all* worlds, say I!
>
> Of course it seems MQA *tries* to better quality as well. Its approach
> is well in line with the dithering and noise shaping and information
> hiding and active decoding principles which were in vogue in the
> literature circa 2010, and even earlier. It really *is* so that,
> psychoacoustically speaking, you can get better audio quality from low
> end formats such as CD by hiding metadata below the audible noise floor,
> instead of doing straing LPCM. The simplest way of doing this is to bury
> a dynamical compression factor there, which doesn't much need
> bandwidth/noise power, and which then can control a dynamical expander
> on the receiver side, which brings the added noise *far* below what the
> original linear recording medium would have given you. MQA and similar
> things then run with the idea, and embed more metadata, and in theory
> *can* be even better; but the choice to make the encoding such that only
> 13 bit accuracy is retained unencoded, seems more than aggressive; it's
> just stupid, and far from audiophile quality.
>
>
> Finally, about MQA as an architecture, trademark, product, and such.
> It's all that, and it's also why it's mostly so worrisome. MQA, as a
> Meridian (one of the finest audio-DSP outfits out there, to be sure,
> which is why this is so sorry) spinoff, tries to tell audiophiles what
> they're getting is the Real Deal. It does so by turning on a blue light
> when the decoder synchs and recognizes some kind of signal that the
> signal comes straight from the "original master tape".
>
> That's a proficient marketing ploy, but it doesn't hold upto scrutiny.
> Because how the fuck do we know how that purported authentication works?
> The exact protocol certainly isn't public. Does it even employ public
> key cryptography, of the standardized sort, which would be in order? We
> don't know. Does it truly fingerprint audio streams? Independent testing
> appears to show it does not. Is it lossless, even in its "unfolded"
> configuration? No, it *far* from coming out bit-perfect, adds
> considerable noise still. Does it come out ahead in independent
> listening tests? No, with many preferring the original.
>
> You cannot even conduct fair listening tests on MQA. Because the patent
> holders do not let you encode arbitrary material of your own choice, in
> order to *do* any fair tests. Every bit of material has to be sent to an
> accredited (by them) encoding lab, which might or might not accept your
> submission. Those highly regarded mastering engineers included; none of
> them get to do it by themselves either. (And at sizable cost, subject to
> no refund.) Or you could maybe send your test signals to be published at
> Tidal, the new streaming service, expounding their use of MQA...but then
> there appears to be a purpose-built machine filter at Tidal against
> anything that seems like a test signal. It passes test signals
> interspersed with live music, but not signals per se. (Goes without
> saying the thing isn't anything like free or open software.)
>
> Particularly offensive is that Tidal offers a "Hifi" option, which
> streams FLAC. We're told that at least is lossless. Not so: it's
> unflagged 16/44 MQA, and so lossy.
>
> So, in addition to the original papers underlying the technology being
> subject to lack of scrutiny, and there being serious mathematical and
> psychoacoustical objections to their premises, there are also various
> objections to the business model they've now led to. My pirate friends
> consider the whole enterprise to be an attempted IP coup, of the like of
> Sony's SACD/DSD, right at the time of CD patents expiry. The stuff
> sounds like snakeoil and all-round boogaloo -- much to my chagrin,
> because I and many others really *do* consider Stuart and Craven to be
> among the top-10 of audio engineers extant. What on God's green earth
> possessed them to sell something like this as the next best thing, when
> they *certainly* know better?!? :/
> --
> Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
> +358-40-3751464 <http://decoy.iki.fi/front+358-40-3751464>, 025E D175
> ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
>

Re: MQA, generalized sampling and the lot (was: about subtractive dither, for audio and other use (also scientific))

Reply via email to