a bit of an aside because i'm 1) not as technical as Sampo 2) not really working in pro-audio but this is also a point that someone wanted to make and I didn't want to reply with a canned response about MQA
https://gearspace.com/board/showpost.php?p=15835460&postcount=460 On Sat, Jan 29, 2022 at 11:05 PM Sampo Syreeni <de...@iki.fi> wrote: > On 2022-01-08, vicki melchior wrote: > > (I'm somewhat late to the fray, but hey, it's interesting stuff with > which I'm well acquainted. Let the flamewars rage on.) > > > MQA is based on finite rate of innovation sampling (and > > reconstruction) theory. Primary authors are M. Unser, M. Vetterli, > > P.L. Dragotti, and others. > > I'm aware of the work from a long time before MQA came into the scene. > What these guys did was to start with nonuniform sampling, and show that > you can propagate the information theoretical sharp rate bound through a > rather generic sampling and reconstruction setup, so that instead of > constant entropy per sample in time as with the Shannonian setup, you > can talk extremely generally about innovation (possibly averaged time > varying entropy production of a source) using a sampling-like setup. > Then, pretty much anything goes as long as your sampling process > recovers more entropy than the source process innovates. > > This should not be a huge surprise. The basics come down to the age-old > idea of degrees of freedom in the solution of equation groups. Barring > issues of noise, stability, singularity and internal symmetry, you'd > expect a general group of N equations in N unknows to be uniquely > soluble. The classical sampling problem is like this in the short term, > and easily handled since we have linearity, an equispaced grid, and an > L^2 space, which lets us go to an exact limit in the long term. What > these guys showed is then "just" a (vast) generalization to nonuniform > sampling, and all kinds of possible basis waveforms which can be vastly > different from sinc pulses. > > Some of the applications aren't new at all, either. E.g. bandlimited > sampling was well known before their work. There what you do is you do > the exact same analysis as with the sampling theorem, but instead of > assuming the spectrum is identical to zero from some given bandlimit > upwards, you assume the spectrum is compact, i.e its support is a finite > union of disjoint line segments. Now the antialiasing and anti-imaging > filters assume a form where their frequency response is a linear-phase > rectangular pulse train. Their impulse response looks wonky, but > otherwise the sampling theorem goes through as-is. What the machinery > amounts to is a projection on an equidistant pulse train, just not of > sinc(x-t)'s, but something else. The reconstruction is exactly the same > as with the Shannon-Nyquist construction. > > In nonuniform sampling the reconstruction and analysis filters are no > longer identical or time-invariant, but just biorthogonal. Is this idea > new? Nope: these ideas have been well-known since wavelets came about, > and especially the short term craze with using overcomplete > filterbanks/bases in signal analysis and compression. (Overcomplete > bases require extra optimization machinery, since analysis is then an > underdetermined problem.) So what is it that we know about the > connections of MQA to this sort of modern sampling theory? > > The gist of it has to do with basis choice. In the case of MQA, the > chosen basis is an equidistantly spaced series of polynomial splines, of > compact support. The rationale is Craven's idea that time resolution is > hurt by the conventional sampling paradigm, which necessarily -- even in > the bandlimited sampling framework -- leads to continuous time basis > functions which are nonlocal. You just cannot have compact support in > the time and frequency bases at the same time; this is in fact the most > basic form of the uncertainty principle (a mathematical theorem about > Fourier transforms, and not an empirical physical thingy, as some seem > to think). > > Furthermore do note that such bases can only be discretely > shift-invariant, not continuously. That's because of one of the basic > classification theorems in functional and harmonical analysis: every > shift-invariant subspace of an L^2 (by slight modification, an L^1 one) > space is spanned by some combination of quadrature sines and cosines. So > essentially everything truly linear and time-invariant comes out of the > bandlimited sampling framework. This implies that MQA, while linear in > its sampling step, is not time-invariant, and it cannot be; it'll > necessarily lead to intermodulation products related to the sample rate, > just as if you amplitude modulated the signal before sampling with some > harmonic combination of sinewaves having to do with the rate. > > Craven's approach actually grew out of an earlier paper in which he > dealt with the problem (?) of preringing in DAC anti-imaging filters in > the fully time-invariant way: instead of using anti-aliasing and > anti-imaging filters with linear phase, he went with minimum phase. And > indeed, those don't pre-ring, evenwhile being brickwall in amplitude. > They post-ring as fuck, but then they don't compound any more badly than > linear phase sinc filters do (since a product of minimum phase filters > is minimum phase, and the phase response of a minimum phase filter is > wholly determined by amplitude response, multiplying two all-pass > minimum phase filters has to be idempotent). So why worry about > pre-ringing so much, as to forgo conventional sampling theory? > > Well, there's actually some little-known evidence in the deep > psychoacoustical literature for that. One of the most worrisome ones I > know about comes from the sixties, when researchers still used analog > machinery. There's at least one paper somewhere which systematically > tested how binaural (then called dichotic) localisation accuracy > depended on the bandwidth (and so inverse time localisation) of brief > pulses. In it, it was shown that localisation accuracy decreased until > the bandwidth of the signal was something like a megahertz. That of > course is on its face insane, since even abnormally acute children can't > hear a sustained tone past the 24-26kHz mark, and most adolescents can't > go past 18kHz. However, the results *are* there in the literature, so > how do you account for them? > > The only obvious way to understand them theoretically is that the LTI > framework we use to understand sound is incomplete as applied to > psychoacoustics. What we Fourier analyse as being too high in frequency > to hear, and measure as being unhearable as sustained tones, could still > affect how we process and perceive sound if the ears and the following > neural processes are in some ways nonlinear. It could for example be, > that while we can't hear a sustained tone above 20kHZ, in impulsive > transients the exact first rise time might be measured via a mechanism > separate from cochlear resonance, and might be smeared if your > reconstruction filter is 1) too narrowband so as to smear things, 2) or > prerings. > > Is there any newer evidence that something like that might happen? Yes, > a little bit. First, auditory neurons are *definitely* not linear, as we > know from the missing fundamental phenomenon, and its explanation via > the rectifying behavior of cochlear cilia. And of course not linear > overall, because they communicate via pulse-like action potentials in > the first place, so that they're essentially "continuous time > shift-invariant but quantized in amplitude". They also work > stochastically together as fields to encode information, and not as > individuals. They sure seem to be rather consistent in responding to > minute changes and relaying information forward at a consistent speed, > and they do so at low microsecond temporal accuracy. That's from > electromicroassay data, so in fact it *is* possible there is a separate > pathway for onset-detection of transients, whose bandwidth (in the sense > of linear Fourier theory, now not so much applicable) is much wider than > that of the cochlearly mediated resonance based one used to consciously > detect sustained tones, and to do auditory scene analysis. > > Another one is the fact that already *decades* ago the idea of people > not being able to sense absolute phase was debunked. True, we don't > usually pay attention to whether sound comes at us compression or > rarefaction first. But in fact auditory neurons respond highly > nonlinearly to the fact, and in particular in dichotic hearing, the > information *is* there to separate between the two cases, especially > when overtones are present for comparison. You can demonstrably teach > people to hear absolute phase, provided the stimulus isn't fully > symmetrical about the mean, and in time; that wouldn't be possible if > there wasn't something nonlinear in the way, and if there then is, that > "something" can theoretically catch onto ultrasonics. > > Again, is this stuff new? No it isn't. For example, in the days of yore, > I probably talked about SRS Labs and the like even on-list. Sony's SACD > with its single-bit DSD idea referred to the same ideas (which I went on > to debunk then). And of course every single increase in sampling rate > has been sold with some argument like this since the CD appeared on the > scene. (Perhaps the only tenable argument for higher sample rates came > from Audio Renaissance for Acoustics, at the time people were mired in > the Audio-DVD vs. SACD debates. That argument in fact came from the > Ambisonics and Meridian affiliated crowd, probably including Bob Stuart, > Peter Gerzon and Peter Craven; it suggested something like 50kHz and 14 > bits with aggressive perceptual noise shaping would *precisely* fill the > human auditory envelope.) > > So, back to MQA. How does it really go into new sampling theory for > example? Of course it does. It utilizes a set of basis functions which > are not bandlimited, so that it can theoretically reconstruct things > like hihgly time-localised transients which bandlimited sampling cannot. > But that's then at the cost of not being able to perfectly reconstruct > bandlimited signals; the finite innovation framework works to prove this > too, in its contrary, because the proof is entropically as tight as the > original Shannon one. "There is no free lunch." > > It comes down to the fact that while the modern generalized sampling > theory does admit time-frequency tradeoffs and sampling/reconstruction > regimes different from the highly uniform and well-behaved one of the > classical theory (and could in principle accommodate almost arbitrary > nonlinearity), each such system relies on an a priori assumption of what > your source model is, and what you're thus trying to reconstruct, based > on your sample stream. In the case of Shannon-Nyquist, the a priori > assumption is a signal which which is bandlimited, or in the slightly > generalized case, a signal whose spectrum is compact. In the case of > MQA, the assumption is a signal which decomposes nicely in a > time-variant basis composed of recurring polymial splines of some kind. > > The latter model is not well suited to *anything* but time-domain work. > As I said above, it'll necessarily lead to intermodulation distortion. > If that can be held below the masking threshold, fine, it might not be > noticeable. And yeah, if you do some signaling and adaptation on the > fly, perfect reconstruction is still possible even for band-limited > signals (signal a change to a prior which is bandlimited, and > reconstruct via the above mentioned N-determines-N-idea, using brute > force, running matrix inversion, and what in the coding theory circuit > is called among others "successive cancellation"). > > This way of looking at it, probably *does* explain why they claim > adaptability to the precise source material. Because, when you work with > these kinds of off-kilter bases in your decomposition, of course you can > adapt the sampling basis on the fly. If your source material has a > certain spectrum, you can optimize your sampling spline to have similar > characteristics. That then leads to certain matching conditions, which > make rate-distortion analysis and error accumulation constraints go > through much easier, like they automatically do in the Shannon-Nyquist > framework. *However*... > > That stuff is much easier to do within the conventional critically > sampled + quantizer + dither + lossless statistical compression > framework. Like FLAC, or in fact Meridian Lossless Packing, from the > same folks who instigated all this MQA nonsense (why? except to extract > royalties?!?). How you do it is, you brickwall and sample > conventionally, at a very high rate. (If you want to go as far as that > 1MHz bound I mentioned from analogue work, you bring in a coherent, > higher frequency but lower precision converter to fill in.) You apply > your favourite apodising LTI filter, delta-sigma-method and whatnot, to > arrive at a phase and amplitude spectrum to your liking. It will now > typically be something headily bottom-sided in frequency, which you then > quantize with well-thought-out dither; apodization takes out ringing, > and the system remains fully translation invariant even on the > continuous side time, so that if you just *have* to apply Craven's > original minimum phase ideas, the receiver can do so fully by its own > discretion. No proprietary hardware or software needed, and you get to > choose. > > Then you pack it via lossless digital means. Here the MLP work is first > rate, since it does numerical analysis of reversible, discretized > filters, and all that. What here happens is that the unequal power of > the analogue utility signal is systematically transformed into a > lossless digital code. Whatever good the original, weird sampling > machinery might have done, is more systematically transformed into a > similarly or better performing digital coding format, using the > well-developed machinery of statistical compression and numerical > analysis. Without the sampling artifacts which necessarily come with > non-bandlimited basis functions and time-variance. Best of both, nay, > *all* worlds, say I! > > Of course it seems MQA *tries* to better quality as well. Its approach > is well in line with the dithering and noise shaping and information > hiding and active decoding principles which were in vogue in the > literature circa 2010, and even earlier. It really *is* so that, > psychoacoustically speaking, you can get better audio quality from low > end formats such as CD by hiding metadata below the audible noise floor, > instead of doing straing LPCM. The simplest way of doing this is to bury > a dynamical compression factor there, which doesn't much need > bandwidth/noise power, and which then can control a dynamical expander > on the receiver side, which brings the added noise *far* below what the > original linear recording medium would have given you. MQA and similar > things then run with the idea, and embed more metadata, and in theory > *can* be even better; but the choice to make the encoding such that only > 13 bit accuracy is retained unencoded, seems more than aggressive; it's > just stupid, and far from audiophile quality. > > > Finally, about MQA as an architecture, trademark, product, and such. > It's all that, and it's also why it's mostly so worrisome. MQA, as a > Meridian (one of the finest audio-DSP outfits out there, to be sure, > which is why this is so sorry) spinoff, tries to tell audiophiles what > they're getting is the Real Deal. It does so by turning on a blue light > when the decoder synchs and recognizes some kind of signal that the > signal comes straight from the "original master tape". > > That's a proficient marketing ploy, but it doesn't hold upto scrutiny. > Because how the fuck do we know how that purported authentication works? > The exact protocol certainly isn't public. Does it even employ public > key cryptography, of the standardized sort, which would be in order? We > don't know. Does it truly fingerprint audio streams? Independent testing > appears to show it does not. Is it lossless, even in its "unfolded" > configuration? No, it *far* from coming out bit-perfect, adds > considerable noise still. Does it come out ahead in independent > listening tests? No, with many preferring the original. > > You cannot even conduct fair listening tests on MQA. Because the patent > holders do not let you encode arbitrary material of your own choice, in > order to *do* any fair tests. Every bit of material has to be sent to an > accredited (by them) encoding lab, which might or might not accept your > submission. Those highly regarded mastering engineers included; none of > them get to do it by themselves either. (And at sizable cost, subject to > no refund.) Or you could maybe send your test signals to be published at > Tidal, the new streaming service, expounding their use of MQA...but then > there appears to be a purpose-built machine filter at Tidal against > anything that seems like a test signal. It passes test signals > interspersed with live music, but not signals per se. (Goes without > saying the thing isn't anything like free or open software.) > > Particularly offensive is that Tidal offers a "Hifi" option, which > streams FLAC. We're told that at least is lossless. Not so: it's > unflagged 16/44 MQA, and so lossy. > > So, in addition to the original papers underlying the technology being > subject to lack of scrutiny, and there being serious mathematical and > psychoacoustical objections to their premises, there are also various > objections to the business model they've now led to. My pirate friends > consider the whole enterprise to be an attempted IP coup, of the like of > Sony's SACD/DSD, right at the time of CD patents expiry. The stuff > sounds like snakeoil and all-round boogaloo -- much to my chagrin, > because I and many others really *do* consider Stuart and Craven to be > among the top-10 of audio engineers extant. What on God's green earth > possessed them to sell something like this as the next best thing, when > they *certainly* know better?!? :/ > -- > Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front > +358-40-3751464 <http://decoy.iki.fi/front+358-40-3751464>, 025E D175 > ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2 >