On 2023-06-01, Jan Jacob Hofmann wrote:
is it possible/ reasonable to mix ambisonic encoded information of
different order?
It's possible and it's reasonable, and as Fons Adriansen said above, at
the rather high orders you're talking about, it's not much below
optimality either. This has also been talked about in the past, with the
— granted, a bit of a shocking — revelation to me and some others, that
actually orders mixed this way do *not* automatically decode optimally
in either decoder.
But theoretically, this ought to be purely a decoding side issue. When
you're mixing into or in B-format, you're essentially dealing with an
isotropic approximation of a soundfield, around a central point. That
approximation is always a physical one, and in ambisonic work, it's
going to be orthogonal by the basic math. If you want to add extra
directional accuracy, you'll add orders to your directional
decomposition. If you can't or won't, then you don't. But in the end,
the fact that the (3D) Fourier-Bessel series rightly normalized (too)
preserves the power of point sources, and is an isotropic decomposition
of an inbound far field, guarantees that the *only* thing you lose in
lower order is directional accuracy. When going to B-format, the one
meant to capture the physics, mixing two orders cannot lose anything.
So the real trouble comes when decoding B-format into D-format. If you
have a set of first order, POA signals, you have one particular, optimal
equation set for how you'd lay the sound out over your speakers. If you
had a second order HOA signal, running into something like 5.1, the
optimal set differs quite a lot, especially in the higher frequencies,
since the theory doesn't work by easy interference principles there, but
by second order psychoacoustical ones, coming from the stereo work of
Makita. Solving the problem optimally becomes rather finicky.
Then, solving it for mixed orders (not usually a term used for this
situation, but for leaving out certain spherical harmonics, e.g. for
horizontal, pantophonic work), is even messier. How could we know in
decoding only, blindly, that we have a superimposition of say first and
second (arbitrary?) order signals, so that we could apply the optimum
decoding rule to them all, at the same time?
I've been toying around with this problem for a decade or so, and
haven't found a satisfactory solution to it all. My intuition says
this has something to do with non-negative matrix factorization and
convex optimization, but even if that's it, I'm not quite there yet.
From Dolby Surround and HARPEX -like things I've been toying around with
doing them in the pure spherical harmonical domain to arbitrary order; a
generizable infinite order decoder; in DirAC kind of stuff I've been
toying around with just tensoring the STFT/MDCT-domain with the
directional Fourier domain, complexly; and then some classical LTI DSP
statistical learning and information/compression/rate-distortion theory
on top. In an effort to solve the problem of how to make full spatial
audio pack well.
And then there was NFC-HOA. I was already making some progress, but that
totally stopped me. In that one, you an mix several orders of signals,
but suddenly you can't mix ones of separate radii. Fuck, back to the
drawing board for me as well. :/
The sound-information (synthesized) is encoded in Ambisonic 7th order
while the spatial reverberation of that very sound is encoded „only“
to third order.
In fact Fons asked you already: why go to such a high order? You'd need
an extraordinary number of speakers to utilize such a signal. Also, an
extraordinary computing power and a lot of real life meaasurement of
your speaaker rig to even align your decoding solution optimally.
Whereas in low, matched order, you can do it right with a day's
computation time.
Reason for doing so: My reverberant information comes from several
directions in space. If these would not have to be encoded all up to
7th order, it would save some calculation time and computation effort.
They really don't have to. Take a look at Ville Pulkki's DirAC work,
here in Finland. The gist of it is that it reconstructs both specular
sources and reverberation, separately. The first part is identified via
time coherence, averaging, much like Dolby Surround does it in its four
constrainted channels, and like HARPEX does it better in the ambisonic
work.
Ville's work however is fully general and frequency dependent in its
source recognition. And it goes beyond: it actually tries to identify
reverberant modes from a SoundField, by using the imaginary axis of the
Fourier transformation in time to recognize reverberant modes. Which has
also been discussed years before on-list, when Angelo (I think) talked
about his car interiors.
Also the reverberant information may well be more „blurry“ in respect
to the actual sound, as it may stay in the background of perception
anyway.