Re: [Sursound] Oculus Rift Visual Demo + Ambisonic Audio Available?

2014-11-22 Thread dw

The above file _does_ work for me. (only?)
Unfortunately for you, if you click the third icon in the player, the 
spectrogram shows no obvious higher frequency pinna cues.
One of Hugo Zuccarelli's demos does have good height cues for me, and 
many others, but that is the only one in hundreds of binaural recordings 
that I have heard.


On 22/11/2014 02:30, Stefan Schreiber wrote:

dw wrote:

The state-of-the-art finds it very difficult to render sounds below 
the listener. To do it with a 'flat' frequency response, and 
referenced to ground/gravity ie.. unaffected by normal, small head 
movements is a bonus. It is just a pity it might take a while to get 
used to..
I can't tell after being spoilt by 100s of hours of listening to 
various binaural recordings, and not hearing above 12kHz.


http://www.freesound.org/people/dwareing/sounds/255159/


This might be some observation worth for some serious discussion.

Elevation cues depend a lot on pinnae forms, and are related (mostly?) 
to high(er) frequencies.


The HRTF set you are using might just not do it for you?

Would you notice some changes if you try to find some HRTF set which 
actually fits to you? (Provided that you are probably not able to 
measure your personal HRTF data...)


Best,

Stefan
___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe 
here, edit account or options, view archives and so on.




___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.


Re: [Sursound] Oculus Rift Visual Demo + Ambisonic Audio Available?

2014-11-22 Thread dw

It claimed to use state of the art applications of binaural rendering.
Because I cited one white paper does not mean I have only read one in my 
life, or base my opinions on those of the BBC.

I did not say you '_represent_ binaural science'.
I do not pretend to understand it. Others do. I think they are wrong due 
to the lack of observational support for the implied predictions of said 
theories.


On 22/11/2014 02:34, Stefan Schreiber wrote:

dw wrote:


On 19/11/2014 22:12, Stefan Schreiber wrote:




Your posting seems to be meaningless if not arrogant, BTW.


Let me put it in a more positive way then.. Your thinking is 
representative of the state of the art in binaural science.:-)


Previous work http://www.bbc.co.uk/rd/publications/whitepaper250 
has shown that even with the state-of-the-art virtual surround 
systems we don't currently get a big improvement in quality over a 
conventional stereo down-mix. The perceived quality was found to vary 
significantly according to the source material used.

http://www.bbc.co.uk/rd/blog/2014/10/tommies-in-3d



I actually have discussed this study with some people. (for example, 
Günther Theile)


Experience with Realiser A8 from Smyth Research and IRT's BRS system 
seems to indicate that binaural systems with head-tracking and 
personalized HRTF  filters  can't be  distinguished from real (5.1) 
speakers .


(They used BRIRs of the listening room and the reference 5.1 speaker 
system, of course.)


I don't believe that the BBC study is really flawless, BTW. (Günther 
Theile thought the same.)

(I am too lazy to discuss this now, have some other stuff to do.)

Your thinking is representative of the state of the art in binaural 
science.:-)



Maybe you should read more than (just) one paper, before claiming 
that nobody beside you has some clue?  :-D


I also didn't claim to represent binaural science, if I remember well.

Best,

Stefan
-- next part --
An HTML attachment was scrubbed...
URL: 
https://mail.music.vt.edu/mailman/private/sursound/attachments/20141122/ad6a8a05/attachment.html

___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe 
here, edit account or options, view archives and so on.




___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.


Re: [Sursound] Oculus Rift Visual Demo + Ambisonic Audio Available?

2014-11-22 Thread dw

On 22/11/2014 02:43, Stefan Schreiber wrote:


I don't believe that the BBC study is really flawless, BTW. (Günther 
Theile thought the same.)


Günther Theile is not one of my drinking buddies,  I wish he was. BTW 
the Stax demo is not that great..


___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.


Re: [Sursound] Oculus Rift Visual Demo + Ambisonic Audio Available?

2014-11-22 Thread dw


Stefan. bach, You have not actually downloaded it, nobody has!
Unless you are making the assumption that very low bitrate MP3 is the 
same as 24bit flac, there is nothing to discuss.


On 22/11/2014 02:30, Stefan Schreiber wrote:



http://www.freesound.org/people/dwareing/sounds/255159/


This might be some observation worth for some serious discussion.


___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.


Re: [Sursound] Oculus Rift Visual Demo + Ambisonic Audio Available?

2014-11-22 Thread Stefan Schreiber

dw wrote:


It claimed to use state of the art applications of binaural rendering.
Because I cited one white paper does not mean I have only read one in 
my life, or base my opinions on those of the BBC. ´


and ...

Previous work http://www.bbc.co.uk/rd/publications/whitepaper250 has 
shown...




Ok, I allow myself to cite two short passages from Mr. Theile's mail:

The test method applied in the BBC study was based on “not known 
target quality” (Chapter 5.1). However, the target is known, it should 
be the real room which was measured and aimed as original surround 
listening experience. Assessing a surround room synthesis technology 
in comparison to the ITU stereo down-mix is synonymous with assessing 
a real surround monitoring in comparison to ITU 2-channel down-mix 
headphone monitoring. · The BBC comparison test does not clarify basic 
preferences of the subjects regarding “5-ch surround vs. 2-ch stereo” 
and “2-ch loudspeaker vs. 2-ch headphone listening”. Results plotted 
in Fig. 6 indicate that listening group 2 prefers 2-ch headphone 
listening and one cannot exclude that this would be found basically in 
comparison to loudspeaker listening. In contrast, listening group 1 
should prefer basically surround sound and out-of-head localization.


...

It seems clear to me that further causes have effected this result. I 
suggest that the “individualised head-tracked BRIR system” was not 
calibrated correctly or ...



I wrote to several people before receiving this feedback:



http://downloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP250.pdf

Experimental evidence has suggested that for plausible virtual sound 
sources located outside of the head with good directional accuracy, 
HRIR measurements specific to the individual are required [4, 5]. 
However when impulse responses containing the room response, known as 
binaural room impulse responses (BRIRs), are used in combination with 
head-tracking, to compensate for head motion, plausible synthesis of 
virtual sound sources can be achieved with high localisation accuracy 
[6, 7]. The relative importance of these system components has been 
addressed in terms of localisation [8] however the effects on overall 
sound quality are not clear.


(pag 5)

and then:

In the context of broadcast distribution, the state-of-the-art virtual 
surround systems were shown not to give a great improvement over an 
ITU down-mix for playback of 5.1 audio over headphones, when used in a 
black box approach. Many systems performed significantly worse than 
the downmix, including the dynamic individualised BRIR system. The 
best performing systems were graded similarly to the down-mix.


(in 7, Conclusions and Discussion. If so, we should pack our luggage...)

Well, and this is why I don't trust every study

You could conclude
a) that binaural representation of surround sound is just not being 
worth it to be studied


OR

b) that they didn't do things in the right way.

Actually I believe the conclusion is more like b), because my 
reference system


http://smyth-research.com/technology.html

just happens to reproduce 5.1/7.1 via headphones very well, according 
to everybody who has listened to this.




It is a fact that the Smyth Research system has been rated highly by  
any  person I know which listened to, and there are plenty of listening 
reviews available in the Internet. (HiFi reviews etc. ...)


It is also obvious that this system can reproduce sound from the front, 
as it is used for 5.1/7.1/2.0 studio mixing, and as HiFi system.


Some reviews:

http://www.soundonsound.com/sos/jul13/articles/smyth-realiser-a8.htm

A big concern many people have about headphone monitoring is the lack 
of physical bass sensation, but I was genuinely surprised at how 
little difference the use of the Tactile output actually made to my 
mixing decisions in the long term. Just hearing the low-frequency mix 
components within such a believably speaker-like context seems to 
clarify most low-frequency level and quality questions on its own 
somehow, and in no less reliable a manner than 95 percent of nearfield 
monitoring systems I've heard, given the strong influence of room 
resonances on real-world bass reproduction. Furthermore, the 
Realiser's nifty Direct Bass feature (see the 'Better Than The Real 
Thing?' box) can remove the effects of LF room modes from its 
emulation entirely, delivering low-end fidelity that's well beyond the 
capabilities of the speaker system you originally sampled!


(LF properties of headphones...)


My biggest gripe about the Realiser has nothing to do with the sound, 
though: it's the clunkiness of the user interface. 



I also missed the psychological 'averaging' effect that you get with 
real speakers when you stroll round your room. Although you can move 
around quite a bit without losing the head-tracking, your virtual 
monitoring position remains riveted in the stereo sweet spot, which 
might not be the position that provides the most 

Re: [Sursound] Oculus Rift Visual Demo + Ambisonic Audio Available?

2014-11-22 Thread Sampo Syreeni

On 2014-11-17, Pulkki Ville wrote:

Sampo mentioned that he heard our demo at Aalto. Here is the title and 
the abstract of the demo, which we first showed in AES 55th conference 
on spatial audio.


I heard two separate demonstrations, actually. Ville showed me a 
headtracking binaural recording synched to 360 degree stereo video over 
the Oculus. I actually got the beginnings of vertigo, looking down; it 
tracked that good. I believe that demo came from plain old first order 
DirAC on 32 virtual speakers folded down into head tracked binaural.


The clip with a choir in the front and full 360 tracking capability on 
the visuals and the sound at the same time was particularly effective: 
actually being able to look dead back and both see and hear the audience 
blew me back a bit.


However, even better stuff was to follow. Now, DirAC adapted to fourth 
order ambisonics, and eventually even the Eigenmic. Archontis Politis ( 
https://people.aalto.fi/index.html?profilepage=isfor#!archontis_politis 
) put up a whole listening test/panel for me, in one of Aalto's 
anechoic caverns.


It was as impressive as ever: even at fourth order, I could clearly 
discern how the ´directional averaging of ambisonics leads to too much 
correlation. At first order the effect was downright stiffling. And then 
after DirAC processing, the first order track still sounded odd, but the 
fourth order one actually -- finally -- approached the reference.


Now, I've been a critic of DirAC and especially SIRR in the past, not to 
mentin VBAP. I still sort of think they ought to be thrown out the 
window within the DirAC framework. Because they aren't principled and 
ambisonic minded enough. But... Right now they *really* seem to 
consitute the closest thing I at least have heard, to an ideal infinite 
order decoder. In particular, the DirAC machinery not only decorrelates 
the sound indirectly by infinite order decoding, it also separately, 
spatially whitens the signal set...and it makes a *real* difference in 
realism and transparency. Seriously, you have to hear it to believe it.



Demo 3: Head-mounted head-tracked audiovisual reproduction.

Olli Santala, Mikko-Ville Laitinen, Ville Pulkki and Olli Rummukainen.
Aalto University, Department of Signal Processing and Acoustics

Audiovisual scenes are reproduced with headphones and a head-mounted 
display in this demonstration. The sound has been recorded with a real 
A-format microphone, and it is reproduced using binaural DirAC, which 
utilizes DirAC processing, virtual loudspeakers and head tracking.


Interestingly enough one of the demos was captured near my ex's abode. 
So that I knew the auditory environment from before. It was spot on, if 
only a bit push away and lacking in 'pop'. Maybe dynamics and the 
high end, or something.


The head tracking was immaculate right till I made my signature move: 
the head tilt. There the lag was easy to see. But then, it came from the 
Rift's API, not Ville's code.


See description of the audio rendering technique here: Laitinen, M-V., 
and Ville Pulkki. Binaural reproduction for directional audio 
coding. Applications of Signal Processing to Audio and Acoustics, 
2009. WASPAA'09. IEEE Workshop on. IEEE, 2009.


Ville, Archontis, I hope in time you link to the paper involving the 
fourth order DirAC demo I heard of new. Because that shit is *seriously* 
impressive. Close to spatially transparent even to my rather discerning 
ear.


To me this demo is really cool since the auditory objects are nicely 
externalized, even in the field of vision.


And I can vouch for that now, too. That also probably has to do with 
DirAC's decorrelation processing, even at first order.


The trick could to be that when the subject perceives the space 
visually, he adapts to the HRTFs used in the system fast.


Correct as well. Though I adapt pretty slowly myself, thanks to my 
(genetic?) hearing deficit. And yet it still worked pretty well. As such 
I'd rather blame the decorrelation processing, not so much adaptation.


We also update the head position with rate of about 100 Hz, and then 
correspondingly update the video and audio. This prevents nausea, and 
also helps in externalization of headphone audio.


The extreme externalisation I'd again blame on the decorrelative 
processing. I've heard a number of binaural technologies before, and 
even before I took my dynamical cues/rolled head around, the demo just 
worked.

--
Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.


[Sursound] infinite order decoding proper, and redux

2014-11-22 Thread Sampo Syreeni

On 2014-11-19, dw wrote:

Still, I've yet to find a solution for b-to-binaural which is as 
convincing as some of the BRIR-based object-sound spatialization 
packages (e.g. DTS HeadphoneX and Visisonics Realspace). I think 
what's primarily lacking is externalization, which perhaps can be 
'faked' with BRIRs.


I've been thinking about this one for *ages*, because I've never had the 
werewithal to buy more than two speakers, nor especially the kind of 
acoustic space/room/home where even POA would play out well. Binaural 
rendering is then the only proper alternative left beyond that, evenas 
the main allure of ambisonic is extended, areal playback. So, how to do 
even that right?


Well... The way I see it, we don't have a proper theory for that even 
now. But we do have lots of data even starting with the KEMAR set of 
HRTF responses. So we have to make do with that, and interpolation.


Linear interpolation is often used, but I feel it's wrong. Instead I 
feel we have to go into the frequency domain, do cosine law 
interpolation in amplitude, and linear interpolation in phase. And since 
all of the extant HRTF sets -- KEMAR included -- are rather nonuniform, 
we have to weight the eventual solution more or less by the area over 
the sphere surface each of the impulse response can be taken to be 
representative over.


That's certainly not too principled. Not a whole, covering theory. But 
it's good enough for starters. And then, it only covers how to do the 
rendering, it doesn't tell you how to go from A-format to B, or 
especially how to enhance low order B into infinite order.


In the latter sport, I'm reasonably certain that Ville's DirAC is the 
way to go...but it ought to be reformulated on the way. I mean, it's an 
engineer's version, not Gerzon's, as a mathematician.


The way I see it done is to go back to the early actively decoded Matrix 
H work, and continuing from there. That calls for approximating the 
sonic power coming from various directions. So let's do that directly: 
given a signal over a sphere expressed as its spherical surface harmonic 
decomposition, let's just find a way to square and integrate that, 
within the decomposition. That will raise both the angular and the 
temporal bandwidth twicefold, but so be it.


Doing DirAC kind of decorrelative processing might raise the channel 
count by two, for the imaginary channel. Doing it really well might call 
for channel intercorrelations, and so raise the complexity to the second 
power. But even then, even with fourth order signals, it'd be eminently 
doable. So let's do that as well.


Then, add the true infinite order decoding. Detection of signal sets 
which are clearly under-rank. Those can be decoded into clean sources, 
and repanned using extremely high order ambisonics, theoretically 
exceeding even VBAP used by DirAC. All the while applying DirAC's 
reimagined decorrelation machinery for fun an benefit.


I'm reasonably sure a framework like this would exceed any an all 
extant ones in reproductive accuracy and mathematical elegance. Plus it 
would be fully generalizable over the whole ambisonic hierarchy, while 
sounding better than the current incarnations of even DirAC.


It worked better for natural B-format recordings than anechoic or 
rendered source.


Seriously, first order B-format (POA) sounds rather shitty once you hear 
a comparison signal. It's excellent considering what it has to deal 
with, but absent that, it's just plain shitty. Dull.

--
Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.


Re: [Sursound] Oculus Rift Visual Demo + Ambisonic Audio Available?

2014-11-22 Thread Stefan Schreiber

Sampo Syreeni wrote:


...
Ville, Archontis, I hope in time you link to the paper involving the 
fourth order DirAC demo I heard of new. Because that shit is 
*seriously* impressive. Close to spatially transparent even to my 
rather discerning ear.


Using 4th order via (head-tracked)  headphones alone should be pretty 
impressive, as you are always in the sweet spot.


- The frequencies to about 2.500 Hz can be represented in a complete 
way, at least theoreticallt.


(The sweet spot which matters is head-sized.)

- Frequencies above are technically out of sweet spot, but the 
Ambisonics blurring effect will be much less pronounced than at 1st 
order. (up to 18º, if I remember well; compared to up to 45º for FOA...)


- To evaluate DirAC at 4th order, you would have to compare a 
(reasonable/good) conventional binaural 4th order decoder to the DirAC 
4th order decoder...



Otherwise, the seriously impressive shit might be seriously impressive 
with any 4th order decoder. (4th order Ambisonics resolution is much 
higher than 1st order, FAPP.)


You  have  to compare a Dirac-HOA binaural decoder against a 
conventional HOA binaural decoder, because I am actually not so sure if 
DirAC (in spite of giving sharp directions even at 1st order) does not 
introduce also some (hearable) artefacts. (Hard to judge this for 
me/everybody, as no form of DirAC is available to the public, and even 
not to other researchers.)




To me this demo is really cool since the auditory objects are nicely 
externalized, even in the field of vision.



And I can vouch for that now, too. That also probably has to do with 
DirAC's decorrelation processing, even at first order.



You mean VBAP decorrelates the channels, and this is  principally  
better for sound field reproduction than the normal way?


Doubts... The merit of DirAC is that you will receive sharpened 
directions, which will be reproduced via VBAP. It is not the 
decorrelation of VBAP which seems to matter.


The algorithm behind DirAC doesn't seem to know early reflections, as 
you receive just one direction per time / frequency bin and diffuse 
elements. Provided that your hearing will normally integrate direct 
sound and 1st reflections to  perceived directions , the absence of 
reflections in any frequency bin after DirAC processing means that you 
will have changed the acoustical properties of the ambient part. This 
seems to be a fundamental problem.


Not being able to test if this will translate into hearable artefacts, I 
am pretty sure that there is at least  some  problem here. (Any 3D 
audio game engine I know knows reflections. A separation into direct 
and diffuse elements - which is the model used in DirAC - is too 
simplistic, and definitely not based on perceptual science.)


Best regards,

Stefan


P.S.: And hopefully DirAC - which seems to exist for FOA at least since 
2006 if not earlier - will get in some form usable for all the people 
which didn't make it to Helsinki. (But we can  walk  there if we are 
not willing to pay for the plane tickets... :-D )


Just another reflection...




The trick could to be that when the subject perceives the space 
visually, he adapts to the HRTFs used in the system fast.



Correct as well. Though I adapt pretty slowly myself, thanks to my 
(genetic?) hearing deficit. And yet it still worked pretty well. As 
such I'd rather blame the decorrelation processing, not so much 
adaptation.


We also update the head position with rate of about 100 Hz, and then 
correspondingly update the video and audio. This prevents nausea, and 
also helps in externalization of headphone audio.



The extreme externalisation I'd again blame on the decorrelative 
processing. I've heard a number of binaural technologies before, and 
even before I took my dynamical cues/rolled head around, the demo 
just worked.



___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.


Re: [Sursound] infinite order decoding proper, and redux

2014-11-22 Thread Stefan Schreiber

Sampo Syreeni wrote:


On 2014-11-19, dw wrote:

Still, I've yet to find a solution for b-to-binaural which is as 
convincing as some of the BRIR-based object-sound spatialization 
packages (e.g. DTS HeadphoneX and Visisonics Realspace). I think 
what's primarily lacking is externalization, which perhaps can be 
'faked' with BRIRs.




I've been thinking about this one for *ages*, because I've never had 
the werewithal to buy more than two speakers, nor especially the kind 
of acoustic space/room/home where even POA would play out well. 
Binaural rendering is then the only proper alternative left beyond 
that, evenas the main allure of ambisonic is extended, areal playback. 
So, how to do even that right?


Well... The way I see it, we don't have a proper theory for that even 
now. But we do have lots of data even starting with the KEMAR set of 
HRTF responses. So we have to make do with that, and interpolation.


Linear interpolation is often used, but I feel it's wrong. Instead I 
feel we have to go into the frequency domain, do cosine law 
interpolation in amplitude, and linear interpolation in phase. And 
since all of the extant HRTF sets -- KEMAR included -- are rather 
nonuniform, we have to weight the eventual solution more or less by 
the area over the sphere surface each of the impulse response can be 
taken to be representative over.


That's certainly not too principled. Not a whole, covering theory. But 
it's good enough for starters. And then, it only covers how to do the 
rendering, it doesn't tell you how to go from A-format to B, or 
especially how to enhance low order B into infinite order.


In the latter sport, I'm reasonably certain that Ville's DirAC is the 
way to go...but it ought to be reformulated on the way. I mean, it's 
an engineer's version, not Gerzon's, as a mathematician.


The way I see it done is to go back to the early actively decoded 
Matrix H work, and continuing from there. That calls for approximating 
the sonic power coming from various directions. So let's do that 
directly: given a signal over a sphere expressed as its spherical 
surface harmonic decomposition, let's just find a way to square and 
integrate that, within the decomposition. That will raise both the 
angular and the temporal bandwidth twicefold, but so be it.


Doing DirAC kind of decorrelative processing might raise the channel 
count by two, for the imaginary channel. Doing it really well might 
call for channel intercorrelations, and so raise the complexity to the 
second power. But even then, even with fourth order signals, it'd be 
eminently doable. So let's do that as well.


Then, add the true infinite order decoding. Detection of signal sets 
which are clearly under-rank. Those can be decoded into clean sources, 
and repanned using extremely high order ambisonics, theoretically 
exceeding even VBAP used by DirAC. All the while applying DirAC's 
reimagined decorrelation machinery for fun an benefit.


I'm reasonably sure a framework like this would exceed any an all 
extant ones in reproductive accuracy and mathematical elegance. Plus 
it would be fully generalizable over the whole ambisonic hierarchy, 
while sounding better than the current incarnations of even DirAC.



Your sketched algorithm is definitively not related to DirAC. (It is 
another and actually more sophisticated parametric decoder, to apply the 
correct terminology.)


This is a very smart algo and might actually work, at 1st sight. (To 
work this out will be far from easy, to say the least.)


Thanks for sharing this!

Stefan




It worked better for natural B-format recordings than anechoic or 
rendered source.



Seriously, first order B-format (POA) sounds rather shitty once you 
hear a comparison signal. It's excellent considering what it has to 
deal with, but absent that, it's just plain shitty. Dull.



___
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.