Re: [Sursound] Oculus Rift Visual Demo + Ambisonic Audio Available?
The above file _does_ work for me. (only?) Unfortunately for you, if you click the third icon in the player, the spectrogram shows no obvious higher frequency pinna cues. One of Hugo Zuccarelli's demos does have good height cues for me, and many others, but that is the only one in hundreds of binaural recordings that I have heard. On 22/11/2014 02:30, Stefan Schreiber wrote: dw wrote: The state-of-the-art finds it very difficult to render sounds below the listener. To do it with a 'flat' frequency response, and referenced to ground/gravity ie.. unaffected by normal, small head movements is a bonus. It is just a pity it might take a while to get used to.. I can't tell after being spoilt by 100s of hours of listening to various binaural recordings, and not hearing above 12kHz. http://www.freesound.org/people/dwareing/sounds/255159/ This might be some observation worth for some serious discussion. Elevation cues depend a lot on pinnae forms, and are related (mostly?) to high(er) frequencies. The HRTF set you are using might just not do it for you? Would you notice some changes if you try to find some HRTF set which actually fits to you? (Provided that you are probably not able to measure your personal HRTF data...) Best, Stefan ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit account or options, view archives and so on. ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit account or options, view archives and so on.
Re: [Sursound] Oculus Rift Visual Demo + Ambisonic Audio Available?
It claimed to use state of the art applications of binaural rendering. Because I cited one white paper does not mean I have only read one in my life, or base my opinions on those of the BBC. I did not say you '_represent_ binaural science'. I do not pretend to understand it. Others do. I think they are wrong due to the lack of observational support for the implied predictions of said theories. On 22/11/2014 02:34, Stefan Schreiber wrote: dw wrote: On 19/11/2014 22:12, Stefan Schreiber wrote: Your posting seems to be meaningless if not arrogant, BTW. Let me put it in a more positive way then.. Your thinking is representative of the state of the art in binaural science.:-) Previous work http://www.bbc.co.uk/rd/publications/whitepaper250 has shown that even with the state-of-the-art virtual surround systems we don't currently get a big improvement in quality over a conventional stereo down-mix. The perceived quality was found to vary significantly according to the source material used. http://www.bbc.co.uk/rd/blog/2014/10/tommies-in-3d I actually have discussed this study with some people. (for example, Günther Theile) Experience with Realiser A8 from Smyth Research and IRT's BRS system seems to indicate that binaural systems with head-tracking and personalized HRTF filters can't be distinguished from real (5.1) speakers . (They used BRIRs of the listening room and the reference 5.1 speaker system, of course.) I don't believe that the BBC study is really flawless, BTW. (Günther Theile thought the same.) (I am too lazy to discuss this now, have some other stuff to do.) Your thinking is representative of the state of the art in binaural science.:-) Maybe you should read more than (just) one paper, before claiming that nobody beside you has some clue? :-D I also didn't claim to represent binaural science, if I remember well. Best, Stefan -- next part -- An HTML attachment was scrubbed... URL: https://mail.music.vt.edu/mailman/private/sursound/attachments/20141122/ad6a8a05/attachment.html ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit account or options, view archives and so on. ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit account or options, view archives and so on.
Re: [Sursound] Oculus Rift Visual Demo + Ambisonic Audio Available?
On 22/11/2014 02:43, Stefan Schreiber wrote: I don't believe that the BBC study is really flawless, BTW. (Günther Theile thought the same.) Günther Theile is not one of my drinking buddies, I wish he was. BTW the Stax demo is not that great.. ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit account or options, view archives and so on.
Re: [Sursound] Oculus Rift Visual Demo + Ambisonic Audio Available?
Stefan. bach, You have not actually downloaded it, nobody has! Unless you are making the assumption that very low bitrate MP3 is the same as 24bit flac, there is nothing to discuss. On 22/11/2014 02:30, Stefan Schreiber wrote: http://www.freesound.org/people/dwareing/sounds/255159/ This might be some observation worth for some serious discussion. ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit account or options, view archives and so on.
Re: [Sursound] Oculus Rift Visual Demo + Ambisonic Audio Available?
dw wrote: It claimed to use state of the art applications of binaural rendering. Because I cited one white paper does not mean I have only read one in my life, or base my opinions on those of the BBC. ´ and ... Previous work http://www.bbc.co.uk/rd/publications/whitepaper250 has shown... Ok, I allow myself to cite two short passages from Mr. Theile's mail: The test method applied in the BBC study was based on “not known target quality” (Chapter 5.1). However, the target is known, it should be the real room which was measured and aimed as original surround listening experience. Assessing a surround room synthesis technology in comparison to the ITU stereo down-mix is synonymous with assessing a real surround monitoring in comparison to ITU 2-channel down-mix headphone monitoring. · The BBC comparison test does not clarify basic preferences of the subjects regarding “5-ch surround vs. 2-ch stereo” and “2-ch loudspeaker vs. 2-ch headphone listening”. Results plotted in Fig. 6 indicate that listening group 2 prefers 2-ch headphone listening and one cannot exclude that this would be found basically in comparison to loudspeaker listening. In contrast, listening group 1 should prefer basically surround sound and out-of-head localization. ... It seems clear to me that further causes have effected this result. I suggest that the “individualised head-tracked BRIR system” was not calibrated correctly or ... I wrote to several people before receiving this feedback: http://downloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP250.pdf Experimental evidence has suggested that for plausible virtual sound sources located outside of the head with good directional accuracy, HRIR measurements specific to the individual are required [4, 5]. However when impulse responses containing the room response, known as binaural room impulse responses (BRIRs), are used in combination with head-tracking, to compensate for head motion, plausible synthesis of virtual sound sources can be achieved with high localisation accuracy [6, 7]. The relative importance of these system components has been addressed in terms of localisation [8] however the effects on overall sound quality are not clear. (pag 5) and then: In the context of broadcast distribution, the state-of-the-art virtual surround systems were shown not to give a great improvement over an ITU down-mix for playback of 5.1 audio over headphones, when used in a black box approach. Many systems performed significantly worse than the downmix, including the dynamic individualised BRIR system. The best performing systems were graded similarly to the down-mix. (in 7, Conclusions and Discussion. If so, we should pack our luggage...) Well, and this is why I don't trust every study You could conclude a) that binaural representation of surround sound is just not being worth it to be studied OR b) that they didn't do things in the right way. Actually I believe the conclusion is more like b), because my reference system http://smyth-research.com/technology.html just happens to reproduce 5.1/7.1 via headphones very well, according to everybody who has listened to this. It is a fact that the Smyth Research system has been rated highly by any person I know which listened to, and there are plenty of listening reviews available in the Internet. (HiFi reviews etc. ...) It is also obvious that this system can reproduce sound from the front, as it is used for 5.1/7.1/2.0 studio mixing, and as HiFi system. Some reviews: http://www.soundonsound.com/sos/jul13/articles/smyth-realiser-a8.htm A big concern many people have about headphone monitoring is the lack of physical bass sensation, but I was genuinely surprised at how little difference the use of the Tactile output actually made to my mixing decisions in the long term. Just hearing the low-frequency mix components within such a believably speaker-like context seems to clarify most low-frequency level and quality questions on its own somehow, and in no less reliable a manner than 95 percent of nearfield monitoring systems I've heard, given the strong influence of room resonances on real-world bass reproduction. Furthermore, the Realiser's nifty Direct Bass feature (see the 'Better Than The Real Thing?' box) can remove the effects of LF room modes from its emulation entirely, delivering low-end fidelity that's well beyond the capabilities of the speaker system you originally sampled! (LF properties of headphones...) My biggest gripe about the Realiser has nothing to do with the sound, though: it's the clunkiness of the user interface. I also missed the psychological 'averaging' effect that you get with real speakers when you stroll round your room. Although you can move around quite a bit without losing the head-tracking, your virtual monitoring position remains riveted in the stereo sweet spot, which might not be the position that provides the most
Re: [Sursound] Oculus Rift Visual Demo + Ambisonic Audio Available?
On 2014-11-17, Pulkki Ville wrote: Sampo mentioned that he heard our demo at Aalto. Here is the title and the abstract of the demo, which we first showed in AES 55th conference on spatial audio. I heard two separate demonstrations, actually. Ville showed me a headtracking binaural recording synched to 360 degree stereo video over the Oculus. I actually got the beginnings of vertigo, looking down; it tracked that good. I believe that demo came from plain old first order DirAC on 32 virtual speakers folded down into head tracked binaural. The clip with a choir in the front and full 360 tracking capability on the visuals and the sound at the same time was particularly effective: actually being able to look dead back and both see and hear the audience blew me back a bit. However, even better stuff was to follow. Now, DirAC adapted to fourth order ambisonics, and eventually even the Eigenmic. Archontis Politis ( https://people.aalto.fi/index.html?profilepage=isfor#!archontis_politis ) put up a whole listening test/panel for me, in one of Aalto's anechoic caverns. It was as impressive as ever: even at fourth order, I could clearly discern how the ´directional averaging of ambisonics leads to too much correlation. At first order the effect was downright stiffling. And then after DirAC processing, the first order track still sounded odd, but the fourth order one actually -- finally -- approached the reference. Now, I've been a critic of DirAC and especially SIRR in the past, not to mentin VBAP. I still sort of think they ought to be thrown out the window within the DirAC framework. Because they aren't principled and ambisonic minded enough. But... Right now they *really* seem to consitute the closest thing I at least have heard, to an ideal infinite order decoder. In particular, the DirAC machinery not only decorrelates the sound indirectly by infinite order decoding, it also separately, spatially whitens the signal set...and it makes a *real* difference in realism and transparency. Seriously, you have to hear it to believe it. Demo 3: Head-mounted head-tracked audiovisual reproduction. Olli Santala, Mikko-Ville Laitinen, Ville Pulkki and Olli Rummukainen. Aalto University, Department of Signal Processing and Acoustics Audiovisual scenes are reproduced with headphones and a head-mounted display in this demonstration. The sound has been recorded with a real A-format microphone, and it is reproduced using binaural DirAC, which utilizes DirAC processing, virtual loudspeakers and head tracking. Interestingly enough one of the demos was captured near my ex's abode. So that I knew the auditory environment from before. It was spot on, if only a bit push away and lacking in 'pop'. Maybe dynamics and the high end, or something. The head tracking was immaculate right till I made my signature move: the head tilt. There the lag was easy to see. But then, it came from the Rift's API, not Ville's code. See description of the audio rendering technique here: Laitinen, M-V., and Ville Pulkki. Binaural reproduction for directional audio coding. Applications of Signal Processing to Audio and Acoustics, 2009. WASPAA'09. IEEE Workshop on. IEEE, 2009. Ville, Archontis, I hope in time you link to the paper involving the fourth order DirAC demo I heard of new. Because that shit is *seriously* impressive. Close to spatially transparent even to my rather discerning ear. To me this demo is really cool since the auditory objects are nicely externalized, even in the field of vision. And I can vouch for that now, too. That also probably has to do with DirAC's decorrelation processing, even at first order. The trick could to be that when the subject perceives the space visually, he adapts to the HRTFs used in the system fast. Correct as well. Though I adapt pretty slowly myself, thanks to my (genetic?) hearing deficit. And yet it still worked pretty well. As such I'd rather blame the decorrelation processing, not so much adaptation. We also update the head position with rate of about 100 Hz, and then correspondingly update the video and audio. This prevents nausea, and also helps in externalization of headphone audio. The extreme externalisation I'd again blame on the decorrelative processing. I've heard a number of binaural technologies before, and even before I took my dynamical cues/rolled head around, the demo just worked. -- Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front +358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2 ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit account or options, view archives and so on.
[Sursound] infinite order decoding proper, and redux
On 2014-11-19, dw wrote: Still, I've yet to find a solution for b-to-binaural which is as convincing as some of the BRIR-based object-sound spatialization packages (e.g. DTS HeadphoneX and Visisonics Realspace). I think what's primarily lacking is externalization, which perhaps can be 'faked' with BRIRs. I've been thinking about this one for *ages*, because I've never had the werewithal to buy more than two speakers, nor especially the kind of acoustic space/room/home where even POA would play out well. Binaural rendering is then the only proper alternative left beyond that, evenas the main allure of ambisonic is extended, areal playback. So, how to do even that right? Well... The way I see it, we don't have a proper theory for that even now. But we do have lots of data even starting with the KEMAR set of HRTF responses. So we have to make do with that, and interpolation. Linear interpolation is often used, but I feel it's wrong. Instead I feel we have to go into the frequency domain, do cosine law interpolation in amplitude, and linear interpolation in phase. And since all of the extant HRTF sets -- KEMAR included -- are rather nonuniform, we have to weight the eventual solution more or less by the area over the sphere surface each of the impulse response can be taken to be representative over. That's certainly not too principled. Not a whole, covering theory. But it's good enough for starters. And then, it only covers how to do the rendering, it doesn't tell you how to go from A-format to B, or especially how to enhance low order B into infinite order. In the latter sport, I'm reasonably certain that Ville's DirAC is the way to go...but it ought to be reformulated on the way. I mean, it's an engineer's version, not Gerzon's, as a mathematician. The way I see it done is to go back to the early actively decoded Matrix H work, and continuing from there. That calls for approximating the sonic power coming from various directions. So let's do that directly: given a signal over a sphere expressed as its spherical surface harmonic decomposition, let's just find a way to square and integrate that, within the decomposition. That will raise both the angular and the temporal bandwidth twicefold, but so be it. Doing DirAC kind of decorrelative processing might raise the channel count by two, for the imaginary channel. Doing it really well might call for channel intercorrelations, and so raise the complexity to the second power. But even then, even with fourth order signals, it'd be eminently doable. So let's do that as well. Then, add the true infinite order decoding. Detection of signal sets which are clearly under-rank. Those can be decoded into clean sources, and repanned using extremely high order ambisonics, theoretically exceeding even VBAP used by DirAC. All the while applying DirAC's reimagined decorrelation machinery for fun an benefit. I'm reasonably sure a framework like this would exceed any an all extant ones in reproductive accuracy and mathematical elegance. Plus it would be fully generalizable over the whole ambisonic hierarchy, while sounding better than the current incarnations of even DirAC. It worked better for natural B-format recordings than anechoic or rendered source. Seriously, first order B-format (POA) sounds rather shitty once you hear a comparison signal. It's excellent considering what it has to deal with, but absent that, it's just plain shitty. Dull. -- Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front +358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2 ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit account or options, view archives and so on.
Re: [Sursound] Oculus Rift Visual Demo + Ambisonic Audio Available?
Sampo Syreeni wrote: ... Ville, Archontis, I hope in time you link to the paper involving the fourth order DirAC demo I heard of new. Because that shit is *seriously* impressive. Close to spatially transparent even to my rather discerning ear. Using 4th order via (head-tracked) headphones alone should be pretty impressive, as you are always in the sweet spot. - The frequencies to about 2.500 Hz can be represented in a complete way, at least theoreticallt. (The sweet spot which matters is head-sized.) - Frequencies above are technically out of sweet spot, but the Ambisonics blurring effect will be much less pronounced than at 1st order. (up to 18º, if I remember well; compared to up to 45º for FOA...) - To evaluate DirAC at 4th order, you would have to compare a (reasonable/good) conventional binaural 4th order decoder to the DirAC 4th order decoder... Otherwise, the seriously impressive shit might be seriously impressive with any 4th order decoder. (4th order Ambisonics resolution is much higher than 1st order, FAPP.) You have to compare a Dirac-HOA binaural decoder against a conventional HOA binaural decoder, because I am actually not so sure if DirAC (in spite of giving sharp directions even at 1st order) does not introduce also some (hearable) artefacts. (Hard to judge this for me/everybody, as no form of DirAC is available to the public, and even not to other researchers.) To me this demo is really cool since the auditory objects are nicely externalized, even in the field of vision. And I can vouch for that now, too. That also probably has to do with DirAC's decorrelation processing, even at first order. You mean VBAP decorrelates the channels, and this is principally better for sound field reproduction than the normal way? Doubts... The merit of DirAC is that you will receive sharpened directions, which will be reproduced via VBAP. It is not the decorrelation of VBAP which seems to matter. The algorithm behind DirAC doesn't seem to know early reflections, as you receive just one direction per time / frequency bin and diffuse elements. Provided that your hearing will normally integrate direct sound and 1st reflections to perceived directions , the absence of reflections in any frequency bin after DirAC processing means that you will have changed the acoustical properties of the ambient part. This seems to be a fundamental problem. Not being able to test if this will translate into hearable artefacts, I am pretty sure that there is at least some problem here. (Any 3D audio game engine I know knows reflections. A separation into direct and diffuse elements - which is the model used in DirAC - is too simplistic, and definitely not based on perceptual science.) Best regards, Stefan P.S.: And hopefully DirAC - which seems to exist for FOA at least since 2006 if not earlier - will get in some form usable for all the people which didn't make it to Helsinki. (But we can walk there if we are not willing to pay for the plane tickets... :-D ) Just another reflection... The trick could to be that when the subject perceives the space visually, he adapts to the HRTFs used in the system fast. Correct as well. Though I adapt pretty slowly myself, thanks to my (genetic?) hearing deficit. And yet it still worked pretty well. As such I'd rather blame the decorrelation processing, not so much adaptation. We also update the head position with rate of about 100 Hz, and then correspondingly update the video and audio. This prevents nausea, and also helps in externalization of headphone audio. The extreme externalisation I'd again blame on the decorrelative processing. I've heard a number of binaural technologies before, and even before I took my dynamical cues/rolled head around, the demo just worked. ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit account or options, view archives and so on.
Re: [Sursound] infinite order decoding proper, and redux
Sampo Syreeni wrote: On 2014-11-19, dw wrote: Still, I've yet to find a solution for b-to-binaural which is as convincing as some of the BRIR-based object-sound spatialization packages (e.g. DTS HeadphoneX and Visisonics Realspace). I think what's primarily lacking is externalization, which perhaps can be 'faked' with BRIRs. I've been thinking about this one for *ages*, because I've never had the werewithal to buy more than two speakers, nor especially the kind of acoustic space/room/home where even POA would play out well. Binaural rendering is then the only proper alternative left beyond that, evenas the main allure of ambisonic is extended, areal playback. So, how to do even that right? Well... The way I see it, we don't have a proper theory for that even now. But we do have lots of data even starting with the KEMAR set of HRTF responses. So we have to make do with that, and interpolation. Linear interpolation is often used, but I feel it's wrong. Instead I feel we have to go into the frequency domain, do cosine law interpolation in amplitude, and linear interpolation in phase. And since all of the extant HRTF sets -- KEMAR included -- are rather nonuniform, we have to weight the eventual solution more or less by the area over the sphere surface each of the impulse response can be taken to be representative over. That's certainly not too principled. Not a whole, covering theory. But it's good enough for starters. And then, it only covers how to do the rendering, it doesn't tell you how to go from A-format to B, or especially how to enhance low order B into infinite order. In the latter sport, I'm reasonably certain that Ville's DirAC is the way to go...but it ought to be reformulated on the way. I mean, it's an engineer's version, not Gerzon's, as a mathematician. The way I see it done is to go back to the early actively decoded Matrix H work, and continuing from there. That calls for approximating the sonic power coming from various directions. So let's do that directly: given a signal over a sphere expressed as its spherical surface harmonic decomposition, let's just find a way to square and integrate that, within the decomposition. That will raise both the angular and the temporal bandwidth twicefold, but so be it. Doing DirAC kind of decorrelative processing might raise the channel count by two, for the imaginary channel. Doing it really well might call for channel intercorrelations, and so raise the complexity to the second power. But even then, even with fourth order signals, it'd be eminently doable. So let's do that as well. Then, add the true infinite order decoding. Detection of signal sets which are clearly under-rank. Those can be decoded into clean sources, and repanned using extremely high order ambisonics, theoretically exceeding even VBAP used by DirAC. All the while applying DirAC's reimagined decorrelation machinery for fun an benefit. I'm reasonably sure a framework like this would exceed any an all extant ones in reproductive accuracy and mathematical elegance. Plus it would be fully generalizable over the whole ambisonic hierarchy, while sounding better than the current incarnations of even DirAC. Your sketched algorithm is definitively not related to DirAC. (It is another and actually more sophisticated parametric decoder, to apply the correct terminology.) This is a very smart algo and might actually work, at 1st sight. (To work this out will be far from easy, to say the least.) Thanks for sharing this! Stefan It worked better for natural B-format recordings than anechoic or rendered source. Seriously, first order B-format (POA) sounds rather shitty once you hear a comparison signal. It's excellent considering what it has to deal with, but absent that, it's just plain shitty. Dull. ___ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit account or options, view archives and so on.