On Sunday, 29 November 2015 at 16:15:32 UTC, Guillaume Piolat wrote:
There is also a sample-wise FFT I've came across, which is expensive but avoids chunking.

Hm, I don't know what that is :).

Looking for similar grains is the idea behind the popular auto-correlation pitch detection methods. Require two periods else no autocorrelation peak though. The rumor says that the non-realtime Autotune works with that, along with many modern pitch detection methods.

I thought they used Laroche and Dolson's FFT based one combined with a peak detector, but maybe that was the real time version.

There are other full spectral resynthesis methods that throw away phase information and represent each spectral components as a bandpass filter of noise. That is rather expressive since you can do morphing with it. (Like you can do with images). But since you throw away phase information I guess some attacks suffer, so you have to special case the attacks as "residue" samples that are left in the time domain (the difference between what you can represent as spectral components and the left over bits).

I don't know what "voicedness" is? You mean things like vibrato?

vibrato is the pitch variation that occur when the larynx is well relaxed.

Yes, so that will generate sidebands in the frequency spectrum, like FM synthesis, right? So in order to pick up fast vibrato I would assume you would also need to do analysis of the spectrum, or?

voicedness is the difference between sssssss(unvoiced) and zzzzzz (voiced). A phonem is voiced when there is periodic glottal closure and openings.

Ah! In the 90s I read a paper in Computer Music journal where they did song synthesis by emulating the vocal tract as a "physical" filter-model. I'm not sure if they used FoF for generating the sound. I think there was a vinyl flexi disc with it too. :-) I have it somewhere...

You might find it interesting.

When the sound isn't voiced, there is no period. There isn't a "pitch" there. So pitch detection tend to come with a confidence measure.

So it is a problem for real time, but in non-real time you can work your way backwards and fill in the missing parts before doing resynthesis? I guess?

The devil in that is that voicedness itself is half a lie, or let say a leaky abstraction, it breaks down for distorted vocals.

Right. You have a lot of these problems in sound analysis. Like sound separation. The brain is so impressive. I still have problem understanding how we can hear 3D with two ears. Like distinguishing above and below. I understand the basics of it, but it is still impressive when you try to figure out _how_.

I guess that's why IRCAM can sell licenses to superVP. :)

Their paper on that topic are interesting, they group spectral peaks by formants and move them together.

I've read the Laroche and Dowson paper in detail, and more or less know it by heart now, but maybe you are thinking about some other paper? Their paper was good on the science part, but they leave the artistic engineering part open to the reader... ;-) More insight on the artistic engineering part is most welcome!!



Reply via email to