As a follow-up, I just stumbled upon this article:
https://towardsdatascience.com/voice-translation-and-audio-style-transfer-with-gans-b63d58f61854

On 9/6/19 4:52 PM, Patric Schmitz wrote:
On 9/6/19 4:29 PM, Bruno Afonso wrote:
I'd love to hear if others have been using DNN for audio, I am a bit more interested in DNN processing audio (ie, outputs processed audio) than classic classification approaches where people are mostly borrowing ideas from computer vision and classifying based on spectrogram representations (think SFFT).

A former colleague is researching in this area. Particularly for the transformation of singing voice emotion. Have a look at this recent paper.
https://ieeexplore.ieee.org/abstract/document/8683865

They use a multi-layered recurrent LSTM network in what they call a sequence-to-sequence architecture, that learns a latent space representation of f0 contours conditioned on different emotions (anger, fear, sadness..).

Then there is WaveNet and many recent applications of it and extensions to specific problem settings.> https://arxiv.org/abs/1609.03499

Best,
Patric
_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Reply via email to