As a follow-up, I just stumbled upon this article:
https://towardsdatascience.com/voice-translation-and-audio-style-transfer-with-gans-b63d58f61854
On 9/6/19 4:52 PM, Patric Schmitz wrote:
On 9/6/19 4:29 PM, Bruno Afonso wrote:
I'd love to hear if others have been using DNN for audio, I am a bit
more interested in DNN processing audio (ie, outputs processed audio)
than classic classification approaches where people are mostly
borrowing ideas from computer vision and classifying based on
spectrogram representations (think SFFT).
A former colleague is researching in this area. Particularly for the
transformation of singing voice emotion. Have a look at this recent paper.
https://ieeexplore.ieee.org/abstract/document/8683865
They use a multi-layered recurrent LSTM network in what they call a
sequence-to-sequence architecture, that learns a latent space
representation of f0 contours conditioned on different emotions (anger,
fear, sadness..).
Then there is WaveNet and many recent applications of it and extensions
to specific problem settings.> https://arxiv.org/abs/1609.03499
Best,
Patric
_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp