On 9/6/19 4:29 PM, Bruno Afonso wrote:
I'd love to hear if others have been using DNN for audio, I am a bit more interested in DNN processing audio (ie, outputs processed audio) than classic classification approaches where people are mostly borrowing ideas from computer vision and classifying based on spectrogram representations (think SFFT).
A former colleague is researching in this area. Particularly for the transformation of singing voice emotion. Have a look at this recent paper.
https://ieeexplore.ieee.org/abstract/document/8683865
They use a multi-layered recurrent LSTM network in what they call a sequence-to-sequence architecture, that learns a latent space representation of f0 contours conditioned on different emotions (anger, fear, sadness..).
Then there is WaveNet and many recent applications of it and extensions to specific problem settings.> https://arxiv.org/abs/1609.03499
Best, Patric _______________________________________________ dupswapdrop: music-dsp mailing list music-dsp@music.columbia.edu https://lists.columbia.edu/mailman/listinfo/music-dsp