Hi Gaurav,

Thanks for starting this. I see the PR is out
<https://github.com/apache/incubator-mxnet/pull/13241>, left some initial
reviews, good work!

In addition to Sandeep's queries, I have the following:
1. Can we include some simple classic audio dataset for users to directly
import and try out? like MNIST in vision. (e.g.:
http://pytorch.org/audio/datasets.html#yesno)
2. Librosa provides some good audio feature extractions, we can use it for
now. But it's slow as you have to do conversions between ndarray and numpy.
In the long term, can we make transforms to use mxnet operators and change
your transforms to hybrid blocks? For example, mxnet FFT
<https://mxnet.apache.org/api/python/ndarray/contrib.html?highlight=fft#mxnet.ndarray.contrib.fft>
operator
can be used in a hybrid block transformer, which will be a lot faster.

Some additional references on users already using mxnet on audio, we should
aim to make it easier and automate the file load/preprocess/transform
process.
1. https://github.com/chen0040/mxnet-audio
2. https://github.com/shuokay/mxnet-wavenet

Looking forward to seeing this feature out.
Thanks!

Best Regards

Lai


On Tue, Nov 13, 2018 at 9:09 AM sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:

> Thanks, Gaurav for starting this initiative. The design document is
> detailed and gives all the information.
> Starting to add this in "Contrib" is a good idea while we expect a few
> rough edges and cleanups to follow.
>
> I had the following queries:
> 1. Is there any analysis comparing LibROSA with other libraries? w.r.t
> features, performance, community usage in audio data domain.
> 2. What is the recommendation of LibROSA dependency? Part of MXNet PyPi or
> ask the user to install if required? I prefer the latter, similar to
> protobuf in ONNX-MXNet.
> 3. I see LibROSA is a fully Python-based library. Are we getting blocked on
> the dependency for future use cases when we want to make transformations as
> operators and allow for cross-language support?
> 4. In performance design considerations, with lazy=True / False the
> performance difference is too scary ( 8 minutes to 4 hours!!) This requires
> some more analysis. If we known turning a flag off/on has 24X performance
> degradation, should we need to provide that control to user? What is the
> impact of this on Memory usage?
> 5. I see LibROSA has ISC license (
> https://github.com/librosa/librosa/blob/master/LICENSE.md) which says free
> to use with same license notification. I am not sure if this is ok. I
> request other committers/mentors to suggest.
>
> Best,
> Sandeep
>
> On Fri, Nov 9, 2018 at 5:45 PM Gaurav Gireesh <gaurav.gire...@gmail.com>
> wrote:
>
> > Dear MXNet Community,
> >
> > I recently started looking into performing some simple sound multi-class
> > classification tasks with Audio Data and realized that as a user, I would
> > like MXNet to have an out of the box feature which allows us to load
> audio
> > data(at least 1 file format), extract features( or apply some common
> > transforms/feature extraction) and train a model using the Audio Dataset.
> > This could be a first step towards building and supporting APIs similar
> to
> > what we have for "vision" related use cases in MXNet.
> >
> > Below is the design proposal :
> >
> > Gluon - Audio Design Proposal
> > <https://cwiki.apache.org/confluence/display/MXNET/Gluon+-+Audio>
> >
> > I would highly appreciate your taking time to review and provide
> feedback,
> > comments/suggestions on this.
> > Looking forward to your support.
> >
> >
> > Best Regards,
> >
> > Gaurav Gireesh
> >
>
>
> --
> Sandeep Krishnamurthy
>

Reply via email to