Hi Lai!
Thank you for your comments!
Below are the answers to your comments/queries:
1) That's a good suggestion. However, I have added an example in the Pull
request related to this:
https://github.com/apache/incubator-mxnet/pull/13241/commits/eabb68256d8fd603a0075eafcd8947d92e7df27f
.
I would be happy to include a dataset similar to MNIST to support that. I
have come across an example dataset used in tensor flow speech
related example here
<https://www.tensorflow.org/tutorials/sequences/audio_recognition>. This
could be included.

2) Thank you for the suggestion, I shall look into the FFT operator that
you have pointed out. However, there are other kind of features like, mfcc,
mels and so on which are popular in audio data feature extraction, which
will find utility if implemented. I am not sure if we have operators for
this.

3) The references look good too. I shall look into them. Thank you for
bringing them into my notice.

Regards,
Gaurav

On Tue, Nov 13, 2018 at 11:22 AM Lai Wei <roywei...@gmail.com> wrote:

> Hi Gaurav,
>
> Thanks for starting this. I see the PR is out
> <https://github.com/apache/incubator-mxnet/pull/13241>, left some initial
> reviews, good work!
>
> In addition to Sandeep's queries, I have the following:
> 1. Can we include some simple classic audio dataset for users to directly
> import and try out? like MNIST in vision. (e.g.:
> http://pytorch.org/audio/datasets.html#yesno)
> 2. Librosa provides some good audio feature extractions, we can use it for
> now. But it's slow as you have to do conversions between ndarray and numpy.
> In the long term, can we make transforms to use mxnet operators and change
> your transforms to hybrid blocks? For example, mxnet FFT
> <
> https://mxnet.apache.org/api/python/ndarray/contrib.html?highlight=fft#mxnet.ndarray.contrib.fft
> >
> operator
> can be used in a hybrid block transformer, which will be a lot faster.
>
> Some additional references on users already using mxnet on audio, we should
> aim to make it easier and automate the file load/preprocess/transform
> process.
> 1. https://github.com/chen0040/mxnet-audio
> 2. https://github.com/shuokay/mxnet-wavenet
>
> Looking forward to seeing this feature out.
> Thanks!
>
> Best Regards
>
> Lai
>
>
> On Tue, Nov 13, 2018 at 9:09 AM sandeep krishnamurthy <
> sandeep.krishn...@gmail.com> wrote:
>
> > Thanks, Gaurav for starting this initiative. The design document is
> > detailed and gives all the information.
> > Starting to add this in "Contrib" is a good idea while we expect a few
> > rough edges and cleanups to follow.
> >
> > I had the following queries:
> > 1. Is there any analysis comparing LibROSA with other libraries? w.r.t
> > features, performance, community usage in audio data domain.
> > 2. What is the recommendation of LibROSA dependency? Part of MXNet PyPi
> or
> > ask the user to install if required? I prefer the latter, similar to
> > protobuf in ONNX-MXNet.
> > 3. I see LibROSA is a fully Python-based library. Are we getting blocked
> on
> > the dependency for future use cases when we want to make transformations
> as
> > operators and allow for cross-language support?
> > 4. In performance design considerations, with lazy=True / False the
> > performance difference is too scary ( 8 minutes to 4 hours!!) This
> requires
> > some more analysis. If we known turning a flag off/on has 24X performance
> > degradation, should we need to provide that control to user? What is the
> > impact of this on Memory usage?
> > 5. I see LibROSA has ISC license (
> > https://github.com/librosa/librosa/blob/master/LICENSE.md) which says
> free
> > to use with same license notification. I am not sure if this is ok. I
> > request other committers/mentors to suggest.
> >
> > Best,
> > Sandeep
> >
> > On Fri, Nov 9, 2018 at 5:45 PM Gaurav Gireesh <gaurav.gire...@gmail.com>
> > wrote:
> >
> > > Dear MXNet Community,
> > >
> > > I recently started looking into performing some simple sound
> multi-class
> > > classification tasks with Audio Data and realized that as a user, I
> would
> > > like MXNet to have an out of the box feature which allows us to load
> > audio
> > > data(at least 1 file format), extract features( or apply some common
> > > transforms/feature extraction) and train a model using the Audio
> Dataset.
> > > This could be a first step towards building and supporting APIs similar
> > to
> > > what we have for "vision" related use cases in MXNet.
> > >
> > > Below is the design proposal :
> > >
> > > Gluon - Audio Design Proposal
> > > <https://cwiki.apache.org/confluence/display/MXNET/Gluon+-+Audio>
> > >
> > > I would highly appreciate your taking time to review and provide
> > feedback,
> > > comments/suggestions on this.
> > > Looking forward to your support.
> > >
> > >
> > > Best Regards,
> > >
> > > Gaurav Gireesh
> > >
> >
> >
> > --
> > Sandeep Krishnamurthy
> >
>

Reply via email to