Re: Spark ANN

Feynman Liang Wed, 09 Sep 2015 00:56:34 -0700

My 2 cents:

* There is frequency domain processing available already (e.g. spark.ml DCT
transformer) but no FFT transformer yet because complex numbers are not
currently a Spark SQL datatype
* We shouldn't assume signals are even, so we need complex numbers to
implement the FFT
* I have not closely studied the relative performance tradeoffs, so please
do let me know if there's a significant difference in practice




On Tue, Sep 8, 2015 at 5:46 PM, Ulanov, Alexander <alexander.ula...@hpe.com>
wrote:

> That is an option too. Implementing convolutions with FFTs should be
> considered as well http://arxiv.org/pdf/1312.5851.pdf.
>
>
>
> *From:* Feynman Liang [mailto:fli...@databricks.com]
> *Sent:* Tuesday, September 08, 2015 12:07 PM
> *To:* Ulanov, Alexander
> *Cc:* Ruslan Dautkhanov; Nick Pentreath; user; na...@yandex.ru
> *Subject:* Re: Spark ANN
>
>
>
> Just wondering, why do we need tensors? Is the implementation of convnets
> using im2col (see here <http://cs231n.github.io/convolutional-networks/>)
> insufficient?
>
>
>
> On Tue, Sep 8, 2015 at 11:55 AM, Ulanov, Alexander <
> alexander.ula...@hpe.com> wrote:
>
> Ruslan, thanks for including me in the discussion!
>
>
>
> Dropout and other features such as Autoencoder were implemented, but not
> merged yet in order to have room for improving the internal Layer API. For
> example, there is an ongoing work with convolutional layer that
> consumes/outputs 2D arrays. We’ll probably need to change the Layer’s
> input/output type to tensors. This will influence dropout which will need
> some refactoring to handle tensors too. Also, all new components should
> have ML pipeline public interface. There is an umbrella issue for deep
> learning in Spark https://issues.apache.org/jira/browse/SPARK-5575 which
> includes various features of Autoencoder, in particular
> https://issues.apache.org/jira/browse/SPARK-10408. You are very welcome
> to join and contribute since there is a lot of work to be done.
>
>
>
> Best regards, Alexander
>
> *From:* Ruslan Dautkhanov [mailto:dautkha...@gmail.com]
> *Sent:* Monday, September 07, 2015 10:09 PM
> *To:* Feynman Liang
> *Cc:* Nick Pentreath; user; na...@yandex.ru
> *Subject:* Re: Spark ANN
>
>
>
> Found a dropout commit from avulanov:
>
>
> https://github.com/avulanov/spark/commit/3f25e26d10ef8617e46e35953fe0ad1a178be69d
>
>
>
> It probably hasn't made its way to MLLib (yet?).
>
>
>
>
>
> --
> Ruslan Dautkhanov
>
>
>
> On Mon, Sep 7, 2015 at 8:34 PM, Feynman Liang <fli...@databricks.com>
> wrote:
>
> Unfortunately, not yet... Deep learning support (autoencoders, RBMs) is on
> the roadmap for 1.6 <https://issues.apache.org/jira/browse/SPARK-10324>
> though, and there is a spark package
> <http://spark-packages.org/package/rakeshchalasani/MLlib-dropout> for
> dropout regularized logistic regression.
>
>
>
>
>
> On Mon, Sep 7, 2015 at 3:15 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
> wrote:
>
> Thanks!
>
>
>
> It does not look Spark ANN yet supports dropout/dropconnect or any other
> techniques that help avoiding overfitting?
>
> http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf
>
> https://cs.nyu.edu/~wanli/dropc/dropc.pdf
>
>
>
> ps. There is a small copy-paste typo in
>
>
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/ann/BreezeUtil.scala#L43
>
> should read B&C :)
>
>
>
>
>
> --
> Ruslan Dautkhanov
>
>
>
> On Mon, Sep 7, 2015 at 12:47 PM, Feynman Liang <fli...@databricks.com>
> wrote:
>
> Backprop is used to compute the gradient here
> <https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala#L579-L584>,
> which is then optimized by SGD or LBFGS here
> <https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala#L878>
>
>
>
> On Mon, Sep 7, 2015 at 11:24 AM, Nick Pentreath <nick.pentre...@gmail.com>
> wrote:
>
> Haven't checked the actual code but that doc says "MLPC employes
> backpropagation for learning the model. .."?
>
>
>
>
>
>
> —
> Sent from Mailbox <https://www.dropbox.com/mailbox>
>
>
>
> On Mon, Sep 7, 2015 at 8:18 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
> wrote:
>
> http://people.apache.org/~pwendell/spark-releases/latest/ml-ann.html
>
>
>
> Implementation seems missing backpropagation?
>
> Was there is a good reason to omit BP?
>
> What are the drawbacks of a pure feedforward-only ANN?
>
>
>
> Thanks!
>
>
>
> --
> Ruslan Dautkhanov
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: Spark ANN

Reply via email to