Hi Yexi,

I was reading your code and found the MLP class is abstract-ish (both train functions throws exception). Is there a thread or ticket for shippable implementation?

Yours Peng

On Thu 27 Feb 2014 06:56:51 PM EST, peng wrote:
With pleasure! the original downpour paper propose a parameter server
from which subnodes download shards of old model and upload gradients.
So if the parameter server is down, the process has to be delayed, it
also requires that all model parameters to be stored and atomically
updated on (and fetched from) a single machine, imposing asymmetric
HDD and bandwidth requirement. This design is necessary only because
each -=delta operation has to be atomic. Which cannot be ensured
across network (e.g. on HDFS).

But it doesn't mean that the operation cannot be decentralized:
parameters can be sharded across multiple nodes and multiple
accumulator instances can handle parts of the vector subtraction. This
should be easy if you create a buffer for the stream of gradient, and
allocate proper numbers of producers and consumers on each machine to
make sure it doesn't overflow. Obviously this is far from MR
framework, but at least it can be made homogeneous and slightly faster
(because sparse data can be distributed in a way to minimize their
overlapping, so gradients doesn't have to go across the network that
frequent).

If we instead using a centralized architect. Then there must be >=1
backup parameter server for mission critical training.

Yours Peng

e.g. we can simply use a producer/consumer pattern

If we use a producer/consumer pattern for all gradients,

On Thu 27 Feb 2014 05:09:52 PM EST, Yexi Jiang wrote:
Peng,

Can you provide more details about your thought?

Regards,


2014-02-27 16:00 GMT-05:00 peng <[email protected]>:

That should be easy. But that defeats the purpose of using mahout as
there
are already enough implementations of single node backpropagation
(in which
case GPU is much faster).

Yexi:

Regarding downpour SGD and sandblaster, may I suggest that the
implementation better has no parameter server? It's obviously a single
point of failure and in terms of bandwidth, a bottleneck. I heard that
MLlib on top of Spark has a functional implementation (never read or
test
it), and its possible to build the workflow on top of YARN. Non of
those
framework has an heterogeneous topology.

Yours Peng


On Thu 27 Feb 2014 09:43:19 AM EST, Maciej Mazur (JIRA) wrote:


      [ https://issues.apache.org/jira/browse/MAHOUT-1426?page=
com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel&focusedCommentId=13913488#comment-13913488 ]

Maciej Mazur edited comment on MAHOUT-1426 at 2/27/14 2:41 PM:
---------------------------------------------------------------

I've read the papers. I didn't think about distributed network. I
had in
mind network that will fit into memory, but will require
significant amount
of computations.

I understand that there are better options for neural networks than
map
reduce.
How about non-map-reduce version?
I see that you think it is something that would make a sense. (Doing a
non-map-reduce neural network in Mahout would be of substantial
interest.)
Do you think it will be a valueable contribution?
Is there a need for this type of algorithm?
I think about multi-threded batch gradient descent with pretraining
(RBM
or/and Autoencoders).

I have looked into these old JIRAs. RBM patch was withdrawn.
"I would rather like to withdraw that patch, because by the time i
implemented it i didn't know that the learning algorithm is not
suited for
MR, so I think there is no point including the patch."


was (Author: maciejmazur):
I've read the papers. I didn't think about distributed network. I
had in
mind network that will fit into memory, but will require
significant amount
of computations.

I understand that there are better options for neural networks than
map
reduce.
How about non-map-reduce version?
I see that you think it is something that would make a sense.
Do you think it will be a valueable contribution?
Is there a need for this type of algorithm?
I think about multi-threded batch gradient descent with pretraining
(RBM
or/and Autoencoders).

I have looked into these old JIRAs. RBM patch was withdrawn.
"I would rather like to withdraw that patch, because by the time i
implemented it i didn't know that the learning algorithm is not
suited for
MR, so I think there is no point including the patch."

  GSOC 2013 Neural network algorithms
-----------------------------------

                  Key: MAHOUT-1426
                  URL:
https://issues.apache.org/jira/browse/MAHOUT-1426
              Project: Mahout
           Issue Type: Improvement
           Components: Classification
             Reporter: Maciej Mazur

I would like to ask about possibilites of implementing neural network
algorithms in mahout during GSOC.
There is a classifier.mlp package with neural network.
I can't see neighter RBM  nor Autoencoder in these classes.
There is only one word about Autoencoders in NeuralNetwork class.
As far as I know Mahout doesn't support convolutional networks.
Is it a good idea to implement one of these algorithms?
Is it a reasonable amount of work?
How hard is it to get GSOC in Mahout?
Did anyone succeed last year?




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)




Reply via email to