[
https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209446#comment-13209446
]
Ted Dunning commented on MAHOUT-976:
------------------------------------
Also, John has had very good results in Vowpal Wabbit with an allreduce
operation in his learning system. The way that this works is that he launches
a map-only learning task which reads inputs repeatedly and propagates the
gradient vector every pass over the data using an all-reduce operation. All
reduce applies an associative aggregation to a data structure in a tree
structure imposed on the network. The result of the aggregation is passed back
down the tree to all nodes.
This allows fast iteration of learning and could also speed up our k-means
codes massively. Typically, this improves speeds by about 2 orders of
magnitude because the horrid costs of Hadoop job starts go away.
Would you be interested in experimenting with this in your parallel
implementation here?
> Implement Multilayer Perceptron
> -------------------------------
>
> Key: MAHOUT-976
> URL: https://issues.apache.org/jira/browse/MAHOUT-976
> Project: Mahout
> Issue Type: New Feature
> Affects Versions: 0.7
> Reporter: Christian Herta
> Priority: Minor
> Labels: multilayer, networks, neural, perceptron
> Attachments: MAHOUT-976.patch, MAHOUT-976.patch
>
> Original Estimate: 80h
> Remaining Estimate: 80h
>
> Implement a multi layer perceptron
> * via Matrix Multiplication
> * Learning by Backpropagation; implementing tricks by Yann LeCun et al.:
> "Efficent Backprop"
> * arbitrary number of hidden layers (also 0 - just the linear model)
> * connection between proximate layers only
> * different cost and activation functions (different activation function in
> each layer)
> * test of backprop by gradient checking
> * normalization of the inputs (storeable) as part of the model
>
> First:
> * implementation "stocastic gradient descent" like gradient machine
> * simple gradient descent incl. momentum
> Later (new jira issues):
> * Distributed Batch learning (see below)
> * "Stacked (Denoising) Autoencoder" - Feature Learning
> * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
> 1 Partioning of the data in x chunks
> 2 Learning the weight changes as matrices in each chunk
> 3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed
> quasi online learning).
> Batch learning with delta-bar-delta heuristics for adapting the learning
> rates.
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira