[ https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13678668#comment-13678668 ]
Yexi Jiang commented on MAHOUT-976: ----------------------------------- Hi, I read the source code from the patch files (all the four versions) and have the following questions. 1) It seems that the source code has not fully implemented the distributed MLP. Based on my understanding, the algorithm designer intends to make the implemented MLP generic enough so that it can be used both in single machine scenario and distributed scenario. For the single machine scenario, the user can easily reuse the algorithm by writing similar code in the test cases. But for the distributed version, the user has to implement the mapper to load all the training data. And then he needs to create a MLP instance inside the mapper and train it with the incoming data. Moreover, the user has to come up with a solution to merge all the MLP weight updating in each mapper instance, which is not trivial. Therefore, it seems that the current implementation does no more than a single machine version of MLP. 2) The dimension of target Vector feed to trainOnline is always 1. This is because the actual is always an integer, and there is no post-process to make it a mutual class vector. The following is the call sequence. train -> trainOnline -> getDerivativeOfTheCostWithoutRegularization -> getOutputDeltas -> AbstractVector.assign(Vector v, DoubleDoubleFunction f) The assign method would check whether v equals to this.size. In the MLP scenario, it will check whether the size of output layer equals the size of class label. And the following is the related code. ------------------------------ public void train(long trackingKey, String groupKey, int actual, Vector instance) { // training with one pattern Vector target = new DenseVector(1); target.setQuick(0, (double) actual); trainOnline(instance, target); } ------------------------------ The reason why it passes the test cases is because the test case just create the MLP with size 1 output layer. So, I'm wondering whether the argument list of train should be changed, or argument 'actual' should be transformed internally. I have implemented a BSP based distributed MLP, and the code has already by committed to apache hama machine learning package. The BSP version is not difficult to adapt to the mapreduce framework. If it is OK, I can change my existing code and contribute it the mahout. > Implement Multilayer Perceptron > ------------------------------- > > Key: MAHOUT-976 > URL: https://issues.apache.org/jira/browse/MAHOUT-976 > Project: Mahout > Issue Type: New Feature > Affects Versions: 0.7 > Reporter: Christian Herta > Assignee: Ted Dunning > Priority: Minor > Labels: multilayer, networks, neural, perceptron > Fix For: Backlog > > Attachments: MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch, > MAHOUT-976.patch > > Original Estimate: 80h > Remaining Estimate: 80h > > Implement a multi layer perceptron > * via Matrix Multiplication > * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: > "Efficent Backprop" > * arbitrary number of hidden layers (also 0 - just the linear model) > * connection between proximate layers only > * different cost and activation functions (different activation function in > each layer) > * test of backprop by gradient checking > * normalization of the inputs (storeable) as part of the model > > First: > * implementation "stocastic gradient descent" like gradient machine > * simple gradient descent incl. momentum > Later (new jira issues): > * Distributed Batch learning (see below) > * "Stacked (Denoising) Autoencoder" - Feature Learning > * advanced cost minimazation like 2nd order methods, conjugate gradient etc. > Distribution of learning can be done by (batch learning): > 1 Partioning of the data in x chunks > 2 Learning the weight changes as matrices in each chunk > 3 Combining the matrixes and update of the weights - back to 2 > Maybe this procedure can be done with random parts of the chunks (distributed > quasi online learning). > Batch learning with delta-bar-delta heuristics for adapting the learning > rates. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira