[ 
https://issues.apache.org/jira/browse/MAHOUT-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13678668#comment-13678668
 ] 

Yexi Jiang commented on MAHOUT-976:
-----------------------------------

Hi, 

I read the source code from the patch files (all the four versions) and have 
the following questions.

1) It seems that the source code has not fully implemented the distributed MLP.

Based on my understanding, the algorithm designer intends to make the 
implemented MLP generic enough so that it can be used both in single machine 
scenario and distributed scenario.

For the single machine scenario, the user can easily reuse the algorithm by 
writing similar code in the test cases. But for the distributed version, the 
user has to implement the mapper to load all the training data. And then he 
needs to create a MLP instance inside the mapper and train it with the incoming 
data. Moreover, the user has to come up with a solution to merge all the MLP 
weight updating in each mapper instance, which is not trivial.

Therefore, it seems that the current implementation does no more than a single 
machine version of MLP.



2) The dimension of target Vector feed to trainOnline is always 1. This is 
because the actual is always an integer, and there is no post-process to make 
it a mutual class vector.

The following is the call sequence.
train -> trainOnline -> getDerivativeOfTheCostWithoutRegularization -> 
getOutputDeltas -> AbstractVector.assign(Vector v, DoubleDoubleFunction f)

The assign method would check whether v equals to this.size. In the MLP 
scenario, it will check whether the size of output layer equals the size of 
class label.

And the following is the related code.
------------------------------
public void train(long trackingKey, String groupKey, int actual,
      Vector instance) {
    // training with one pattern
    Vector target = new DenseVector(1);
    target.setQuick(0, (double) actual);
    trainOnline(instance, target);
  }
------------------------------

The reason why it passes the test cases is because the test case just create 
the MLP with size 1 output layer.

So, I'm wondering whether the argument list of train should be changed, or 
argument 'actual' should be transformed internally.



I have implemented a BSP based distributed MLP, and the code has already by 
committed to apache hama machine learning package. The BSP version is not 
difficult to adapt to the mapreduce framework. If it is OK, I can change my 
existing code and contribute it the mahout.


                
> Implement Multilayer Perceptron
> -------------------------------
>
>                 Key: MAHOUT-976
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-976
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Assignee: Ted Dunning
>            Priority: Minor
>              Labels: multilayer, networks, neural, perceptron
>             Fix For: Backlog
>
>         Attachments: MAHOUT-976.patch, MAHOUT-976.patch, MAHOUT-976.patch, 
> MAHOUT-976.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> Implement a multi layer perceptron
>  * via Matrix Multiplication
>  * Learning by Backpropagation; implementing tricks by Yann LeCun et al.: 
> "Efficent Backprop"
>  * arbitrary number of hidden layers (also 0  - just the linear model)
>  * connection between proximate layers only 
>  * different cost and activation functions (different activation function in 
> each layer) 
>  * test of backprop by gradient checking 
>  * normalization of the inputs (storeable) as part of the model
>  
> First:
>  * implementation "stocastic gradient descent" like gradient machine
>  * simple gradient descent incl. momentum
> Later (new jira issues):  
>  * Distributed Batch learning (see below)  
>  * "Stacked (Denoising) Autoencoder" - Feature Learning
>  * advanced cost minimazation like 2nd order methods, conjugate gradient etc.
> Distribution of learning can be done by (batch learning):
>  1 Partioning of the data in x chunks 
>  2 Learning the weight changes as matrices in each chunk
>  3 Combining the matrixes and update of the weights - back to 2
> Maybe this procedure can be done with random parts of the chunks (distributed 
> quasi online learning). 
> Batch learning with delta-bar-delta heuristics for adapting the learning 
> rates.    
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to