[ 
https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680476#comment-13680476
 ] 

Yexi Jiang commented on MAHOUT-975:
-----------------------------------

There are multiple problems (not only bugs) with the GradientMachine (based on 
Ted's revised version). If there is not time to pay attention to this issue, 
please ignore it until next week (when 0.8 is released).

1) The GradientMachine is a special case of MultiLayerPerceptron (MLP) that 
contains only 1 hidden layer. Is it necessary to have it if the 
MultiLayerPerceptron is in the plan?

2) The hiddenToOutput seems not correct. The squashing(activation) function 
should also apply to the output layer (See [1][2][3][4]). Therefore, the range 
of the output for each node(neuron) in the output is (0, 1) if Sigmoid function 
is used, or (-1, 1) if Tanh function is used.

3) There are several problems with the training method. In updateRanking, I 
don't know which weight update strategy is used, it claims it is 
back-propagation, but it is not implemented in that way. 

3.1) It seems that only part of the outputWeight are updated (the weights 
associated with the good output node, and the weights associated with the worst 
output node. Again, this is OK for two-class problem).
For back-propagation, all the weights between the last hidden layer and the 
output layer should be updated. So, is the original designer intentionally 
design it like that and can guarantee its correctness?

In the backpropagation way, the delta of each node should be calculated first, 
and the weight of each node is adjusted based on the corresponding delta. 
However, in the implemented code, 
   
 3.2) The GradientMachine (and MLP) actually can also be used for regression 
and prediction. The 'train' method of OnlineLearner restricts its power.
    
4) The corresponding test case is not enough to test the correctness of the 
implementation.

5) If all the previous problems have been fixed, it is time to consider the 
necessity of a map-reduce version of the algorithm.
 

Reference:
[1] Tom Mitchel. Machine Learning. Chapter 4.
[2] Jiawei Han. Data Mining Concepts and Technologies. Chapter 6.
[3] Stanford Unsupervised Feature Learning and Deep Learning tutorial. 
http://ufldl.stanford.edu/wiki/index.php/Neural_Networks. Section Neural 
Network.
[4] Christopher Bishop. Neural Networks for Pattern Recognition. Chapter 4.


                
> Bug in Gradient Machine  - Computation of the gradient
> ------------------------------------------------------
>
>                 Key: MAHOUT-975
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-975
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.7
>            Reporter: Christian Herta
>            Assignee: Ted Dunning
>             Fix For: Backlog
>
>         Attachments: GradientMachine2.java, GradientMachine.patch, 
> MAHOUT-975.patch
>
>
> The initialisation to compute the gradient descent weight updates for the 
> output units should be wrong:
>  
> In the comment: "dy / dw is just w since  y = x' * w + b."
> This is wrong. dy/dw is x (ignoring the indices). The same initialisation is 
> done in the code.
> Check by using neural network terminology:
> The gradient machine is a specialized version of a multi layer perceptron 
> (MLP).
> In a MLP the gradient for computing the "weight change" for the output units 
> is:
> dE / dw_ij = dE / dz_i * dz_i / d_ij with z_i = sum_j (w_ij * a_j)
> here: i index of the output layer; j index of the hidden layer
> (d stands for the partial derivatives)
> here: z_i = a_i (no squashing in the output layer)
> with the special loss (cost function) is  E = 1 - a_g + a_b = 1 - z_g + z_b
> with
> g index of output unit with target value: +1 (positive class)
> b: random output unit with target value: 0
> =>
> dE / dw_gj = -dE/dz_g * dz_g/dw_gj = -1 * a_j (a_j: activity of the hidden 
> unit
> j)
> dE / dw_bj = -dE/dz_b * dz_b/dw_bj = +1 * a_j (a_j: activity of the hidden 
> unit
> j)
> That's the same if the comment would be correct:
> dy /dw = x (x is here the activation of the hidden unit) * (-1) for weights to
> the output unit with target value +1.
> ------------
> In neural network implementations it's common to compute the gradient
> numerically for a test of the implementation. This can be done by:
> dE/dw_ij = (E(w_ij + epsilon) -E(w_ij - epsilon) ) / (2* (epsilon))

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to