[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206660#comment-13206660 ]
Lance Norskog commented on MAHOUT-975: -------------------------------------- The newest patch does not compile against the trunk. There is a singular/plural problem with one of the variables. I have tested this with the SGD classification example/bin/classify-20newsgroups.sh. The total accuracy dropped from 71% to 62%. The SGD example for Apache emails (subset of commons v.s. cocoon) does not work well, so I can't evaluate it with that. Can you suggest a public dataset where this change works better than the trunk? > Bug in Gradient Machine - Computation of the gradient > ------------------------------------------------------ > > Key: MAHOUT-975 > URL: https://issues.apache.org/jira/browse/MAHOUT-975 > Project: Mahout > Issue Type: Bug > Components: Classification > Affects Versions: 0.7 > Reporter: Christian Herta > Attachments: GradientMachine.patch > > > The initialisation to compute the gradient descent weight updates for the > output units should be wrong: > > In the comment: "dy / dw is just w since y = x' * w + b." > This is wrong. dy/dw is x (ignoring the indices). The same initialisation is > done in the code. > Check by using neural network terminology: > The gradient machine is a specialized version of a multi layer perceptron > (MLP). > In a MLP the gradient for computing the "weight change" for the output units > is: > dE / dw_ij = dE / dz_i * dz_i / d_ij with z_i = sum_j (w_ij * a_j) > here: i index of the output layer; j index of the hidden layer > (d stands for the partial derivatives) > here: z_i = a_i (no squashing in the output layer) > with the special loss (cost function) is E = 1 - a_g + a_b = 1 - z_g + z_b > with > g index of output unit with target value: +1 (positive class) > b: random output unit with target value: 0 > => > dE / dw_gj = -dE/dz_g * dz_g/dw_gj = -1 * a_j (a_j: activity of the hidden > unit > j) > dE / dw_bj = -dE/dz_b * dz_b/dw_bj = +1 * a_j (a_j: activity of the hidden > unit > j) > That's the same if the comment would be correct: > dy /dw = x (x is here the activation of the hidden unit) * (-1) for weights to > the output unit with target value +1. > ------------ > In neural network implementations it's common to compute the gradient > numerically for a test of the implementation. This can be done by: > dE/dw_ij = (E(w_ij + epsilon) -E(w_ij - epsilon) ) / (2* (epsilon)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira