Hi all, I have a question regarding TD(lambda) training by Tesauro (see http://www.research.ibm.com/massive/tdl.html#h2:learning_methodology).
The formula for adapting the weights of the neural net is w(t+1)-w(t) = a * [Y(t+1)-Y(t)] * sum(lambda^(t-k) * nabla(w)Y(k); k=1..t). I would like to know if nabla(w)Y(k) in the formula above is the gradient of Y(k) to the weights of the net at time t (i.e. the current net) or to the weights of the net at time k. I assume the former. Thanks in advance! greetings, boomslang _______________________________________________ Bug-gnubg mailing list Bug-gnubg@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnubg