Dear all,
There has been a lot of interest recently in performing deep learning
without backprop - papers by Geoff Hinton and Barak Pearlmutter in
particular showing how supervised and self-supervised learning can be
achieved solely with forward message propagation. We wanted to mention
here our own work dating back to 2019, in which we realised real-time
reinforcement learning in a closed-loop setting via the forward
propagation of error signals. Naturally forward propagation has the
attraction of biological feasibility, but from our point of view the
more important point is that error signals are then defined in the space
of sensory inputs rather than motor outputs, which makes a lot more
sense in the context of a self-contained autonomous agent. Rather than
the reward functions of Q learning, we use reflex circuits to generate
the errors: the goal of sensory learning is to predict disturbances
before they occur, thereby silencing the reflex.
To see the system in operation, here is a real robot learning a simple
line-following task, where the system initially uses a light sensor to
drive a reflex steering action, and learning discovers predictive
information from vision which pre-empts this reflex:
https://sigmoid.social/@berndporr/109534507823054808
Because the algorithm works for deep networks, arbitrarily complex
predictive cues can in principle be discovered.
Source code and paper:
https://github.com/glasgowneuro/feedforward_closedloop_learning
We would be highly interested to hear views on this approach.
/Bernd Porr and Paul Miller
--
http://www.berndporr.me.uk
http://www.attys.tech
+44 (0)7840 340069