Re: [agi] dopamine and reward prediction error

2007-04-17 Thread Richard Loosemore

William Pearson wrote:

On 13/04/07, Richard Loosemore [EMAIL PROTECTED] wrote:

To convey this subtlety as simply as I can, I would suggest that you ask
yourself how much intelligence is being assumed in the preprocessing
system that does the work of (a) picking out patterns to be considered
by the system, and (b) picking the particular patterns that are to be
rewarded, according to some success criterion.  Here is the problem:
if you are not careful you will assume MORE intelligence in the
preprocessor than you were hoping to get the core of the system to
learn.  There are other issues, but that is one of the main ones.


For the record I agree with this critique of some of the neuroscience
views of reinforcement learning in the brain.


What I find tremendously frustrating is the fact that people are still
so dismally unaware of these issues that they come out with statement
such as the one in the quote:  speaking as if the idea of reward
assigment was a fantastic idea, and as if the neuroscience discovery of
a possible mechanism really meant anything.  The neuroscience discovery
was bound to collapse:  I said that much of it the first time I heard of
it, and I am glad that it has now happened so quickly.  The depressing
part is that the folks who showed it to be wrong think that they can
still tinker with the mechanism and salvage something out of it.


It think they do this because they haven't found a better hypothesis
and have too much invested in the previous status quo. I'd be curious
to know if your hypothesis for a motivation system has the potential
for the same simple signal given to systems, with different histories,
to cause the system to attempt to get the same signal again (addiction
being the pure example of this). This is one of the important
phenomenon I require a motivational system to explain.



This is an interesting question.

Addiction is clearly a pathology of the human motivational system, but 
the explanation for it could lie at a number of levels.  Maybe the 
system is designed in such a way that a high-level imbalance occurs in 
some cases ... and then again, in other cases, it might be caused by a 
low-level problem.


Example of high level.  Suppose the system is designed to work by means 
of a checks and balances mechanism, where too much of a given type of 
desirable activity starts to cause the eward for that activity to 
decrease, making the system susceptible to new ideas for what it would 
like to do.  The default settings for these habituation effects would 
veary between individuals, but overall we might expect that some would 
be highly tolerated because whenever a normal environment provided that 
source of activity, the environment always stops supplying the activity 
before the person can get enough of it.  Example:  playing computer 
games.  When the natural environment supplies a fascinating game 
situation, there are great benefits to getting good at it, so the system 
is wired to let the human try to get as much as it wants of the game 
playing, but only if the game is very complex and engaging.  Such games 
that involve a quick series of challenges by the environment (usually 
another human being) are so rare that the human system basically says 
get as much of this as you possibly can, because it is rare, and you 
can never get enough of this.  But when the system says no limits to 
this activity, it does not take account of the invention of computer 
games which can go on indefinitely.  Thus there is the possibility of 
addiction because the design of the system wrongly assumed the 
environment would never allow the human to get an infinite amount of 
this activity.


Example of the low level.  Chemical addiction can cause one part of the 
brain to generate vast quantities of (a) novel stimuli (e.g. 
hallucinogenics) or (b) satisfaction signals attached to an arbitrary 
activity (e.g. drugs that cause people to feel exceptionally good for no 
reason whatsoever).  In both these cases the wiring of the system has 
been subverted, causing a situation where the normal process starts to 
require the drug because the satisfaction caused by the drug is greater 
than any other activity.


Clearly, these are very simple types of explanation.  Quite mundane, 
really.  The problem with explaining any given addiction is that we 
cannot yet be sure at what level it is happening, so good explanations 
will have to wait until we can do more advanced work.


Stepping back a bit from your question:  note that my main interest is 
in building artiifical motivation systems inspired by the human system, 
so the goal of precisely explaining breakdowns of the human system is 
not directly in my sights.  Subtle difference, but a big one.




Richard Loosemore.










-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=231415user_secret=fabd7936


[agi] dopamine and reward prediction error

2007-04-13 Thread Eugen Leitl

http://scienceblogs.com/developingintelligence/2007/04/the_death_of_a_beautiful_theor.php

The Death of a Beautiful Theory? Dopamine And Reward Prediction Error

Category: Artificial Intelligence • Cognitive Neuroscience • Computational
Modeling Posted on: April 11, 2007 12:07 PM, by Chris Chatham

Very early in the history of artificial intelligence research, it was
apparent that cognitive agents needed to be able to maximize reward by
changing their behavior. But this leads to a credit-assignment problem: how
does the agent know which of its actions led to the reward? An early solution
was to select the behavior with the maximal predicted rewards, and to later
adjust the likelihood of that behavior according to whether it ultimately led
to the anticipated reward. These temporal-difference errors in reward
prediction were first implemented in a 1950's checker-playing program, before
exploding in popularity some 30 years later.

This repopularization seemed to originate from a tantalizing discovery: the
brain's most ancient structures were releasing dopamine in exactly the way
predicted by temporal-difference learning algorithms. Specifically, dopamine
release in the ventral tegmental area (VTA) decreased in response to stimuli
that were repeatedly paired without a reward - as though dopamine levels
dipped to signal the overprediction (and under-delivery) of a reward.
Secondly, dopamine release abruptly spikes in response to stimuli that are
suddenly paired with a reward - as though dopamine is signaling the
underprediction (and over-delivery) of a reward. Finally, when a
previously-rewarded stimulus is no longer rewarded, dopamine levels dip,
again suggesting overprediction and underdelivery of reward.

Thus, a beautiful computational theory was garnering support from some
unusually beautiful data in neuroscience. Dopamine appeared to rise for items
that predicted a reward, to dropped for items that predict an absence of
reward, and to show no response to neutral stimuli. But as noted by Thomas
Huxley, in science many a beautiful theory has been destroyed by an ugly
fact.

These ugly facts are presented in Redgrave and Gurney's new NRN article that
is circulating the field of computational neuroscience. Among the ugliest:

1) Dopamine spikes in response to novel items which have never been paired
with reward, and thus have no predictive value.

2) The latency and duration of dopamine spikes is constant across species,
experiments, stimulus modality and stimulus complexity. In contrast, reward
prediction should take longer to establish in some situations than others -
for example, reward prediction may be slower for more complex stimuli.

3) The dopamine signal actually occurs before animals have even been able to
fixate on a stimulus - this questions the extent to which this signal is
mechanistically capable of the reward prediction error function.

4) VTA dopamine neurons fire simultaneous with (and possibly even before)
object recognition is completed in the infero-temporal cortex, and
simultaneous with visual responses in striatum and subthalamic nucleus. It
seems unlikely that VTA can perform both object recognition and reward
prediction error.

5) The most likely visual signal to these VTA neurons may originate from
superior colliculus, a region that is sensitive to spatial changes but not
those that would be involved in object processing per se.

6) Many of the experiments showing the apparent dopaminergic-coding of reward
prediction error had stimuli that differed not only in reward value but also
in spatial location. Therefore, data in support of reward prediction error is
confounded with hypotheses involving spatial selectivity.

Redgrave  Gurney suggest that VTA dopamine neurons fire too quickly and with
too little detailed visual input to actually accomplish the calculation of
errors in reward prediction. They advocate an alternative theory in which
temporal prediction is still key, but instead of encoding reward prediction,
dopamine neurons are actually signalling the reinforcement of
actions/movements that immediately precede a biologically salient event.

To understand this claim, consider Redgrave  Gurney's point that most
temporally unexpected transient events in nature are also spatially
unpredictable. The theory is basically that a system notes its own
uncertainty, via the spatial reorientation in the superior colliculus, and
attempts to reduce that uncertainty by pairing a running record of previous
movements with the unexpected event.

Although this alternative theory is intriguing, there is not an abundance of
evidence supporting it: it seems to me more like a pastiche of fragments from
the apparently broken reward prediction error hypothesis.

We should also be cautious in discarding any theory as powerful as the reward
prediction error hypothesis on the basis of null evidence: in this case, we
simply don't know how reward prediction error could be calculated so quickly.
This kind 

Re: [agi] dopamine and reward prediction error

2007-04-13 Thread Richard Loosemore


This is actually a good illustration of how people can become obsessed 
with microtheories because they don't have a comprehensive, broad-based 
understanding of the problem.


The beautiful theory is actually not a beautiful theory at all!  It 
was complete nonsense right from he beginning.  The whole idea of reward 
assigment contains some very subtle traps having to do with what defines 
the reward.


To convey this subtlety as simply as I can, I would suggest that you ask 
yourself how much intelligence is being assumed in the preprocessing 
system that does the work of (a) picking out patterns to be considered 
by the system, and (b) picking the particular patterns that are to be 
rewarded, according to some success criterion.  Here is the problem: 
if you are not careful you will assume MORE intelligence in the 
preprocessor than you were hoping to get the core of the system to 
learn.  There are other issues, but that is one of the main ones.


The behaviorists founded an entire 'science' on this mistake.

What I find tremendously frustrating is the fact that people are still 
so dismally unaware of these issues that they come out with statement 
such as the one in the quote:  speaking as if the idea of reward 
assigment was a fantastic idea, and as if the neuroscience discovery of 
a possible mechanism really meant anything.  The neuroscience discovery 
was bound to collapse:  I said that much of it the first time I heard of 
it, and I am glad that it has now happened so quickly.  The depressing 
part is that the folks who showed it to be wrong think that they can 
still tinker with the mechanism and salvage something out of it.


So long as this field (the general field of AI/Psychology/Neuroscience) 
is populated with people who have narrow perspectives, and who keep 
repeating mistakes and running around in circles, we are going to get 
absolutely nowhere.


That makes me think of something else I meant to say, but I will put 
that in a separate message.




Richard Loosemore







Eugen Leitl wrote:

http://scienceblogs.com/developingintelligence/2007/04/the_death_of_a_beautiful_theor.php

The Death of a Beautiful Theory? Dopamine And Reward Prediction Error

Category: Artificial Intelligence • Cognitive Neuroscience • Computational
Modeling Posted on: April 11, 2007 12:07 PM, by Chris Chatham

Very early in the history of artificial intelligence research, it was
apparent that cognitive agents needed to be able to maximize reward by
changing their behavior. But this leads to a credit-assignment problem: how
does the agent know which of its actions led to the reward? An early solution
was to select the behavior with the maximal predicted rewards, and to later
adjust the likelihood of that behavior according to whether it ultimately led
to the anticipated reward. These temporal-difference errors in reward
prediction were first implemented in a 1950's checker-playing program, before
exploding in popularity some 30 years later.

This repopularization seemed to originate from a tantalizing discovery: the
brain's most ancient structures were releasing dopamine in exactly the way
predicted by temporal-difference learning algorithms. Specifically, dopamine
release in the ventral tegmental area (VTA) decreased in response to stimuli
that were repeatedly paired without a reward - as though dopamine levels
dipped to signal the overprediction (and under-delivery) of a reward.
Secondly, dopamine release abruptly spikes in response to stimuli that are
suddenly paired with a reward - as though dopamine is signaling the
underprediction (and over-delivery) of a reward. Finally, when a
previously-rewarded stimulus is no longer rewarded, dopamine levels dip,
again suggesting overprediction and underdelivery of reward.

Thus, a beautiful computational theory was garnering support from some
unusually beautiful data in neuroscience. Dopamine appeared to rise for items
that predicted a reward, to dropped for items that predict an absence of
reward, and to show no response to neutral stimuli. But as noted by Thomas
Huxley, in science many a beautiful theory has been destroyed by an ugly
fact.

These ugly facts are presented in Redgrave and Gurney's new NRN article that
is circulating the field of computational neuroscience. Among the ugliest:

1) Dopamine spikes in response to novel items which have never been paired
with reward, and thus have no predictive value.

2) The latency and duration of dopamine spikes is constant across species,
experiments, stimulus modality and stimulus complexity. In contrast, reward
prediction should take longer to establish in some situations than others -
for example, reward prediction may be slower for more complex stimuli.

3) The dopamine signal actually occurs before animals have even been able to
fixate on a stimulus - this questions the extent to which this signal is
mechanistically capable of the reward prediction error function.

4) VTA dopamine neurons fire 

Re: [agi] dopamine and reward prediction error

2007-04-13 Thread William Pearson

On 13/04/07, Richard Loosemore [EMAIL PROTECTED] wrote:

To convey this subtlety as simply as I can, I would suggest that you ask
yourself how much intelligence is being assumed in the preprocessing
system that does the work of (a) picking out patterns to be considered
by the system, and (b) picking the particular patterns that are to be
rewarded, according to some success criterion.  Here is the problem:
if you are not careful you will assume MORE intelligence in the
preprocessor than you were hoping to get the core of the system to
learn.  There are other issues, but that is one of the main ones.


For the record I agree with this critique of some of the neuroscience
views of reinforcement learning in the brain.


What I find tremendously frustrating is the fact that people are still
so dismally unaware of these issues that they come out with statement
such as the one in the quote:  speaking as if the idea of reward
assigment was a fantastic idea, and as if the neuroscience discovery of
a possible mechanism really meant anything.  The neuroscience discovery
was bound to collapse:  I said that much of it the first time I heard of
it, and I am glad that it has now happened so quickly.  The depressing
part is that the folks who showed it to be wrong think that they can
still tinker with the mechanism and salvage something out of it.


It think they do this because they haven't found a better hypothesis
and have too much invested in the previous status quo. I'd be curious
to know if your hypothesis for a motivation system has the potential
for the same simple signal given to systems, with different histories,
to cause the system to attempt to get the same signal again (addiction
being the pure example of this). This is one of the important
phenomenon I require a motivational system to explain.

 Will Pearson

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=231415user_secret=fabd7936