Re: [agi] dopamine and reward prediction error
William Pearson wrote: On 13/04/07, Richard Loosemore [EMAIL PROTECTED] wrote: To convey this subtlety as simply as I can, I would suggest that you ask yourself how much intelligence is being assumed in the preprocessing system that does the work of (a) picking out patterns to be considered by the system, and (b) picking the particular patterns that are to be rewarded, according to some success criterion. Here is the problem: if you are not careful you will assume MORE intelligence in the preprocessor than you were hoping to get the core of the system to learn. There are other issues, but that is one of the main ones. For the record I agree with this critique of some of the neuroscience views of reinforcement learning in the brain. What I find tremendously frustrating is the fact that people are still so dismally unaware of these issues that they come out with statement such as the one in the quote: speaking as if the idea of reward assigment was a fantastic idea, and as if the neuroscience discovery of a possible mechanism really meant anything. The neuroscience discovery was bound to collapse: I said that much of it the first time I heard of it, and I am glad that it has now happened so quickly. The depressing part is that the folks who showed it to be wrong think that they can still tinker with the mechanism and salvage something out of it. It think they do this because they haven't found a better hypothesis and have too much invested in the previous status quo. I'd be curious to know if your hypothesis for a motivation system has the potential for the same simple signal given to systems, with different histories, to cause the system to attempt to get the same signal again (addiction being the pure example of this). This is one of the important phenomenon I require a motivational system to explain. This is an interesting question. Addiction is clearly a pathology of the human motivational system, but the explanation for it could lie at a number of levels. Maybe the system is designed in such a way that a high-level imbalance occurs in some cases ... and then again, in other cases, it might be caused by a low-level problem. Example of high level. Suppose the system is designed to work by means of a checks and balances mechanism, where too much of a given type of desirable activity starts to cause the eward for that activity to decrease, making the system susceptible to new ideas for what it would like to do. The default settings for these habituation effects would veary between individuals, but overall we might expect that some would be highly tolerated because whenever a normal environment provided that source of activity, the environment always stops supplying the activity before the person can get enough of it. Example: playing computer games. When the natural environment supplies a fascinating game situation, there are great benefits to getting good at it, so the system is wired to let the human try to get as much as it wants of the game playing, but only if the game is very complex and engaging. Such games that involve a quick series of challenges by the environment (usually another human being) are so rare that the human system basically says get as much of this as you possibly can, because it is rare, and you can never get enough of this. But when the system says no limits to this activity, it does not take account of the invention of computer games which can go on indefinitely. Thus there is the possibility of addiction because the design of the system wrongly assumed the environment would never allow the human to get an infinite amount of this activity. Example of the low level. Chemical addiction can cause one part of the brain to generate vast quantities of (a) novel stimuli (e.g. hallucinogenics) or (b) satisfaction signals attached to an arbitrary activity (e.g. drugs that cause people to feel exceptionally good for no reason whatsoever). In both these cases the wiring of the system has been subverted, causing a situation where the normal process starts to require the drug because the satisfaction caused by the drug is greater than any other activity. Clearly, these are very simple types of explanation. Quite mundane, really. The problem with explaining any given addiction is that we cannot yet be sure at what level it is happening, so good explanations will have to wait until we can do more advanced work. Stepping back a bit from your question: note that my main interest is in building artiifical motivation systems inspired by the human system, so the goal of precisely explaining breakdowns of the human system is not directly in my sights. Subtle difference, but a big one. Richard Loosemore. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=231415user_secret=fabd7936
[agi] dopamine and reward prediction error
http://scienceblogs.com/developingintelligence/2007/04/the_death_of_a_beautiful_theor.php The Death of a Beautiful Theory? Dopamine And Reward Prediction Error Category: Artificial Intelligence • Cognitive Neuroscience • Computational Modeling Posted on: April 11, 2007 12:07 PM, by Chris Chatham Very early in the history of artificial intelligence research, it was apparent that cognitive agents needed to be able to maximize reward by changing their behavior. But this leads to a credit-assignment problem: how does the agent know which of its actions led to the reward? An early solution was to select the behavior with the maximal predicted rewards, and to later adjust the likelihood of that behavior according to whether it ultimately led to the anticipated reward. These temporal-difference errors in reward prediction were first implemented in a 1950's checker-playing program, before exploding in popularity some 30 years later. This repopularization seemed to originate from a tantalizing discovery: the brain's most ancient structures were releasing dopamine in exactly the way predicted by temporal-difference learning algorithms. Specifically, dopamine release in the ventral tegmental area (VTA) decreased in response to stimuli that were repeatedly paired without a reward - as though dopamine levels dipped to signal the overprediction (and under-delivery) of a reward. Secondly, dopamine release abruptly spikes in response to stimuli that are suddenly paired with a reward - as though dopamine is signaling the underprediction (and over-delivery) of a reward. Finally, when a previously-rewarded stimulus is no longer rewarded, dopamine levels dip, again suggesting overprediction and underdelivery of reward. Thus, a beautiful computational theory was garnering support from some unusually beautiful data in neuroscience. Dopamine appeared to rise for items that predicted a reward, to dropped for items that predict an absence of reward, and to show no response to neutral stimuli. But as noted by Thomas Huxley, in science many a beautiful theory has been destroyed by an ugly fact. These ugly facts are presented in Redgrave and Gurney's new NRN article that is circulating the field of computational neuroscience. Among the ugliest: 1) Dopamine spikes in response to novel items which have never been paired with reward, and thus have no predictive value. 2) The latency and duration of dopamine spikes is constant across species, experiments, stimulus modality and stimulus complexity. In contrast, reward prediction should take longer to establish in some situations than others - for example, reward prediction may be slower for more complex stimuli. 3) The dopamine signal actually occurs before animals have even been able to fixate on a stimulus - this questions the extent to which this signal is mechanistically capable of the reward prediction error function. 4) VTA dopamine neurons fire simultaneous with (and possibly even before) object recognition is completed in the infero-temporal cortex, and simultaneous with visual responses in striatum and subthalamic nucleus. It seems unlikely that VTA can perform both object recognition and reward prediction error. 5) The most likely visual signal to these VTA neurons may originate from superior colliculus, a region that is sensitive to spatial changes but not those that would be involved in object processing per se. 6) Many of the experiments showing the apparent dopaminergic-coding of reward prediction error had stimuli that differed not only in reward value but also in spatial location. Therefore, data in support of reward prediction error is confounded with hypotheses involving spatial selectivity. Redgrave Gurney suggest that VTA dopamine neurons fire too quickly and with too little detailed visual input to actually accomplish the calculation of errors in reward prediction. They advocate an alternative theory in which temporal prediction is still key, but instead of encoding reward prediction, dopamine neurons are actually signalling the reinforcement of actions/movements that immediately precede a biologically salient event. To understand this claim, consider Redgrave Gurney's point that most temporally unexpected transient events in nature are also spatially unpredictable. The theory is basically that a system notes its own uncertainty, via the spatial reorientation in the superior colliculus, and attempts to reduce that uncertainty by pairing a running record of previous movements with the unexpected event. Although this alternative theory is intriguing, there is not an abundance of evidence supporting it: it seems to me more like a pastiche of fragments from the apparently broken reward prediction error hypothesis. We should also be cautious in discarding any theory as powerful as the reward prediction error hypothesis on the basis of null evidence: in this case, we simply don't know how reward prediction error could be calculated so quickly. This kind
Re: [agi] dopamine and reward prediction error
This is actually a good illustration of how people can become obsessed with microtheories because they don't have a comprehensive, broad-based understanding of the problem. The beautiful theory is actually not a beautiful theory at all! It was complete nonsense right from he beginning. The whole idea of reward assigment contains some very subtle traps having to do with what defines the reward. To convey this subtlety as simply as I can, I would suggest that you ask yourself how much intelligence is being assumed in the preprocessing system that does the work of (a) picking out patterns to be considered by the system, and (b) picking the particular patterns that are to be rewarded, according to some success criterion. Here is the problem: if you are not careful you will assume MORE intelligence in the preprocessor than you were hoping to get the core of the system to learn. There are other issues, but that is one of the main ones. The behaviorists founded an entire 'science' on this mistake. What I find tremendously frustrating is the fact that people are still so dismally unaware of these issues that they come out with statement such as the one in the quote: speaking as if the idea of reward assigment was a fantastic idea, and as if the neuroscience discovery of a possible mechanism really meant anything. The neuroscience discovery was bound to collapse: I said that much of it the first time I heard of it, and I am glad that it has now happened so quickly. The depressing part is that the folks who showed it to be wrong think that they can still tinker with the mechanism and salvage something out of it. So long as this field (the general field of AI/Psychology/Neuroscience) is populated with people who have narrow perspectives, and who keep repeating mistakes and running around in circles, we are going to get absolutely nowhere. That makes me think of something else I meant to say, but I will put that in a separate message. Richard Loosemore Eugen Leitl wrote: http://scienceblogs.com/developingintelligence/2007/04/the_death_of_a_beautiful_theor.php The Death of a Beautiful Theory? Dopamine And Reward Prediction Error Category: Artificial Intelligence • Cognitive Neuroscience • Computational Modeling Posted on: April 11, 2007 12:07 PM, by Chris Chatham Very early in the history of artificial intelligence research, it was apparent that cognitive agents needed to be able to maximize reward by changing their behavior. But this leads to a credit-assignment problem: how does the agent know which of its actions led to the reward? An early solution was to select the behavior with the maximal predicted rewards, and to later adjust the likelihood of that behavior according to whether it ultimately led to the anticipated reward. These temporal-difference errors in reward prediction were first implemented in a 1950's checker-playing program, before exploding in popularity some 30 years later. This repopularization seemed to originate from a tantalizing discovery: the brain's most ancient structures were releasing dopamine in exactly the way predicted by temporal-difference learning algorithms. Specifically, dopamine release in the ventral tegmental area (VTA) decreased in response to stimuli that were repeatedly paired without a reward - as though dopamine levels dipped to signal the overprediction (and under-delivery) of a reward. Secondly, dopamine release abruptly spikes in response to stimuli that are suddenly paired with a reward - as though dopamine is signaling the underprediction (and over-delivery) of a reward. Finally, when a previously-rewarded stimulus is no longer rewarded, dopamine levels dip, again suggesting overprediction and underdelivery of reward. Thus, a beautiful computational theory was garnering support from some unusually beautiful data in neuroscience. Dopamine appeared to rise for items that predicted a reward, to dropped for items that predict an absence of reward, and to show no response to neutral stimuli. But as noted by Thomas Huxley, in science many a beautiful theory has been destroyed by an ugly fact. These ugly facts are presented in Redgrave and Gurney's new NRN article that is circulating the field of computational neuroscience. Among the ugliest: 1) Dopamine spikes in response to novel items which have never been paired with reward, and thus have no predictive value. 2) The latency and duration of dopamine spikes is constant across species, experiments, stimulus modality and stimulus complexity. In contrast, reward prediction should take longer to establish in some situations than others - for example, reward prediction may be slower for more complex stimuli. 3) The dopamine signal actually occurs before animals have even been able to fixate on a stimulus - this questions the extent to which this signal is mechanistically capable of the reward prediction error function. 4) VTA dopamine neurons fire
Re: [agi] dopamine and reward prediction error
On 13/04/07, Richard Loosemore [EMAIL PROTECTED] wrote: To convey this subtlety as simply as I can, I would suggest that you ask yourself how much intelligence is being assumed in the preprocessing system that does the work of (a) picking out patterns to be considered by the system, and (b) picking the particular patterns that are to be rewarded, according to some success criterion. Here is the problem: if you are not careful you will assume MORE intelligence in the preprocessor than you were hoping to get the core of the system to learn. There are other issues, but that is one of the main ones. For the record I agree with this critique of some of the neuroscience views of reinforcement learning in the brain. What I find tremendously frustrating is the fact that people are still so dismally unaware of these issues that they come out with statement such as the one in the quote: speaking as if the idea of reward assigment was a fantastic idea, and as if the neuroscience discovery of a possible mechanism really meant anything. The neuroscience discovery was bound to collapse: I said that much of it the first time I heard of it, and I am glad that it has now happened so quickly. The depressing part is that the folks who showed it to be wrong think that they can still tinker with the mechanism and salvage something out of it. It think they do this because they haven't found a better hypothesis and have too much invested in the previous status quo. I'd be curious to know if your hypothesis for a motivation system has the potential for the same simple signal given to systems, with different histories, to cause the system to attempt to get the same signal again (addiction being the pure example of this). This is one of the important phenomenon I require a motivational system to explain. Will Pearson - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=231415user_secret=fabd7936