Re: [agi] [Science Daily] Our Unconscious Brain Makes The Best Decisions Possible
On Thu, Jan 1, 2009 at 3:05 PM, Jim Bromer jimbro...@gmail.com wrote: On Mon, Dec 29, 2008 at 4:02 PM, Richard Loosemore r...@lightlink.com wrote: My friend Mike Oaksford in the UK has written several papers giving a higher level cognitive theory that says that people are, in fact, doing something like bayesian estimation when then make judgments. In fact, people are very good at being bayesians, contra the loud protests of the I Am A Bayesian Rationalist crowd, who think they were the first to do it. Richard Loosemore That sounds like an easy hypothesis to test. Except for a problem. Previous learning would be relevant to the solving of the problems and would produce results that could not be totally accounted for. Complexity, in the complicated sense of the term, is relevant to this problem, both in the complexity of how previous learning that might influence decision making and the possible (likely) complexity of the process of judgment itself. If extensive tests showed that people overwhelmingly made judgments that were Bayesianesque then this conjecture would be important. The problem is, that since the numerous possible influences of previous learning has to be ruled out, I would suspect that any test for Bayesian-like reasoning would have to be kept so simple that it would not add anything new to our knowledge. If judgment was that simple most of the programmers in this list would have really great AGI programs by now, because simple weighted decision making is really easy to program. The problem occurs when you realize that it is just not that easy. I think Anderson was the first to advocate weighted decision making in AI and my recollection is that he was writing his theories back in the 1970's. Jim Bromer One other thing. My interest in studies of cognitive science is how the results of some study might be related to advanced AI, what is called AGI in this group. The use of weighted reasoning seems attractive and if these kinds of methods do actually conform to some cognitive processes then that would be a tremendous justification for their use in AGI projects - along with other methods that would be necessary to actually simulate or produce conceptually integrated judgement. But, one of the major design problems with tests that use statistical methods to demonstrate that some cognitive function of reasoning seems to conform with statistical processes is that since the artifacts of the statistical method itself may obscure the results, the design of the sample has to be called into question and the proposition restudied using other design models capable of accounting for possible sources of artifact error. Jim Bromer --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] [Science Daily] Our Unconscious Brain Makes The Best Decisions Possible
On Fri, Jan 2, 2009 at 11:12 AM, Jim Bromer jimbro...@gmail.com wrote: If extensive tests showed that people overwhelmingly made judgments that were Bayesianesque then this conjecture would be important. The problem is, that since the numerous possible influences of previous learning has to be ruled out, I would suspect that any test for Bayesian-like reasoning would have to be kept so simple that it would not add anything new to our knowledge. If judgment was that simple most of the programmers in this list would have really great AGI programs by now, because simple weighted decision making is really easy to program. The problem occurs when you realize that it is just not that easy. I think Anderson was the first to advocate weighted decision making in AI and my recollection is that he was writing his theories back in the 1970's. Jim Bromer One other thing. My interest in studies of cognitive science is how the results of some study might be related to advanced AI, what is called AGI in this group. The use of weighted reasoning seems attractive and if these kinds of methods do actually conform to some cognitive processes then that would be a tremendous justification for their use in AGI projects - along with other methods that would be necessary to actually simulate or produce conceptually integrated judgement. But, one of the major design problems with tests that use statistical methods to demonstrate that some cognitive function of reasoning seems to conform with statistical processes is that since the artifacts of the statistical method itself may obscure the results, the design of the sample has to be called into question and the proposition restudied using other design models capable of accounting for possible sources of artifact error. Jim Bromer I did not mean to direct this criticism at any one study or any one person. Not only can the design of a study be questioned on the basis of whether or not the question tends to lead to the kind of results that the study purports to show, but the methods of the analysis can also leave artifacts or other subtle influences on the results as well. This not only goes for statistical studies, but could be found in logical studies, numerical studies, linguistic studies, image-based studies and so on. Ok, this isn't news but some people haven't learned it in yet. Jim Bromer --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Introducing Steve's Theory of Everything in cognition.
Abram, Oh dammitall, I'm going to have to expose the vast extent of my profound ignorance to respond. Oh well... On 1/1/09, Abram Demski abramdem...@gmail.com wrote: Steve, Sorry for not responding for a little while. Comments follow: PCA attempts to isolate components that give maximum information... so my question to you becomes, do you think that the problem you're pointing towards is suboptimal models that don't predict the data well enough, or models that predict the data fine but aren't directly useful for what you expect them to be useful for? Since prediction is NOT the goal, but rather just a useful measure, I am only interested in recognizing that which can be recognized, and NOT in expending resources on understanding semi-random noise. Further, since compression is NOT my goal, I am not interested in combining features in ways that minimize the number of components. In short, there is a lot to be learned from PCA, but a perfect PCA solution is likely a less-than-perfect NN solution. What I am saying is this: a good predictive model will predict whatever is desired. Unsupervised learning attempts to find such a model. But, a good predictive model will probably predict lots of stuff we aren't particularly interested in, so supervised methods have been invented to predict single variables when those variables are of interest. Still, in principle, we could use unsupervised methods. Furthermore (as I understand it), if we are dealing with lots of variables and believe deep patterns are present, unsupervised learning can outperform supervised learning by grabbing onto patterns that may ultimately lead to the desired result, which supervised learning would miss because no immediate value was evident. But, anyway, my point is that I can only see two meanings for the word goodness: --usefulness in predicting the data as a whole --usefulness in predicting reward in particular (the real goal) I'm still hung up on predicting, which may indeed be the best measure of value, but AGI efforts need understanding, which is subtly different. OK, so what is the difference? The tree of reality has many branches in the future - there are many possible futures. Understanding is the process of keeping track of which branch you are on, while predicting is taking shots at which branch will prevail. One may necessarily involve the other. Has anyone thought this through yet? (Actually, I can think of a third: usefulness in *getting* reward (ie, motor control). But, I feel adding that to the discussion would be premature... there are interesting issues, but they are separate from the ones being discussed here...) To that end... you weren't talking about using the *predictions* of the PCA model, but rather the principle components themselves. The components are essentially hidden variables to make the model run. ... or variables smushed together in ways that may work well for compression, but poorly for recognition. What are the variables that you keep worrying might be smushed together? Can you give an example? I thought I could, but then I ran into problems as you discussed below. If PCA smushes variables together, that suggests 1 of 3 things: --PCA found suboptimal components Here, I am hung up on found. This implies a multitude of solutions, yet there are guys out there who are beating on the matrix manipulations to solve PCA. Is this like non-zero-sum game theory, where there can be many solutions, some better than others? --PCA found optimal components, but the hidden variables that got smooshed really are functionally equivalent (when looked at through the lens of the available visible variables) Here, I am hung up on functionally. This presumes supervised learning or divine observation. --The true probabilistic situation violates the probabilistic assumptions behind PCA The third option is by far the most probable, I think. That's where I got stuck trying to come up with an example. or in an attempt to complexify the model to make it more accurate in its predictions, by looking for links between the hidden variables, or patterns over time, et cetera. Setting predictions aside, the next layer of PCA-like neurons would be looking for those links. Absolutely. More on my ignorance... I and PCA hadn't really connected until a few months ago, when I attended a computer conference and listened to several presentations. The (possibly false, at least in some instances) impression I got was that the presenters didn't really understand some/many of the components that they were finding. One video compression presenter did identify the first few, but admittedly failed to identify later components. I can see that this process necessarily involves a tiny amount of a priori information, specifically, knowledge of: 1. The physical extent of features, e.g. as controlled by mutual inhibition. 2. The
Re: [agi] Introducing Steve's Theory of Everything in cognition.
Steve, I'm thinking that you are taking understanding to mean something like identifying the *actual* hidden variables responsible for the pattern, and finding the *actual* state of that variable. Probabilistic models instead *invent* hidden variables, that happen to help explain the data. Is that about right? If so, then explaining what I mean by functionally equivalent will help. Here is an example: suppose that we are looking at data concerning a set of chemical experiments. Suppose that the experimental conditions are not very well-controlled, so that interesting hidden variables are present. Suppose that two of these are temperature and air pressure, but that the two have the same effect on the experiment. Then the unsupervised learning will have no way of distinguishing between the two, so it will only find one hidden variable representing them. So, they are functionally equivalent. This implies that, in the absence of further information, the best thing we can do to try to understand the data is to probabilistically model it. Or perhaps when you say understanding it is short for understanding the implications of, ie, in an already-present model. In that case, perhaps we could separate the quality of predictions from the speed of predictions. A complicated-but-accurate model is useless if we can't calculate the information we need quickly enough. So, we also want an understandable model: one that doesn't take too long to create predictions. This would be different than looking for the best probabilistic model in terms of prediction accuracy. On the other hand, it is irrelevant in (practically?) all neural-network style approaches today, because the model size is fixed anyway. If the output is being fed to humans rather than further along the network, as in the conference example, the situation is very different. Human-readability becomes an issue. This paper is a good example of an approach that creates better human-readability rather than better performance: http://www.stanford.edu/~hllee/nips07-sparseDBN.pdf The altered algorithm also seems to have a performance that matches more closely with statistical analysis of the brain (which was the research goal), suggesting a correlation between human-readability and actual performance gains (since the brain wouldn't do it if it were a bad idea). In a probabilistic framework this is represented best by a prior bias for simplicity. --Abram On Fri, Jan 2, 2009 at 1:36 PM, Steve Richfield steve.richfi...@gmail.com wrote: Abram, Oh dammitall, I'm going to have to expose the vast extent of my profound ignorance to respond. Oh well... On 1/1/09, Abram Demski abramdem...@gmail.com wrote: Steve, Sorry for not responding for a little while. Comments follow: PCA attempts to isolate components that give maximum information... so my question to you becomes, do you think that the problem you're pointing towards is suboptimal models that don't predict the data well enough, or models that predict the data fine but aren't directly useful for what you expect them to be useful for? Since prediction is NOT the goal, but rather just a useful measure, I am only interested in recognizing that which can be recognized, and NOT in expending resources on understanding semi-random noise. Further, since compression is NOT my goal, I am not interested in combining features in ways that minimize the number of components. In short, there is a lot to be learned from PCA, but a perfect PCA solution is likely a less-than-perfect NN solution. What I am saying is this: a good predictive model will predict whatever is desired. Unsupervised learning attempts to find such a model. But, a good predictive model will probably predict lots of stuff we aren't particularly interested in, so supervised methods have been invented to predict single variables when those variables are of interest. Still, in principle, we could use unsupervised methods. Furthermore (as I understand it), if we are dealing with lots of variables and believe deep patterns are present, unsupervised learning can outperform supervised learning by grabbing onto patterns that may ultimately lead to the desired result, which supervised learning would miss because no immediate value was evident. But, anyway, my point is that I can only see two meanings for the word goodness: --usefulness in predicting the data as a whole --usefulness in predicting reward in particular (the real goal) I'm still hung up on predicting, which may indeed be the best measure of value, but AGI efforts need understanding, which is subtly different. OK, so what is the difference? The tree of reality has many branches in the future - there are many possible futures. Understanding is the process of keeping track of which branch you are on, while predicting is taking shots at which branch will prevail. One may necessarily involve the other. Has anyone thought this through yet?
Re: [agi] Introducing Steve's Theory of Everything in cognition.
Hey. I didn't even like this thread. I'll be right back On 1/2/09, Abram Demski abramdem...@gmail.com wrote: Steve, I'm thinking that you are taking understanding to mean something like identifying the *actual* hidden variables responsible for the pattern, and finding the *actual* state of that variable. Probabilistic models instead *invent* hidden variables, that happen to help explain the data. Is that about right? If so, then explaining what I mean by functionally equivalent will help. Here is an example: suppose that we are looking at data concerning a set of chemical experiments. Suppose that the experimental conditions are not very well-controlled, so that interesting hidden variables are present. Suppose that two of these are temperature and air pressure, but that the two have the same effect on the experiment. Then the unsupervised learning will have no way of distinguishing between the two, so it will only find one hidden variable representing them. So, they are functionally equivalent. This implies that, in the absence of further information, the best thing we can do to try to understand the data is to probabilistically model it. Or perhaps when you say understanding it is short for understanding the implications of, ie, in an already-present model. In that case, perhaps we could separate the quality of predictions from the speed of predictions. A complicated-but-accurate model is useless if we can't calculate the information we need quickly enough. So, we also want an understandable model: one that doesn't take too long to create predictions. This would be different than looking for the best probabilistic model in terms of prediction accuracy. On the other hand, it is irrelevant in (practically?) all neural-network style approaches today, because the model size is fixed anyway. If the output is being fed to humans rather than further along the network, as in the conference example, the situation is very different. Human-readability becomes an issue. This paper is a good example of an approach that creates better human-readability rather than better performance: http://www.stanford.edu/~hllee/nips07-sparseDBN.pdf The altered algorithm also seems to have a performance that matches more closely with statistical analysis of the brain (which was the research goal), suggesting a correlation between human-readability and actual performance gains (since the brain wouldn't do it if it were a bad idea). In a probabilistic framework this is represented best by a prior bias for simplicity. --Abram On Fri, Jan 2, 2009 at 1:36 PM, Steve Richfield steve.richfi...@gmail.com wrote: Abram, Oh dammitall, I'm going to have to expose the vast extent of my profound ignorance to respond. Oh well... On 1/1/09, Abram Demski abramdem...@gmail.com wrote: Steve, Sorry for not responding for a little while. Comments follow: PCA attempts to isolate components that give maximum information... so my question to you becomes, do you think that the problem you're pointing towards is suboptimal models that don't predict the data well enough, or models that predict the data fine but aren't directly useful for what you expect them to be useful for? Since prediction is NOT the goal, but rather just a useful measure, I am only interested in recognizing that which can be recognized, and NOT in expending resources on understanding semi-random noise. Further, since compression is NOT my goal, I am not interested in combining features in ways that minimize the number of components. In short, there is a lot to be learned from PCA, but a perfect PCA solution is likely a less-than-perfect NN solution. What I am saying is this: a good predictive model will predict whatever is desired. Unsupervised learning attempts to find such a model. But, a good predictive model will probably predict lots of stuff we aren't particularly interested in, so supervised methods have been invented to predict single variables when those variables are of interest. Still, in principle, we could use unsupervised methods. Furthermore (as I understand it), if we are dealing with lots of variables and believe deep patterns are present, unsupervised learning can outperform supervised learning by grabbing onto patterns that may ultimately lead to the desired result, which supervised learning would miss because no immediate value was evident. But, anyway, my point is that I can only see two meanings for the word goodness: --usefulness in predicting the data as a whole --usefulness in predicting reward in particular (the real goal) I'm still hung up on predicting, which may indeed be the best measure of value, but AGI efforts need understanding, which is subtly different. OK, so what is the difference? The tree of reality has many branches in the future - there are many possible futures. Understanding is the process of keeping track of which