Re: [agi] [Science Daily] Our Unconscious Brain Makes The Best Decisions Possible

2009-01-02 Thread Jim Bromer
On Thu, Jan 1, 2009 at 3:05 PM, Jim Bromer jimbro...@gmail.com wrote:
 On Mon, Dec 29, 2008 at 4:02 PM, Richard Loosemore r...@lightlink.com wrote:
  My friend Mike Oaksford in the UK has written several
 papers giving a higher level cognitive theory that says that people are, in
 fact, doing something like bayesian estimation when then make judgments.  In
 fact, people are very good at being bayesians, contra the loud protests of
 the I Am A Bayesian Rationalist crowd, who think they were the first to do
 it.
 Richard Loosemore

 That sounds like an easy hypothesis to test.  Except for a problem.
 Previous learning would be relevant to the solving of the problems and
 would produce results that could not be totally accounted for.
 Complexity, in the complicated sense of the term, is relevant to this
 problem, both in the complexity of how previous learning that might
 influence decision making and the possible (likely) complexity of the
 process of judgment itself.

 If extensive tests showed that people overwhelmingly made judgments
 that were Bayesianesque then this conjecture would be important.  The
 problem is, that since the numerous possible influences of previous
 learning has to be ruled out, I would suspect that any test for
 Bayesian-like reasoning would have to be kept so simple that it would
 not add anything new to our knowledge.

 If judgment was that simple most of the programmers in this list would
 have really great AGI programs by now, because simple weighted
 decision making is really easy to program.  The problem occurs when
 you realize that it is just not that easy.

 I think Anderson was the first to advocate weighted decision making in
 AI and my recollection is that he was writing his theories back in the
 1970's.

 Jim Bromer

One other thing.  My interest in studies of cognitive science is how
the results of some study might be related to advanced AI, what is
called AGI in this group.  The use of weighted reasoning seems
attractive and if these kinds of methods do actually conform to some
cognitive processes then that would be a tremendous justification for
their use in AGI projects - along with other methods that would be
necessary to actually simulate or produce conceptually integrated
judgement.

But, one of the major design problems with tests that use statistical
methods to demonstrate that some cognitive function of reasoning seems
to conform with statistical processes is that since the artifacts of
the statistical method itself may obscure the results, the design of
the sample has to be called into question and the proposition
restudied using other design models capable of accounting for possible
sources of artifact error.
Jim Bromer


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com


Re: [agi] [Science Daily] Our Unconscious Brain Makes The Best Decisions Possible

2009-01-02 Thread Jim Bromer
On Fri, Jan 2, 2009 at 11:12 AM, Jim Bromer jimbro...@gmail.com wrote:

 If extensive tests showed that people overwhelmingly made judgments
 that were Bayesianesque then this conjecture would be important.  The
 problem is, that since the numerous possible influences of previous
 learning has to be ruled out, I would suspect that any test for
 Bayesian-like reasoning would have to be kept so simple that it would
 not add anything new to our knowledge.

 If judgment was that simple most of the programmers in this list would
 have really great AGI programs by now, because simple weighted
 decision making is really easy to program.  The problem occurs when
 you realize that it is just not that easy.

 I think Anderson was the first to advocate weighted decision making in
 AI and my recollection is that he was writing his theories back in the
 1970's.

 Jim Bromer

 One other thing.  My interest in studies of cognitive science is how
 the results of some study might be related to advanced AI, what is
 called AGI in this group.  The use of weighted reasoning seems
 attractive and if these kinds of methods do actually conform to some
 cognitive processes then that would be a tremendous justification for
 their use in AGI projects - along with other methods that would be
 necessary to actually simulate or produce conceptually integrated
 judgement.

 But, one of the major design problems with tests that use statistical
 methods to demonstrate that some cognitive function of reasoning seems
 to conform with statistical processes is that since the artifacts of
 the statistical method itself may obscure the results, the design of
 the sample has to be called into question and the proposition
 restudied using other design models capable of accounting for possible
 sources of artifact error.
 Jim Bromer


I did not mean to direct this criticism at any one study or any one
person.  Not only can the design of a study be questioned on the basis
of whether or not the question tends to lead to the kind of results
that the study purports to show, but the methods of the analysis can
also leave artifacts or other subtle influences on the results as
well.  This not only goes for statistical studies, but could be found
in logical studies, numerical studies, linguistic studies, image-based
studies and so on.  Ok, this isn't news but some people haven't
learned it in yet.
Jim Bromer


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Powered by Listbox: http://www.listbox.com


Re: [agi] Introducing Steve's Theory of Everything in cognition.

2009-01-02 Thread Steve Richfield
Abram,

Oh dammitall, I'm going to have to expose the vast extent of my
profound ignorance to respond. Oh well...

On 1/1/09, Abram Demski abramdem...@gmail.com wrote:

 Steve,

 Sorry for not responding for a little while. Comments follow:

 
  PCA attempts to isolate components that give maximum
  information... so my question to you becomes, do you think that the
  problem you're pointing towards is suboptimal models that don't
  predict the data well enough, or models that predict the data fine but
  aren't directly useful for what you expect them to be useful for?
 
 
  Since prediction is NOT the goal, but rather just a useful measure, I am
  only interested in recognizing
  that which can be recognized, and NOT in expending resources on
  understanding semi-random noise.
  Further, since compression is NOT my goal, I am not interested in
 combining
  features
  in ways that minimize the number of components. In short, there is a lot
 to
  be learned from PCA,
  but a perfect PCA solution is likely a less-than-perfect NN solution.

 What I am saying is this: a good predictive model will predict
 whatever is desired. Unsupervised learning attempts to find such a
 model. But, a good predictive model will probably predict lots of
 stuff we aren't particularly interested in, so supervised methods have
 been invented to predict single variables when those variables are of
 interest. Still, in principle, we could use unsupervised methods.
 Furthermore (as I understand it), if we are dealing with lots of
 variables and believe deep patterns are present, unsupervised learning
 can outperform supervised learning by grabbing onto patterns that may
 ultimately lead to the desired result, which supervised learning would
 miss because no immediate value was evident. But, anyway, my point is
 that I can only see two meanings for the word goodness:

 --usefulness in predicting the data as a whole
 --usefulness in predicting reward in particular (the real goal)


I'm still hung up on predicting, which may indeed be the best measure of
value, but AGI efforts need understanding, which is subtly different. OK, so
what is the difference?

The tree of reality has many branches in the future - there are many
possible futures. Understanding is the process of keeping track of which
branch you are on, while predicting is taking shots at which branch will
prevail. One may necessarily involve the other. Has anyone thought
this through yet?

(Actually, I can think of a third: usefulness in *getting* reward (ie,
 motor control). But, I feel adding that to the discussion would be
 premature... there are interesting issues, but they are separate from
 the ones being discussed here...)

 
  To that end... you weren't talking about using the *predictions* of
  the PCA model, but rather the principle components themselves. The
  components are essentially hidden variables to make the model run.
 
 
  ... or variables smushed together in ways that may work well for
  compression, but poorly for recognition.

 What are the variables that you keep worrying might be smushed
 together? Can you give an example?


I thought I could, but then I ran into problems as you discussed below.

If PCA smushes variables together,
 that suggests 1 of 3 things:

 --PCA found suboptimal components


Here, I am hung up on found. This implies a multitude of solutions, yet
there are guys out there who are beating on the matrix manipulations to
solve PCA. Is this like non-zero-sum game theory, where there can be many
solutions, some better than others?

--PCA found optimal components, but the hidden variables that got
 smooshed really are functionally equivalent (when looked at through
 the lens of the available visible variables)


Here, I am hung up on functionally. This presumes supervised learning or
divine observation.

--The true probabilistic situation violates the probabilistic
 assumptions behind PCA

 The third option is by far the most probable, I think.


That's where I got stuck trying to come up with an example.


  or in an attempt to complexify the model to make it more accurate in
  its predictions, by looking for links between the hidden variables, or
  patterns over time, et cetera.
 
 
  Setting predictions aside, the next layer of PCA-like neurons would be
  looking for those links.

 Absolutely.


More on my ignorance...

I and PCA hadn't really connected until a few months ago, when I attended
a computer conference and listened to several presentations. The (possibly
false, at least in some instances) impression I got was that the presenters
didn't really understand some/many of the components that they were
finding. One video compression presenter did identify the first few, but
admittedly failed to identify later components.

I can see that this process necessarily involves a tiny amount of a priori
information, specifically, knowledge of:
1.  The physical extent of features, e.g. as controlled by mutual
inhibition.
2.  The 

Re: [agi] Introducing Steve's Theory of Everything in cognition.

2009-01-02 Thread Abram Demski
Steve,

I'm thinking that you are taking understanding to mean something
like identifying the *actual* hidden variables responsible for the
pattern, and finding the *actual* state of that variable.
Probabilistic models instead *invent* hidden variables, that happen to
help explain the data. Is that about right? If so, then explaining
what I mean by functionally equivalent will help. Here is an
example: suppose that we are looking at data concerning a set of
chemical experiments. Suppose that the experimental conditions are not
very well-controlled, so that interesting hidden variables are
present. Suppose that two of these are temperature and air pressure,
but that the two have the same effect on the experiment. Then the
unsupervised learning will have no way of distinguishing between the
two, so it will only find one hidden variable representing them. So,
they are functionally equivalent.

This implies that, in the absence of further information, the best
thing we can do to try to understand the data is to
probabilistically model it.

Or perhaps when you say understanding it is short for understanding
the implications of, ie, in an already-present model. In that case,
perhaps we could separate the quality of predictions from the speed of
predictions. A complicated-but-accurate model is useless if we can't
calculate the information we need quickly enough. So, we also want an
understandable model: one that doesn't take too long to create
predictions. This would be different than looking for the best
probabilistic model in terms of prediction accuracy. On the other
hand, it is irrelevant in (practically?) all neural-network style
approaches today, because the model size is fixed anyway.

If the output is being fed to humans rather than further along the
network, as in the conference example, the situation is very
different. Human-readability becomes an issue. This paper is a good
example of an approach that creates better human-readability rather
than better performance:

http://www.stanford.edu/~hllee/nips07-sparseDBN.pdf

The altered algorithm also seems to have a performance that matches
more closely with statistical analysis of the brain (which was the
research goal), suggesting a correlation between human-readability and
actual performance gains (since the brain wouldn't do it if it were a
bad idea). In a probabilistic framework this is represented best by a
prior bias for simplicity.

--Abram

On Fri, Jan 2, 2009 at 1:36 PM, Steve Richfield
steve.richfi...@gmail.com wrote:
 Abram,

 Oh dammitall, I'm going to have to expose the vast extent of my
 profound ignorance to respond. Oh well...

 On 1/1/09, Abram Demski abramdem...@gmail.com wrote:

 Steve,

 Sorry for not responding for a little while. Comments follow:

 
  PCA attempts to isolate components that give maximum
  information... so my question to you becomes, do you think that the
  problem you're pointing towards is suboptimal models that don't
  predict the data well enough, or models that predict the data fine but
  aren't directly useful for what you expect them to be useful for?
 
 
  Since prediction is NOT the goal, but rather just a useful measure, I am
  only interested in recognizing
  that which can be recognized, and NOT in expending resources on
  understanding semi-random noise.
  Further, since compression is NOT my goal, I am not interested in
  combining
  features
  in ways that minimize the number of components. In short, there is a lot
  to
  be learned from PCA,
  but a perfect PCA solution is likely a less-than-perfect NN solution.

 What I am saying is this: a good predictive model will predict
 whatever is desired. Unsupervised learning attempts to find such a
 model. But, a good predictive model will probably predict lots of
 stuff we aren't particularly interested in, so supervised methods have
 been invented to predict single variables when those variables are of
 interest. Still, in principle, we could use unsupervised methods.
 Furthermore (as I understand it), if we are dealing with lots of
 variables and believe deep patterns are present, unsupervised learning
 can outperform supervised learning by grabbing onto patterns that may
 ultimately lead to the desired result, which supervised learning would
 miss because no immediate value was evident. But, anyway, my point is
 that I can only see two meanings for the word goodness:

 --usefulness in predicting the data as a whole
 --usefulness in predicting reward in particular (the real goal)


 I'm still hung up on predicting, which may indeed be the best measure of
 value, but AGI efforts need understanding, which is subtly different. OK, so
 what is the difference?

 The tree of reality has many branches in the future - there are many
 possible futures. Understanding is the process of keeping track of which
 branch you are on, while predicting is taking shots at which branch will
 prevail. One may necessarily involve the other. Has anyone thought
 this through yet?

Re: [agi] Introducing Steve's Theory of Everything in cognition.

2009-01-02 Thread Eric Burton
Hey. I didn't even like this thread. I'll be right back

On 1/2/09, Abram Demski abramdem...@gmail.com wrote:
 Steve,

 I'm thinking that you are taking understanding to mean something
 like identifying the *actual* hidden variables responsible for the
 pattern, and finding the *actual* state of that variable.
 Probabilistic models instead *invent* hidden variables, that happen to
 help explain the data. Is that about right? If so, then explaining
 what I mean by functionally equivalent will help. Here is an
 example: suppose that we are looking at data concerning a set of
 chemical experiments. Suppose that the experimental conditions are not
 very well-controlled, so that interesting hidden variables are
 present. Suppose that two of these are temperature and air pressure,
 but that the two have the same effect on the experiment. Then the
 unsupervised learning will have no way of distinguishing between the
 two, so it will only find one hidden variable representing them. So,
 they are functionally equivalent.

 This implies that, in the absence of further information, the best
 thing we can do to try to understand the data is to
 probabilistically model it.

 Or perhaps when you say understanding it is short for understanding
 the implications of, ie, in an already-present model. In that case,
 perhaps we could separate the quality of predictions from the speed of
 predictions. A complicated-but-accurate model is useless if we can't
 calculate the information we need quickly enough. So, we also want an
 understandable model: one that doesn't take too long to create
 predictions. This would be different than looking for the best
 probabilistic model in terms of prediction accuracy. On the other
 hand, it is irrelevant in (practically?) all neural-network style
 approaches today, because the model size is fixed anyway.

 If the output is being fed to humans rather than further along the
 network, as in the conference example, the situation is very
 different. Human-readability becomes an issue. This paper is a good
 example of an approach that creates better human-readability rather
 than better performance:

 http://www.stanford.edu/~hllee/nips07-sparseDBN.pdf

 The altered algorithm also seems to have a performance that matches
 more closely with statistical analysis of the brain (which was the
 research goal), suggesting a correlation between human-readability and
 actual performance gains (since the brain wouldn't do it if it were a
 bad idea). In a probabilistic framework this is represented best by a
 prior bias for simplicity.

 --Abram

 On Fri, Jan 2, 2009 at 1:36 PM, Steve Richfield
 steve.richfi...@gmail.com wrote:
 Abram,

 Oh dammitall, I'm going to have to expose the vast extent of my
 profound ignorance to respond. Oh well...

 On 1/1/09, Abram Demski abramdem...@gmail.com wrote:

 Steve,

 Sorry for not responding for a little while. Comments follow:

 
  PCA attempts to isolate components that give maximum
  information... so my question to you becomes, do you think that the
  problem you're pointing towards is suboptimal models that don't
  predict the data well enough, or models that predict the data fine but
  aren't directly useful for what you expect them to be useful for?
 
 
  Since prediction is NOT the goal, but rather just a useful measure, I
  am
  only interested in recognizing
  that which can be recognized, and NOT in expending resources on
  understanding semi-random noise.
  Further, since compression is NOT my goal, I am not interested in
  combining
  features
  in ways that minimize the number of components. In short, there is a
  lot
  to
  be learned from PCA,
  but a perfect PCA solution is likely a less-than-perfect NN solution.

 What I am saying is this: a good predictive model will predict
 whatever is desired. Unsupervised learning attempts to find such a
 model. But, a good predictive model will probably predict lots of
 stuff we aren't particularly interested in, so supervised methods have
 been invented to predict single variables when those variables are of
 interest. Still, in principle, we could use unsupervised methods.
 Furthermore (as I understand it), if we are dealing with lots of
 variables and believe deep patterns are present, unsupervised learning
 can outperform supervised learning by grabbing onto patterns that may
 ultimately lead to the desired result, which supervised learning would
 miss because no immediate value was evident. But, anyway, my point is
 that I can only see two meanings for the word goodness:

 --usefulness in predicting the data as a whole
 --usefulness in predicting reward in particular (the real goal)


 I'm still hung up on predicting, which may indeed be the best measure of
 value, but AGI efforts need understanding, which is subtly different. OK,
 so
 what is the difference?

 The tree of reality has many branches in the future - there are many
 possible futures. Understanding is the process of keeping track of which