Re: [agi] Computing's coming Theory of Everything

2008-07-24 Thread Steve Richfield
Abram,

On 7/23/08, Abram Demski <[EMAIL PROTECTED]> wrote:
>
> The Wikipedia article on PCA cites papers that show K-means clustering
> and PCA to be in a certain sense equivalent-- from what I read so far,
> the idea is that clustering is simply extracting discrete versions of
> the continuous variables that PCA extracts.
>
>
> http://en.wikipedia.org/wiki/Principal_component_analysis#Relation_to_K-means_clustering
>
> Does that settle it?


Sorry for the delay, but I have been working on a response to this.

1.  That the article that makes this point was first presented in 2004 makes
my original point that despite the advanced age of PCA, that there are some
really new and exciting advances taking place, not withstanding the many
comments here to the contrary, apparently by people who have NOT been
keeping up on this re-emerging field.

2.  Both clustering and PCA methods presume that data is collected,
analyzed, and some sort of decision is made. However, unless the
neurobiological and CS worlds have missed something really important (it
sure wouldn't be the first time), neurons probably do this incrementally,
though there ARE other viable possibilities, e.g. that groups of neurons
could work together to develop the principle component transformations or
something similar, during which time their output would indicate nothing of
value. If the underlying incremental presumption is true (wouldn't sound
math suggest it to be false?!), then a sort of "derivative" with respect to
time of PCA or K-means must be developed, to show how neurons should change
second-by-second as they learn. More fun with matrix algebra.

3.  OK, I seem to be right back where I started - still looking for someone
who lives and breathes matrix math, and yet still speaks enough English for
us mere mortals to be able to communicate with. The problem (as I see it) is
one of perverse notation and/or orthogonal complexity, where people
manipulate matrix operators without really relating to what is happening
underneath. Only this way could K-means remain separate from PCA for so
long. Where is the idiot savant who could transform this field?

So, in answer to your question, no, it is NOT settled, though this may move
it to the next chapter in its development. Perhaps my predicted coming CS
"Theory of Everything" needs a more descriptive title, but I still see it
just sitting there waiting for some matrix math gurus to unravel and publish
(in plain English) the methods to converge the now disparate fields of AI,
NN, Compression, and Encryption.

Steve Richfield



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com


Re: [agi] Computing's coming Theory of Everything

2008-07-23 Thread Abram Demski
The Wikipedia article on PCA cites papers that show K-means clustering
and PCA to be in a certain sense equivalent-- from what I read so far,
the idea is that clustering is simply extracting discrete versions of
the continuous variables that PCA extracts.

http://en.wikipedia.org/wiki/Principal_component_analysis#Relation_to_K-means_clustering

Does that settle it?

On Wed, Jul 23, 2008 at 2:21 AM, Steve Richfield
<[EMAIL PROTECTED]> wrote:
> Ben,
>
> On 7/22/08, Benjamin Johnston <[EMAIL PROTECTED]> wrote:
>>>
>>> /Restating (not copying) my original posting, the challenge of effective
>>> unstructured learning is to utilize every clue and NOT just go with static
>>> clusters, etc. This includes temporal as well as positional clues,
>>> information content, etc. PCA does some but certainly not all of this, but
>>> considering that we were talking about clustering here just a couple of
>>> weeks ago, ratcheting up to PCA seems to be at least a step out of the
>>> basement./
>>
>> You should actually try PCA on real data before getting too excited about
>> it.
>
>
> Why, as I have already conceded that virgin PCA isn't a solution? I would
> expect it to fail in expected ways until it is repaired/recreated to address
> known shortcomings, e.g. that it works on linear luminosity rather than
> logarithmic luminosity. In short, I am not ready for data yet - until I am
> first tentatively happy with the math.
>
>>
>> Clustering and dimension reduction are related, but they are different and
>> equally valid techniques designed for different purposes.
>
>
> Perhaps you missed the discussion a couple of weeks ago, where I listed some
> of the UNstated assumptions in clustering that are typically NOT met in the
> real world, e.g.:
> 1.  It presumes that cluster exist, whether or not they actually do.
> 2.  It is unable to deal with data that has wildly different importance.
> 3.  Corollary to 2 above, any random input completely trashes it.
> 4.  It is designed for neurons/quantities where intermediate values have
> special significance, rather than for fuzzy indicators that are just midway
> between TRUE and FALSE. This might be interesting for stock market analysis,
> but has no (that I know of) parallel in our own neurons.
>
>>
>> It is absurd to say that one is "ratcheting up" from the other.
>
>
> I agree that they do VERY different jobs, but I assert that the one that
> clustering does has nothing to do with NN, AGI, or most of the rest of the
> real world. I short, I am listening and carefully considering all arguments
> here, but in this case, I am still standing behind my "ratcheting up"
> statement, at least until I hear a better challenge to it.
>
> Steve Richfield
>
> 
> agi | Archives | Modify Your Subscription


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com


Re: [agi] Computing's coming Theory of Everything

2008-07-23 Thread Abram Demski
This is getting long in embedded-reply format, but oh well

On Wed, Jul 23, 2008 at 12:24 PM, Steve Richfield
<[EMAIL PROTECTED]> wrote:
> Abram,
>
> On 7/23/08, Abram Demski <[EMAIL PROTECTED]> wrote:
>>
>> Replying in reverse order
>>
>> > Story: I once viewed being able to invert the Airy Disk transform (what
>> > makes a blur from a point of light in a microscope or telescope) as an
>> > EXTREMELY valuable thing to do to greatly increase their power, so I set
>> > about finding a transform function. Then, I wrote a program to test it,
>> > first making an Airy Disk blur and then transforming it back to the
>> > original
>> > point. It sorta worked, but there was lots of computational noise in the
>> > result, so I switched to double precision, whereupon it failed to work
>> > at
>> > all. After LOTS more work, I finally figured out that the Airy Disk
>> > function
>> > was a perfect spacial low pass filter, so that two points that were too
>> > close to be resolved as separate points made EXACTLY the same perfectly
>> > circular pattern as did a single point of the same total brightness. In
>> > single precision, I was inverting the computational noise, and doing a
>> > pretty good job of it. However, for about a month, I thought that I had
>> > changed the world.
>>
>> Neat. I have a professor who is doing some stuff with a similar
>> transform, but with a circle (/sphere) rather than a disc (/ball).
>
>
> The "Airy Disk" is the name of the transform. In fact, it is the central
> maxima surrounded by faint rings of rapidly diminishing brightness typical
> of what a star produces. Note that you can cut the radius of the first
> minima to ~2/3 by stopping out all but a peripheral ring on the lens, which
> significantly increases the resolution - a well known trick among
> experienced astronomers, but completely missed by the Hubble team! Just
> stopping out the middle of their mirror would make it equivalent to half
> again its present diameter, though its light-gathering ability would be
> greatly reduced. Of course, this could easily be switched in and out just as
> they are already switching other optical systems in and out.
>
> Can you tell me a little more about what your professor is doing?

He came up with a fast way of doing the transform, which allows him to
quickly identify points that have spherical shapes around them (of a
given radius). He does the transform for a few different
radius-values, so he detects spheres of different sizes, and then he
uses the resulting information to help classify points. An example
application would be picking out important structures in X-ray images
or CAT scans: train the system on points that doctors pick out, then
use it to pick out points in a new image. Spheres may not be the best
feature to use, but they work, and since his algorithm allows them to
be calculated extremely quickly, it becomes a good choice.

> Imagine a layer where the inputs represent probabilities of situations in
> the real world, and the present layer must recognize combinations that are
> important. This would seem to require ANDing (multiplication) rather than
> simple linear addition. However, if we first take the logarithms of the
> incoming probabilities, simple addition produces ANDed probabilities.
>
> OK, so lets make this a little more complicated by specifying that some of
> those inputs are correlated, and hence should receive reduced weighting. We
> can compute the weighted geometric mean of a group of inputs by simply
> multiplying each by its weight (synaptic efficacy), and adding the results
> together. Of course, the sum of these efficacies would be 1.0.

If I understand, what you are saying is that linear dependencies might
be squeezed out, but some nonlinear dependencies might become linear
for various reasons, including purposefully applying nonlinear
functions (log, sigmoid...) to the resulting variables.

It seems there are some standard ways of introducing nonlinearity:
http://en.wikipedia.org/wiki/Kernel_principal_component_analysis

On a related note, the standard classifier my professor applied to the
sphere-data worked by taking the data to a higher-dimensional space
that made nonlinear dependencies linear. It then found a plane that
cut between "yes" points and "no" points.



> Agreed. Nonlinearities, time information, scope, memory, etc. BTW, have you
> looked at asynchronous logic -  where they have MEMORY elements sprinkled in
> with the logic?! Why? Because they look for some indication of a subsequent
> event, e.g. inputs going to FALSE, before re-evaluating the inputs. This is
> akin to pipelining - which OF COURSE you would expect in highly parallel
> systems like us. Asynchronous logic has many of the same design issues as
> our own brains, and some REALLY counter-intuitive techniques have been
> developed, like 2-wire logic, where TRUE and FALSE are transmitted on two
> different wires to eliminate the need for synchronicity. There are seve

Re: [agi] Computing's coming Theory of Everything

2008-07-23 Thread Abram Demski
Replying in reverse order

> Story: I once viewed being able to invert the Airy Disk transform (what
> makes a blur from a point of light in a microscope or telescope) as an
> EXTREMELY valuable thing to do to greatly increase their power, so I set
> about finding a transform function. Then, I wrote a program to test it,
> first making an Airy Disk blur and then transforming it back to the original
> point. It sorta worked, but there was lots of computational noise in the
> result, so I switched to double precision, whereupon it failed to work at
> all. After LOTS more work, I finally figured out that the Airy Disk function
> was a perfect spacial low pass filter, so that two points that were too
> close to be resolved as separate points made EXACTLY the same perfectly
> circular pattern as did a single point of the same total brightness. In
> single precision, I was inverting the computational noise, and doing a
> pretty good job of it. However, for about a month, I thought that I had
> changed the world.

Neat. I have a professor who is doing some stuff with a similar
transform, but with a circle (/sphere) rather than a disc (/ball). I
thought it was information-preserving? Wouldn't two dots make
something of an oval?

>> Yet, if we take
>> the multilevel approach, the 2nd level will be trying to take
>> advantage of dependencies in those variables...
>
>
> Probably not linear dependencies because these should have been wrung out in
> the previous level. Hopefully, the next layer would look at time sequencing,
> various combinations, etc.

Well, since I am not thinking of the algorithm as-is, I assumed that
it would be finding more than just linear dependencies. And if each
layer was linear, then wouldn't it still fail for the same reason?
(Because it would be looking for linear dependencies in variables that
are linearly independent, just as I had argued that it would be
looking for nonlinear dependence in nonlinearly dependent variables?)
In other words, the successive layers would need to be actually
different from eachother (perhaps adding in time-information as you
suggested) to do anything useful. So again what we are looking for is
a useful division of the task into subtasks.

>> Hmm... the algorithm for a single level would need to "subtract" the
>> information encoded in the new variable each time, so that the next
>> iteration is working with only the still-unexplained properties of the
>> data.
>
>
> (Taking another puff) Unfortunately, PCA methods produce amplitude
> information but not phase information. This is a little like indefinite
> integration, where you know what is there, but not enough to recreate it.
>
> Further, maximum information channels would seem to be naturally orthogonal,
> so subtracting, even if it were possible, is probably unnecessary.

Yep, this is my point, I was just saying it a different way. Since
maximum information channels should be orthogonal, the algorithm needs
to do *something* like subtracting. (For example, if we are
compressing a bunch of points that nearly fall on a line, we should
first extract a variable telling us where on the line. We should then
remove that dimension from the data, so that we've got just a patch of
fuzz. Any significant variables in the fuzz will be independent of
line-location, because if they were not we would have caught them on
the first extraction. So then we extract the remaining variables from
this fuzz rather than the original data.)

>> It is not even capable of
>> representing context-free patterns (for example, pictures of
>> fractals).
>
>
> Can people do this?

Yes, yes absolutely. Not in the visual cortex maybe, at least not in
the "lower" regions, but people can see the pattern at some level. I
can prove this by drawing the sierpinski triangle for you.

The issue is which "invariant transforms" are supported by the system.
For example, the unaltered algorithm might not support
location-invariance in a picture, so people might add "eye-movements"
to the algorithm, making it slide around taking many sub-picture
samples. Next, people might want size-invariance, then
rotation-invariance. These three together might seem to cover
everything, but they do not. First, we've thrown out possibly useful
information along the way; people can ignore size sometimes, but it is
sometimes important, and even more so for rotation and location.
Second, more complicated types of invariance can be learned; there is
really an infinite variety. This is why relational methods are
necessary: they can see things from the beginning as both in a
particular location, and as being in a relationship to surroundings
that is location-independent. The same holds for size if we add the
proper formulas.  (Hmm... I admit that current relational methods
can't so easily account for rotation invariance... it would be
possible but very expensive...)

>> Such systems might produce some good results, but the formalism cannot
>> represent complex relational

Re: [agi] Computing's coming Theory of Everything

2008-07-22 Thread Steve Richfield
Ben,

On 7/22/08, Benjamin Johnston <[EMAIL PROTECTED]> wrote:

>
> /Restating (not copying) my original posting, the challenge of effective
>> unstructured learning is to utilize every clue and NOT just go with static
>> clusters, etc. This includes temporal as well as positional clues,
>> information content, etc. PCA does some but certainly not all of this, but
>> considering that we were talking about clustering here just a couple of
>> weeks ago, ratcheting up to PCA seems to be at least a step out of the
>> basement./
>>
>
> You should actually try PCA on real data before getting too excited about
> it.


Why, as I have already conceded that virgin PCA isn't a solution? I would
expect it to fail in expected ways until it is repaired/recreated to address
known shortcomings, e.g. that it works on linear luminosity rather than
logarithmic luminosity. In short, I am not ready for data yet - until I am
first tentatively happy with the math.



> Clustering and dimension reduction are related, but they are different and
> equally valid techniques designed for different purposes.


Perhaps you missed the discussion a couple of weeks ago, where I listed some
of the UNstated assumptions in clustering that are typically NOT met in the
real world, e.g.:
1.  It presumes that cluster exist, whether or not they actually do.
2.  It is unable to deal with data that has wildly different importance.
3.  Corollary to 2 above, any random input completely trashes it.
4.  It is designed for neurons/quantities where intermediate values have
special significance, rather than for fuzzy indicators that are just midway
between TRUE and FALSE. This might be interesting for stock market analysis,
but has no (that I know of) parallel in our own neurons.



> It is absurd to say that one is "ratcheting up" from the other.


I agree that they do VERY different jobs, but I assert that the one that
clustering does has nothing to do with NN, AGI, or most of the rest of the
real world. I short, I am listening and carefully considering all arguments
here, but in this case, I am still standing behind my "ratcheting up"
statement, at least until I hear a better challenge to it.

Steve Richfield



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com


Re: [agi] Computing's coming Theory of Everything

2008-07-22 Thread Benjamin Johnston


/Restating (not copying) my original posting, the challenge of 
effective unstructured learning is to utilize every clue and NOT just 
go with static clusters, etc. This includes temporal as well as 
positional clues, information content, etc. PCA does some but 
certainly not all of this, but considering that we were talking about 
clustering here just a couple of weeks ago, ratcheting up to PCA seems 
to be at least a step out of the basement./



You should actually try PCA on real data before getting too excited 
about it.


Clustering and dimension reduction are related, but they are different 
and equally valid techniques designed for different purposes. It is 
absurd to say that one is "ratcheting up" from the other.


-Ben



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com


Re: [agi] Computing's coming Theory of Everything

2008-07-22 Thread Steve Richfield
Derek,

On 7/22/08, Derek Zahn <[EMAIL PROTECTED]> wrote:
>
>
> >  Remembering that absolutely ANY function can be performed by
> > passing the inputs through a suitable non-linearity, adding them
> > up, and running the results through another suitable non-linearity,
> > it isn't clear what the limitations of "linear" operations are
>
> You might be interested in "kernel PCA"
> http://en.wikipedia.org/wiki/Kernel_principal_component_analysis
>

Which brings to mind other issues, e.g. the need to work with the logarithms
of intensities to make the same relative differences have the same impact on
the result. Then, what happens to the units when you logarithmnitize the
data?

I suspect that what I REALLY need is to find someone who has been twiddling
with PCA-related methods for the last 20 years and find out what sorts of
things work and what doesn't.

>
> Also, once you start looking beyond pure PCA the ideas begin to blur with
> the "clustering" you abhor, with things like kohonen networks, k-means
> clustering, etc.  I'm not a huge expert on these topics although I do think
> that dimensionality reduction (for generalization/categorization if nothing
> else) must be an important piece of the puzzle.  These methods including PCA
> are all mainstream in machine learning.
>

You seem to be saying much the same things as Richard. OK, let me say it
back in different words to see if I got it: There are enough smart people
working on this that if there were a nice solution, then it would have been
found long ago. Some not-so-nice approaches have been tried but they didn't
seem to produce anything valuable.

If I got the above correct, then I suspect that everyone subscribes to some
common falacy, e.g. if they disclaimed the falacy in a paper, then it would
never get published. The falacy would be found among the solid beliefs in
the field. I wonder what these are? Hmmm...

>
> > Did you see anything there that was not biologically plausible?
>
> The fact that feature maps in the early visual system don't actually seem
> to detect the principal components as found in the methods of that paper.
> Instead, they appear to detect things that the principal components can
> also be usefully combined to represent (which are just the obvious features
> of a segmented visual field).
>
> >> For a much more detailed, capable, and perhaps more neurally
> >> plausible model of similar stuff, the work of Risto Miikkulainen's
> >> group is a lot of fun.
> >
> > Do you have a hyperlink?
>
> The book I'm thinking of is _Computational Maps in the Visual Cortex_.  see
> http://computationalmaps.org which has enough material to get the idea.
>

It must be REALLY good to command a 3-digit price for even a used copy.

Steve Richfield



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com


Re: [agi] Computing's coming Theory of Everything

2008-07-22 Thread Steve Richfield
Abram,

All good points. Detailed comments follow. First I must take a LONG drag,
because I must now blow a lot of smoke...

On 7/22/08, Abram Demski <[EMAIL PROTECTED]> wrote:
>
> On Tue, Jul 22, 2008 at 4:29 PM, Steve Richfield
> <[EMAIL PROTECTED]> wrote:
> > Abram,
> >
> > On 7/22/08, Abram Demski <[EMAIL PROTECTED]> wrote:
> >>
> >> From the paper you posted, and from wikipedia articles, the current
> >> meaning of PCA is very different from your generalized version. I
> >> doubt the current algorithms would even metaphorically apply...
> >
> >
> > Just more input points that are time-displaced from the present points,
> or
> > alternatively in simple cases, compute with the derivative of the inputs
> > rather than with their static value.
>
> Such systems might produce some good results, but the formalism cannot
> represent complex relational ideas.


All you need is a model, any model, capable of representing reality and its
complex relationships. I would think that simple cause-and-effect might
suffice, where events cause other events, that in turn cause still other
events. With a neuron or PCA coordinate for each prospective event, I could
see things coming together. The uninhibited neurons (or PAC coordinates) in
the last layer would be the possible present courses of action. Stimulate
them all and the best will inhibit the rest, and the best course of action
will take place.

It is not even capable of
> representing context-free patterns (for example, pictures of
> fractals).


Can people do this?

Of course, I'm referring to PCA "as it is", not "as it
> could be".
>
> >>
> >> Also, what would "multiple layers" mean in the generalized version?
> >
> >
> > Performing the PC-like analysis on the principal components derived in a
> > preceding PC-like analysis.
>
> If this worked, it would be another way of trying to break up the task
> into subtasks. It might help, I admit. It has an intuitive feel; it
> fits the idea of there being levels of processing in the brain. But if
> it helps, why?


Maybe we are just large data reduction engines?

What clean subtask-division is it relying on?


As I have pointed out here many times before, we are MUCH shorter on
knowledge of reality than we are on CS technology. With this approach, we
might build AGIs without even knowing how they work.

The idea
> of iteratively compressing data by looking for the highest-information
> variable repeatedly makes sense to me, it is a clear subgoal. But what
> is the subgoal here?
>
> Hmm... the algorithm for a single level would need to "subtract" the
> information encoded in the new variable each time, so that the next
> iteration is working with only the still-unexplained properties of the
> data.


(Taking another puff) Unfortunately, PCA methods produce amplitude
information but not phase information. This is a little like indefinite
integration, where you know what is there, but not enough to recreate it.

Further, maximum information channels would seem to be naturally orthogonal,
so subtracting, even if it were possible, is probably unnecessary.

The variables then should be independent, right?


To the extent that they are not independent, they are not orthogonal, and
less information is produced.

Yet, if we take
> the multilevel approach, the 2nd level will be trying to take
> advantage of dependencies in those variables...


Probably not linear dependencies because these should have been wrung out in
the previous level. Hopefully, the next layer would look at time sequencing,
various combinations, etc.

Perhaps this will work due to inaccuracies in the algorithm, caused by
> approximate methods. The task of the higher levels, then, is to
> correct for the approximations.


This isn't my (blurred?) vision.

But if this is their usefulness, then
> it needs to be shown that they are capable of it. After all, they will
> be running the same sort of approximation. It is possible that they
> will therefore miss the same sorts of things. So, we need to be
> careful in defining multilevel systems.


Story: I once viewed being able to invert the Airy Disk transform (what
makes a blur from a point of light in a microscope or telescope) as an
EXTREMELY valuable thing to do to greatly increase their power, so I set
about finding a transform function. Then, I wrote a program to test it,
first making an Airy Disk blur and then transforming it back to the original
point. It sorta worked, but there was lots of computational noise in the
result, so I switched to double precision, whereupon it failed to work at
all. After LOTS more work, I finally figured out that the Airy Disk function
was a perfect spacial low pass filter, so that two points that were too
close to be resolved as separate points made EXACTLY the same perfectly
circular pattern as did a single point of the same total brightness. In
single precision, I was inverting the computational noise, and doing a
pretty good job of it. However, for about a month, I thought 

Re: [agi] Computing's coming Theory of Everything

2008-07-22 Thread Abram Demski
On Tue, Jul 22, 2008 at 4:29 PM, Steve Richfield
<[EMAIL PROTECTED]> wrote:
> Abram,
>
> On 7/22/08, Abram Demski <[EMAIL PROTECTED]> wrote:
>>
>> From the paper you posted, and from wikipedia articles, the current
>> meaning of PCA is very different from your generalized version. I
>> doubt the current algorithms would even metaphorically apply...
>
>
> Just more input points that are time-displaced from the present points, or
> alternatively in simple cases, compute with the derivative of the inputs
> rather than with their static value.

Such systems might produce some good results, but the formalism cannot
represent complex relational ideas. It is not even capable of
representing context-free patterns (for example, pictures of
fractals). Of course, I'm referring to PCA "as it is", not "as it
could be".

>>
>> Also, what would "multiple layers" mean in the generalized version?
>
>
> Performing the PC-like analysis on the principal components derived in a
> preceding PC-like analysis.

If this worked, it would be another way of trying to break up the task
into subtasks. It might help, I admit. It has an intuitive feel; it
fits the idea of there being levels of processing in the brain. But if
it helps, why? What clean subtask-division is it relying on? The idea
of iteratively compressing data by looking for the highest-information
variable repeatedly makes sense to me, it is a clear subgoal. But what
is the subgoal here?

Hmm... the algorithm for a single level would need to "subtract" the
information encoded in the new variable each time, so that the next
iteration is working with only the still-unexplained properties of the
data. The variables then should be independent, right? Yet, if we take
the multilevel approach, the 2nd level will be trying to take
advantage of dependencies in those variables...

Perhaps this will work due to inaccuracies in the algorithm, caused by
approximate methods. The task of the higher levels, then, is to
correct for the approximations. But if this is their usefulness, then
it needs to be shown that they are capable of it. After all, they will
be running the same sort of approximation. It is possible that they
will therefore miss the same sorts of things. So, we need to be
careful in defining multilevel systems.

>
> Steve Richfield
> 
>>
>> On Tue, Jul 22, 2008 at 2:58 PM, Steve Richfield
>> <[EMAIL PROTECTED]> wrote:
>> > Abram,
>> >
>> > On 7/22/08, Abram Demski <[EMAIL PROTECTED]> wrote:
>> >>
>> >> "Problem Statement: What are the optimal functions, derived from
>> >> real-world observations of past events, the timings of their comings
>> >> and goings, and perhaps their physical association, to extract each
>> >> successive parameter containing the maximum amount of information (in
>> >> a Shannon sense) usable in reconstructing the observed inputs."
>> >>
>> >> I see it now! It is typically very useful to decompose a problem into
>> >> sub-problems that can be solved either independently or with simple
>> >> well-defined interaction. What you are proposing is such a
>> >> decomposition, for the very general problem of compression. "Find an
>> >> encoding scheme for the data in dataset X that minimizes the number of
>> >> bits we need" can be split into subproblems of the form "find a
>> >> meaning for the next N bits of an encoding that maximizes the
>> >> information they carry". The general problem can be solved by applying
>> >> a solution to the simpler problem until the data is completely
>> >> compressed.
>> >
>> >
>> > Yes, we do appear to be on the same page here. The challenge is that
>> > there
>> > seems to be a prevailing opinion that these don't :stack" into
>> > multi-level
>> > structures. The reason that this hasn't been tested seems obvious from
>> > the
>> > literature - computers are now just too damn slow, but people here seem
>> > to
>> > think that there is another more basic reason, like it doesn't work. I
>> > don't
>> > understand this argument either.
>> >
>> > Richard, perhaps you could explain?
>> >>
>> >> "However, it still fails to consider temporal clues, unless of course
>> >> you just consider these to be another dimension."
>> >>
>> >> Why does this not count as a working solution?
>> >
>> >
>> > It might be. Note that delays from axonal transit times could quite
>> > easily
>> > and effectively present inputs "flat" with time presented as just
>> > another
>> > dimension. Now, the challenge of testing a theory with an additional
>> > dimension, that already clogs computers without the additional
>> > dimension.
>> > Ugh. Any thoughts?
>> >
>> > Perhaps I should write this up and send it to the various people working
>> > in
>> > this area. Perhaps people with the present test beds could find a way to
>> > test this, and the retired math professor would have a better idea as to
>> > exactly what needed to be optimized.
>> >
>> > Steve Richfield
>> > =
>> >>
>> >> On Tue, Jul 22, 2008 at 1:48 PM, Steve 

Re: [agi] Computing's coming Theory of Everything

2008-07-22 Thread Steve Richfield
Abram,

On 7/22/08, Abram Demski <[EMAIL PROTECTED]> wrote:
>
> From the paper you posted, and from wikipedia articles, the current
> meaning of PCA is very different from your generalized version. I
> doubt the current algorithms would even metaphorically apply...


Just more input points that are time-displaced from the present points, or
alternatively in simple cases, compute with the derivative of the inputs
rather than with their static value.

Also, what would "multiple layers" mean in the generalized version?


Performing the PC-like analysis on the principal components derived in a
preceding PC-like analysis.

Steve Richfield


> On Tue, Jul 22, 2008 at 2:58 PM, Steve Richfield
> <[EMAIL PROTECTED]> wrote:
> > Abram,
> >
> > On 7/22/08, Abram Demski <[EMAIL PROTECTED]> wrote:
> >>
> >> "Problem Statement: What are the optimal functions, derived from
> >> real-world observations of past events, the timings of their comings
> >> and goings, and perhaps their physical association, to extract each
> >> successive parameter containing the maximum amount of information (in
> >> a Shannon sense) usable in reconstructing the observed inputs."
> >>
> >> I see it now! It is typically very useful to decompose a problem into
> >> sub-problems that can be solved either independently or with simple
> >> well-defined interaction. What you are proposing is such a
> >> decomposition, for the very general problem of compression. "Find an
> >> encoding scheme for the data in dataset X that minimizes the number of
> >> bits we need" can be split into subproblems of the form "find a
> >> meaning for the next N bits of an encoding that maximizes the
> >> information they carry". The general problem can be solved by applying
> >> a solution to the simpler problem until the data is completely
> >> compressed.
> >
> >
> > Yes, we do appear to be on the same page here. The challenge is that
> there
> > seems to be a prevailing opinion that these don't :stack" into
> multi-level
> > structures. The reason that this hasn't been tested seems obvious from
> the
> > literature - computers are now just too damn slow, but people here seem
> to
> > think that there is another more basic reason, like it doesn't work. I
> don't
> > understand this argument either.
> >
> > Richard, perhaps you could explain?
> >>
> >> "However, it still fails to consider temporal clues, unless of course
> >> you just consider these to be another dimension."
> >>
> >> Why does this not count as a working solution?
> >
> >
> > It might be. Note that delays from axonal transit times could quite
> easily
> > and effectively present inputs "flat" with time presented as just another
> > dimension. Now, the challenge of testing a theory with an additional
> > dimension, that already clogs computers without the additional dimension.
> > Ugh. Any thoughts?
> >
> > Perhaps I should write this up and send it to the various people working
> in
> > this area. Perhaps people with the present test beds could find a way to
> > test this, and the retired math professor would have a better idea as to
> > exactly what needed to be optimized.
> >
> > Steve Richfield
> > =
> >>
> >> On Tue, Jul 22, 2008 at 1:48 PM, Steve Richfield
> >> <[EMAIL PROTECTED]> wrote:
> >> > Ben,
> >> > On 7/22/08, Benjamin Johnston <[EMAIL PROTECTED]> wrote:
> >> >>>
> >> >>> You are confusing what PCA now is, and what it might become. I am
> more
> >> >>> interested in the dream than in the present reality.
> >> >>
> >> >> That is like claiming that multiplication of two numbers is the
> answer
> >> >> to
> >> >> AGI, and then telling any critics that they're confusing what
> >> >> multiplication
> >> >> is now with what multiplication may become.
> >> >
> >> >
> >> > Restating (not copying) my original posting, the challenge of
> effective
> >> > unstructured learning is to utilize every clue and NOT just go with
> >> > static
> >> > clusters, etc. This includes temporal as well as positional clues,
> >> > information content, etc. PCA does some but certainly not all of this,
> >> > but
> >> > considering that we were talking about clustering here just a couple
> of
> >> > weeks ago, ratcheting up to PCA seems to be at least a step out of the
> >> > basement.
> >> >
> >> > I think that perhaps I mis-stated or was misunderstood in my
> "position".
> >> > No
> >> > one has "the answer" yet, but given recent work, I think that perhaps
> >> > the
> >> > problem can now be stated. Given a problem statement, it (hopefully)
> >> > should
> >> > be "just some math" to zero in on the solution. OK...
> >> >
> >> > Problem Statement: What are the optimal functions, derived from
> >> > real-world
> >> > observations of past events, the timings of their comings and goings,
> >> > and
> >> > perhaps their physical association, to extract each successive
> parameter
> >> > containing the maximum amount of information (in a Shannon sense)
> usable
> >> > in
> >> > reconstruct

Re: [agi] Computing's coming Theory of Everything

2008-07-22 Thread Abram Demski
>From the paper you posted, and from wikipedia articles, the current
meaning of PCA is very different from your generalized version. I
doubt the current algorithms would even metaphorically apply...

Also, what would "multiple layers" mean in the generalized version?

On Tue, Jul 22, 2008 at 2:58 PM, Steve Richfield
<[EMAIL PROTECTED]> wrote:
> Abram,
>
> On 7/22/08, Abram Demski <[EMAIL PROTECTED]> wrote:
>>
>> "Problem Statement: What are the optimal functions, derived from
>> real-world observations of past events, the timings of their comings
>> and goings, and perhaps their physical association, to extract each
>> successive parameter containing the maximum amount of information (in
>> a Shannon sense) usable in reconstructing the observed inputs."
>>
>> I see it now! It is typically very useful to decompose a problem into
>> sub-problems that can be solved either independently or with simple
>> well-defined interaction. What you are proposing is such a
>> decomposition, for the very general problem of compression. "Find an
>> encoding scheme for the data in dataset X that minimizes the number of
>> bits we need" can be split into subproblems of the form "find a
>> meaning for the next N bits of an encoding that maximizes the
>> information they carry". The general problem can be solved by applying
>> a solution to the simpler problem until the data is completely
>> compressed.
>
>
> Yes, we do appear to be on the same page here. The challenge is that there
> seems to be a prevailing opinion that these don't :stack" into multi-level
> structures. The reason that this hasn't been tested seems obvious from the
> literature - computers are now just too damn slow, but people here seem to
> think that there is another more basic reason, like it doesn't work. I don't
> understand this argument either.
>
> Richard, perhaps you could explain?
>>
>> "However, it still fails to consider temporal clues, unless of course
>> you just consider these to be another dimension."
>>
>> Why does this not count as a working solution?
>
>
> It might be. Note that delays from axonal transit times could quite easily
> and effectively present inputs "flat" with time presented as just another
> dimension. Now, the challenge of testing a theory with an additional
> dimension, that already clogs computers without the additional dimension.
> Ugh. Any thoughts?
>
> Perhaps I should write this up and send it to the various people working in
> this area. Perhaps people with the present test beds could find a way to
> test this, and the retired math professor would have a better idea as to
> exactly what needed to be optimized.
>
> Steve Richfield
> =
>>
>> On Tue, Jul 22, 2008 at 1:48 PM, Steve Richfield
>> <[EMAIL PROTECTED]> wrote:
>> > Ben,
>> > On 7/22/08, Benjamin Johnston <[EMAIL PROTECTED]> wrote:
>> >>>
>> >>> You are confusing what PCA now is, and what it might become. I am more
>> >>> interested in the dream than in the present reality.
>> >>
>> >> That is like claiming that multiplication of two numbers is the answer
>> >> to
>> >> AGI, and then telling any critics that they're confusing what
>> >> multiplication
>> >> is now with what multiplication may become.
>> >
>> >
>> > Restating (not copying) my original posting, the challenge of effective
>> > unstructured learning is to utilize every clue and NOT just go with
>> > static
>> > clusters, etc. This includes temporal as well as positional clues,
>> > information content, etc. PCA does some but certainly not all of this,
>> > but
>> > considering that we were talking about clustering here just a couple of
>> > weeks ago, ratcheting up to PCA seems to be at least a step out of the
>> > basement.
>> >
>> > I think that perhaps I mis-stated or was misunderstood in my "position".
>> > No
>> > one has "the answer" yet, but given recent work, I think that perhaps
>> > the
>> > problem can now be stated. Given a problem statement, it (hopefully)
>> > should
>> > be "just some math" to zero in on the solution. OK...
>> >
>> > Problem Statement: What are the optimal functions, derived from
>> > real-world
>> > observations of past events, the timings of their comings and goings,
>> > and
>> > perhaps their physical association, to extract each successive parameter
>> > containing the maximum amount of information (in a Shannon sense) usable
>> > in
>> > reconstructing the observed inputs. IMHO these same functions will be
>> > exactly what you need to recognize what is happening in the world, what
>> > you
>> > need to act upon, which actions will have the most effect on the world,
>> > etc.
>> > PCA is clearly NOT there (e.g. it lacks temporal consideration), but
>> > seems
>> > to be a step closer than anything else on the horizon. Hopefully, given
>> > the
>> > "hint" of PCA, we can follow the path.
>> >
>> > You should find an explanation of PCA in any elementary linear algebra
>> > or
>> > statistics textbook. It has a range of applications (like any
>> >

Re: [agi] Is intelligence just a single non-linear function? [WAS Re: [agi] Computing's coming Theory of Everything]

2008-07-22 Thread Richard Loosemore

Steve Richfield wrote:

Richard,
 
Good - you hit this one on its head! Continuing...
 
On 7/22/08, *Richard Loosemore* <[EMAIL PROTECTED] 
> wrote:


Steve Richfield wrote:

THIS is a big question. Remembering that absolutely ANY function
can be performed by passing the inputs through a suitable
non-linearity, adding them up, and running the results through
another suitable non-linearity, it isn't clear what the
limitations of "linear" operations are, given suitable
"translation" of units or point-of-view. Certainly, all fuzzy
logical functions can be performed this way. I even presented a
paper at the very 1st NN conference in San Diego, showing that
one of the two inhibitory synapses ever to be characterized was
precisely what was needed to perform an AND NOT to the
logarithms of probabilities of assertions being true, right down
to the discontinuity at 1.


Steve,

You are stating a well-known point of view which makes no sense, and
which has been widely discredited in cognitive science for five decades:

 
I don't really understand how it is possible to "discredit" a 
prospective solution that is not yet known, other than exhibiting 
people's inability to arrive at it, e.g. as people have been unable to 
parse English using POS-based approaches, given ~40 years to do so.


I am going to have stop.  How can I explain how this idea became 
discredited, using only the space available in one list post, when it 
takes an entire course in cognitive science to drill it into the heads 
of undergraduate cog sci students (and quite often it does not click 
even then)?





Richard Loosemore












 


 you are stating [one version of] the core of the Behaviorist manifesto.

 
Close, but not exactly. I believe that there is a common math basis with 
some "tweaks" as needed for things that don't "fit the pattern".


 


Yes, in principle you could argue that intelligent systems consist
only of a black box with one gargantuan nonlinear function that maps
inputs to outputs.

 
Remembering that there are ~200 different types of neurons, probably 
some with different physical structure but the same math, and others 
with different math, it would be good to arrive at a full understanding 
of at least one of them, and move out from there.


 


The trouble is that such a "flat" system is only possible in
principle:  it would be ridiculously huge, and it gives us no clue
about how it becomes learned through experience.

So the fact that everything could IN PRINCIPLE be done in this
simplistic, flat kind of system means nothing.  The devil is in the
detals and the details are just ridiculous.

 
Again, I am NOT proposing a single type of building block, but rather a 
family with a common mathematical underpinning, plus whatever "special 
sauce" these fail to provide.


 


One problem is that this idea - this "Hey!! Let's Just Explain It
With One Great Big Nonlinear Function, Folks" idea - keeps
creeping back into the cognitive science-neural nets-artificial
intelligence complex.

 
How about substituting ~200 for One.


 


Otherwise sensible people keep accidentally reintoducing it without
really understanding what they are doing;  without understanding the
ramifications of this idea.

That is why it is meaningless to say something like "Make that
present-day PCA. Several people are working on its limitations, and
there seems to be some reason for hope of much better things to
come." There is little reason to hope for better things to come
(except for the low level mechanisms that Derek quite correctly
pointed out), because the whole PCA idea is a dead end.

 
I hear that you are quite convinced of this, and if this is true, then I 
should also become quite convinced. I just don't yet see how to get 
there (mentally burying PCA-like approaches and other similar NN-like 
views) given that something like these seem to be working for us.


This seems to be going the way of the discussion on the viability of ad 
hoc approaches to AGI we had a couple of months ago, where I asked for 
the prima facie case that it should work, and got a bunch of opinions 
generally to the effect that people felt that it could work, but 
couldn't state why they felt this way. Is that the case here - that you 
feel that PCA-like approaches can't work, but you can't make the prima 
facie case?


A dead end as a general AGI theory, mark you.  It has its uses.

 
If I could see just one narrow application where something worked every 
bit as well as neurons in people do, then there would be some sort of 
starting point. Until then, nothing, not even PCA, would seem to "have 
its uses".
 
You have quite rightly moved the level of this discussion up to where it 
belongs. Now the challenge seems to be for one o

Re: [agi] Computing's coming Theory of Everything

2008-07-22 Thread Steve Richfield
Abram,

On 7/22/08, Abram Demski <[EMAIL PROTECTED]> wrote:
>
> "Problem Statement: What are the optimal functions, derived from
> real-world observations of past events, the timings of their comings
> and goings, and perhaps their physical association, to extract each
> successive parameter containing the maximum amount of information (in
> a Shannon sense) usable in reconstructing the observed inputs."
>
> I see it now! It is typically very useful to decompose a problem into
> sub-problems that can be solved either independently or with simple
> well-defined interaction. What you are proposing is such a
> decomposition, for the very general problem of compression. "Find an
> encoding scheme for the data in dataset X that minimizes the number of
> bits we need" can be split into subproblems of the form "find a
> meaning for the next N bits of an encoding that maximizes the
> information they carry". The general problem can be solved by applying
> a solution to the simpler problem until the data is completely
> compressed.


Yes, we do appear to be on the same page here. The challenge is that there
seems to be a prevailing opinion that these don't :stack" into multi-level
structures. The reason that this hasn't been tested seems obvious from the
literature - computers are now just too damn slow, but people here seem to
think that there is another more basic reason, like it doesn't work. I don't
understand this argument either.

*Richard, perhaps you could explain?*

"However, it still fails to consider temporal clues, unless of course
> you just consider these to be another dimension."
>
> Why does this not count as a working solution?


It might be. Note that delays from axonal transit times could quite easily
and effectively present inputs "flat" with time presented as just another
dimension. Now, the challenge of testing a theory with an additional
dimension, that already clogs computers without the additional dimension.
Ugh. Any thoughts?

Perhaps I should write this up and send it to the various people working in
this area. Perhaps people with the present test beds could find a way to
test this, and the retired math professor would have a better idea as to
exactly what needed to be optimized.

Steve Richfield
=

> On Tue, Jul 22, 2008 at 1:48 PM, Steve Richfield
> <[EMAIL PROTECTED]> wrote:
> > Ben,
> > On 7/22/08, Benjamin Johnston <[EMAIL PROTECTED]> wrote:
> >>>
> >>> You are confusing what PCA now is, and what it might become. I am more
> >>> interested in the dream than in the present reality.
> >>
> >> That is like claiming that multiplication of two numbers is the answer
> to
> >> AGI, and then telling any critics that they're confusing what
> multiplication
> >> is now with what multiplication may become.
> >
> >
> > Restating (not copying) my original posting, the challenge of effective
> > unstructured learning is to utilize every clue and NOT just go with
> static
> > clusters, etc. This includes temporal as well as positional clues,
> > information content, etc. PCA does some but certainly not all of this,
> but
> > considering that we were talking about clustering here just a couple of
> > weeks ago, ratcheting up to PCA seems to be at least a step out of the
> > basement.
> >
> > I think that perhaps I mis-stated or was misunderstood in my "position".
> No
> > one has "the answer" yet, but given recent work, I think that perhaps the
> > problem can now be stated. Given a problem statement, it (hopefully)
> should
> > be "just some math" to zero in on the solution. OK...
> >
> > Problem Statement: What are the optimal functions, derived from
> real-world
> > observations of past events, the timings of their comings and goings, and
> > perhaps their physical association, to extract each successive parameter
> > containing the maximum amount of information (in a Shannon sense) usable
> in
> > reconstructing the observed inputs. IMHO these same functions will be
> > exactly what you need to recognize what is happening in the world, what
> you
> > need to act upon, which actions will have the most effect on the world,
> etc.
> > PCA is clearly NOT there (e.g. it lacks temporal consideration), but
> seems
> > to be a step closer than anything else on the horizon. Hopefully, given
> the
> > "hint" of PCA, we can follow the path.
> >
> > You should find an explanation of PCA in any elementary linear algebra or
> > statistics textbook. It has a range of applications (like any transform),
> > but it might be best regarded as an/the elementary algorithm for
> > unsupervised dimension reduction.
> >
> > Bingo! However, it still fails to consider temporal clues, unless of
> course
> > you just consider these to be another dimension.
> >
> > When PCA works, it is more likely to be interpreted as a comment on the
> > underlying simplicity of the original dataset, rather than the power of
> PCA
> > itself.
> >
> > Agreed, but so far, I haven't seen any solid evidence that the world is
> 

Re: [agi] Is intelligence just a single non-linear function? [WAS Re: [agi] Computing's coming Theory of Everything]

2008-07-22 Thread Steve Richfield
Richard,

Good - you hit this one on its head! Continuing...

On 7/22/08, Richard Loosemore <[EMAIL PROTECTED]> wrote:

> Steve Richfield wrote:
>
> THIS is a big question. Remembering that absolutely ANY function can be
>> performed by passing the inputs through a suitable non-linearity, adding
>> them up, and running the results through another suitable non-linearity, it
>> isn't clear what the limitations of "linear" operations are, given suitable
>> "translation" of units or point-of-view. Certainly, all fuzzy logical
>> functions can be performed this way. I even presented a paper at the very
>> 1st NN conference in San Diego, showing that one of the two inhibitory
>> synapses ever to be characterized was precisely what was needed to perform
>> an AND NOT to the logarithms of probabilities of assertions being true,
>> right down to the discontinuity at 1.
>>
>
> Steve,
>
> You are stating a well-known point of view which makes no sense, and which
> has been widely discredited in cognitive science for five decades:


I don't really understand how it is possible to "discredit" a prospective
solution that is not yet known, other than exhibiting people's inability to
arrive at it, e.g. as people have been unable to parse English using
POS-based approaches, given ~40 years to do so.



>  you are stating [one version of] the core of the Behaviorist manifesto.


Close, but not exactly. I believe that there is a common math basis with
some "tweaks" as needed for things that don't "fit the pattern".



> Yes, in principle you could argue that intelligent systems consist only of
> a black box with one gargantuan nonlinear function that maps inputs to
> outputs.


Remembering that there are ~200 different types of neurons, probably some
with different physical structure but the same math, and others with
different math, it would be good to arrive at a full understanding of at
least one of them, and move out from there.



> The trouble is that such a "flat" system is only possible in principle:  it
> would be ridiculously huge, and it gives us no clue about how it becomes
> learned through experience.
>
> So the fact that everything could IN PRINCIPLE be done in this simplistic,
> flat kind of system means nothing.  The devil is in the detals and the
> details are just ridiculous.


Again, I am NOT proposing a single type of building block, but rather a
family with a common mathematical underpinning, plus whatever "special
sauce" these fail to provide.



> One problem is that this idea - this "Hey!! Let's Just Explain It With One
> Great Big Nonlinear Function, Folks" idea - keeps creeping back into the
> cognitive science-neural nets-artificial intelligence complex.


How about substituting ~200 for One.



> Otherwise sensible people keep accidentally reintoducing it without really
> understanding what they are doing;  without understanding the ramifications
> of this idea.
>
> That is why it is meaningless to say something like "Make that present-day
> PCA. Several people are working on its limitations, and there seems to be
> some reason for hope of much better things to come." There is little reason
> to hope for better things to come (except for the low level mechanisms that
> Derek quite correctly pointed out), because the whole PCA idea is a dead
> end.


I hear that you are quite convinced of this, and if this is true, then I
should also become quite convinced. I just don't yet see how to get there
(mentally burying PCA-like approaches and other similar NN-like views) given
that something like these seem to be working for us.

This seems to be going the way of the discussion on the viability of ad hoc
approaches to AGI we had a couple of months ago, where I asked for the prima
facie case that it should work, and got a bunch of opinions generally to the
effect that people felt that it could work, but couldn't state why they felt
this way. Is that the case here - that you feel that PCA-like approaches
can't work, but you can't make the prima facie case?

> A dead end as a general AGI theory, mark you.  It has its uses.


If I could see just one narrow application where something worked every bit
as well as neurons in people do, then there would be some sort of starting
point. Until then, nothing, not even PCA, would seem to "have its uses".

You have quite rightly moved the level of this discussion up to where it
belongs. Now the challenge seems to be for one of us to "put a stake through
the heart" of the other. You just got my spleen - would you care to take
another shot?!

Steve Richfield



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com


Re: [agi] Computing's coming Theory of Everything

2008-07-22 Thread Abram Demski
"Problem Statement: What are the optimal functions, derived from
real-world observations of past events, the timings of their comings
and goings, and perhaps their physical association, to extract each
successive parameter containing the maximum amount of information (in
a Shannon sense) usable in reconstructing the observed inputs."

I see it now! It is typically very useful to decompose a problem into
sub-problems that can be solved either independently or with simple
well-defined interaction. What you are proposing is such a
decomposition, for the very general problem of compression. "Find an
encoding scheme for the data in dataset X that minimizes the number of
bits we need" can be split into subproblems of the form "find a
meaning for the next N bits of an encoding that maximizes the
information they carry". The general problem can be solved by applying
a solution to the simpler problem until the data is completely
compressed.

"However, it still fails to consider temporal clues, unless of course
you just consider these to be another dimension."

Why does this not count as a working solution?

On Tue, Jul 22, 2008 at 1:48 PM, Steve Richfield
<[EMAIL PROTECTED]> wrote:
> Ben,
> On 7/22/08, Benjamin Johnston <[EMAIL PROTECTED]> wrote:
>>>
>>> You are confusing what PCA now is, and what it might become. I am more
>>> interested in the dream than in the present reality.
>>
>> That is like claiming that multiplication of two numbers is the answer to
>> AGI, and then telling any critics that they're confusing what multiplication
>> is now with what multiplication may become.
>
>
> Restating (not copying) my original posting, the challenge of effective
> unstructured learning is to utilize every clue and NOT just go with static
> clusters, etc. This includes temporal as well as positional clues,
> information content, etc. PCA does some but certainly not all of this, but
> considering that we were talking about clustering here just a couple of
> weeks ago, ratcheting up to PCA seems to be at least a step out of the
> basement.
>
> I think that perhaps I mis-stated or was misunderstood in my "position". No
> one has "the answer" yet, but given recent work, I think that perhaps the
> problem can now be stated. Given a problem statement, it (hopefully) should
> be "just some math" to zero in on the solution. OK...
>
> Problem Statement: What are the optimal functions, derived from real-world
> observations of past events, the timings of their comings and goings, and
> perhaps their physical association, to extract each successive parameter
> containing the maximum amount of information (in a Shannon sense) usable in
> reconstructing the observed inputs. IMHO these same functions will be
> exactly what you need to recognize what is happening in the world, what you
> need to act upon, which actions will have the most effect on the world, etc.
> PCA is clearly NOT there (e.g. it lacks temporal consideration), but seems
> to be a step closer than anything else on the horizon. Hopefully, given the
> "hint" of PCA, we can follow the path.
>
> You should find an explanation of PCA in any elementary linear algebra or
> statistics textbook. It has a range of applications (like any transform),
> but it might be best regarded as an/the elementary algorithm for
> unsupervised dimension reduction.
>
> Bingo! However, it still fails to consider temporal clues, unless of course
> you just consider these to be another dimension.
>
> When PCA works, it is more likely to be interpreted as a comment on the
> underlying simplicity of the original dataset, rather than the power of PCA
> itself.
>
> Agreed, but so far, I haven't seen any solid evidence that the world is NOT
> simple, though it appears pretty complex until you understand it.
>
> Thanks for making me clarify my thoughts.
>
> Steve Richfield
>
> 
> agi | Archives | Modify Your Subscription


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com


Re: [agi] Computing's coming Theory of Everything

2008-07-22 Thread Steve Richfield
Ben,
On 7/22/08, Benjamin Johnston <[EMAIL PROTECTED]> wrote:
>
>
> You are confusing what PCA now is, and what it might become. I am more
>> interested in the dream than in the present reality.
>>
>
> That is like claiming that multiplication of two numbers is the answer to
> AGI, and then telling any critics that they're confusing what multiplication
> is now with what multiplication may become.


*Restating (not copying) my original posting, the challenge of effective
unstructured learning is to utilize every clue and NOT just go with static
clusters, etc. This includes temporal as well as positional clues,
information content, etc. PCA does some but certainly not all of this, but
considering that we were talking about clustering here just a couple of
weeks ago, ratcheting up to PCA seems to be at least a step out of the
basement.*
**
*I think that perhaps I mis-stated or was misunderstood in my "position". No
one has "the answer" yet, but given recent work, I think that perhaps the
problem can now be stated. Given a problem statement, it (hopefully) should
be "just some math" to zero in on the solution. OK...*
**
*Problem Statement: What are the optimal functions, derived from real-world
observations of past events, the timings of their comings and goings, and
perhaps their physical association, to extract each successive parameter
containing the maximum amount of information (in a Shannon sense) usable in
reconstructing the observed inputs. IMHO these same functions will be
exactly what you need to recognize what is happening in the world, what you
need to act upon, which actions will have the most effect on the world, etc.
PCA is clearly NOT there (e.g. it lacks temporal consideration), but seems
to be a step closer than anything else on the horizon. Hopefully, given the
"hint" of PCA, we can follow the path.*
**
You should find an explanation of PCA in any elementary linear algebra or
statistics textbook. It has a range of applications (like any transform),
but it might be best regarded as an/the elementary algorithm for
unsupervised dimension reduction.

*Bingo! However, it still fails to consider temporal clues, unless of course
you just consider these to be another dimension.*

When PCA works, it is more likely to be interpreted as a comment on the
underlying simplicity of the original dataset, rather than the power of PCA
itself.

*Agreed, but so far, I haven't seen any solid evidence that the world is NOT
simple, though it appears pretty complex until you understand it.

Thanks for making me clarify my thoughts.*
**
*Steve Richfield*



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com


Re: [agi] Computing's coming Theory of Everything

2008-07-22 Thread Richard Loosemore

Steve Richfield wrote:

Richard,
 
You are confusing what PCA now is, and what it might become. I am more 
interested in the dream than in the present reality. Detailed comments 
follow...
 
On 7/21/08, *Richard Loosemore* <[EMAIL PROTECTED] 
> wrote:


Steve Richfield wrote:

 Maybe not "complete" AGI, but a good chunk of one.


Mercy me!  It is not even a gleam in the eye of something that would
be half adequate.

 
Who knows what the building blocks of the first successful AGI will be. 


Ummm some of us believe we do.  We see the ingredients, we see the 
obvious traps and pitfalls, we see the dead ends.  We may even see the 
beginnings of a complete theory.  But that is by the by:  we see the 
dead ends clearly enough, and PCA is one of them.





Remember that YOU are made of wet neurons, and who knows, maybe they 
work by some as-yet-to-be-identified mathematics that will be uncovered 
in the quest for a better PCA.


 Do you have any favorites?


No.  The ones I have seen are not worth a second look.

 
I had the same opinion.


I have attached an earlier 2006 paper with *_pictures_* of the
learned transfer functions, which look a LOT like what is seen
in a cat's an money's visual processing.


... which is so low-level that it counts as peripheral wiring.

 
Agreed, but there is little difference between GOOD compression and 
understanding, so if these guys are truly able to (eventually) perform 
good compression, then maybe we are on the way to understanding.


Now that, I'm afraid, is simply not true.  What woudl make you say that 
"there is little difference between GOOD compression and understanding"? 
 That statement is unsupportable.





Note that in the last section where they consider multi-layer
applications, that they apparently suggest using *_only one_*
PCA layer!


Of course they do:  that is what all these magic bullet people say.
They can't figure out how to do things in more than one layer, and
they do not really understand that it is *necessary* to do things in
more than one layer, so guess what?, they suggest that we not *need*
more than one layer.

Sigh.  Programmer Error.

 
I noted this comment because it didn't ring true for me either. However, 
my take on this is that a real/future/good PCA will work for many 
layers, and not just the first.


Well, there is a sense in which I would agree with this, but the problem 
is that by the time it has been extended sufficiently to become 
multilayer, it will not longer be recognisable as PCA, and it will not 
necessarily have any of the original good features of PCA, and there be 
other mechanisms that do the same multi-layer understanding much better 
than this hypothetical PCA+++, and, finally, those other mechanisms may 
be much easier to discover than PCA+++.


I simply do not believe that you can get there by starting from here. 
That is why I describe PCA as a dead end.





Note that the extensive training was LESS than what a baby sees 
during its first hour in the real world.


   To give you an idea of what I am looking for, does the
algorithm go
   beyond single-level encoding patterns?

 Many of the articles, including the one above, make it clear
that they are up against a computing "brick wall". It seems that
algorithmic honing is necessary to prove whether the algorithms
are any good. Hence, no one has shown any practical application
(yet), though they note that JPEG encoding is a sort of grossly
degenerative example of their approach.
 Of course, the present computational difficulties is NO
indication that this isn't the right and best way to go, though
I agree that this is yet to be proven.


Hmm... you did not eally answer the question here.

 
Increasing bluntness: How are they supposed to test multiple-layer 
methods when they have to run their computers for days just to test a 
single layer? PCs just don't last that long, and Microsoft has provided 
no checkpoint capability to support year-long executions.


What I meant was:  does anyone have any reason to believe that the 
scalability of the multilevel PCA will be such that we can get 
human-level intelligence out of it, using a computer smaller than the 
size of the entire universe?  In that context, it would not be an answer 
to say ... "well, those folks haven't been able to get powerful enough 
computers yet, so we don't know"  :-)






 Does your response indicate that you are willing to take a shot
at explaining some of the math murk in more recent articles? I
could certainly use any help that I can get. So far, it appears
that a PCA and matrix algebra glossary of terms and
abbreviations would go a LONG way to understanding these
articles. I wonder if one already exist

Is intelligence just a single non-linear function? [WAS Re: [agi] Computing's coming Theory of Everything]

2008-07-22 Thread Richard Loosemore

Steve Richfield wrote:

THIS is a big question. Remembering that absolutely ANY function can be 
performed by passing the inputs through a suitable non-linearity, adding 
them up, and running the results through another suitable non-linearity, 
it isn't clear what the limitations of "linear" operations are, given 
suitable "translation" of units or point-of-view. Certainly, all fuzzy 
logical functions can be performed this way. I even presented a paper at 
the very 1st NN conference in San Diego, showing that one of the two 
inhibitory synapses ever to be characterized was precisely what was 
needed to perform an AND NOT to the logarithms of probabilities of 
assertions being true, right down to the discontinuity at 1.


Steve,

You are stating a well-known point of view which makes no sense, and 
which has been widely discredited in cognitive science for five decades: 
 you are stating [one version of] the core of the Behaviorist manifesto.


Yes, in principle you could argue that intelligent systems consist only 
of a black box with one gargantuan nonlinear function that maps inputs 
to outputs.


The trouble is that such a "flat" system is only possible in principle: 
 it would be ridiculously huge, and it gives us no clue about how it 
becomes learned through experience.


So the fact that everything could IN PRINCIPLE be done in this 
simplistic, flat kind of system means nothing.  The devil is in the 
detals and the details are just ridiculous.


One problem is that this idea - this "Hey!! Let's Just Explain It With 
One Great Big Nonlinear Function, Folks" idea - keeps creeping back 
into the cognitive science-neural nets-artificial intelligence complex. 
 Otherwise sensible people keep accidentally reintoducing it without 
really understanding what they are doing;  without understanding the 
ramifications of this idea.


That is why it is meaningless to say something like "Make that 
present-day PCA. Several people are working on its limitations, and 
there seems to be some reason for hope of much better things to come." 
There is little reason to hope for better things to come (except for the 
low level mechanisms that Derek quite correctly pointed out), because 
the whole PCA idea is a dead end.


A dead end as a general AGI theory, mark you.  It has its uses.




Richard Loosemore


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com


Re: [agi] Computing's coming Theory of Everything

2008-07-22 Thread Derek Zahn
 
>  Remembering that absolutely ANY function can be performed by 
> passing the inputs through a suitable non-linearity, adding them 
> up, and running the results through another suitable non-linearity, 
> it isn't clear what the limitations of "linear" operations are
 
You might be interested in "kernel PCA"  
http://en.wikipedia.org/wiki/Kernel_principal_component_analysis
 
Also, once you start looking beyond pure PCA the ideas begin to blur with the 
"clustering" you abhor, with things like kohonen networks, k-means clustering, 
etc.  I'm not a huge expert on these topics although I do think that 
dimensionality reduction (for generalization/categorization if nothing else) 
must be an important piece of the puzzle.  These methods including PCA are all 
mainstream in machine learning.
 
> Did you see anything there that was not biologically plausible?
 
The fact that feature maps in the early visual system don't actually seem to 
detect the principal components as found in the methods of that paper.  
Instead, they appear to detect things that the principal components can also be 
usefully combined to represent (which are just the obvious features of a 
segmented visual field).  
 
>> For a much more detailed, capable, and perhaps more neurally 
>> plausible model of similar stuff, the work of Risto Miikkulainen's 
>> group is a lot of fun.
> 
> Do you have a hyperlink?
 
The book I'm thinking of is _Computational Maps in the Visual Cortex_.  see 
http://computationalmaps.org which has enough material to get the idea.
 
 
 


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com


Re: [agi] Computing's coming Theory of Everything

2008-07-22 Thread Benjamin Johnston


You are confusing what PCA now is, and what it might become. I am more 
interested in the dream than in the present reality.



That is like claiming that multiplication of two numbers is the answer 
to AGI, and then telling any critics that they're confusing what 
multiplication is now with what multiplication may become.



You should find an explanation of PCA in any elementary linear algebra 
or statistics textbook. It has a range of applications (like any 
transform), but it might be best regarded as an/the elementary algorithm 
for unsupervised dimension reduction. When PCA works, it is more likely 
to be interpreted as a comment on the underlying simplicity of the 
original dataset, rather than the power of PCA itself.


-Ben



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com


Re: [agi] Computing's coming Theory of Everything

2008-07-21 Thread Steve Richfield
Richard,

You are confusing what PCA now is, and what it might become. I am more
interested in the dream than in the present reality. Detailed comments
follow...

On 7/21/08, Richard Loosemore <[EMAIL PROTECTED]> wrote:
>
> Steve Richfield wrote:
>
>>  Maybe not "complete" AGI, but a good chunk of one.
>>
>
> Mercy me!  It is not even a gleam in the eye of something that would be
> half adequate.


Who knows what the building blocks of the first successful AGI will be.
Remember that YOU are made of wet neurons, and who knows, maybe they work by
some as-yet-to-be-identified mathematics that will be uncovered in the quest
for a better PCA.

  Do you have any favorites?
>>
>
> No.  The ones I have seen are not worth a second look.


I had the same opinion.

 I have attached an earlier 2006 paper with *_pictures_* of the learned
>> transfer functions, which look a LOT like what is seen in a cat's an money's
>> visual processing.
>>
>
> ... which is so low-level that it counts as peripheral wiring.


Agreed, but there is little difference between GOOD compression and
understanding, so if these guys are truly able to (eventually) perform good
compression, then maybe we are on the way to understanding.

 Note that in the last section where they consider multi-layer applications,
>> that they apparently suggest using *_only one_* PCA layer!
>>
>
> Of course they do:  that is what all these magic bullet people say. They
> can't figure out how to do things in more than one layer, and they do not
> really understand that it is *necessary* to do things in more than one
> layer, so guess what?, they suggest that we not *need* more than one layer.
>
> Sigh.  Programmer Error.


I noted this comment because it didn't ring true for me either. However, my
take on this is that a real/future/good PCA will work for many layers, and
not just the first.

Note that the extensive training was LESS than what a baby sees during its
first hour in the real world.

To give you an idea of what I am looking for, does the algorithm go
>>beyond single-level encoding patterns?
>>
>>  Many of the articles, including the one above, make it clear that they
>> are up against a computing "brick wall". It seems that algorithmic honing is
>> necessary to prove whether the algorithms are any good. Hence, no one has
>> shown any practical application (yet), though they note that JPEG encoding
>> is a sort of grossly degenerative example of their approach.
>>  Of course, the present computational difficulties is NO indication that
>> this isn't the right and best way to go, though I agree that this is yet to
>> be proven.
>>
>
> Hmm... you did not eally answer the question here.


Increasing bluntness: How are they supposed to test multiple-layer methods
when they have to run their computers for days just to test a single layer?
PCs just don't last that long, and Microsoft has provided no checkpoint
capability to support year-long executions.

  Does your response indicate that you are willing to take a shot at
>> explaining some of the math murk in more recent articles? I could certainly
>> use any help that I can get. So far, it appears that a PCA and matrix
>> algebra glossary of terms and abbreviations would go a LONG way to
>> understanding these articles. I wonder if one already exists?
>>
>
> I'd like to help (and I could), but do you realise how pointless it is?


Not yet. I agree that it has't gone anywhere yet. Please make your case that
this will never go anywhere.

 All this brings up another question to consider: Suppose that a magical
>> processing method were discovered that did everything that AGIs needed, but
>> took WAY more computing power than is presently available. What would people
>> here do?
>> 1.  Go work on better hardware.
>> 2.  Work of faster/crummier approximations.
>> 3.  Ignore it completely and look for some other breakthrough.
>>
>
> Steve, you raise a deeply interesting question, at one level, because of
> the answer that it provokes:  if you did not have the computing power to
> prove that the "magical processing method" actually was capable of solving
> the problems of AGI, then you would not be in any position to *know* that it
> was capable of solving the problems if AGI.


This all depends on the underlying theoretical case. Early Game Theory
application was also limited by compute power, but holding the proof that
this was as good as could be done, they pushed for more compute power rather
than walking away and looking for some other approach. I remember when the
RAND Corp required 5 hours to just to solve a 5X5 non-zero-sum game.

Your question answers itself, in other words.


Only in the absence of theoretical support/proof of optimality. PCA looked
like maybe such a proof might be in its future.

Steve Richfield


>> 
>> Steve Richfield wrote:
>>
>>Y'all,
>> I have long predicted a coming "Theory of Everything" (TOE) in
>>CS that would, among other things, 

Re: [agi] Computing's coming Theory of Everything

2008-07-21 Thread Steve Richfield
Derek,

On 7/21/08, Derek Zahn <[EMAIL PROTECTED]> wrote:
>
>
> > > I have attached an earlier 2006 paper with *_pictures_* of the learned
> > > transfer functions, which look a LOT like what is seen in a cat's an
> > > money's visual processing.
> >
> > ... which is so low-level that it counts as peripheral wiring.
>
> True.  Still, it is kind of cool stuff for folks interested in how neural
> systems might self-organize from sensory data.  The visual world has edges
> and borders at various scales and degrees of sharpness and it is interesting
> to see how that can be learned.  Unfortunately, although the linearity
> assumptions of PCA might just barely allow this sort of "proto-V1" as in the
> paper, it doesn't seem likely to extend further up in a feature abstraction
> hierarchy where more complex relationships would seem to require
> nonlinearities.
>

THIS is a big question. Remembering that absolutely ANY function can be
performed by passing the inputs through a suitable non-linearity, adding
them up, and running the results through another suitable non-linearity, it
isn't clear what the limitations of "linear" operations are, given suitable
"translation" of units or point-of-view. Certainly, all fuzzy logical
functions can be performed this way. I even presented a paper at the very
1st NN conference in San Diego, showing that one of the two inhibitory
synapses ever to be characterized was precisely what was needed to perform
an AND NOT to the logarithms of probabilities of assertions being true,
right down to the discontinuity at 1.

>
> Assuming the author's analysis is correct, the observation that the
> discovered eigenvectors form groups that can express rotations of edge (etc)
> filters at various frequencies is kind of nifty, even if it turns out not to
> be biologically plausible.
>

Did you see anything there that was not biologically plausible?

 I don't see any broad generalityfor AGI beyond very low-level sensory
> processing given the limits of PCA
>

Make that present-day PCA. Several people are working on its limitations,
and there seems to be some reason for hope of much better things to come.

 and the sheer volume of training data required to sort out the principal
> components of high-dimensional inputs.
>

Given crummy shitforbrains Hebbian neurons, that aren't smart enough to
continuously normalize their synaptic weights, etc. This too needs MUCH more
work.

>
> For a much more detailed, capable, and perhaps more neurally plausible
> model of similar stuff, the work of Risto Miikkulainen's group is a lot of
> fun.
>

Do you have a hyperlink?

Thanks.

Steve Richfield



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com


RE: [agi] Computing's coming Theory of Everything

2008-07-21 Thread Derek Zahn
> > I have attached an earlier 2006 paper with *_pictures_* of the learned > > 
> > transfer functions, which look a LOT like what is seen in a cat's an > > 
> > money's visual processing.> > ... which is so low-level that it counts as 
> > peripheral wiring.
True.  Still, it is kind of cool stuff for folks interested in how neural 
systems might self-organize from sensory data.  The visual world has edges and 
borders at various scales and degrees of sharpness and it is interesting to see 
how that can be learned.  Unfortunately, although the linearity assumptions of 
PCA might just barely allow this sort of "proto-V1" as in the paper, it doesn't 
seem likely to extend further up in a feature abstraction hierarchy where more 
complex relationships would seem to require nonlinearities.  
 
Assuming the author's analysis is correct, the observation that the discovered 
eigenvectors form groups that can express rotations of edge (etc) filters at 
various frequencies is kind of nifty, even if it turns out not to be 
biologically plausible.  I don't see any broad generalityfor AGI beyond very 
low-level sensory processing given the limits of PCA and the sheer volume of 
training data required to sort out the principal components of high-dimensional 
inputs.
 
For a much more detailed, capable, and perhaps more neurally plausible model of 
similar stuff, the work of Risto Miikkulainen's group is a lot of fun.
 


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com


Re: [agi] Computing's coming Theory of Everything

2008-07-21 Thread Richard Loosemore

Steve Richfield wrote:

Richard,

On 7/21/08, *Richard Loosemore* <[EMAIL PROTECTED] 
> wrote:


Principal component analysis is not new, it has a long history,

 
Yes, as I have just discovered. What I do NOT understand is why anyone 
bothers with clustering (except through ignorance - my own excuse), 
which seems on its face to be greatly inferior.


and so far it is a very long way from being the basis for a complete
AGI,

 
Maybe not "complete" AGI, but a good chunk of one.


Mercy me!  It is not even a gleam in the eye of something that would be 
half adequate.





let alone a theory of everything in computer science.

 
OK, so that may be a bit of an exaggeration, but nonetheless there looks 
like there is SOMETHING big out there that could potentially do the 
particular jobs that I have outlined.


Is there any concrete reason to believe that this particular PCA
paper is doing something that is some kind of quantum leap beyond
what can be found in the (several thousand?) other PCA papers that
have already been written?

 
Do you have any favorites?


No.  The ones I have seen are not worth a second look.


I have attached an earlier 2006 paper with *_pictures_* of the learned 
transfer functions, which look a LOT like what is seen in a cat's an 
money's visual processing.


... which is so low-level that it counts as peripheral wiring.


Note that in the last section where they consider multi-layer 
applications, that they apparently suggest using *_only one_* PCA layer!


Of course they do:  that is what all these magic bullet people say. 
They can't figure out how to do things in more than one layer, and they 
do not really understand that it is *necessary* to do things in more 
than one layer, so guess what?, they suggest that we not *need* more 
than one layer.


Sigh.  Programmer Error.




To give you an idea of what I am looking for, does the algorithm go
beyond single-level encoding patterns?

 
Many of the articles, including the one above, make it clear that they 
are up against a computing "brick wall". It seems that algorithmic 
honing is necessary to prove whether the algorithms are any good. Hence, 
no one has shown any practical application (yet), though they note that 
JPEG encoding is a sort of grossly degenerative example of their approach.
 
Of course, the present computational difficulties is NO indication that 
this isn't the right and best way to go, though I agree that this is yet 
to be proven.


Hmm... you did not eally answer the question here.




Can it find patterns of patterns, up to arbitrary levels of depth?
 And is there empirical evidence that it really does find a set of
patterns comparable to those found by the human cognitive mechanism,
without missing any obvious cases?

 
Again, take a look at the pictures and provide your own opinion. It 
sounds like you are a LOT more familiar with this than I am.


Bloated claims for the effectiveness of some form of PCA turn up
frequently in cog sci, NN and AI.  It can look really impressive
until you realize how limited and non-extensible it is.

 
Curiously, there were NO such claims in any of these articles. Just lots 
of murky math. The attached article is the least opaque of the bunch. I 
was just pointing out that if this ever really DOES come together, then 
WOW. Further, disparate people seem to be coming up with different 
pieces of the puzzle.
 
Does your response indicate that you are willing to take a shot at 
explaining some of the math murk in more recent articles? I could 
certainly use any help that I can get. So far, it appears that a PCA and 
matrix algebra glossary of terms and abbreviations would go a LONG way 
to understanding these articles. I wonder if one already exists?


I'd like to help (and I could), but do you realise how pointless it is? 
 I have enough other things to do that I am not getting on with 
seriously important tasks, never mind explaining PCA minutiae.



All this brings up another question to consider: Suppose that a magical 
processing method were discovered that did everything that AGIs needed, 
but took WAY more computing power than is presently available. What 
would people here do?

1.  Go work on better hardware.
2.  Work of faster/crummier approximations.
3.  Ignore it completely and look for some other breakthrough.


Steve, you raise a deeply interesting question, at one level, because of 
the answer that it provokes:  if you did not have the computing power to 
prove that the "magical processing method" actually was capable of 
solving the problems of AGI, then you would not be in any position to 
*know* that it was capable of solving the problems if AGI.


Your question answers itself, in other words.




Richard Loosemore






There is a NN parallel in electric circuit simulation programs like 
SPICE. Here, the execution time goes up as the *~_square_* of the 
circuit comple

Re: [agi] Computing's coming Theory of Everything

2008-07-21 Thread Vladimir Nesov
On Mon, Jul 21, 2008 at 10:32 PM, Richard Loosemore <[EMAIL PROTECTED]> wrote:
>
> Steve,
>
> Principal component analysis is not new, it has a long history, and so far
> it is a very long way from being the basis for a complete AGI, let alone a
> theory of everything in computer science.
>
> Is there any concrete reason to believe that this particular PCA paper is
> doing something that is some kind of quantum leap beyond what can be found
> in the (several thousand?) other PCA papers that have already been written?
>
> To give you an idea of what I am looking for, does the algorithm go beyond
> single-level encoding patterns?  Can it find patterns of patterns, up to
> arbitrary levels of depth?  And is there empirical evidence that it really
> does find a set of patterns comparable to those found by the human cognitive
> mechanism, without missing any obvious cases?
>
> Bloated claims for the effectiveness of some form of PCA turn up frequently
> in cog sci, NN and AI.  It can look really impressive until you realize how
> limited and non-extensible it is.
>

Indeed, there are many techniques to perform "flat" clustering, and
some of them work really well. The trick is to use such techniques to
build up levels of representation, from "sensory" perception and up to
the higher concepts, with cross-checks everywhere and goal-directed
dynamics at the core.

-- 
Vladimir Nesov
[EMAIL PROTECTED]
http://causalityrelay.wordpress.com/


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com


Re: [agi] Computing's coming Theory of Everything

2008-07-21 Thread Richard Loosemore


Steve,

Principal component analysis is not new, it has a long history, and so 
far it is a very long way from being the basis for a complete AGI, let 
alone a theory of everything in computer science.


Is there any concrete reason to believe that this particular PCA paper 
is doing something that is some kind of quantum leap beyond what can be 
found in the (several thousand?) other PCA papers that have already been 
written?


To give you an idea of what I am looking for, does the algorithm go 
beyond single-level encoding patterns?  Can it find patterns of 
patterns, up to arbitrary levels of depth?  And is there empirical 
evidence that it really does find a set of patterns comparable to those 
found by the human cognitive mechanism, without missing any obvious cases?


Bloated claims for the effectiveness of some form of PCA turn up 
frequently in cog sci, NN and AI.  It can look really impressive until 
you realize how limited and non-extensible it is.




Richard Loosemore



Steve Richfield wrote:

Y'all,
 
I have long predicted a coming "Theory of Everything" (TOE) in CS that 
would, among other things, be the "secret sauce" that AGI so desperately 
needs. This year at WORLDCOMP I saw two presentations that seem to be 
running in the right direction. An earlier IEEE article by one of the 
authors seems to be right on target. Here is my own take on this...
 
Form:  The TOE would provide a way of unsupervised learning to rapidly 
form productive NNs, would provide a subroutine that AGI programs could 
throw observations into and SIGNIFICANT patterns would be identified, 
would be the key to excellent video compression, and indirectly, would 
provide the "perfect" encryption that nearly perfect compression would 
provide.
 
Some video compression folks in Germany have come up with "Principal 
Component Analysis" that works a little like clustering, only it also 
includes temporal consideration, so that things that come and go 
together are presumed to be related, thereby eliminating the 
"superstitious clustering" problem of static cluster analysis. There is 
just one "catch": This is buried in array transforms and compression 
jargon that baffles even me, a former in-house numerical analysis 
consultant to the physics and astronomy departments of a major 
university. Further, it is computationally intensive.
 
Teaser: Their article is entitled "A new method for Principal Component 
Analysis of high-dimensional data using Compressive Sensing" and applies 
methods that *_benefit_* from having many dimensions, rather than being 
plagued by them (e.g. as in cluster analysis).
 
Enter a retired math professor who has come up with some clever 
"simplifications" (to the computer, but certainly not to me) to make 
these sorts of computations tractable for real-world use. It looks like 
this could be quickly put to use, if only someone could translate this 
stuff from linear algebra to English for us mere mortals. He also 
authored a textbook that Amazon provides peeks into, but in addition to 
its 3-digit price tag, it was also rather opaque.
 
It's been ~40 years since I have had my head into matrix transforms, so 
I have ordered up some books to hopefully help me through it. Is there 
someone here who is fresh in this area who would like to take a shot at 
"translating" some obtuse mathematical articles into English - or at 
least providing a few pages of prosaic footnotes to explain their 
terminology?
 
I will gladly forward the articles that seem to be relevant to anyone 
who wants to take a shot at this.
 
Any takers?
 
Steve Richfield
 

*agi* | Archives  
 | Modify 
 
Your Subscription	[Powered by Listbox] 






---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com