Re: [agi] Computing's coming Theory of Everything

Abram Demski Wed, 23 Jul 2008 08:14:51 -0700

Replying in reverse order....

> Story: I once viewed being able to invert the Airy Disk transform (what
> makes a blur from a point of light in a microscope or telescope) as an
> EXTREMELY valuable thing to do to greatly increase their power, so I set
> about finding a transform function. Then, I wrote a program to test it,
> first making an Airy Disk blur and then transforming it back to the original
> point. It sorta worked, but there was lots of computational noise in the
> result, so I switched to double precision, whereupon it failed to work at
> all. After LOTS more work, I finally figured out that the Airy Disk function
> was a perfect spacial low pass filter, so that two points that were too
> close to be resolved as separate points made EXACTLY the same perfectly
> circular pattern as did a single point of the same total brightness. In
> single precision, I was inverting the computational noise, and doing a
> pretty good job of it. However, for about a month, I thought that I had
> changed the world.


Neat. I have a professor who is doing some stuff with a similar
transform, but with a circle (/sphere) rather than a disc (/ball). I
thought it was information-preserving? Wouldn't two dots make
something of an oval?

>> Yet, if we take
>> the multilevel approach, the 2nd level will be trying to take
>> advantage of dependencies in those variables...
>
>
> Probably not linear dependencies because these should have been wrung out in
> the previous level. Hopefully, the next layer would look at time sequencing,
> various combinations, etc.

Well, since I am not thinking of the algorithm as-is, I assumed that
it would be finding more than just linear dependencies. And if each
layer was linear, then wouldn't it still fail for the same reason?
(Because it would be looking for linear dependencies in variables that
are linearly independent, just as I had argued that it would be
looking for nonlinear dependence in nonlinearly dependent variables?)
In other words, the successive layers would need to be actually
different from eachother (perhaps adding in time-information as you
suggested) to do anything useful. So again what we are looking for is
a useful division of the task into subtasks.

>> Hmm... the algorithm for a single level would need to "subtract" the
>> information encoded in the new variable each time, so that the next
>> iteration is working with only the still-unexplained properties of the
>> data.
>
>
> (Taking another puff) Unfortunately, PCA methods produce amplitude
> information but not phase information. This is a little like indefinite
> integration, where you know what is there, but not enough to recreate it.
>
> Further, maximum information channels would seem to be naturally orthogonal,
> so subtracting, even if it were possible, is probably unnecessary.

Yep, this is my point, I was just saying it a different way. Since
maximum information channels should be orthogonal, the algorithm needs
to do *something* like subtracting. (For example, if we are
compressing a bunch of points that nearly fall on a line, we should
first extract a variable telling us where on the line. We should then
remove that dimension from the data, so that we've got just a patch of
fuzz. Any significant variables in the fuzz will be independent of
line-location, because if they were not we would have caught them on
the first extraction. So then we extract the remaining variables from
this fuzz rather than the original data.)

>> It is not even capable of
>> representing context-free patterns (for example, pictures of
>> fractals).
>
>
> Can people do this?

Yes, yes absolutely. Not in the visual cortex maybe, at least not in
the "lower" regions, but people can see the pattern at some level. I
can prove this by drawing the sierpinski triangle for you.

The issue is which "invariant transforms" are supported by the system.
For example, the unaltered algorithm might not support
location-invariance in a picture, so people might add "eye-movements"
to the algorithm, making it slide around taking many sub-picture
samples. Next, people might want size-invariance, then
rotation-invariance. These three together might seem to cover
everything, but they do not. First, we've thrown out possibly useful
information along the way; people can ignore size sometimes, but it is
sometimes important, and even more so for rotation and location.
Second, more complicated types of invariance can be learned; there is
really an infinite variety. This is why relational methods are
necessary: they can see things from the beginning as both in a
particular location, and as being in a relationship to surroundings
that is location-independent. The same holds for size if we add the
proper formulas.  (Hmm... I admit that current relational methods
can't so easily account for rotation invariance... it would be
possible but very expensive...)

>> Such systems might produce some good results, but the formalism cannot
>> represent complex relational ideas.
>
>
> All you need is a model, any model, capable of representing reality and its
> complex relationships. I would think that simple cause-and-effect might
> suffice, where events cause other events, that in turn cause still other
> events. With a neuron or PCA coordinate for each prospective event, I could
> see things coming together. The uninhibited neurons (or PAC coordinates) in
> the last layer would be the possible present courses of action. Stimulate
> them all and the best will inhibit the rest, and the best course of action
> will take place.
>>

Hidden causes happen to be a turing-complete formalism, so sure. And
if we add the temporal dimension, this becomes really relational,
since the temporal cause-effect chain can stretch backward and forward
and become a nonbounded computation. But I am skeptical about teasing
this behavior from the current algorithms.

On Tue, Jul 22, 2008 at 7:04 PM, Steve Richfield
<[EMAIL PROTECTED]> wrote:
> Abram,
>
> All good points. Detailed comments follow. First I must take a LONG drag,
> because I must now blow a lot of smoke...
>
> On 7/22/08, Abram Demski <[EMAIL PROTECTED]> wrote:
>>
>> On Tue, Jul 22, 2008 at 4:29 PM, Steve Richfield
>> <[EMAIL PROTECTED]> wrote:
>> > Abram,
>> >
>> > On 7/22/08, Abram Demski <[EMAIL PROTECTED]> wrote:
>> >>
>> >> From the paper you posted, and from wikipedia articles, the current
>> >> meaning of PCA is very different from your generalized version. I
>> >> doubt the current algorithms would even metaphorically apply...
>> >
>> >
>> > Just more input points that are time-displaced from the present points,
>> > or
>> > alternatively in simple cases, compute with the derivative of the inputs
>> > rather than with their static value.
>>
>> Such systems might produce some good results, but the formalism cannot
>> represent complex relational ideas.
>
>
> All you need is a model, any model, capable of representing reality and its
> complex relationships. I would think that simple cause-and-effect might
> suffice, where events cause other events, that in turn cause still other
> events. With a neuron or PCA coordinate for each prospective event, I could
> see things coming together. The uninhibited neurons (or PAC coordinates) in
> the last layer would be the possible present courses of action. Stimulate
> them all and the best will inhibit the rest, and the best course of action
> will take place.
>>
>> It is not even capable of
>> representing context-free patterns (for example, pictures of
>> fractals).
>
>
> Can people do this?
>>
>> Of course, I'm referring to PCA "as it is", not "as it
>> could be".
>>
>> >>
>> >> Also, what would "multiple layers" mean in the generalized version?
>> >
>> >
>> > Performing the PC-like analysis on the principal components derived in a
>> > preceding PC-like analysis.
>>
>> If this worked, it would be another way of trying to break up the task
>> into subtasks. It might help, I admit. It has an intuitive feel; it
>> fits the idea of there being levels of processing in the brain. But if
>> it helps, why?
>
>
> Maybe we are just large data reduction engines?
>>
>> What clean subtask-division is it relying on?
>
>
> As I have pointed out here many times before, we are MUCH shorter on
> knowledge of reality than we are on CS technology. With this approach, we
> might build AGIs without even knowing how they work.
>>
>> The idea
>> of iteratively compressing data by looking for the highest-information
>> variable repeatedly makes sense to me, it is a clear subgoal. But what
>> is the subgoal here?
>>
>> Hmm... the algorithm for a single level would need to "subtract" the
>> information encoded in the new variable each time, so that the next
>> iteration is working with only the still-unexplained properties of the
>> data.
>
>
> (Taking another puff) Unfortunately, PCA methods produce amplitude
> information but not phase information. This is a little like indefinite
> integration, where you know what is there, but not enough to recreate it.
>
> Further, maximum information channels would seem to be naturally orthogonal,
> so subtracting, even if it were possible, is probably unnecessary.
>>
>> The variables then should be independent, right?
>
>
> To the extent that they are not independent, they are not orthogonal, and
> less information is produced.
>>
>> Yet, if we take
>> the multilevel approach, the 2nd level will be trying to take
>> advantage of dependencies in those variables...
>
>
> Probably not linear dependencies because these should have been wrung out in
> the previous level. Hopefully, the next layer would look at time sequencing,
> various combinations, etc.
>>
>> Perhaps this will work due to inaccuracies in the algorithm, caused by
>> approximate methods. The task of the higher levels, then, is to
>> correct for the approximations.
>
>
> This isn't my (blurred?) vision.
>>
>> But if this is their usefulness, then
>> it needs to be shown that they are capable of it. After all, they will
>> be running the same sort of approximation. It is possible that they
>> will therefore miss the same sorts of things. So, we need to be
>> careful in defining multilevel systems.
>
>
> Story: I once viewed being able to invert the Airy Disk transform (what
> makes a blur from a point of light in a microscope or telescope) as an
> EXTREMELY valuable thing to do to greatly increase their power, so I set
> about finding a transform function. Then, I wrote a program to test it,
> first making an Airy Disk blur and then transforming it back to the original
> point. It sorta worked, but there was lots of computational noise in the
> result, so I switched to double precision, whereupon it failed to work at
> all. After LOTS more work, I finally figured out that the Airy Disk function
> was a perfect spacial low pass filter, so that two points that were too
> close to be resolved as separate points made EXACTLY the same perfectly
> circular pattern as did a single point of the same total brightness. In
> single precision, I was inverting the computational noise, and doing a
> pretty good job of it. However, for about a month, I thought that I had
> changed the world.
>
> I also once had a proof for Fermat's Last Theorem that lasted about a week
> rattling around the math department of a major university.
>
> Hence, you are preaching to the choir regarding care in approach. I have
> already run down my fair share of blind alleys.
>
> Steve Richfield
> ==============
>>
>> >>> On Tue, Jul 22, 2008 at 2:58 PM, Steve Richfield
>> >> <[EMAIL PROTECTED]> wrote:
>> >> > Abram,
>> >> >
>> >> > On 7/22/08, Abram Demski <[EMAIL PROTECTED]> wrote:
>> >> >>
>> >> >> "Problem Statement: What are the optimal functions, derived from
>> >> >> real-world observations of past events, the timings of their comings
>> >> >> and goings, and perhaps their physical association, to extract each
>> >> >> successive parameter containing the maximum amount of information
>> >> >> (in
>> >> >> a Shannon sense) usable in reconstructing the observed inputs."
>> >> >>
>> >> >> I see it now! It is typically very useful to decompose a problem
>> >> >> into
>> >> >> sub-problems that can be solved either independently or with simple
>> >> >> well-defined interaction. What you are proposing is such a
>> >> >> decomposition, for the very general problem of compression. "Find an
>> >> >> encoding scheme for the data in dataset X that minimizes the number
>> >> >> of
>> >> >> bits we need" can be split into subproblems of the form "find a
>> >> >> meaning for the next N bits of an encoding that maximizes the
>> >> >> information they carry". The general problem can be solved by
>> >> >> applying
>> >> >> a solution to the simpler problem until the data is completely
>> >> >> compressed.
>> >> >
>> >> >
>> >> > Yes, we do appear to be on the same page here. The challenge is that
>> >> > there
>> >> > seems to be a prevailing opinion that these don't :stack" into
>> >> > multi-level
>> >> > structures. The reason that this hasn't been tested seems obvious
>> >> > from
>> >> > the
>> >> > literature - computers are now just too damn slow, but people here
>> >> > seem
>> >> > to
>> >> > think that there is another more basic reason, like it doesn't work.
>> >> > I
>> >> > don't
>> >> > understand this argument either.
>> >> >
>> >> > Richard, perhaps you could explain?
>> >> >>
>> >> >> "However, it still fails to consider temporal clues, unless of
>> >> >> course
>> >> >> you just consider these to be another dimension."
>> >> >>
>> >> >> Why does this not count as a working solution?
>> >> >
>> >> >
>> >> > It might be. Note that delays from axonal transit times could quite
>> >> > easily
>> >> > and effectively present inputs "flat" with time presented as just
>> >> > another
>> >> > dimension. Now, the challenge of testing a theory with an additional
>> >> > dimension, that already clogs computers without the additional
>> >> > dimension.
>> >> > Ugh. Any thoughts?
>> >> >
>> >> > Perhaps I should write this up and send it to the various people
>> >> > working
>> >> > in
>> >> > this area. Perhaps people with the present test beds could find a way
>> >> > to
>> >> > test this, and the retired math professor would have a better idea as
>> >> > to
>> >> > exactly what needed to be optimized.
>> >> >
>> >> > Steve Richfield
>> >> > =================
>> >> >>
>> >> >> On Tue, Jul 22, 2008 at 1:48 PM, Steve Richfield
>> >> >> <[EMAIL PROTECTED]> wrote:
>> >> >> > Ben,
>> >> >> > On 7/22/08, Benjamin Johnston <[EMAIL PROTECTED]> wrote:
>> >> >> >>>
>> >> >> >>> You are confusing what PCA now is, and what it might become. I
>> >> >> >>> am
>> >> >> >>> more
>> >> >> >>> interested in the dream than in the present reality.
>> >> >> >>
>> >> >> >> That is like claiming that multiplication of two numbers is the
>> >> >> >> answer
>> >> >> >> to
>> >> >> >> AGI, and then telling any critics that they're confusing what
>> >> >> >> multiplication
>> >> >> >> is now with what multiplication may become.
>> >> >> >
>> >> >> >
>> >> >> > Restating (not copying) my original posting, the challenge of
>> >> >> > effective
>> >> >> > unstructured learning is to utilize every clue and NOT just go
>> >> >> > with
>> >> >> > static
>> >> >> > clusters, etc. This includes temporal as well as positional clues,
>> >> >> > information content, etc. PCA does some but certainly not all of
>> >> >> > this,
>> >> >> > but
>> >> >> > considering that we were talking about clustering here just a
>> >> >> > couple
>> >> >> > of
>> >> >> > weeks ago, ratcheting up to PCA seems to be at least a step out of
>> >> >> > the
>> >> >> > basement.
>> >> >> >
>> >> >> > I think that perhaps I mis-stated or was misunderstood in my
>> >> >> > "position".
>> >> >> > No
>> >> >> > one has "the answer" yet, but given recent work, I think that
>> >> >> > perhaps
>> >> >> > the
>> >> >> > problem can now be stated. Given a problem statement, it
>> >> >> > (hopefully)
>> >> >> > should
>> >> >> > be "just some math" to zero in on the solution. OK...
>> >> >> >
>> >> >> > Problem Statement: What are the optimal functions, derived from
>> >> >> > real-world
>> >> >> > observations of past events, the timings of their comings and
>> >> >> > goings,
>> >> >> > and
>> >> >> > perhaps their physical association, to extract each successive
>> >> >> > parameter
>> >> >> > containing the maximum amount of information (in a Shannon sense)
>> >> >> > usable
>> >> >> > in
>> >> >> > reconstructing the observed inputs. IMHO these same functions will
>> >> >> > be
>> >> >> > exactly what you need to recognize what is happening in the world,
>> >> >> > what
>> >> >> > you
>> >> >> > need to act upon, which actions will have the most effect on the
>> >> >> > world,
>> >> >> > etc.
>> >> >> > PCA is clearly NOT there (e.g. it lacks temporal consideration),
>> >> >> > but
>> >> >> > seems
>> >> >> > to be a step closer than anything else on the horizon. Hopefully,
>> >> >> > given
>> >> >> > the
>> >> >> > "hint" of PCA, we can follow the path.
>> >> >> >
>> >> >> > You should find an explanation of PCA in any elementary linear
>> >> >> > algebra
>> >> >> > or
>> >> >> > statistics textbook. It has a range of applications (like any
>> >> >> > transform),
>> >> >> > but it might be best regarded as an/the elementary algorithm for
>> >> >> > unsupervised dimension reduction.
>> >> >> >
>> >> >> > Bingo! However, it still fails to consider temporal clues, unless
>> >> >> > of
>> >> >> > course
>> >> >> > you just consider these to be another dimension.
>> >> >> >
>> >> >> > When PCA works, it is more likely to be interpreted as a comment
>> >> >> > on
>> >> >> > the
>> >> >> > underlying simplicity of the original dataset, rather than the
>> >> >> > power
>> >> >> > of
>> >> >> > PCA
>> >> >> > itself.
>> >> >> >
>> >> >> > Agreed, but so far, I haven't seen any solid evidence that the
>> >> >> > world
>> >> >> > is
>> >> >> > NOT
>> >> >> > simple, though it appears pretty complex until you understand it.
>> >> >> >
>> >> >> > Thanks for making me clarify my thoughts.
>> >> >> >
>> >> >> > Steve Richfield
>> >> >> >
>> >> >> > ________________________________
>> >> >> > agi | Archives | Modify Your Subscription
>> >> >>
>> >> >>
>> >> >> -------------------------------------------
>> >> >> agi
>> >> >> Archives: https://www.listbox.com/member/archive/303/=now
>> >> >> RSS Feed: https://www.listbox.com/member/archive/rss/303/
>> >> >> Modify Your Subscription: https://www.listbox.com/member/?&;
>> >> >> Powered by Listbox: http://www.listbox.com
>> >> >
>> >> > ________________________________
>> >> > agi | Archives | Modify Your Subscription
>> >>
>> >>
>> >> -------------------------------------------
>> >> agi
>> >> Archives: https://www.listbox.com/member/archive/303/=now
>> >> RSS Feed: https://www.listbox.com/member/archive/rss/303/
>> >> Modify Your Subscription: https://www.listbox.com/member/?&;
>> >> Powered by Listbox: http://www.listbox.com
>> >
>> > ________________________________
>> > agi | Archives | Modify Your Subscription
>>
>>
>> -------------------------------------------
>> agi
>> Archives: https://www.listbox.com/member/archive/303/=now
>> RSS Feed: https://www.listbox.com/member/archive/rss/303/
>> Modify Your Subscription: https://www.listbox.com/member/?&;
>> Powered by Listbox: http://www.listbox.com
>
> ________________________________
> agi | Archives | Modify Your Subscription


-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=108809214-a0d121
Powered by Listbox: http://www.listbox.com

Re: [agi] Computing's coming Theory of Everything

Reply via email to