date:20060828




 However, I think that a lossless model can 
reasonably derive this information by observing that p(x, x') is approximately 
equal to p(x) or p(x'). In other words, knowing both x and x' does not 
tell you any more than x or x' alone, or CDM(x, x') ~ 0.5. I think this is 
a reasonable way to model lossy behavior in humans.
How does a lossless model observe that "Jim is 
extremely fat" and "James continues to be morbidly obese"are approximately 
equal? I would assume that it would have to be via the same world model 
that a lossy model would -- which is wy above the bitstream 
level.

Also, I think that going at this via a probability 
model is not the way to go.

 knowing both x and x' does not tell you any 
more than x or x' alone

Can't you rephrase this with the following 
approximately equal phrases:

  You need to discard either x or x' to reach a 
  canonical form, or
  Discarding either x or x' is not a lossy 
  operation?
  
Mark

- Original Message - 
From: "Matt Mahoney" [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Sunday, August 27, 2006 10:32 PM
Subject: Re: [agi] Lossy ** lossless 
compressi
 In showing that compression implies AI, I first make the simplifying 
assumption that everyone shares the same language model. Then I relax that 
assumption and argue that this makes it easier for a machine to pass the Turing 
test.  But I see your point. I argued that a lossless 
model knows everything that a lossy model does, plus more, because the lossless 
model knows p(x) and p(x'), while a lossy model only knows p(x) + p(x'). 
However I missed that the lossy model knows that x and x' are equivalent, while 
the lossless model does not.  However, I think that a lossless 
model can reasonably derive this information by observing that p(x, x') is 
approximately equal to p(x) or p(x'). In other words, knowing both x and 
x' does not tell you any more than x or x' alone, or CDM(x, x') ~ 0.5. I 
think this is a reasonable way to model lossy behavior in 
humans. -- Matt Mahoney, [EMAIL PROTECTED]  
- Original Message  From: Philip Goetz [EMAIL PROTECTED] To: 
agi@v2.listbox.com Sent: 
Sunday, August 27, 2006 9:23:25 PM Subject: Re: [agi] Lossy ** 
lossless compressi  On 8/25/06, Matt Mahoney [EMAIL PROTECTED] 
wrote: As I stated earlier, the fact that there is normal variation 
in human language models makes it easier for a machine to pass the Turing 
test. However, a machine with a lossless model will still outperform one 
with a lossy model because the lossless model has more knowledge. 
 That would be true only if there were one correct language model, 
AND you knew what it was. Besides which, every human has a lossy 
model. It seems to me that by your argument, a machine with a 
lossless model would "out-perform" a human, and thus /fail/ the Turing 
test.  - Phil  --- To unsubscribe, 
change your address, or temporarily deactivate your subscription,  
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]--- To 
unsubscribe, change your address, or temporarily deactivate your subscription, 
 please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Lossy ** lossless compressi

2006-08-28 Thread Sampo Etelavuori




On 8/28/06, Mark Waser [EMAIL PROTECTED] wrote:

How does a lossless model observe that Jim is  extremely fat and James continues 
to be morbidly obese are approximately  equal?


Actually I think I just may have invented one possible way to do that using a lossless probabilistic model in my previous email to this list. Did you read it? Anyway, in case you have a hard time figuring it out all by yourself, the idea behind it  can be pretty straigthforwardly generalized to be used with phrases and  thus I think phrases can be observed to be approximately equal if they can occur in pretty much identical contexts and affect the distribution of following words and phrases approximately equivalently. 


---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] AGI open source license

On 8/28/06, Stephen Reed [EMAIL PROTECTED] wrote:
An assumption of mine that can be debated perhaps in aseparate message thread, is that there should beeffectively only one AGI, allowing for a federation ofAGI's contrived to prevent war between them.

I've explained my opinion of the various AI conquer the world memes
before, this probably isn't the thread for a repeat :) However, I think
there might be something to be said for this idea for a completely
different reason.

OpenOffice works well under the GPL because of what it does: everyone
puts their own little OpenOffice installation on their own PC and uses
it for individual-scale tasks, and the GPL lets them do this.

Google wouldn't work at all well under the GPL. Why? Because if
everyone had their own little Google, it would be quite useless [1].
The system's usefulness comes from the fact that there is only one
Google, and it is _big_, in terms of both knowledge and the computing
resources to use that knowledge.

A serious AGI will have to end up making Google look like those '10
PRINT HELLO: GOTO 10' programs we used to write on our childhood
8-bit computers. If everyone just downloads their own copy and tweaks
it separately from everyone else's, the sum total of value generated
will be effectively zero.

Now, I don't think I'd have the license say you must donate CPU cycles
in payment for using this both because I don't see any way to enforce
it and because it would justifiably annoy people who want to e.g. play
with it on a laptop without an Internet connection.

What I would consider doing (haven't thought very much about this so
far, might be flaws in the idea, but I think it's at least worth a
look) if I were going to do open source AGI, is take an idea from GPL
and say: You may do anything you like with this on your own PC, but you
may not _distribute_ an incompatible version. Any modified version that
gets distributed, must seamlessly hook into the network of other copies.

[1] I know there are some companies that have licensed Google to
catalog their intranet stuff, but this is small change by comparison,
and even this doesn't apply to AGI because the latter's knowledge
acquisition will be far less regular and at least for the foreseeable
future far less fully automatable than Google's.

To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] AGI open source license

2006-08-28 Thread Stephen Reed



--- Russell Wallace [EMAIL PROTECTED] wrote:

 A serious AGI will have to end up making Google look
 like those '10 PRINT
 HELLO: GOTO 10' programs we used to write on our
 childhood 8-bit
 computers. 

Agreed.

 If everyone just downloads their own copy
 and tweaks it
 separately from everyone else's, the sum total of
 value generated will be
 effectively zero.

Yes, but suppose the government of China decides to
download an open source AGI and install it on one or
more of their Top 500 supercomputer facilities? 
Certain indivuals may have vast compute resources at
their disposal.

 Now, I don't think I'd have the license say you
 must donate CPU cycles in
 payment for using this both because I don't see any
 way to enforce it and
 because it would justifiably annoy people who want
 to e.g. play with it on a
 laptop without an Internet connection.

Let me elaborate:  I am considering a primary
deployment for my open source project using Jabber, an
open source chat protocal adopted by Google Chat among
others.  Any user could just chat with the AGI.  If
they want the AGI to perform tasks for them on their
own computer, then they would download some components
and be subject to the constraints of the license. 
From public research soliciations I know that the US
CIA is interested in AI to assist their intelligence
analysts.  If they were to download and intall an open
source AGI they would permit policy control from the
unclassified side, e.g. not violate US laws, but no
information could come out of a classified software
component back to the central, completely open AGI. 

 What I would consider doing (haven't thought very
 much about this so far,
 might be flaws in the idea, but I think it's at
 least worth a look) if I
 were going to do open source AGI, is take an idea
 from GPL and say: You may
 do anything you like with this on your own PC, but
 you may not _distribute_
 an incompatible version. Any modified version that
 gets distributed, must
 seamlessly hook into the network of other copies.

Exactly, how to detect evil modifications is a safety
issue.


 ---
 To unsubscribe, change your address, or temporarily
 deactivate your subscription, 
 please go to

http://v2.listbox.com/member/[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] AGI open source license

On 8/28/06, Stephen Reed [EMAIL PROTECTED] wrote:
Yes, but suppose the government of China decides todownload an open source AGI and install it on one ormore of their Top 500 supercomputer facilities?
Suppose the government of China decide to get hold of CAD, simulation
software etc and install it on their computers and use it for designing
bombs and missiles? Well then they can do that, and indeed doubtless
they have. Same answer.

How do we regulate the use of computers for nefarious deeds today? Well
mostly we don't, and when we try (e.g. the American government with
cryptography), it doesn't work and the attempt does far more harm than
good. What we do, by and large, is not bother - instead, we just outlaw
the nefarious deeds themselves, regardless of the tools that were used.
I think this will remain true in the future (or if it doesn't, we'll be
in big trouble).

That having been said, if you're serious about preventing the abuse of
your software, I think the only answer is, don't distribute it. Follow
the path of Novamente and indeed Google themselves (albeit for
different reasons) and keep the software on your own machines and sell
the services it provides.

That might be a better route with regard to resources. It's not clear
to me whether an open source AGI project relying on donated manpower
and computing power could obtain enough of those. Then again, maybe it
could; I don't really know either way.

To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] AGI open source license

On 8/28/06, Stephen Reed [EMAIL PROTECTED] wrote:
I assume that you fully understand the benefits andbusiness case of an open source project, and that yourpoint is made even with the former fully considered.
Yes. For that matter, my answer would be the same if you proposed a
closed source project that sold a binary distribution like Microsoft
Office. 
I would respond to the proprietary AGI alternativewith the observation that one may suppose, as do I,
that only one AGI is safer than many, possiblyopposing, AGIs.With the proprietary model, therewill be a market for others to enter.On the otherhand an established open source project precludescompetition, 
e.g. only one Wikipedia.

I think safety is maximized by maximizing the probability of successful
development of AGI within whatever time we have available rather than
trying to minimize the probability that one or more AGIs will be
abused, but that is a different question. If minimizing the probability
that an AGI will be abused is your priority, the best approach might be
to try to get there first and remain so far ahead of the competition as
to have a near monopoly, as e.g. IBM did in the mainframe market in its
heyday.

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] AGI open source license


On 28/08/06, Russell Wallace [EMAIL PROTECTED] wrote:

On 8/28/06, Stephen Reed [EMAIL PROTECTED] wrote:
 Google wouldn't work at all well under the GPL. Why? Because if everyone
had their own little Google, it would be quite useless [1]. The system's
usefulness comes from the fact that there is only one Google, and it is
_big_, in terms of both knowledge and the computing resources to use that
knowledge.


But google gets its knowledge from lots of little actors (web page
makers). I suspect the thing that will replace google will get its
information from lots of little AIs each attached to a
person/government or other organisation. While AGI will likely be a
google replacer, it will also be an outlook replacer as well. The
micro scale and the macro.

If the macro AGI can't translate between differences in language or
representation that the micro AGIs have acquired from being open
source, then we probably haven't done our job properly.

 Will Pearson

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] AGI open source license


On 28/08/06, Russell Wallace [EMAIL PROTECTED] wrote:

On 8/28/06, William Pearson [EMAIL PROTECTED] wrote:


 If the macro AGI can't translate between differences in language or
 representation that the micro AGIs have acquired from being open
 source, then we probably haven't done our job properly.


 But I don't think that will. I think that job is impossible to do, or
rather that doing it would require a complete, fully-educated AGI - which is
precisely what we are trying to achieve, so we can't rely on its existence
while we are trying to build it.



I was thinking more long term than you. I agree in the first phase we
can't rely on it being to translate different information from
different AGI. But to start with I wouldn't attempt the google killer,
merely the outlook killer.

We may well not have enough computing resources available to do it on
the cheap using local resources. But that is the approach I am
inclined to take, I'll just wait until we do. The open source
distibuted google killer will have the problem of who decides what
goals the system has/starts with (depending upon your philosophy), and
how to upgrade the collective if the goals were incorrect to start
with. It is also not as amenable to experiment as the micro level
systems are.

Will

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] AGI open source license

On 8/28/06, William Pearson [EMAIL PROTECTED] wrote:
I was thinking more long term than you. I agree in the first phase wecan't rely on it being to translate different information fromdifferent AGI. But to start with I wouldn't attempt the google killer,merely the outlook killer.

Okay, but... 
We may well not have enough computing resources available to do it onthe cheap using local resources. But that is the approach I am
inclined to take, I'll just wait until we do.
Computing power isn't the only issue, and probably not the most
important one; what do you think an Outlook killer could do that
Outlook doesn't already do, and how would it know how to do it?The open source
distibuted google killer will have the problem of who decides whatgoals the system has/starts with (depending upon your philosophy)
Do what the users want you to do.
andhow to upgrade the collective if the goals were incorrect to startwith.

In the case of an open source AGI project, there would be no
requirement that all users form a collective as far as their goals are
concerned, only that they agree on running, maintaining and enhancing
the software to serve their separate goals, just as is the case with
e.g. the Internet today.
It is also not as amenable to experiment as the micro levelsystems are.

True.


To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] AGI open source license

2006-08-28 Thread Bill Hibbard

Hi Stephen,

As a small operation independent of Cyc, distributing
your AGI system as open source is likely to be a good
strategy.

As a small university PI developing visualization
software, distributing my systems as open source turned
out to be very good for my project. Our collaborators
and customers came to us out of the blue, without any
need for a marketing department we couldn't afford.
There were very few negatives (we did have a developer
in China who wanted me to pay them to get access to
some enhanccements they had made, but I simply declined).

In my first major system, Vis5D, there was a problem
with divergent versions. We were able to work with
developers to unify the most important versions, but
it was hard work and there were still numerous
divergent versions. For another major system, VisAD, I
specifically designed it a high level of abstraction
and with classes designed to be extended to allow
developers to make low level changes, and so far there
have not been divergent versions. I am a bit skeptical
whether legal wording in the license will restrain
developers from making divergent versions - as a small
operation, are you really prepared to take violators to
court? But if your design makes divergence less necesary,
most developers will see the advantage of a unified
version that permits sharing.

By open source distribution you are expressing optimism
about human nature, and your developer community will
mostly justify that optimism. The best approach for the
few who disappoint you is to simply ignore them.

Good luck,
Bill

On Mon, 28 Aug 2006, Stephen Reed wrote:

 I would appreciate comments regarding additional
 constraints, if any, that should be applied to a
 traditional open source license to achieve a free but
 safe widespread distribution of software that may lead
 to AGI.

 As background, I was recently layed off by Cycorp, the
 creators of the Cyc knowledge base, and I am taking
 this opportunity to pursue my own AGI ideas full time.
 Although I am a ResearchCyc licensee I am considering
 a roadmap leading to a completely open source AGI.

 An assumption that some may challenge is that AGI
 software should be free in the first place.  I think
 that this approach has proved useful for both software
 (e.g. MySQL database) and knowledge (Wikipedia).
 Could additional terms and conditions for an AGI open
 source license retain these benefits yet be safe?

 The leading GNU Public License forbids any further
 constraints beyond its own terms so lets think about
 the Apache Software License (ASL).

 http://www.apache.org/licenses/LICENSE-2.0.html

 Here is a key clause:

 Subject to the terms and conditions of this License,
 each Contributor hereby grants to You a perpetual,
 worldwide, non-exclusive, no-charge, royalty-free,
 irrevocable copyright license to reproduce, prepare
 Derivative Works of, publicly display, publicly
 perform, sublicense, and distribute the Work and such
 Derivative Works in Source or Object form.

 An assumption of mine that can be debated perhaps in a
 separate message thread, is that there should be
 effectively only one AGI, allowing for a federation of
 AGI's contrived to prevent war between them.

 A second assumption is that the existing legal
 structure, in particular license enforcement
 throughout the world, can handle an open source AGI.

 What about an AGI open source license, similar to the
 above ASL in which the user must, to comply with the
 license, federate their downloaded AGI with the
 existing AGI system and thus subordinate it to ethic,
 legal and safety controls previously established?

 Governance of a open source distributed AGI, with
 users who could be citizens of enemy countries, is an
 issue that might be addressed by license terms and
 conditions - any thoughts?

 Cheers.
 -Steve

 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com

 ---
 To unsubscribe, change your address, or temporarily deactivate your 
 subscription,
 please go to http://v2.listbox.com/member/[EMAIL PROTECTED]


---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] AGI open source license

On 8/28/06, Bill Hibbard [EMAIL PROTECTED] wrote:
By open source distribution you are expressing optimismabout human nature, and your developer community willmostly justify that optimism. The best approach for thefew who disappoint you is to simply ignore them.

I agree. When I suggested a no incompatible versions clause in the
license, I wasn't thinking in terms of then you can fight and win lots
of court cases!; Linus Torvalds, Guido van Rossum et al haven't had to
do that after all. I think that, as in the case of the GPL, most people
would respect the terms of the license without having to be coerced;
and I agree that the first line of defense against incompatible forking
is to design the architecture such that incompatible forks aren't
needed.

(This is different from the question of what if [insert favorite bad
guys] use it for nefarious purposes. I still think the only way to
guarantee that doesn't happen is to never let any copies of the code
out of your grasp.)


To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] AGI open source license

2006-08-28 Thread Stephen Reed



--- Russell Wallace [EMAIL PROTECTED] wrote:

 On 8/28/06, Bill Hibbard [EMAIL PROTECTED] wrote:
 
  By open source distribution you are expressing
 optimism
  about human nature, and your developer community
 will
  mostly justify that optimism. The best approach
 for the
  few who disappoint you is to simply ignore them.
 
 
 I agree. When I suggested a no incompatible
 versions clause in the
 license, I wasn't thinking in terms of then you can
 fight and win lots of
 court cases!; Linus Torvalds, Guido van Rossum et
 al haven't had to do that
 after all. I think that, as in the case of the GPL,
 most people would
 respect the terms of the license without having to
 be coerced; and I agree
 that the first line of defense against incompatible
 forking is to design the
 architecture such that incompatible forks aren't
 needed.
 
 (This is different from the question of what if
 [insert favorite bad guys]
 use it for nefarious purposes. I still think the
 only way to guarantee that
 doesn't happen is to never let any copies of the
 code out of your grasp.)

Thanks for the clarification.  Now I see how to
integrate your thinking with my own.  

-Steve

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] AGI open source license


On 28/08/06, Russell Wallace [EMAIL PROTECTED] wrote:

On 8/28/06, William Pearson [EMAIL PROTECTED] wrote:



 We may well not have enough computing resources available to do it on
 the cheap using local resources. But that is the approach I am
 inclined to take, I'll just wait until we do.


 Computing power isn't the only issue, and probably not the most important
one; what do you think an Outlook killer could do that Outlook doesn't
already do, and how would it know how to do it?


Things like hooking it up to low quality sound video feeds and have it
judge by posture/expression/time of day what the most useful piece of
information in the RSS feeds/email etc to provide to the user is. We
would have to program a large amounts of the behaviour to start with,
but also by the dynamics and mechanism we create it would get more of
an information about what the individual user wanted.


 The open source
 distibuted google killer will have the problem of who decides what
 goals the system has/starts with (depending upon your philosophy)


 Do what the users want you to do.


Hmm. Possibly what we are talking about is not so different.



 and
 how to upgrade the collective if the goals were incorrect to start
 with.


 In the case of an open source AGI project, there would be no requirement
that all users form a collective as far as their goals are concerned, only
that they agree on running, maintaining and enhancing the software to serve
their separate goals, just as is the case with e.g. the Internet today.


Wouldn't interoperability be maintained by the same sort of pressures
that mean that everyones tweaked version of Open Office shares the
same file formats? The fact that the first mover that is incompatible
loses then benefits from remaining compatible?

Will

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] AGI open source license

On 8/28/06, William Pearson [EMAIL PROTECTED] wrote:
Things like hooking it up to low quality sound video feeds and have itjudge by posture/_expression_/time of day what the most useful piece ofinformation in the RSS feeds/email etc to provide to the user is. Wewould have to program a large amounts of the behaviour to start with,
but also by the dynamics and mechanism we create it would get more ofan information about what the individual user wanted.
Hmm... okay... it's not obvious to me that would be useful, but maybe
it would. The nice thing about being a pessimist, one's surprises are
more likely to be pleasant ones. Surprise me ^.^ 
Wouldn't interoperability be maintained by the same sort of pressuresthat mean that everyones tweaked version of Open Office shares the
same file formats? The fact that the first mover that is incompatibleloses then benefits from remaining compatible?

Yes, I would rely primarily on such incentives to maintain compatibility.

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: Sampo [agi] Lossy ** lossless compressi

Actually I think I just may have invented one possible way to do that 
using a lossless probabilistic model in my previous email to this list. 
Did you read it?


:-)  I read it.  I think that you have to be in a perfect world situation 
for what you propose to be feasible (i.e. it requires seeing the phrases in 
pretty much identical contexts -- which you have to recognize as 
identical -- and then being able to tell that that they affect the 
distribution of following words and phrases approximately equivalently --  
which is an equally large problem).  I'm afraid that it *really* does not 
look like a *feasible* solution at all (to me -- as I said, I think that 
going at all this via a probability model is not the way to go since it 
entails getting and analyzing those probabilities from far less data than I 
think is necessary for those operations).


   Mark

- Original Message - 
From: Sampo Etelavuori [EMAIL PROTECTED]

To: agi@v2.listbox.com
Sent: Monday, August 28, 2006 8:56 AM
Subject: **SPAM** Re: [agi] Lossy ** lossless compressi





On 8/28/06, Mark Waser [EMAIL PROTECTED] wrote:
How does a lossless model observe that Jim is  extremely fat and James 
continues to be morbidly obese are approximately  equal?


Actually I think I just may have invented one possible way to do that 
using a lossless probabilistic model in my previous email to this list. 
Did you read it? Anyway, in case you have a hard time figuring it out all 
by yourself, the idea behind it  can be pretty straigthforwardly 
generalized to be used with phrases and  thus I think phrases can be 
observed to be approximately equal if they can occur in pretty much 
identical contexts and affect the distribution of following words and 
phrases approximately equivalently.

---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, please go to 
http://v2.listbox.com/member/[EMAIL PROTECTED]





---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] AGI open source license


On 28/08/06, Russell Wallace [EMAIL PROTECTED] wrote:

On 8/28/06, William Pearson [EMAIL PROTECTED] wrote:

 Things like hooking it up to low quality sound video feeds and have it
 judge by posture/expression/time of day what the most useful piece of
 information in the RSS feeds/email etc to provide to the user is. We
 would have to program a large amounts of the behaviour to start with,
 but also by the dynamics and mechanism we create it would get more of
 an information about what the individual user wanted.


 Hmm... okay... it's not obvious to me that would be useful, but maybe it
would. The nice thing about being a pessimist, one's surprises are more
likely to be pleasant ones. Surprise me ^.^


Possibly I am not explaining things clearly enough. One of my
motivations for developing AI, apart from the challenge, is to enable
me to get the information I need, when I need it.

As a lot of the power I have in this world is through what I buy, I
need to have this information available when I might buy something,
which may be when I am in social situations etc. I can be a lot better
ethical consumer with the the details I need at the right time given
to me. As such I am interested in wearable and ubiquitous computing.
Due to the constraints wearable computer place upon the designer, you
really want the correct information given to you and nothing else that
may distract the user unnecessarily.

Knowing what the correct information is will entail knowing about the
user and the uses current environment. Whether they rate energy
efficiency or CO2 emissions as a priority, for example. It will also
entail the google like system you are focused upon.

I also think that a system designed to understand our body
language/gestures/moods will also be able to be more easily and
naturally trained as it has more information coming in about what we
want and we will not have to be so explicit in our instructions.

I'm also a pessimist in that I don't think an era of light will entail
just because AI is invented, but I hope it will allow the few people
that care to close the information gap that exists between producers
and consumers. Or the government and the populace for that matter. And
provide an economy marginally closer to what is promised by free
market theory.

You have hinted at the normative value of AI, I'm curious what you
find it to be? Is it simply to speed up technological development so
that we can escape the gravity well?

 Will

 Will

---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] AGI open source license

On 8/28/06, William Pearson [EMAIL PROTECTED] wrote:
Possibly I am not explaining things clearly enough. One of mymotivations for developing AI, apart from the challenge, is to enableme to get the information I need, when I need it.As a lot of the power I have in this world is through what I buy, I
need to have this information available when I might buy something,which may be when I am in social situations etc. I can be a lot betterethical consumer with the the details I need at the right time given
to me. As such I am interested in wearable and ubiquitous computing.Due to the constraints wearable computer place upon the designer, youreally want the correct information given to you and nothing else thatmay distract the user unnecessarily.

Ah, so you see this on a wearable... okay... that makes a bit more
sense, and also of what you said earlier about computing power, since
wearables are much more constrained in that regard than desktops.
Knowing what the correct information is will entail knowing about theuser and the uses current environment. Whether they rate energy
efficiency or CO2 emissions as a priority, for example. It will alsoentail the google like system you are focused upon.
I should clarify: I think competing with Google in the search market is
a losing proposition, that's already wrapped up; I'd look for new
markets that nobody is serving well today. I use it only as an example
of a software system that needs a lot of knowledge and computing power
and is therefore run on a central rather than local basis.
You have hinted at the normative value of AI, I'm curious what youfind it to be? Is it simply to speed up technological development so
that we can escape the gravity well?

Break the boundaries of space and time that currently apply to human life. Specifically:

1) Escape the gravity well. Or more precisely, we can already do that,
but we can't live anywhere other than Earth, because the number of
tasks that need to be carried out to keep a person alive for a year
vastly exceeds the number of things a person can do in a year. Cracking
that complexity barrier needs qualitative technological advances.

2) Stop or at least slow down the loss of fifty-plus million lives per
year. That's a matter both of developing the hardware tools to work
proficiently at the molecular level (i.e. some form of nanotechnology)
and again the software tools to handle the complexity.

To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Lossy ** lossless compressi

2006-08-28 Thread Matt Mahoney

On 8/28/06, Mark Waser  wrote: 
How does a lossless model observe that Jim is  extremely fat and James  
continues to be morbidly obese are approximately  equal? 
 
I realize this is far beyond the capabilities of current data compression 
programs, which typically predict the next byte in the context of the last few 
bytes using learned statistics.  Of course we must do better.  The model has to 
either know, or be able to learn, the relationships between Jim and James, 
is and continues to be, fat and obese, etc.  I think a 1 GB corpus is 
big enough to learn most of this knowledge using statistical methods. 
 
C:\res\data\wikigrep -c . enwik9 
 File enwik9: 
 10920493 lines match 
 enwik9: grep: input lines truncated - result questionable 
  
C:\res\data\wikigrep -i -c  fat  enwik9 
 File enwik9: 
 1312 lines match 
 enwik9: grep: input lines truncated - result questionable 
  
 C:\res\data\wikigrep -i -c  obese  enwik9 
File enwik9: 
111 lines match 
enwik9: grep: input lines truncated - result questionable 
 
C:\res\data\wikigrep -i  obese  enwik9 |grep -c  fat  
 File STDIN: 
 14 lines match 
  
So we know that obese occurs in about 0.001% of all paragraphs, but in 1% of 
paragraphs containing fat.  This is an example of a distant bigram model, 
which has been shown to improve word perplexity in offline models [1].  We can 
improve on this method using e.g. latent semantic analysis [2] to exploit the 
transitive property of semantics: if A appears near (means) B and B appears 
near C, then A predicts C. 
 
Likewise, syntax is learnable.  For example, if you encounter the X is you 
know that X is a noun, so you can predict a X was or Xs rather than he X 
or Xed.  This type of knowledge can be exploited using similarity modeling 
[3] to improve word preplexity.  (Thanks to Rob Freeman for pointing me to 
this).

Let me give one more example using the same learning mechanism by which syntax 
is learned:

All men are mortal.  Socrates is a man.  Therefore Socrates is mortal.
All insects have 6 legs.  Ants are insects.  Therefore ants have 6 legs.

Now predict: All frogs are green.  Kermit is a frog.  Therefore...


[1] Rosenfeld, Ronald, A Maximum Entropy Approach to Adaptive Statistical 
Language Modeling, Computer, Speech and Language, 10, 1996. 
 
[2] Bellegarda, Jerome R., Speech recognition experiments using multi-span 
statistical language models, IEEE Intl. Conf. on Acoustics, Speech, and Signal 
Processing, 717-720, 1999. 
 
[3] Ido Dagan, Lillian Lee, Fernando C. N. Pereira, Similarity-Based Models of 
Word Cooccurrence Probabilities, Machine Learning, 1999.   
http://citeseer.ist.psu.edu/dagan99similaritybased.html 
  
-- Matt Mahoney, [EMAIL PROTECTED] 
 
 


---
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Lossy ** lossless compressi

I think a 1 GB corpus is big enough to learn most of this knowledge using 
statistical methods.
So we know that obese occurs in about 0.001% of all paragraphs, but in 
1% of paragraphs containing fat.


OK.  Now try obese and morbidly or obese and clinically.  I suspect 
that you are far more likely to statistically end up with obese being some 
form a disease (that being the context where is normally used) than it is to 
end up as fat.  Statistical methods get absolutely trashed when you start 
switching contexts unless they can tell (or more likely, are told) that 
you've switched contexts.  They are great at pulling context-specific 
clusters out of specific contexts but unless you get cross-context 
explanatory data (that you'll probably interpret with other than 
statistical methods -- see next section), I don't believe that statistical 
methods will recognize obese and fat as synonyms.


Likewise, syntax is learnable.  For example, if you encounter the X is 
you know that X is a noun, so you can predict a X was or Xs rather 
than he X or Xed.  This type of knowledge can be exploited using 
similarity modeling [3] to improve word preplexity.
Let me give one more example using the same learning mechanism by which 
syntax is learned:

All men are mortal.  Socrates is a man.  Therefore Socrates is mortal.
All insects have 6 legs.  Ants are insects.  Therefore ants have 6 legs.
Now predict: All frogs are green.  Kermit is a frog.  Therefore...


This isn't a statistical method (see other than statistical methods above 
:-).


= = = = =

So -- No, I *don't* believe that the 1GB corpus is big enough to learn most 
of this knowledge *USING STATISTICAL METHODS*.  I *do* believe that it is 
large enough for other methods though.



- Original Message - 
From: Matt Mahoney [EMAIL PROTECTED]

To: agi@v2.listbox.com
Sent: Monday, August 28, 2006 3:37 PM
Subject: Re: [agi] Lossy ** lossless compressi



On 8/28/06, Mark Waser  wrote:

How does a lossless model observe that Jim is  extremely fat and James
continues to be morbidly obese are approximately  equal?


I realize this is far beyond the capabilities of current data compression 
programs, which typically predict the next byte in the context of the last 
few bytes using learned statistics.  Of course we must do better.  The 
model has to either know, or be able to learn, the relationships between 
Jim and James, is and continues to be, fat and obese, etc.  I 
think a 1 GB corpus is big enough to learn most of this knowledge using 
statistical methods.


C:\res\data\wikigrep -c . enwik9
File enwik9:
10920493 lines match
enwik9: grep: input lines truncated - result questionable

C:\res\data\wikigrep -i -c  fat  enwik9
File enwik9:
1312 lines match
enwik9: grep: input lines truncated - result questionable

C:\res\data\wikigrep -i -c  obese  enwik9
File enwik9:
111 lines match
enwik9: grep: input lines truncated - result questionable

C:\res\data\wikigrep -i  obese  enwik9 |grep -c  fat 
File STDIN:
14 lines match

So we know that obese occurs in about 0.001% of all paragraphs, but in 
1% of paragraphs containing fat.  This is an example of a distant bigram 
model, which has been shown to improve word perplexity in offline models 
[1].  We can improve on this method using e.g. latent semantic analysis 
[2] to exploit the transitive property of semantics: if A appears near 
(means) B and B appears near C, then A predicts C.


Likewise, syntax is learnable.  For example, if you encounter the X is 
you know that X is a noun, so you can predict a X was or Xs rather 
than he X or Xed.  This type of knowledge can be exploited using 
similarity modeling [3] to improve word preplexity.  (Thanks to Rob 
Freeman for pointing me to this).


Let me give one more example using the same learning mechanism by which 
syntax is learned:


All men are mortal.  Socrates is a man.  Therefore Socrates is mortal.
All insects have 6 legs.  Ants are insects.  Therefore ants have 6 legs.

Now predict: All frogs are green.  Kermit is a frog.  Therefore...


[1] Rosenfeld, Ronald, A Maximum Entropy Approach to Adaptive Statistical 
Language Modeling, Computer, Speech and Language, 10, 1996.


[2] Bellegarda, Jerome R., Speech recognition experiments using 
multi-span statistical language models, IEEE Intl. Conf. on Acoustics, 
Speech, and Signal Processing, 717-720, 1999.


[3] Ido Dagan, Lillian Lee, Fernando C. N. Pereira, Similarity-Based 
Models of Word Cooccurrence Probabilities, Machine Learning, 1999. 
http://citeseer.ist.psu.edu/dagan99similaritybased.html


-- Matt Mahoney, [EMAIL PROTECTED]




---
To unsubscribe, change your address, or temporarily deactivate your 
subscription,

please go to http://v2.listbox.com/member/[EMAIL PROTECTED]




---
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] AGI open source license