subject:"Re\: \[agi\] database access fast enough\?"

Re: Computational requirements of AGI (Re: [agi] database access fast enough?)

2008-04-20 Thread Nikolay Ognyanov

FYI: there is still some way to go by shrinking transistors. From 
current minimum
of 45nm half-pitch down to probably 16nm. Possibly even 11nm but that is 
already
questionable. This will ensure some 5 to 10 more years of Moore's low 
being fueled
by transistor shrinking and roughly an order of magnitude growth of 
performance per
fixed price. 11nm is probably the hard limit for transistor shrinking 
because some
very generic research shows that gates of 5nm or less are really way too 
thin to
prevent electrons from tunneling regardless of exact structure and 
material of the gate.


See e.g. http://en.wikipedia.org/wiki/11_nanometer for more details.

Regards
Nikolay

Steve Richfield wrote:

Matt,
 
A couple of comments your post that I generally agree with...
 
On 4/19/08, *Matt Mahoney* [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


 90% of which are glial cells and not (technically) neurons at
all, though
 all we care about is whether or not they compute.

My understanding is they carry passive signals.

 
The last I heard, the ONLY thing that they know for sure is that when 
they impale them with an electrode, they only see slowly changing 
signals and nothing resembling bistable, spikes, etc. Unknown is 
whether they CAN change rapidly - or perhaps rare rapid changes are 
their important function?! Theories abound for glial cells, e.g. the 
one advanced ~4 years ago in Scientific American, where the author 
asserted that they assisted in the programming of synapses.


 Moore's law presumed a relatively unchanging architecture and
rapidly
 advancing fabrication. This has broken down, now that
transistors can easily
 be made SO small that the electrons jump right over the gates.
Sure there
 will be further developments, e.g. multi-layer, but the easy
stuff that
 Moore's law was build on is now GONE.

Actually Moore's law holds pretty well back to about 1900 if you
consider the
computing power of mechanical adding machines.  (I believe
Kurzweil studied
this).  Moore's law is about the cost of computing, not the size of
transistors.

 
But, until they figure out something besides transistors to make 
computers from, Moore's law has worked in recent decades via 
transistor shrinkage, thereby making them cheaper. My point is that 
they can't shrink any more, so they aren't going to get any cheaper, 
except via slow improvements in methods of manufacturing the same (and 
not smaller/faster) parts.


 The proposed architecture that Josh and I have been discussing
could bring
 this to the market for about the same cost as a PC in a couple
of years -
 with adequate funding.

I've heard that before.

 
NOT using the SAME fabrication equipment! Other proposals involved new 
proposed fabrication technologies.


 2.  Some rich benefactor will step forward and make this happen
over the
 loud objections of millions of devoutly religious.

Nobody has that much money.  AGI will happen because nobody wants
to work for
somebody else.

 
While I agree with you regarding AGI, there are several people who 
could easily afford the 10K processor, or a knowledge-based Internet, 
e.g. Dr. Eliza. These appear to both be necessary as underlying tools 
to make AGI really work, and should both return a really quick profit 
- like in the first year or two.
 
Steve Richfield
 

*agi* | Archives http://www.listbox.com/member/archive/303/=now 
http://www.listbox.com/member/archive/rss/303/ | Modify 
http://www.listbox.com/member/?; 
Your Subscription 	[Powered by Listbox] http://www.listbox.com




--

*Nikolay Ognyanov, PhD*
Chief Technology Officer
*TravelStoreMaker.com Inc.* http://www.travelstoremaker.com/
Phone: +359 2 933 3832
Fax: +359 2 983 6475

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: Computational requirements of AGI (Re: [agi] database access fast enough?)

2008-04-19 Thread Steve Richfield

Matt,

On 4/17/08, Matt Mahoney [EMAIL PROTECTED] wrote:

Before giving my detailed comments, I'd like to comment that people who have
spent decades in wet laboratories know a **LOT** more than they are willing
to write down. Why? THEIR culture is to not write things down until you can
PROVE them with CAPTURED laboratory EVIDENCE. Hence, a researcher may notice
that he must look at ~200 synapses to find one that actually has an efficacy
 0, but since there is no practical way of capturing and proving that 200,
and who is ever sure exactly WHAT their sub-micron electrode is connected to
in living tissue, they don't dare publish these numbers. However, their
culture doesn't seem to prohibit them from TALKING about these things, and
THAT is how I have come up with the numbers that I use - and have absolutely
NO written evidence to suport. However, if you are REALLY interested in some
of them, I could probably put you in contact with someone who would be
willing to TALK about them from first-hand experience.


 The Blue Brain project estimates 8000 synapses per neuron in mouse cortex.


I haven't read the report, but I presume that is a PHYSICAL number developed
from microscopy. Typically, 1% have efficacies  0.

I
 haven't seen a more accurate estimate for humans, so your numbers are
 probably
 as good as mine.


Most came from William Calvin. The contact info on his web site actually
gets to him, so that would be a good place to start for refinement.

I estimate 10^11 neurons,


90% of which are glial cells and not (technically) neurons at all, though
all we care about is whether or not they compute.


 10^15 synapses (1 bit each)


They appear to be performing rather precise analog computation. While there
is a lot of noise in voltage, there is much less noise in current/ions.
Further, either you must come up with interconnections from fractal means,
which means that you need many more synapses, or you must store the
topography, so you'll need a LOT more than 1 bit each. Either way, you'll
have to allow for much more than one bit for the interconnection, plus a lot
more for the characteristics that may also involve time-dependent things
like differentiation (e.g. for temporal adjustments as in antique RTL logic)
and integration (e.g. for averaging to detect low-level phenomena).

and a
 response time of 100 ms, or 10^16 OPS to replicate the processing of a
 human
 brain.


Glial cells constitute90% of the brain and are MUCH slower than this
However, there is a double-pulse mechanism in spiking neurons that provide
millisecond notice of significant events. Hence, more analysis is probably
needed here, but this number could be off by an order of magnitude either
way.


 The memory requirement is considerably higher than the information content
 of
 long term memory estimated by Landauer [1], about 10^9 bits.


Apples and oranges. You are comparing completed memory with
work-in-progress to figure out just what to remember. Computers, of
course, also have large ratios between the space to store data, and the RAM
needed to develop that data.

This may be due
 to the constraints of slow neurons, parallelism, and the pulsed binary
 nature
 of nerve transmission.  For example, the lower levels of visual processing
 in
 the brain involve massive replication of nearly identical spot filters
 which
 could be simulated in a machine by scanning a small filter coefficient
 array
 across the retina.  It also takes large numbers of nerves to represent a
 continuous signal with any accuracy, e.g. fine motor control or
 distinguishing
 nearly identical perceptions.


William Calvin and I had a long-standing argument about this. Finally, we
sat on his pea gravel covered roof and pitch pea gravel at a target while
having our arms blocked at various points by the other person. This was to
separate this theory from mine that motions are a sort of successive
approximation, where groups of neurons watch what we are doing and send
corrective signals. If the massive theory was correct, even a small
interruption of movement would have made a huge error in accuracy, whereas
if the successive approximation theory was correct, we would only lose some
of the very last corrections for a small loss in accuracy. You might try
this experiment yourself, but it was pretty clear to us that we lost
amazingly little accuracy by having our throws physically interrupted.

However my work with text compression suggests that the cost of modeling 1
 GB
 of text (about one human lifetime's worth) is considerably more than a few
 GB
 of memory.  My guess is at least 10^12 bits just for ungrounded language
 modeling.  If the model is represented as a set of (sparse) graphs,
 matrices,
 or neural networks, that's about 10^13 OPS.

 Remember that the goal of AGI is not to duplicate the human brain, but to
 do
 the work that humans are now paid to do.  It still requires solving hard
 problems like language, vision, and robotics, which consume a significant

Re: Computational requirements of AGI (Re: [agi] database access fast enough?)

2008-04-19 Thread Matt Mahoney

--- Steve Richfield [EMAIL PROTECTED] wrote:

 On 4/17/08, Matt Mahoney [EMAIL PROTECTED] wrote:

  The Blue Brain project estimates 8000 synapses per neuron in mouse cortex.
 
 
 I haven't read the report, but I presume that is a PHYSICAL number developed
 from microscopy. Typically, 1% have efficacies  0.

If that's true, then the off synapses carry -lg(.99) = .014 bits of
information and the on synapses carry -lg(.01) = 6.64 bits, for an average
of 0.08 bits per synapse.

 I estimate 10^11 neurons,
 
 
 90% of which are glial cells and not (technically) neurons at all, though
 all we care about is whether or not they compute.

My understanding is they carry passive signals.

  The memory requirement is considerably higher than the information content
  of
  long term memory estimated by Landauer [1], about 10^9 bits.
 
 
 Apples and oranges. You are comparing completed memory with
 work-in-progress to figure out just what to remember. Computers, of
 course, also have large ratios between the space to store data, and the RAM
 needed to develop that data.

Landauer measured human ability to recall all sorts of data like pictures,
spoken or written lists of random words or numbers, music clips, etc.  What's
missing is short term memory and some memory associated with perception and
motor skills.  I haven't seen any good numbers for these.  But you are right
that there is a difference between information content and the memory needed
to represent it.

  Remember that the goal of AGI is not to duplicate the human brain, but to
  do
  the work that humans are now paid to do.  It still requires solving hard
  problems like language, vision, and robotics, which consume a significant
  fraction of the brain's computing power.
 
 
 This all sounds SO much like the 1960s mantra from Carnegie Mellon. At
 minimum, it would seem necessary to distill just what it was that they got
 wrong that present AGI folk have right. If I were an investor, this would be
 the FIRST think that I would want to hear and understand.

Present AGI folks haven't got it right yet either.

 Moore's law presumed a relatively unchanging architecture and rapidly
 advancing fabrication. This has broken down, now that transistors can easily
 be made SO small that the electrons jump right over the gates. Sure there
 will be further developments, e.g. multi-layer, but the easy stuff that
 Moore's law was build on is now GONE.

Actually Moore's law holds pretty well back to about 1900 if you consider the
computing power of mechanical adding machines.  (I believe Kurzweil studied
this).  Moore's law is about the cost of computing, not the size of
transistors.

 The proposed architecture that Josh and I have been discussing could bring
 this to the market for about the same cost as a PC in a couple of years -
 with adequate funding.

I've heard that before.

 IMHO, one of two things will happen:
 1.  The Christians will prevail and this will NEVER EVER be allowed to
 happen, or,

Religion seems to be silent on technology when it doesn't involve human
manipulation (cloning, stem cell research, etc).  I foresee ethical problems
with technologies like brain implants, uploading, reprogramming neurons, etc.
on healthy people.

 2.  Some rich benefactor will step forward and make this happen over the
 loud objections of millions of devoutly religious.

Nobody has that much money.  AGI will happen because nobody wants to work for
somebody else.


-- Matt Mahoney, [EMAIL PROTECTED]

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-18 Thread YKY (Yan King Yin)

On 4/18/08, J. Andrew Rogers [EMAIL PROTECTED] wrote:
 On Apr 17, 2008, at 3:32 PM, YKY (Yan King Yin) wrote:
  Disk access rate is ~10 times faster than ethernet access rate.  IMO,
  if RAM is not enough the next thing to turn to should be the harddisk.

 Eh?  Ethernet latency is sub-millisecond, and in a highly tuned system
 approaches the 10 microsecond range for something local.  Much, much faster
 than disk if the remote node has your data in RAM and is relatively local.

 Note that relatively local can mean geographically regional.  The
 round-trip RAM access time from my machine to a machine on the other side of
 town is a fraction of millisecond over the Internet connection (not
 hypothetical, actually measured at ~400 microseconds).  I wish disk access
 was even remotely that good.  And this was with inexpensive Gigabit
 Ethernet.

LOL... you're right, I forgot to consider latency.  Ethernet is much
faster than harddisk if we measure access times.  But there is another
factor:  Harddisk is owned by the user.  Memory over the net is owned
by others, so must be shared.  It's not easy to arrange a distributed
and cooperative storage scheme.  It's hard enough to solve core AGI
problems, I simply don't have time to do deal with that.

Solid State Disks seems to be a promising solution.

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-18 Thread Mark Waser


Plus, learning requires that we store a lot of hypotheses.  Let's say
1000-1 times the real KB.


I reject this hypothesis as ludicrously incorrect.


- Original Message - 
From: YKY (Yan King Yin) [EMAIL PROTECTED]

To: agi@v2.listbox.com
Sent: Thursday, April 17, 2008 4:58 PM
Subject: Re: [agi] database access fast enough?



On 4/18/08, Mark Waser [EMAIL PROTECTED] wrote:

  Yes.  RAM is *HUGE*.  Intelligence is *NOT*.

 Really?  I will believe that if I see more evidence... right now I'm
skeptical.

And your *opinion* has what basis?  Are you arguing that RAM isn't huge?
That's easily disprovable.  Or are you arguing that intelligence is huge?
That too is easily disprovable.  Which one do I need to knock down?


The current OpenCyc KB is ~200 Mbs (correct me if I'm wrong).

The RAM size of current high-end PCs is ~10 Gbs.

My intuition estimates that the current OpenCyc is only about 10%-40%
of a 5 year-old human intelligence.

Plus, learning requires that we store a lot of hypotheses.  Let's say
1000-1 times the real KB.

That comes to 500Gb - 20Tb.

It seems that if we allow several years for RAM size to double a few
times, RAM may have a chance to catch up to the low end.  Obviously
not now.

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?;

Powered by Listbox: http://www.listbox.com




---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-18 Thread Mark Waser

 I agree with your side of the debate about whole KB not fitting into RAM.  
 As a solution, I propose to partition the whole KB into the tiniest possible 
 cached chunks, suitable for a single agent running on a host computer with 
 RAM resources of at least one GB.  And I propose that AGI will consist not 
 of one program running on one computer, but a vast multitude of separately 
 hosted agents working in concert.

Um.  Neither side is arguing that the whole KB fit into RAM.  I'm arguing that 
the necessary *core* for intelligence plus enough cached chunks (as you 
phrase it) to support the current thought processes WILL fit into RAM.  It's 
obviously ludicrous that all the world's knowledge is going to fit into RAM at 
one time.



  - Original Message - 
  From: Stephen Reed 
  To: agi@v2.listbox.com 
  Sent: Thursday, April 17, 2008 5:20 PM
  Subject: Re: [agi] database access fast enough?


  YKY,

  I agree with your side of the debate about whole KB not fitting into RAM.  As 
a solution, I propose to partition the whole KB into the tiniest possible 
cached chunks, suitable for a single agent running on a host computer with RAM 
resources of at least one GB.  And I propose that AGI will consist not of one 
program running on one computer, but a vast multitude of separately hosted 
agents working in concert.

  But my opinion of the OpenCyc concept coverage with respect to that of a 
human five-year old differs greatly from yours.  I concede that 20 OpenCyc 
facts are about the number a child might know, but in order to properly ground 
these concepts, I believe that a much larger number of feature vectors will 
have to be stored or available in abstracted form.   For example, there is the 
concept of the child's mother.  Properly grounding that one concept might 
require abstracting features from thousands of observations:

a.. wet hair mother
b.. far away mother
c.. angy mother
d.. mother hidden from view
e.. mother in a crowd
f.. mother's voice
g.. mother in dim light
h.. mother from below
i.. and so on

  Of course you can ignore fully grounded concepts as does current Cycorp for 
its applications, and as I will with Texai until it is past the bootstrap stage.

  -Steve


  Stephen L. Reed


  Artificial Intelligence Researcher
  http://texai.org/blog
  http://texai.org
  3008 Oak Crest Ave.
  Austin, Texas, USA 78704
  512.791.7860



  - Original Message 
  From: YKY (Yan King Yin) [EMAIL PROTECTED]
  To: agi@v2.listbox.com
  Sent: Thursday, April 17, 2008 3:58:43 PM
  Subject: Re: [agi] database access fast enough?

  On 4/18/08, Mark Waser [EMAIL PROTECTED] wrote:
 Yes.  RAM is *HUGE*.  Intelligence is *NOT*.
   
Really?  I will believe that if I see more evidence... right now I'm
   skeptical.
  
   And your *opinion* has what basis?  Are you arguing that RAM isn't huge?
   That's easily disprovable.  Or are you arguing that intelligence is huge?
   That too is easily disprovable.  Which one do I need to knock down?

  The current OpenCyc KB is ~200 Mbs (correct me if I'm wrong).

  The RAM size of current high-end PCs is ~10 Gbs.

  My intuition estimates that the current OpenCyc is only about 10%-40%
  of a 5 year-old human intelligence.

  Plus, learning requires that we store a lot of hypotheses.  Let's say
  1000-1 times the real KB.

  That comes to 500Gb - 20Tb.

  It seems that if we allow several years for RAM size to double a few
  times, RAM may have a chance to catch up to the low end.  Obviously
  not now.

  YKY

  ---
  agi
  Archives: http://www.listbox.com/member/archive/303/=now
  RSS Feed: http://www.listbox.com/member/archive/rss/303/
  Modify Your Subscription: http://www.listbox.com/member/?;
  Powered by Listbox: http://www.listbox.com





--
  Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.

--
agi | Archives  | Modify Your Subscription  

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-18 Thread Matt Mahoney


--- Mark Waser [EMAIL PROTECTED] wrote:

 Um.  Neither side is arguing that the whole KB fit into RAM.  I'm arguing
 that the necessary *core* for intelligence plus enough cached chunks (as
 you phrase it) to support the current thought processes WILL fit into RAM. 
 It's obviously ludicrous that all the world's knowledge is going to fit into
 RAM at one time.

What is your estimate of the quantity of all the world's knowledge?  (Or the
amount needed to achieve AGI or some specific goal?)

Google probably keeps a copy of the searchable part of the internet in about 1
PB of RAM, but this isn't AGI yet.  I suppose an internet-wide distributed
system could cache about 1 EB (10^18 bytes).


-- Matt Mahoney, [EMAIL PROTECTED]

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-18 Thread YKY (Yan King Yin)

On 4/18/08, Mark Waser [EMAIL PROTECTED] wrote:
 Um.  Neither side is arguing that the whole KB fit into RAM.  I'm arguing
that the necessary *core* for intelligence plus enough cached chunks (as
you phrase it) to support the current thought processes WILL fit into RAM.
It's obviously ludicrous that all the world's knowledge is going to fit into
RAM at one time.

Then we have no disagreement.

Notice that the loading-on-demand chunks require that we *duplicate*
data.  For example facts about JK Rowling can be in a literature chunk
as well as a entrepreneur chunk.

The question is whether DBMSs support this.  Materialized views may be the
answer (http://en.wikipedia.org/wiki/Materialized_view).

As I said before, minimizing disk access is still an important issue.

And all this is peripheral to AGI.  I wish I can just focus on AGI
algorithms!

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-18 Thread YKY (Yan King Yin)

On 4/18/08, Matt Mahoney [EMAIL PROTECTED] wrote:
 What is your estimate of the quantity of all the world's knowledge?  (Or the
 amount needed to achieve AGI or some specific goal?)

Matt,

The world's knowledge is irrelevant to the goal of AGI.  What we
need is to build a commonsense AGI and then let the it control other
expert systems with specialized knowledge.

So the pertinent question is how large is the core commonsense KB.

I guess anywhere from 1Gb to 100Gb is possible, excluding hypotheses
from learning, and episodic memory.

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-18 Thread Matt Mahoney


--- Mark Waser [EMAIL PROTECTED] wrote:

  What is your estimate of the quantity of all the world's knowledge?  (Or 
  the amount needed to achieve AGI or some specific goal?)
 
 I have no idea (and the question is further muddled by what knowledge is and
 what formats are included).  The question itself is fundamentally 
 nonsensical in it's current form.

I mean either algorithmic complexity, or more practically, how much memory you
need (which depends on the data representation).  But really, it depends on
the goal, which I have been trying unsuccessfully for years to get YKY to pin
down.


-- Matt Mahoney, [EMAIL PROTECTED]

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread J. Andrew Rogers



On Apr 16, 2008, at 9:51 PM, YKY (Yan King Yin) wrote:

Typically we need to retrieve many nodes from the DB to do inference.
The nodes may be scattered around the DB.  So it may require *many*
disk accesses.  My impression is that most DBMS are optimized for
complex queries but not for large numbers of simple retrievals -- am I
correct about this?



No, you are not correct about this.  All good database engines use a  
combination of clever adaptive cache replacement algorithms (read:  
keeps stuff you are most likely to access next in RAM) and cost-based  
optimization (read: optimizes performance by adaptively selecting  
query execution algorithms based on measured resource access costs) to  
optimize performance across a broad range of use cases.  For highly  
regular access patterns (read: similar query types and complexity),  
the engine will converge on very efficient access patterns and  
resource management that match this usage.  For irregular access  
patterns, it will attempt to dynamically select the best options given  
recent access history and resource cost statistics -- not always the  
best result (on occasion hand optimization could do better), but more  
likely to produce good results than simpler rule-based optimization on  
average.


Note that by good database engine I am talking engines that actually  
support these kinds of tightly integrated and adaptive management  
features: Oracle, DB2, PostgreSQL, et al.  This does *not* include  
MySQL, which is a naive and relatively non-adaptive engine, and which  
scales much worse and is generally slower than PostgreSQL anyway if  
you are looking for a free open source solution.



I would also point out that different engines are optimized for  
different use cases.  For example, while Oracle and PostgreSQL share  
the same transaction model, Oracle design decisions optimized for  
massive numbers of small concurrent update transactions and PostgreSQL  
design decisions optimized for massive numbers of small concurrent  
insert/delete transaction.  Databases based on other transaction  
models, such as IBM's DB2, sacrifice extreme write concurrency for  
superior read-only performance.  There are unavoidable tradeoffs with  
such things, so the market has a diverse ecology of engines that have  
chosen a different set of tradeoffs and buyers should be aware of what  
these tradeoffs are if scalable performance is a criteria.



J. Andrew Rogers

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread YKY (Yan King Yin)

On 4/17/08, J. Andrew Rogers [EMAIL PROTECTED] wrote:

 No, you are not correct about this.  All good database engines use a
 combination of clever adaptive cache replacement algorithms (read: keeps
 stuff you are most likely to access next in RAM) and cost-based optimization
 (read: optimizes performance by adaptively selecting query execution
 algorithms based on measured resource access costs) to optimize performance
 across a broad range of use cases.  For highly regular access patterns
 (read: similar query types and complexity), the engine will converge on very
 efficient access patterns and resource management that match this usage.
 For irregular access patterns, it will attempt to dynamically select the
 best options given recent access history and resource cost statistics -- not
 always the best result (on occasion hand optimization could do better), but
 more likely to produce good results than simpler rule-based optimization on
 average.

 Note that by good database engine I am talking engines that actually
 support these kinds of tightly integrated and adaptive management features:
 Oracle, DB2, PostgreSQL, et al.  This does *not* include MySQL, which is a
 naive and relatively non-adaptive engine, and which scales much worse and is
 generally slower than PostgreSQL anyway if you are looking for a free open
 source solution.


 I would also point out that different engines are optimized for different
 use cases.  For example, while Oracle and PostgreSQL share the same
 transaction model, Oracle design decisions optimized for massive numbers of
 small concurrent update transactions and PostgreSQL design decisions
 optimized for massive numbers of small concurrent insert/delete transaction.
  Databases based on other transaction models, such as IBM's DB2, sacrifice
 extreme write concurrency for superior read-only performance.  There are
 unavoidable tradeoffs with such things, so the market has a diverse ecology
 of engines that have chosen a different set of tradeoffs and buyers should
 be aware of what these tradeoffs are if scalable performance is a criteria.


Thanks for the info -- I studied database systems almost a decade ago,
so I can hardly remember the details =)

ARC (Adaptive Cache Replacement) seems to be one of the most popular
methods, and it's based on keeping track of frequently used and
recently used.  Unfortunately, for AGI / inference purposes, those
may not be the right optimization objectives.

The requirement of inference is that we need to access a lot of
*different* nodes, but the same nodes may not be required many times.
Perhaps what we need is to *bundle* up nodes that are associated with
each other, so we can read a whole block of nodes with 1 disk access.
This requires a very special type of storage organization -- it seems
that existing DBMSs don't have it =(

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread YKY (Yan King Yin)

To use an example,

If a lot of people search for Harry Porter, then a conventional
database system would make future retrieval of the Harry Porter node
faster.

But the requirement of the inference system is such that, if Harry
Porter is fetched, then we would want *other* things that are
associated with Harry Porter to be retrieved faster in the future, for
example items such as JK Rowling or fantasy fiction.

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread Mark Waser

No.  You are not correct.  Most DBMS's compile and optimize complex queries 
as a separate operation before doing data retrieval -- but even the most 
complex query is actually implemented as a series of simple retrievals 
(which is what the database is truly designed to do).  On the other hand, 
communication to and from your database -- particularly across a network --  
is very likely to be a speed problem.  My solution is to actually implement 
your inference in the database engine.  That way the database handles all of 
your memory management, caching, storage, etc., etc.


- Original Message - 
From: YKY (Yan King Yin) [EMAIL PROTECTED]

To: agi@v2.listbox.com
Sent: Thursday, April 17, 2008 12:51 AM
Subject: [agi] database access fast enough?



For those using database systems for AGI, I'm wondering if the data
retrieval rate would be a problem.

Typically we need to retrieve many nodes from the DB to do inference.
The nodes may be scattered around the DB.  So it may require *many*
disk accesses.  My impression is that most DBMS are optimized for
complex queries but not for large numbers of simple retrievals -- am I
correct about this?

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?;

Powered by Listbox: http://www.listbox.com




---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread Stephen Reed

YKY

Here is what I learned from implementing the Texai knowledge base. It persists 
symbolic statements about concepts.
I designed an SQL schema to persist OpenCyc in its full CycL form, in MySQL on 
SuSE 64-bit Linux.  My Java application driving MySQL dramatically slowed down 
when the number of rows exceeded 20 million as compared to the initial load of 
5 million rows.
I then tried Oracle Berkeley DB Java Edition (open source) which provides no 
SQL query facility, instead one programs directly to its API for inserts, 
queries, updates and so forth.  It is faster than MySQL for my large KB, but 
uses four times as much disk space due to its method of inserting new rows at 
the end of the file, and having lots of free space.I then studied partitioning, 
which means to break up the monolithic KB into smaller databases in which 
accesses are expected to be clustered.  And I studied sharding, which means to 
slice up a database into logical segments that are hosted by separate db 
engines, typically with separate disk filesystems.I began writing my own 
storage engine, for a fast, space-efficient, partitioned and sharded knowledge 
base, soon realizing that this was far too big a task for a sole developer.   
Revisiting my project object persistence needs, and thinking more about 
interoperability with semantic web technologies, I decided to convert my 
existing KB to an RDF-compatible form and then to evaluate RDF quad 
stores.After some analysis, I chose to evaluate the Sesame 2 RDF store, which 
is Java based and open source and thus very compatible with my other 
components.   In Texai, RDF queries have a simpler form than SQL queries when 
retrieving logical statements from a store.  For example, in SQL my schema had 
to provide separate tables for each object type:  concept term, functional 
term, string, boolean, long integer, double, statement, arity-1 rule, arity-2 
rule, arity-3 rule, arity-4 rule and arity-5 rule.  Many of these tables would 
have to be joined for a typical query (e.g. what concepts subsume a given 
concept?).
My development Linux computer has 4 GB of memory, and Linux has a feature 
called tmpfs which permits mounting a directory in RAM.  I partitioned my KB 
into separate KBs of less than six million rows each.  In Sesame these are less 
than one GB in size and I can therefore put any one of them in tmpfs - running 
that application-relevant part of the KB at RAM speed.   Experiments 
demonstrate about a 10 times speedup.
When Texai is deployed, I expect that the application will log its transactions 
to disk as a background process as a safeguard against losing the volatile KB 
in tmpfs.Hope this information is useful.
-Steve
 
Stephen L. Reed

Artificial Intelligence Researcher
http://texai.org/blog
http://texai.org
3008 Oak Crest Ave.
Austin, Texas, USA 78704
512.791.7860

- Original Message 
From: YKY (Yan King Yin) [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Wednesday, April 16, 2008 11:51:35 PM
Subject: [agi] database access fast enough?

 For those using database systems for AGI, I'm wondering if the data
retrieval rate would be a problem.

Typically we need to retrieve many nodes from the DB to do inference.
The nodes may be scattered around the DB.  So it may require *many*
disk accesses.  My impression is that most DBMS are optimized for
complex queries but not for large numbers of simple retrievals -- am I
correct about this?

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: http://www.listbox.com/member/?;
Powered by Listbox: http://www.listbox.com







  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread Mark Waser

And, as far as I'm concerned, the last clause of item 4 and the transition to 5 
and 6 clearly demonstrates why Steve seems to be making a lot of progress 
compared to everyone else.
  - Original Message - 
  From: Stephen Reed 
  To: agi@v2.listbox.com 
  Sent: Thursday, April 17, 2008 10:23 AM
  Subject: Re: [agi] database access fast enough?

  YKY

  Here is what I learned from implementing the Texai knowledge base. It 
persists symbolic statements about concepts.

1.. I designed an SQL schema to persist OpenCyc in its full CycL form, in 
MySQL on SuSE 64-bit Linux.  My Java application driving MySQL dramatically 
slowed down when the number of rows exceeded 20 million as compared to the 
initial load of 5 million rows.

2.. I then tried Oracle Berkeley DB Java Edition (open source) which 
provides no SQL query facility, instead one programs directly to its API for 
inserts, queries, updates and so forth.  It is faster than MySQL for my large 
KB, but uses four times as much disk space due to its method of inserting new 
rows at the end of the file, and having lots of free space.
3.. I then studied partitioning, which means to break up the monolithic KB 
into smaller databases in which accesses are expected to be clustered.  And I 
studied sharding, which means to slice up a database into logical segments that 
are hosted by separate db engines, typically with separate disk filesystems.
4.. I began writing my own storage engine, for a fast, space-efficient, 
partitioned and sharded knowledge base, soon realizing that this was far too 
big a task for a sole developer.   

5.. Revisiting my project object persistence needs, and thinking more about 
interoperability with semantic web technologies, I decided to convert my 
existing KB to an RDF-compatible form and then to evaluate RDF quad stores.
6.. After some analysis, I chose to evaluate the Sesame 2 RDF store, which 
is Java based and open source and thus very compatible with my other 
components.   In Texai, RDF queries have a simpler form than SQL queries when 
retrieving logical statements from a store.  For example, in SQL my schema had 
to provide separate tables for each object type:  concept term, functional 
term, string, boolean, long integer, double, statement, arity-1 rule, arity-2 
rule, arity-3 rule, arity-4 rule and arity-5 rule.  Many of these tables would 
have to be joined for a typical query (e.g. what concepts subsume a given 
concept?).

7.. My development Linux computer has 4 GB of memory, and Linux has a 
feature called tmpfs which permits mounting a directory in RAM.  I partitioned 
my KB into separate KBs of less than six million rows each.  In Sesame these 
are less than one GB in size and I can therefore put any one of them in tmpfs - 
running that application-relevant part of the KB at RAM speed.   Experiments 
demonstrate about a 10 times speedup.

8.. When Texai is deployed, I expect that the application will log its 
transactions to disk as a background process as a safeguard against losing the 
volatile KB in tmpfs.
  Hope this information is useful.
  -Steve

  Stephen L. Reed

  Artificial Intelligence Researcher
  http://texai.org/blog
  http://texai.org
  3008 Oak Crest Ave.
  Austin, Texas, USA 78704
  512.791.7860

  - Original Message 
  From: YKY (Yan King Yin) [EMAIL PROTECTED]
  To: agi@v2.listbox.com
  Sent: Wednesday, April 16, 2008 11:51:35 PM
  Subject: [agi] database access fast enough?

  For those using database systems for AGI, I'm wondering if the data
  retrieval rate would be a problem.

  Typically we need to retrieve many nodes from the DB to do inference.
  The nodes may be scattered around the DB.  So it may require *many*
  disk accesses.  My impression is that most DBMS are optimized for
  complex queries but not for large numbers of simple retrievals -- am I
  correct about this?

  YKY

  ---
  agi
  Archives: http://www.listbox.com/member/archive/303/=now
  RSS Feed: http://www.listbox.com/member/archive/rss/303/
  Modify Your Subscription: http://www.listbox.com/member/?;
  Powered by Listbox: http://www.listbox.com

--
  Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.

--
agi | Archives  | Modify Your Subscription  

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread J. Andrew Rogers



On Apr 17, 2008, at 2:50 AM, YKY (Yan King Yin) wrote:

ARC (Adaptive Cache Replacement) seems to be one of the most popular
methods, and it's based on keeping track of frequently used and
recently used.  Unfortunately, for AGI / inference purposes, those
may not be the right optimization objectives.



It is a cache replacement algorithm, what would be a right  
optimization objective for such an algorithm?  There is a lot of  
cleverness in the use of the cache to maximize cache efficiency beyond  
the cache replacement algorithm -- it is one of the most heavily  
engineered parts of a database engine.


As an FYI, ARC is patented by IBM.  PostgreSQL uses a different but  
similar algorithm that is indistinguishable from ARC in benchmarks  
(having implemented ARC briefly, not realizing that it was patented).




The requirement of inference is that we need to access a lot of
*different* nodes, but the same nodes may not be required many times.
Perhaps what we need is to *bundle* up nodes that are associated with
each other, so we can read a whole block of nodes with 1 disk access.
This requires a very special type of storage organization -- it seems
that existing DBMSs don't have it =(



Again, most good database engines can do this, as it is a standard  
access pattern for databases, and most databases can solve this  
problem multiple ways.  As an example, clustering and index- 
organization features in databases address your issue here.


It is pretty difficult to generate an access pattern use case that  
they cannot be optimized for with a good database engine.  They are  
very densely engineered pieces of software, designed to be very fast  
while scaling well in multiple dimensions and adapting to varying  
workloads.  On the other hand, if your use case is simple enough you  
can gain some significant speed for modest effort by writing your own  
engine that is purpose-built to be optimized for your needs.



J. Andrew Rogers

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread J. Andrew Rogers



On Apr 17, 2008, at 6:07 AM, Mark Waser wrote:
I have to laugh at your total avoidance of Microsoft SQL Server  
which is arguably faster and better scaling for truly mixed use than  
everything except possibly Oracle on ordinary hardware; which is  
much easier to use than Oracle; and which is the easiest to actually  
put *GOOD* code in the database engine itself (particularly when  
compared to Oracle's *REALLY* poor java imitation).



Discussing SQL Server does not generalize well in that they  
reimplement the core engine design with almost every release once they  
realize they hosed the design with the last release.  For example, up  
until SQL Server 2005 the transaction engine was weak such that  
PostgreSQL could spank it in transaction throughput -- in 2005 they  
switched to a transaction model more like PostgreSQL and Oracle and  
gained some parity.  SQL Server still does not really distribute all  
that easily, unlike Oracle or PostgreSQL.


SQL Server versions before the current two year old one were pretty  
much dogs in a lot of ways.  The most recent version is as you state a  
pretty solid database engine.  Oracle is a major pain in the ass to  
use but does scale well, though for many OLTP loads it is barely  
faster than PostgreSQL these days.



If putting your code in the engine is the goal, PostgreSQL wins by a  
country mile.  The entire engine from front to back is deeply hackable  
with very clean APIs and you can even safely bind binary code into the  
engine at runtime.  That the transaction engine scales quite well is  
just a bonus.  People have already written hooks for a dozen languages  
into it.  I've written performance-sensitive customizations of  
PostgreSQL in the past, and for purposes like that it can often be  
much faster than the commercial alternatives, as the alternatives tend  
to be relatively feature poor and shallow when it comes to engine  
customization.  Making deep and very flexible customization a safe  
core feature was a design decision tradeoff in PostgreSQL that is  
somewhat unique to it.  You can do a lot of really cool software  
implementation tricks with it that Oracle and SQL Server do not do.


J. Andrew Rogers

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread YKY (Yan King Yin)

On 4/17/08, Mark Waser [EMAIL PROTECTED] wrote:

 You *REALLY* need to get up to speed on current database systems before you
 make more ignorant statements.

 First off, *most* databases RARELY go to the disk for reads.  Memory is
 cheap and the vast majority of complex databases are generally small enough
 that they are normally held in memory during normal operation.

That's true as of now, but let's think one or two steps further:  Do
you really think a mature AGI's (say with 3-6 year-old human
intelligence) KB can reside in RAM, entirely?


 Next, I suspect that whatever bundling you're talking about is likely to
 be along field boundaries and is likely going to be akin to just reading
 an entire FIELD table into memory (that will have the exact same structure
 as all other field tables but will be contiguous on disk so as to promote
 fast loads).

To clarify what I mean:

1.  the DB contains a large number of facts / rules (perhaps stored as
rows in SQL parlance)
2.  many of these rows have to be fetched for inference (Resolution
tests if a rule leads to a successful proof, but more often than not
the rules are discarded).
3.  the rows are scattered all around the DB

For example, let say I want to infer something about Harry Porter
and JK Rowling, I would want to fetch these facts / rules:
1.  Harry Porter is a successful book series
2.  Harry Porter belongs to the fantasy genre
3.  JK Rowling is the author of Harry Porter
4.  JK Rowling is now richer than Queen Elizabeth II.
etc...

But I would probably NOT need facts / rules like:
1.  Einstein is the creator of General Relativity
2.  Water is heavier than oil
etc...

So we should keep track of what rules are usually used *together*, and
perhaps bring them into physically contagious storage.  I'm not sure
which DB feature(s) allow this...

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread YKY (Yan King Yin)

Hi Stephen,

Thanks for sharing this!  VERY few people have experience with this stuff...

On 4/17/08, Stephen Reed [EMAIL PROTECTED] wrote:
 4. I began writing my own storage engine, for a fast, space-efficient, 
 partitioned and sharded knowledge base, soon realizing that this was far too 
 big a task for a sole developer.

That seems like what we actually need.

 My development Linux computer has 4 GB of memory, and Linux has a feature 
 called tmpfs which permits mounting a directory in RAM.  I partitioned my KB 
 into separate KBs of less than six million rows each.  In Sesame these are 
 less than one GB in size and I can therefore put any one of them in tmpfs - 
 running that application-relevant part of the KB at RAM speed.   Experiments 
 demonstrate about a 10 times speedup.

If the inference requires a rule outside the sub-KB, you'd have to do
a very expensive swap.  I think this only works if you're sure the
entire inference is contained within a sub-KB.

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread Ben Goertzel

Hi Mark,

  This is, by the way, my primary complaint about Novamente -- far too much
 energy, mind-space, time, and effort has gone into optimizing and repeatedly
 upgrading the custom atom table that should have been built on top of
 existing tools instead of being built totally from scratch.

Really, work on the AtomTable has been a small percentage of work on
the Novamente Cognition Engine ... and, the code running the AtomTable is
now pretty much the same as it was in 2001 (though it was tweaked to make it
64-bit compatible, back in 2004 ... and there has been ongoing bug-removal
as well...).  We wrote some new wrappers for the AtomTable
last year (based on STL containers), but that didn't affect the
internals, just the API.

It's true that a highly-efficient, highly-customizable graph database could
potentially serve the role of the AtomTable, within the NCE or OpenCog.

But that observation is really not
such a big deal.  Potentially, one could just wrap someone else's graph DB
behind the 2007 AtomTable API, and this change would be completely transparent
to the AI processes using the AtomTable.

However, I'm not convinced this would be a good idea.  There are a lot of
useful specialized indices in the AtomTable, and replicating all this in some
other graph DB would wind up being a lot of work ... and we could use that
time/effort on other stuff instead

Using a relational DB rather than a graph DB is not appropriate for the NCE
design, however.

But we've been over this before...

And, this is purely a software implementation issue rather than an AI issue,
of course.  The NCE and OpenCog designs require **some** graph or
hypergraph DB which supports the manual and automated creation of
complex customized indices ... and supports refined cognitive control
over what lives on disk and what lives in RAM, rather than leaving this
up to some non-intelligent automated process.  Given these requirements,
the choice of how to realize them in software is not THAT critical ... and
what we have there now works


-- Ben G

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread YKY (Yan King Yin)

On 4/17/08, J. Andrew Rogers [EMAIL PROTECTED] wrote:

 Again, most good database engines can do this, as it is a standard access
 pattern for databases, and most databases can solve this problem multiple
 ways.  As an example, clustering and index-organization features in
 databases address your issue here.

Thanks... clustered indexing looks promising, but I need to study it
in more details to see if it really solves the problem...

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread Stephen Reed

YKY said:
If the inference requires a rule outside the sub-KB, you'd have to do
a very expensive swap.  I think this only works if you're sure the
entire inference is contained within a sub-KB.

 
Right.  I envision Texai deployed as distributed agents operating within a 
hierarchical control system.  Each agent's mission will be scoped to require 
immediate access to only a cache of some KB partition.  Hopefully infrequent, 
cache misses will incur the penalty you mention, either to local disk, or worse 
- to the network.  I also expect the system to be adaptive to whatever the 
user's computer allows with regard to resources (e.g. more RAM begets faster 
response).   I am also considering torrent-style transfers to satisfy cache 
misses.  As you point out an AGI's KB query is likely to access other linked 
objects (e.g. spreading activation search).  So given that users will likely 
have asymmetric Internet connection bandwidth, It may be faster for large 
chunks of cache-filling KB data to be obtained simultaneously in slices from a 
multitude of collaborating peer agents.

-Steve


Stephen L. Reed

Artificial Intelligence Researcher
http://texai.org/blog
http://texai.org
3008 Oak Crest Ave.
Austin, Texas, USA 78704
512.791.7860





  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread Matt Mahoney

--- YKY (Yan King Yin) [EMAIL PROTECTED] wrote:

 For those using database systems for AGI, I'm wondering if the data
 retrieval rate would be a problem.

I analyzed the scalability of distributed indexing for my thesis (linked at
http://www.mattmahoney.net/agi.html ).  For data randomly distributed in a
vector space model up to log n dimensions, storage is O(n log n), retrieval
time is effectively O(log n) and update is O(log^2 n).  In practice you can do
better because data tends to cluster, reducing the effective number of
dimensions, and because accesses tend to be distributed nonuniformly.  Data
accessed frequently will tend to be cached in nearby nodes.

I realize you are asking about the relational model, but you can implement the
most common transactions, e.g. retrieving or updating a small number of
records at a time, by storing records of the form author timestamp table
field=value field=value   This also gives you transaction logging,
rollback, and authentication, which will be important in any database with
lots of users (I assume AGI).  However I don't think it will be as powerful as
records of the form author timestamp arbitrary_text.

 To use an example,
 
 If a lot of people search for Harry Porter, then a conventional
 database system would make future retrieval of the Harry Porter node
 faster.
 
 But the requirement of the inference system is such that, if Harry
 Porter is fetched, then we would want *other* things that are
 associated with Harry Porter to be retrieved faster in the future, for
 example items such as JK Rowling or fantasy fiction.

A huge relational database would retrieve the fact that Harry Porter won a
gold medal for the high jump in the 1908 Olympics.  A better language model
(like Google) might figure out that you meant Harry Potter :-)


-- Matt Mahoney, [EMAIL PROTECTED]

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread Mark Waser

Clustered indexing *WILL* solve your problem if you're willing to include 
all the data you're going to need in the index.  It's definitely a trade-off 
. . . . but arguably a solid one.

- Original Message - 
From: YKY (Yan King Yin) [EMAIL PROTECTED]

To: agi@v2.listbox.com
Sent: Thursday, April 17, 2008 1:03 PM
Subject: Re: [agi] database access fast enough?

On 4/17/08, J. Andrew Rogers [EMAIL PROTECTED] wrote:

Again, most good database engines can do this, as it is a standard access
pattern for databases, and most databases can solve this problem multiple
ways.  As an example, clustering and index-organization features in
databases address your issue here.

Thanks... clustered indexing looks promising, but I need to study it
in more details to see if it really solves the problem...

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?;

Powered by Listbox: http://www.listbox.com

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread YKY (Yan King Yin)

On 4/18/08, Mark Waser [EMAIL PROTECTED] wrote:

 Yes.  RAM is *HUGE*.  Intelligence is *NOT*.

Really?  I will believe that if I see more evidence... right now I'm skeptical.

Also, I'm designing a learning algorithm that stores *hypotheses* in
the KB along with accepted rules.  This will multiply the size of the
KB by a factor.

YKY

PS:  In my last message, contagious should be contiguous... =)

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread J. Andrew Rogers



On Apr 17, 2008, at 9:08 AM, Mark Waser wrote:
Yes, the newest versions of PostgeSQL could spank SQL Server 2000  
after it was several years old.  One tremendous advantage of  
PostgreSQL is it's very short development cycle.



Actually, this was a fundamental and known weakness in the SQL Server  
2000 transactional model, being more like DB2 than Oracle.  Because  
PostgreSQL has used the same kind of model as Oracle -- and for a very  
long time -- it has always been relatively strong at OLTP throughput.   
Until SQL Server 2005, the Microsoft offering was never really  
competitive.  It had little to do with development timelines.  On the  
other hand, PostgreSQL was a bit of a dog at OLAP until relatively  
recently.


You imply that the performance is due to some kind of linear  
development path, but in fact SQL Server 2005 changed its internal  
model to be like Oracle and PostgreSQL so that it could be competitive  
at OLTP.  It is a matter of algorithm selection and tradeoffs, not  
engineering effort.  SQL Server (until two years ago) has always had  
relatively poor lock concurrency, but gave very good baseline OLAP  
performance as a consequence of that decision.  The reality is that it  
is much easier to make the Oracle/Postgres model perform  
satisfactorily at OLAP than to make the old SQL Server model perform  
satisfactorily at OLTP.



-- in 2005 they  switched to a transaction model more like  
PostgreSQL and Oracle and  gained some parity.  SQL Server still  
does not really distribute all  that easily, unlike Oracle or  
PostgreSQL.


Have you ever worked with an Oracle distributed database?  Oracle  
does not distribute well.



I've worked with very large databases on several major platforms,  
including Oracle and SQL Server in many different guises.  Oracle's  
parallel implementation may not distribute that well, but that is  
because traditional transactional semantics are *theoretically  
incapable* of distributing well.  To the extent it is possible at all,  
Oracle does a very good job at making it work.


There are new transactional architectures in academia that should work  
better in a modern distributed environment than any of the current  
commercial adaptations of classical architectures to distributed  
environments.



Oracle only scales well when you know how to properly use it.  In  
most installations that I've seen, Oracle underperforms even SQL  
Server 2000 because the DBA didn't do the necessary to make it  
perform optimally (because Oracle is *NOT* average person  
friendly).  I've made *a lot* of money optimizing people's Oracle  
installations that I shouldn't have been able to make if Oracle  
could get out of it's own way.



No argument here, one of the major problems of Oracle is that it is  
bloody impossible to use well without a full-time staff.  I spent many  
years solving scaling problems on extremely large Oracle systems.  The  
insidiousness of PostgreSQL in the market is that it is very Oracle- 
like at a high-level but *massively* simpler and easier to use and  
administer while still delivering much of the performance and a  
significant subset of the features of Oracle.  SQL Server has done  
well against Oracle for similar reasons.


The main problem with SQL Server these days is that it does not run on  
Unix.  Most of the major historical suckiness does not apply to the  
current version.




Making deep and very flexible customization a safe  core feature  
was a design decision tradeoff in PostgreSQL that is  somewhat  
unique to it. You can do a lot of really cool software   
implementation tricks with it that Oracle and SQL Server do not do.


Yes.  The biggest problems with PostgreSQL are that it doesn't have  
a Microsoft compatibility mode and it isn't clear to corporations  
where you can get *absolutely guaranteed* support.



Sun Microsystems not only officially supports it, they do a lot of  
development on it, as does Fujitsu in Asia, Red Hat and a few other  
large companies that are heavily invested in it.  A significant  
portion of the main PostgreSQL developers do it as their official  
corporate job.


PostgreSQL is very broadly ANSI compatible (including a lot of  
ancillary database standards surrounding SQL), and to the extent it  
has a flavor it clearly borrows from Oracle rather than SQL Server.   
SQL Server has a lot of bits that do not conform to standards that  
everyone else supports. From a historical perspective, PostgreSQL  
shares a transaction model with Oracle, started on Unix, and has been  
around since a time when SQL Server was not something you would want  
to emulate.  PostgreSQL has matured to the point where it mostly  
follows standards to the extent possible but has enough unique  
features and capabilities that it has started to become a flavor of  
its own.



If you could swap out an MS-SQL server *immediately* for a  
PostgreSQL server simply by copy the data and rebinding a WINS

Re: [agi] database access fast enough?

2008-04-17 Thread Steve Richfield

Everyone,

At startup, I simply had Dr. Eliza cycle through the heavily used part of
the DB, so that it would run in RAM except for unusual access. Of course,
its demo DB now easily fits into RAM. VM paging was a MUCH worse problem
than is DB access. I suspect that unless you lock the code into RAM,
that this may will forever be the case because less-used routines (e.g.
exception handlers) will get pushed out of RAM by the DB engine's scramble
for buffer space, which of course you can limit by tweaking the DB engine.

Also, has any one here looked at using Flash Disks for DB? Vista now puts VM
onto any available flash drives to gain performance.

On 4/17/08, Mark Waser [EMAIL PROTECTED] wrote:

 That's true as of now, but let's think one or two steps further:  Do
  you really think a mature AGI's (say with 3-6 year-old human
  intelligence) KB can reside in RAM, entirely?
 

 Yes.  RAM is *HUGE*.  Intelligence is *NOT*.


Hmm, thinking on the keyboard...
~100E9 computing cells with ~50K inputs each, of which ~200 are active.
One theory is that you would only have to carry the active inputs, plus some
fraction of the inactive inputs while you watched for things to happen to
make them active. Let's say that we must track ~1E3 inputs, for a total
of 100E12 or one hundred trillion inputs. We could use fractal means to
generate the original configuration (as biological brains probably do), very
low precision arithmetic with statistical rounding, etc., which would reduce
each input to just a few bytes to maintain, say ~10. This makes a total of
1E15 or one quadrillion bytes to represent a simulated human's instantaneous
state of construction. An entire checkpoint would take little more, because
it would only include in addition the electrical state of each of the 100E9
cells.

Note however, that the *FUNCTIONAL* state would only be 1/5 of this estimate
because 4/5 of the represented inputs are presently inactive, for a total of
only 100 terabytes.

Note that ~90% of those 100E9 cells are slow-responding glial cells, so
while the state is large, the computational requirements may be well short
of a petaflop.

Of course, this makes a LOT of assumptions that no one has yet bothered to
confirm in the laboratory, and I do NOT want to ignite an estimates war,
so I invite constructive comments from anyone with more recent data than I
have.

Steve Richfield

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread Ben Goertzel

On Thu, Apr 17, 2008 at 2:42 PM, Mark Waser [EMAIL PROTECTED] wrote:

  Really, work on the AtomTable has been a small percentage of work on
  the Novamente Cognition Engine ... and, the code running the AtomTable is
  now pretty much the same as it was in 2001 (though it was tweaked to make
 it
  64-bit compatible, back in 2004 ... and there has been ongoing bug-removal
  as well...).
 

  And . . . and . . . and . . . :-)  It's far more than you're
 admitting to yourself.:-)

That's simply not true, but I know of no way to convince you.

The AtomTable work was full-time work for a two guys for a few months
in 2001, and since then it's been occasional part-time tweaking by two
people who have been full-time engaged on other projects.

  We wrote some new wrappers for the AtomTable
  last year (based on STL containers), but that didn't affect the
  internals, just the API.
 

  Which is what everything should have been designed around anyways -- so
 effectively, last year was a major breaking change that affected *all* the
 software written to the old API.

Yes, but calls to the AT were already well-encapsulated within the code,
so changing from the old API to the new has not been a big deal.

  Absolutely.  That's what I'm pushing for.  Could you please, please publish
 the 2007 AtomTable API?  That's actually far, far more important than the
 code behind it.  Please, please . . . . publish the spec today . . . .
 pretty please with a cherry on top?

It'll be done as part of the initial OpenCog release, which will be pretty
soon now ... I don't have a date yet though...

  However, I'm not convinced this would be a good idea.  There are a lot of
  useful specialized indices in the AtomTable, and replicating all this in
 some
  other graph DB would wind up being a lot of work ... and we could use that
  time/effort on other stuff instead
 

  Which (pardon me, but . . .  ) clearly shows that you're not a professional
 software engineer

I'm not but many other members of the Novamente team are

  My contention is that you all should be
 *a lot* further along than you are.  You have more talent than anyone else
 but are moving at a truly glacial pace.

90% of Novamente LLC's efforts historically have gone into various AI
consulting projects
that pay the bills.

Now, about 60% is going into consulting projects, and 40% is going
into the virtual
pet brain project

We have very rarely had funding to pay folks to work on AGI, so we've
worked on it
in bits and pieces here and there...

Sad, but true...

 I understand that you believe that
 this is primarily due to other reasons but *I am telling you* that A LOT of
 it is also your own fault due to your own software development choices.

You're wrong, but arguing the point over and over isn't getting us
anywhere.

  Worse, fundamentally, currently, you're locking *everyone* into *your*
 implementation of the atom table.

Well, that will not be the case in OpenCog.  The OpenCog architecture
will be such that other containers could be inserted if desired.

Why not let someone else decide whether
 or not it is worth their time and effort to implement those specialized
 indices on another graph DB of their choice?  If you would just open up the
 API and maybe accept some good enhancements (or, maybe even, if necessary,
 some changes) to it?

Yes, that's going to happen within OpenCog.

  Using a relational DB rather than a graph DB is not appropriate for the
 NCE
  design, however.
 

  Incorrect.  If the API is identical and the speed is identical, whether it
 is a relational db or a graph db *behind the scenes* is irrelevant.  Design
 to your API -- *NOT* to the underlying technology.  You keep making this
 mistake.

The speed will not be identical for an important subset of queries, because
of intrinsic limitations of the B-tree datastructures used inside RDB's.  We
discussed this before.


  Seriously -- I think that you're really going to be surprised at how fast
 OpenCog might take off if you'd just relax some control and concentrate on
 the specifications and the API rather than the implementation issues that
 you're currently wasting time on.

I am optimistic about the development speedup we'll see from OpenCog,
but not for the reason you cite.

Rather, I think that by opening it up in an intelligent way, we're simply
going to get a lot more people involved, contributing their code, their
time, and their ideas.  This will accelerate things considerably, if all
goes well.

I repeat that NO implementation time has been spent on the AtomTable
internals for quite some time now.  A few weeks was spent on the API
last year, by one person.  I'm not sure why you want to keep exaggerating
the time put into that component, when after all you weren't involved in
its development at all (and I didn't even know you when the bulk of
that development was being done!!)

I don't care if, in OpenCog, someone replaces the AtomTable internals
with something

Re: [agi] database access fast enough?

2008-04-17 Thread Mark Waser

Actually, this was a fundamental and known weakness in the SQL Server 
2000 transactional model, being more like DB2 than Oracle.


I disagree.  First off, we're talking about the DEFAULT transactional model, 
locking mode, and where new records are placed.  It has always been 
posssible to tweak any of the databases to the other's transactional model. 
Second of all, it was not a weakness -- it was a deliberate choice of 
optimization -- it was a choice of OLAP over OLTP (and, let's be honest, for 
most databases on limited memory machines with low OLTP requirements, this 
was the correct choice until ballooning memories made the reverse true).


Because  PostgreSQL has used the same kind of model as Oracle -- and for a 
very  long time -- it has always been relatively strong at OLTP 
throughput.   Until SQL Server 2005, the Microsoft offering was never 
really  competitive.


Bull.  For anything except the heaviest OLTP loads, Microsoft was more than 
adequate.  You don't need a semi to drive the highways.


It had little to do with development timelines.  On the  other hand, 
PostgreSQL was a bit of a dog at OLAP until relatively  recently.


See?  You're making my point.:-)

You imply that the performance is due to some kind of linear  development 
path, but in fact SQL Server 2005 changed its internal  model to be like 
Oracle and PostgreSQL so that it could be competitive  at OLTP.  It is a 
matter of algorithm selection and tradeoffs, not  engineering effort.  SQL 
Server (until two years ago) has always had  relatively poor lock 
concurrency, but gave very good baseline OLAP  performance as a 
consequence of that decision.  The reality is that it  is much easier to 
make the Oracle/Postgres model perform  satisfactorily at OLAP than to 
make the old SQL Server model perform  satisfactorily at OLTP.


Again, you're making my point.  Until memory became cheap and OLTP become 
more critical, Microsoft made the right choice of OLAP over OLTP.  When the 
world changed, so did they.  I'd call that a strength and flexibility, not a 
weakness.


I've worked with very large databases on several major platforms, 
including Oracle and SQL Server in many different guises.  Oracle's 
parallel implementation may not distribute that well, but that is  because 
traditional transactional semantics are *theoretically  incapable* of 
distributing well.  To the extent it is possible at all,  Oracle does a 
very good job at making it work.


So, is your claim that Oracle distributes better than Microsoft?  If so, 
why?


There are new transactional architectures in academia that should work 
better in a modern distributed environment than any of the current 
commercial adaptations of classical architectures to distributed 
environments.


And PostgreSQL will probably implement them long before Oracle or MS.

Sun Microsystems not only officially supports it, they do a lot of 
development on it, as does Fujitsu in Asia, Red Hat and a few other  large 
companies that are heavily invested in it.  A significant  portion of the 
main PostgreSQL developers do it as their official  corporate job.


Cool.  I wasn't aware that it had made that many inroads.  Awesome.


PostgreSQL is very broadly ANSI compatible (including a lot of  ancillary 
database standards surrounding SQL), and to the extent it  has a flavor 
it clearly borrows from Oracle rather than SQL Server.   SQL Server has a 
lot of bits that do not conform to standards that  everyone else supports. 
From a historical perspective, PostgreSQL  shares a transaction model with 
Oracle, started on Unix, and has been  around since a time when SQL Server 
was not something you would want  to emulate.  PostgreSQL has matured to 
the point where it mostly  follows standards to the extent possible but 
has enough unique  features and capabilities that it has started to become 
a flavor of  its own.



If you could swap out an MS-SQL server *immediately* for a  PostgreSQL 
server simply by copy the data and rebinding a WINS name  or an IP 
address, I would be in hog heaven even if support wasn't  absolutely 
guaranteed since I could always switch back. Given that  there's a huge 
transition cost (changing scripts, procedures, etc.),  I can't get *ANY* 
agreement for the thought of switching (and I'm  sure that there are 
*MANY* more in my circumstances).



The only corporate database that relatively easily ports back and  forth 
with PostgreSQL is Oracle. Nonetheless, a number of people have  ported 
applications to PostgreSQL from MS-SQL with good results;  questions about 
porting nuances come up regularly on the PostgreSQL  mailing lists.


Beyond your basic ANSI compliance, database portability only sort of 
exists.  Inevitably people use non-standard platform features that  expose 
the specific capabilities of the engine being used to maximize 
performance.  As a practical matter, you pick a database platform and 
stick with it as long as is reasonably possible.

Re: [agi] database access fast enough?

2008-04-17 Thread J. Andrew Rogers



On Apr 17, 2008, at 12:20 PM, Mark Waser wrote:
It has always been posssible to tweak any of the databases to the  
other's transactional model.



Eh? Choices in concurrency control and scheduling run very deep in a  
database engine, with ramifications that cascade through every other  
part of the system.  Equivalent transaction isolation levels can  
behave very different in practice depending on the internal  
transaction representation and management model.  You cannot turn off  
these side-effects, and you cannot tweak a non-MVCC-ish model to  
behave like an MVCC-ish model at runtime in any way that matters.



Second of all, it was not a weakness -- it was a deliberate choice  
of optimization -- it was a choice of OLAP over OLTP (and, let's be  
honest, for most databases on limited memory machines with low OLTP  
requirements, this was the correct choice until ballooning memories  
made the reverse true).



The rise of the Internet, with its massive OLTP load characteristic,  
kind of settled the issue.  It is true though that Oracle-like OLTP  
monsters have significantly higher resource overhead for storing the  
same set of records.  These days it is concurrency bottlenecks that  
will kill you.



So, is your claim that Oracle distributes better than Microsoft?  If  
so, why?



Very mature implementation of the concepts, and almost every  
conceivable mechanism and model for doing it is hidden under the  
hood.  Remember, they started introducing the relevant concepts ages  
ago in Oracle 7, though in practice it was mostly unusable until  
relatively recently.   Consequently, their implementation is easily  
the most general in that it works moderately well across the broadest  
number of use cases because they've been tweaking that aspect for  
years.  Other commercial implementations tend to only work for a much  
narrower set of use cases.  In short, Oracle has a long head start.



There are new transactional architectures in academia that should  
work better in a modern distributed environment than any of the  
current commercial adaptations of classical architectures to  
distributed environments.


And PostgreSQL will probably implement them long before Oracle or MS.



Ironically, a specific design decision that has created a fair amount  
of argument for years makes PostgreSQL the engine starting from the  
closest design point.  PostgreSQL does not support threading and only  
uses a single process per query execution, originally for portability  
and data safety reasons -- the extreme hackability would be difficult  
to do otherwise.  This made certain types of trivial parallelism for  
OLAP difficult.  On the other hand, it has had distributed lock  
functionality for a number of versions now.


If you look at newer models explicitly designed to make transactional  
database scale better across distributed systems, you find that they  
are built on a design requirement of single processes per resource,  
strict access serialization, no local parallelism, and distributed  
locks.  Which is not that far removed from where PostgreSQL is today,  
if you remove massive local concurrency support and its high overhead.  
There are a number of outfits (see www.greenplum.com for a very  
advanced implementation) that have hacked PostgreSQL to scale across  
very large clusters for OLAP by essentially making the necessary  
tweaks to approximate these types of models.  The next step would be  
to rip out a lot of expensive bits based on classical design  
assumptions that make distributed write loads scale poorly.


In a sense, a design choice that has traditionally put some limits on  
scaling PostgreSQL for OLAP put it in exactly the right place to make  
implementation of next-generation architectures as natural of an  
evolution as can be expected in this case.



J. Andrew Rogers

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread J. Andrew Rogers



On Apr 17, 2008, at 12:26 PM, Mark Waser wrote:
Actually, it's far worse than that.  For serious systems, most of  
the heavy lifting is done inside the database with stored procedures  
which are not standard AT ALL.  SQL is reasonably easy to port.   
Stored procedures that do a lot of work are not.



The standard is SQL/PSM, which looks similar to Oracle's PL/SQL (and  
PostgreSQL's pl/pgsql).  As a practical matter, support is not  
consistent enough or widespread enough for it to be entirely usable  
for purposes of portability though it is getting better.


To be fair, full SQL/PSM support will not be core in PostgreSQL until  
the next release.


J. Andrew Rogers

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread Stephen Reed

YKY,

I agree with your side of the debate about whole KB not fitting into RAM.  As a 
solution, I propose to partition the whole KB into the tiniest possible cached 
chunks, suitable for a single agent running on a host computer with RAM 
resources of at least one GB.  And I propose that AGI will consist not of one 
program running on one computer, but a vast multitude of separately hosted 
agents working in concert.

But my opinion of the OpenCyc concept coverage with respect to that of a human 
five-year old differs greatly from yours.  I concede that 20 OpenCyc facts 
are about the number a child might know, but in order to properly ground these 
concepts, I believe that a much larger number of feature vectors will have to 
be stored or available in abstracted form.   For example, there is the concept 
of the child's mother.  Properly grounding that one concept might require 
abstracting features from thousands of observations:
wet hair motherfar away motherangy mothermother hidden from viewmother in a 
crowdmother's voicemother in dim lightmother from belowand so on
Of course you can ignore fully grounded concepts as does current Cycorp for its 
applications, and as I will with Texai until it is past the bootstrap stage.

-Steve


Stephen L. Reed

Artificial Intelligence Researcher
http://texai.org/blog
http://texai.org
3008 Oak Crest Ave.
Austin, Texas, USA 78704
512.791.7860

- Original Message 
From: YKY (Yan King Yin) [EMAIL PROTECTED]
To: agi@v2.listbox.com
Sent: Thursday, April 17, 2008 3:58:43 PM
Subject: Re: [agi] database access fast enough?

 On 4/18/08, Mark Waser [EMAIL PROTECTED] wrote:
   Yes.  RAM is *HUGE*.  Intelligence is *NOT*.
 
  Really?  I will believe that if I see more evidence... right now I'm
 skeptical.

 And your *opinion* has what basis?  Are you arguing that RAM isn't huge?
 That's easily disprovable.  Or are you arguing that intelligence is huge?
 That too is easily disprovable.  Which one do I need to knock down?

The current OpenCyc KB is ~200 Mbs (correct me if I'm wrong).

The RAM size of current high-end PCs is ~10 Gbs.

My intuition estimates that the current OpenCyc is only about 10%-40%
of a 5 year-old human intelligence.

Plus, learning requires that we store a lot of hypotheses.  Let's say
1000-1 times the real KB.

That comes to 500Gb - 20Tb.

It seems that if we allow several years for RAM size to double a few
times, RAM may have a chance to catch up to the low end.  Obviously
not now.

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: http://www.listbox.com/member/?;
Powered by Listbox: http://www.listbox.com







  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Computational requirements of AGI (Re: [agi] database access fast enough?)

2008-04-17 Thread Matt Mahoney

--- Steve Richfield [EMAIL PROTECTED] wrote:
 On 4/17/08, Mark Waser [EMAIL PROTECTED] wrote:
 
  That's true as of now, but let's think one or two steps further:  Do
   you really think a mature AGI's (say with 3-6 year-old human
   intelligence) KB can reside in RAM, entirely?
  
 
  Yes.  RAM is *HUGE*.  Intelligence is *NOT*.
 
 
 Hmm, thinking on the keyboard...
 ~100E9 computing cells with ~50K inputs each, of which ~200 are active.
 One theory is that you would only have to carry the active inputs, plus some
 fraction of the inactive inputs while you watched for things to happen to
 make them active. Let's say that we must track ~1E3 inputs, for a total
 of 100E12 or one hundred trillion inputs. We could use fractal means to
 generate the original configuration (as biological brains probably do), very
 low precision arithmetic with statistical rounding, etc., which would reduce
 each input to just a few bytes to maintain, say ~10. This makes a total of
 1E15 or one quadrillion bytes to represent a simulated human's instantaneous
 state of construction. An entire checkpoint would take little more, because
 it would only include in addition the electrical state of each of the 100E9
 cells.
 
 Note however, that the *FUNCTIONAL* state would only be 1/5 of this estimate
 because 4/5 of the represented inputs are presently inactive, for a total of
 only 100 terabytes.
 
 Note that ~90% of those 100E9 cells are slow-responding glial cells, so
 while the state is large, the computational requirements may be well short
 of a petaflop.
 
 Of course, this makes a LOT of assumptions that no one has yet bothered to
 confirm in the laboratory, and I do NOT want to ignite an estimates war,
 so I invite constructive comments from anyone with more recent data than I
 have.

The Blue Brain project estimates 8000 synapses per neuron in mouse cortex.  I
haven't seen a more accurate estimate for humans, so your numbers are probably
as good as mine.  I estimate 10^11 neurons, 10^15 synapses (1 bit each) and a
response time of 100 ms, or 10^16 OPS to replicate the processing of a human
brain.

The memory requirement is considerably higher than the information content of
long term memory estimated by Landauer [1], about 10^9 bits.  This may be due
to the constraints of slow neurons, parallelism, and the pulsed binary nature
of nerve transmission.  For example, the lower levels of visual processing in
the brain involve massive replication of nearly identical spot filters which
could be simulated in a machine by scanning a small filter coefficient array
across the retina.  It also takes large numbers of nerves to represent a
continuous signal with any accuracy, e.g. fine motor control or distinguishing
nearly identical perceptions.

However my work with text compression suggests that the cost of modeling 1 GB
of text (about one human lifetime's worth) is considerably more than a few GB
of memory.  My guess is at least 10^12 bits just for ungrounded language
modeling.  If the model is represented as a set of (sparse) graphs, matrices,
or neural networks, that's about 10^13 OPS.

Remember that the goal of AGI is not to duplicate the human brain, but to do
the work that humans are now paid to do.  It still requires solving hard
problems like language, vision, and robotics, which consume a significant
fraction of the brain's computing power.  But what matters is that the cost of
AGI be less than human labor, currently US $10K per year worldwide and growing
at 3-4% (5% GDP growth - 1.5% population growth).  If my guess is right and
Moore's law continues (halving costs every 1.5 to 2 years), then AGI is at
least 10-15 years away.  If it actually turns out there are no shortcuts to
simulating the brain, then it is 30 years away.

1. Landauer, Tom, How much do people remember? Some estimates of the quantity
of learned information in long term memory, Cognitive Science (10) pp.
477-493, 1986.



-- Matt Mahoney, [EMAIL PROTECTED]

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread YKY (Yan King Yin)

On 4/18/08, Stephen Reed [EMAIL PROTECTED] wrote:

 I agree with your side of the debate about whole KB not fitting into RAM.  As 
 a solution, I propose to partition the whole KB into the tiniest possible 
 cached chunks, suitable for a single agent running on a host computer with 
 RAM resources of at least one GB.  And I propose that AGI will consist not of 
 one program running on one computer, but a vast multitude of separately 
 hosted agents working in concert.


Disk access rate is ~10 times faster than ethernet access rate.  IMO,
if RAM is not enough the next thing to turn to should be the harddisk.

Distributive AGI is a fascinating idea, but you have to solve a lot of
algorithmic problems to make it work.  If each agent has only a slice
of the full KB, the average commonsense query would require
cooperation among many agents.  That's a very challenging algorithmic
problem.  I'm content to do simple, single-machine AGI.


 But my opinion of the OpenCyc concept coverage with respect to that of a 
 human five-year old differs greatly from yours.  I concede that 20 
 OpenCyc facts are about the number a child might know, but in order to 
 properly ground these concepts, I believe that a much larger number of 
 feature vectors will have to be stored or available in abstracted form.   For 
 example, there is the concept of the child's mother.  Properly grounding that 
 one concept might require abstracting features from thousands of observations:

=
Yes, I actually agree with you -- I subconsciously tuned down my
estimates as I was talking to Mark =)

I think sensory processing is going to be a very hard problem, so we
should postpone sensory grounding as late as possible, and instead
focus on text.

Don't forget that the AGI needs to have *episodic* memory as well.  If
we include that, secondary storage is certainly needed.

YKY

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Re: [agi] database access fast enough?

2008-04-17 Thread J. Andrew Rogers



On Apr 17, 2008, at 3:32 PM, YKY (Yan King Yin) wrote:

Disk access rate is ~10 times faster than ethernet access rate.  IMO,
if RAM is not enough the next thing to turn to should be the harddisk.



Eh?  Ethernet latency is sub-millisecond, and in a highly tuned system  
approaches the 10 microsecond range for something local.  Much, much  
faster than disk if the remote node has your data in RAM and is  
relatively local.


Note that relatively local can mean geographically regional.  The  
round-trip RAM access time from my machine to a machine on the other  
side of town is a fraction of millisecond over the Internet connection  
(not hypothetical, actually measured at ~400 microseconds).  I wish  
disk access was even remotely that good.  And this was with  
inexpensive Gigabit Ethernet.


J. Andrew Rogers

---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

RE: [agi] database access fast enough?

2008-04-17 Thread Gary Miller

YKY Said:

The current OpenCyc KB is ~200 Mbs (correct me if I'm wrong).
The RAM size of current high-end PCs is ~10 Gbs.
My intuition estimates that the current OpenCyc is only about 10%-40% of a
5 year-old human intelligence.
Plus, learning requires that we store a lot of hypotheses.  Let's say
1000-1 times the real KB.
That comes to 500Gb - 20Tb.
It seems that if we allow several years for RAM size to double a few
times, 
 RAM may have a chance to catch up to the low end.  Obviously not now.

Don't forget about solid state hard drives (SSDs).  

Currently Solid State Drives speed up typical database applications by about
30 times.

And that's without stripping out all the old caching overhead code databases
used for handling the order of magnitude speed differences between RAM and
hard drives.

Large Storage Area Network Vendors like EMC are looking to SSD Drives to
eliminate IO bottlenecks in corporate applications where large
datawarehouses reach 20Tb very quickly.

And look for capacity to continue to double about every 18 months driving
the price down very quickly.  

And due to higher reliability and lower energy costs to run it won't be too
long before hard drive join the ranks 
of 8-track tape players, record players and 5 1/4 diskettes. 

http://searchstorage.techtarget.com/sDefinition/0,,sid5_gci1300939,00.html#

http://www.storagesearch.com/ssd-fastest.html


---
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

37 matches

Mail list logo