Re: Computational requirements of AGI (Re: [agi] database access fast enough?)
FYI: there is still some way to go by shrinking transistors. From current minimum of 45nm half-pitch down to probably 16nm. Possibly even 11nm but that is already questionable. This will ensure some 5 to 10 more years of Moore's low being fueled by transistor shrinking and roughly an order of magnitude growth of performance per fixed price. 11nm is probably the hard limit for transistor shrinking because some very generic research shows that gates of 5nm or less are really way too thin to prevent electrons from tunneling regardless of exact structure and material of the gate. See e.g. http://en.wikipedia.org/wiki/11_nanometer for more details. Regards Nikolay Steve Richfield wrote: Matt, A couple of comments your post that I generally agree with... On 4/19/08, *Matt Mahoney* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: 90% of which are glial cells and not (technically) neurons at all, though all we care about is whether or not they compute. My understanding is they carry passive signals. The last I heard, the ONLY thing that they know for sure is that when they impale them with an electrode, they only see slowly changing signals and nothing resembling bistable, spikes, etc. Unknown is whether they CAN change rapidly - or perhaps rare rapid changes are their important function?! Theories abound for glial cells, e.g. the one advanced ~4 years ago in Scientific American, where the author asserted that they assisted in the programming of synapses. Moore's law presumed a relatively unchanging architecture and rapidly advancing fabrication. This has broken down, now that transistors can easily be made SO small that the electrons jump right over the gates. Sure there will be further developments, e.g. multi-layer, but the easy stuff that Moore's law was build on is now GONE. Actually Moore's law holds pretty well back to about 1900 if you consider the computing power of mechanical adding machines. (I believe Kurzweil studied this). Moore's law is about the cost of computing, not the size of transistors. But, until they figure out something besides transistors to make computers from, Moore's law has worked in recent decades via transistor shrinkage, thereby making them cheaper. My point is that they can't shrink any more, so they aren't going to get any cheaper, except via slow improvements in methods of manufacturing the same (and not smaller/faster) parts. The proposed architecture that Josh and I have been discussing could bring this to the market for about the same cost as a PC in a couple of years - with adequate funding. I've heard that before. NOT using the SAME fabrication equipment! Other proposals involved new proposed fabrication technologies. 2. Some rich benefactor will step forward and make this happen over the loud objections of millions of devoutly religious. Nobody has that much money. AGI will happen because nobody wants to work for somebody else. While I agree with you regarding AGI, there are several people who could easily afford the 10K processor, or a knowledge-based Internet, e.g. Dr. Eliza. These appear to both be necessary as underlying tools to make AGI really work, and should both return a really quick profit - like in the first year or two. Steve Richfield *agi* | Archives http://www.listbox.com/member/archive/303/=now http://www.listbox.com/member/archive/rss/303/ | Modify http://www.listbox.com/member/?; Your Subscription [Powered by Listbox] http://www.listbox.com -- *Nikolay Ognyanov, PhD* Chief Technology Officer *TravelStoreMaker.com Inc.* http://www.travelstoremaker.com/ Phone: +359 2 933 3832 Fax: +359 2 983 6475 --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: Computational requirements of AGI (Re: [agi] database access fast enough?)
Matt, On 4/17/08, Matt Mahoney [EMAIL PROTECTED] wrote: Before giving my detailed comments, I'd like to comment that people who have spent decades in wet laboratories know a **LOT** more than they are willing to write down. Why? THEIR culture is to not write things down until you can PROVE them with CAPTURED laboratory EVIDENCE. Hence, a researcher may notice that he must look at ~200 synapses to find one that actually has an efficacy 0, but since there is no practical way of capturing and proving that 200, and who is ever sure exactly WHAT their sub-micron electrode is connected to in living tissue, they don't dare publish these numbers. However, their culture doesn't seem to prohibit them from TALKING about these things, and THAT is how I have come up with the numbers that I use - and have absolutely NO written evidence to suport. However, if you are REALLY interested in some of them, I could probably put you in contact with someone who would be willing to TALK about them from first-hand experience. The Blue Brain project estimates 8000 synapses per neuron in mouse cortex. I haven't read the report, but I presume that is a PHYSICAL number developed from microscopy. Typically, 1% have efficacies 0. I haven't seen a more accurate estimate for humans, so your numbers are probably as good as mine. Most came from William Calvin. The contact info on his web site actually gets to him, so that would be a good place to start for refinement. I estimate 10^11 neurons, 90% of which are glial cells and not (technically) neurons at all, though all we care about is whether or not they compute. 10^15 synapses (1 bit each) They appear to be performing rather precise analog computation. While there is a lot of noise in voltage, there is much less noise in current/ions. Further, either you must come up with interconnections from fractal means, which means that you need many more synapses, or you must store the topography, so you'll need a LOT more than 1 bit each. Either way, you'll have to allow for much more than one bit for the interconnection, plus a lot more for the characteristics that may also involve time-dependent things like differentiation (e.g. for temporal adjustments as in antique RTL logic) and integration (e.g. for averaging to detect low-level phenomena). and a response time of 100 ms, or 10^16 OPS to replicate the processing of a human brain. Glial cells constitute90% of the brain and are MUCH slower than this However, there is a double-pulse mechanism in spiking neurons that provide millisecond notice of significant events. Hence, more analysis is probably needed here, but this number could be off by an order of magnitude either way. The memory requirement is considerably higher than the information content of long term memory estimated by Landauer [1], about 10^9 bits. Apples and oranges. You are comparing completed memory with work-in-progress to figure out just what to remember. Computers, of course, also have large ratios between the space to store data, and the RAM needed to develop that data. This may be due to the constraints of slow neurons, parallelism, and the pulsed binary nature of nerve transmission. For example, the lower levels of visual processing in the brain involve massive replication of nearly identical spot filters which could be simulated in a machine by scanning a small filter coefficient array across the retina. It also takes large numbers of nerves to represent a continuous signal with any accuracy, e.g. fine motor control or distinguishing nearly identical perceptions. William Calvin and I had a long-standing argument about this. Finally, we sat on his pea gravel covered roof and pitch pea gravel at a target while having our arms blocked at various points by the other person. This was to separate this theory from mine that motions are a sort of successive approximation, where groups of neurons watch what we are doing and send corrective signals. If the massive theory was correct, even a small interruption of movement would have made a huge error in accuracy, whereas if the successive approximation theory was correct, we would only lose some of the very last corrections for a small loss in accuracy. You might try this experiment yourself, but it was pretty clear to us that we lost amazingly little accuracy by having our throws physically interrupted. However my work with text compression suggests that the cost of modeling 1 GB of text (about one human lifetime's worth) is considerably more than a few GB of memory. My guess is at least 10^12 bits just for ungrounded language modeling. If the model is represented as a set of (sparse) graphs, matrices, or neural networks, that's about 10^13 OPS. Remember that the goal of AGI is not to duplicate the human brain, but to do the work that humans are now paid to do. It still requires solving hard problems like language, vision, and robotics, which consume a significant
Re: Computational requirements of AGI (Re: [agi] database access fast enough?)
--- Steve Richfield [EMAIL PROTECTED] wrote: On 4/17/08, Matt Mahoney [EMAIL PROTECTED] wrote: The Blue Brain project estimates 8000 synapses per neuron in mouse cortex. I haven't read the report, but I presume that is a PHYSICAL number developed from microscopy. Typically, 1% have efficacies 0. If that's true, then the off synapses carry -lg(.99) = .014 bits of information and the on synapses carry -lg(.01) = 6.64 bits, for an average of 0.08 bits per synapse. I estimate 10^11 neurons, 90% of which are glial cells and not (technically) neurons at all, though all we care about is whether or not they compute. My understanding is they carry passive signals. The memory requirement is considerably higher than the information content of long term memory estimated by Landauer [1], about 10^9 bits. Apples and oranges. You are comparing completed memory with work-in-progress to figure out just what to remember. Computers, of course, also have large ratios between the space to store data, and the RAM needed to develop that data. Landauer measured human ability to recall all sorts of data like pictures, spoken or written lists of random words or numbers, music clips, etc. What's missing is short term memory and some memory associated with perception and motor skills. I haven't seen any good numbers for these. But you are right that there is a difference between information content and the memory needed to represent it. Remember that the goal of AGI is not to duplicate the human brain, but to do the work that humans are now paid to do. It still requires solving hard problems like language, vision, and robotics, which consume a significant fraction of the brain's computing power. This all sounds SO much like the 1960s mantra from Carnegie Mellon. At minimum, it would seem necessary to distill just what it was that they got wrong that present AGI folk have right. If I were an investor, this would be the FIRST think that I would want to hear and understand. Present AGI folks haven't got it right yet either. Moore's law presumed a relatively unchanging architecture and rapidly advancing fabrication. This has broken down, now that transistors can easily be made SO small that the electrons jump right over the gates. Sure there will be further developments, e.g. multi-layer, but the easy stuff that Moore's law was build on is now GONE. Actually Moore's law holds pretty well back to about 1900 if you consider the computing power of mechanical adding machines. (I believe Kurzweil studied this). Moore's law is about the cost of computing, not the size of transistors. The proposed architecture that Josh and I have been discussing could bring this to the market for about the same cost as a PC in a couple of years - with adequate funding. I've heard that before. IMHO, one of two things will happen: 1. The Christians will prevail and this will NEVER EVER be allowed to happen, or, Religion seems to be silent on technology when it doesn't involve human manipulation (cloning, stem cell research, etc). I foresee ethical problems with technologies like brain implants, uploading, reprogramming neurons, etc. on healthy people. 2. Some rich benefactor will step forward and make this happen over the loud objections of millions of devoutly religious. Nobody has that much money. AGI will happen because nobody wants to work for somebody else. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On 4/18/08, J. Andrew Rogers [EMAIL PROTECTED] wrote: On Apr 17, 2008, at 3:32 PM, YKY (Yan King Yin) wrote: Disk access rate is ~10 times faster than ethernet access rate. IMO, if RAM is not enough the next thing to turn to should be the harddisk. Eh? Ethernet latency is sub-millisecond, and in a highly tuned system approaches the 10 microsecond range for something local. Much, much faster than disk if the remote node has your data in RAM and is relatively local. Note that relatively local can mean geographically regional. The round-trip RAM access time from my machine to a machine on the other side of town is a fraction of millisecond over the Internet connection (not hypothetical, actually measured at ~400 microseconds). I wish disk access was even remotely that good. And this was with inexpensive Gigabit Ethernet. LOL... you're right, I forgot to consider latency. Ethernet is much faster than harddisk if we measure access times. But there is another factor: Harddisk is owned by the user. Memory over the net is owned by others, so must be shared. It's not easy to arrange a distributed and cooperative storage scheme. It's hard enough to solve core AGI problems, I simply don't have time to do deal with that. Solid State Disks seems to be a promising solution. YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
Plus, learning requires that we store a lot of hypotheses. Let's say 1000-1 times the real KB. I reject this hypothesis as ludicrously incorrect. - Original Message - From: YKY (Yan King Yin) [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, April 17, 2008 4:58 PM Subject: Re: [agi] database access fast enough? On 4/18/08, Mark Waser [EMAIL PROTECTED] wrote: Yes. RAM is *HUGE*. Intelligence is *NOT*. Really? I will believe that if I see more evidence... right now I'm skeptical. And your *opinion* has what basis? Are you arguing that RAM isn't huge? That's easily disprovable. Or are you arguing that intelligence is huge? That too is easily disprovable. Which one do I need to knock down? The current OpenCyc KB is ~200 Mbs (correct me if I'm wrong). The RAM size of current high-end PCs is ~10 Gbs. My intuition estimates that the current OpenCyc is only about 10%-40% of a 5 year-old human intelligence. Plus, learning requires that we store a lot of hypotheses. Let's say 1000-1 times the real KB. That comes to 500Gb - 20Tb. It seems that if we allow several years for RAM size to double a few times, RAM may have a chance to catch up to the low end. Obviously not now. YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
I agree with your side of the debate about whole KB not fitting into RAM. As a solution, I propose to partition the whole KB into the tiniest possible cached chunks, suitable for a single agent running on a host computer with RAM resources of at least one GB. And I propose that AGI will consist not of one program running on one computer, but a vast multitude of separately hosted agents working in concert. Um. Neither side is arguing that the whole KB fit into RAM. I'm arguing that the necessary *core* for intelligence plus enough cached chunks (as you phrase it) to support the current thought processes WILL fit into RAM. It's obviously ludicrous that all the world's knowledge is going to fit into RAM at one time. - Original Message - From: Stephen Reed To: agi@v2.listbox.com Sent: Thursday, April 17, 2008 5:20 PM Subject: Re: [agi] database access fast enough? YKY, I agree with your side of the debate about whole KB not fitting into RAM. As a solution, I propose to partition the whole KB into the tiniest possible cached chunks, suitable for a single agent running on a host computer with RAM resources of at least one GB. And I propose that AGI will consist not of one program running on one computer, but a vast multitude of separately hosted agents working in concert. But my opinion of the OpenCyc concept coverage with respect to that of a human five-year old differs greatly from yours. I concede that 20 OpenCyc facts are about the number a child might know, but in order to properly ground these concepts, I believe that a much larger number of feature vectors will have to be stored or available in abstracted form. For example, there is the concept of the child's mother. Properly grounding that one concept might require abstracting features from thousands of observations: a.. wet hair mother b.. far away mother c.. angy mother d.. mother hidden from view e.. mother in a crowd f.. mother's voice g.. mother in dim light h.. mother from below i.. and so on Of course you can ignore fully grounded concepts as does current Cycorp for its applications, and as I will with Texai until it is past the bootstrap stage. -Steve Stephen L. Reed Artificial Intelligence Researcher http://texai.org/blog http://texai.org 3008 Oak Crest Ave. Austin, Texas, USA 78704 512.791.7860 - Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, April 17, 2008 3:58:43 PM Subject: Re: [agi] database access fast enough? On 4/18/08, Mark Waser [EMAIL PROTECTED] wrote: Yes. RAM is *HUGE*. Intelligence is *NOT*. Really? I will believe that if I see more evidence... right now I'm skeptical. And your *opinion* has what basis? Are you arguing that RAM isn't huge? That's easily disprovable. Or are you arguing that intelligence is huge? That too is easily disprovable. Which one do I need to knock down? The current OpenCyc KB is ~200 Mbs (correct me if I'm wrong). The RAM size of current high-end PCs is ~10 Gbs. My intuition estimates that the current OpenCyc is only about 10%-40% of a 5 year-old human intelligence. Plus, learning requires that we store a lot of hypotheses. Let's say 1000-1 times the real KB. That comes to 500Gb - 20Tb. It seems that if we allow several years for RAM size to double a few times, RAM may have a chance to catch up to the low end. Obviously not now. YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com -- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. -- agi | Archives | Modify Your Subscription --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
--- Mark Waser [EMAIL PROTECTED] wrote: Um. Neither side is arguing that the whole KB fit into RAM. I'm arguing that the necessary *core* for intelligence plus enough cached chunks (as you phrase it) to support the current thought processes WILL fit into RAM. It's obviously ludicrous that all the world's knowledge is going to fit into RAM at one time. What is your estimate of the quantity of all the world's knowledge? (Or the amount needed to achieve AGI or some specific goal?) Google probably keeps a copy of the searchable part of the internet in about 1 PB of RAM, but this isn't AGI yet. I suppose an internet-wide distributed system could cache about 1 EB (10^18 bytes). -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On 4/18/08, Mark Waser [EMAIL PROTECTED] wrote: Um. Neither side is arguing that the whole KB fit into RAM. I'm arguing that the necessary *core* for intelligence plus enough cached chunks (as you phrase it) to support the current thought processes WILL fit into RAM. It's obviously ludicrous that all the world's knowledge is going to fit into RAM at one time. Then we have no disagreement. Notice that the loading-on-demand chunks require that we *duplicate* data. For example facts about JK Rowling can be in a literature chunk as well as a entrepreneur chunk. The question is whether DBMSs support this. Materialized views may be the answer (http://en.wikipedia.org/wiki/Materialized_view). As I said before, minimizing disk access is still an important issue. And all this is peripheral to AGI. I wish I can just focus on AGI algorithms! YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On 4/18/08, Matt Mahoney [EMAIL PROTECTED] wrote: What is your estimate of the quantity of all the world's knowledge? (Or the amount needed to achieve AGI or some specific goal?) Matt, The world's knowledge is irrelevant to the goal of AGI. What we need is to build a commonsense AGI and then let the it control other expert systems with specialized knowledge. So the pertinent question is how large is the core commonsense KB. I guess anywhere from 1Gb to 100Gb is possible, excluding hypotheses from learning, and episodic memory. YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
--- Mark Waser [EMAIL PROTECTED] wrote: What is your estimate of the quantity of all the world's knowledge? (Or the amount needed to achieve AGI or some specific goal?) I have no idea (and the question is further muddled by what knowledge is and what formats are included). The question itself is fundamentally nonsensical in it's current form. I mean either algorithmic complexity, or more practically, how much memory you need (which depends on the data representation). But really, it depends on the goal, which I have been trying unsuccessfully for years to get YKY to pin down. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On Apr 16, 2008, at 9:51 PM, YKY (Yan King Yin) wrote: Typically we need to retrieve many nodes from the DB to do inference. The nodes may be scattered around the DB. So it may require *many* disk accesses. My impression is that most DBMS are optimized for complex queries but not for large numbers of simple retrievals -- am I correct about this? No, you are not correct about this. All good database engines use a combination of clever adaptive cache replacement algorithms (read: keeps stuff you are most likely to access next in RAM) and cost-based optimization (read: optimizes performance by adaptively selecting query execution algorithms based on measured resource access costs) to optimize performance across a broad range of use cases. For highly regular access patterns (read: similar query types and complexity), the engine will converge on very efficient access patterns and resource management that match this usage. For irregular access patterns, it will attempt to dynamically select the best options given recent access history and resource cost statistics -- not always the best result (on occasion hand optimization could do better), but more likely to produce good results than simpler rule-based optimization on average. Note that by good database engine I am talking engines that actually support these kinds of tightly integrated and adaptive management features: Oracle, DB2, PostgreSQL, et al. This does *not* include MySQL, which is a naive and relatively non-adaptive engine, and which scales much worse and is generally slower than PostgreSQL anyway if you are looking for a free open source solution. I would also point out that different engines are optimized for different use cases. For example, while Oracle and PostgreSQL share the same transaction model, Oracle design decisions optimized for massive numbers of small concurrent update transactions and PostgreSQL design decisions optimized for massive numbers of small concurrent insert/delete transaction. Databases based on other transaction models, such as IBM's DB2, sacrifice extreme write concurrency for superior read-only performance. There are unavoidable tradeoffs with such things, so the market has a diverse ecology of engines that have chosen a different set of tradeoffs and buyers should be aware of what these tradeoffs are if scalable performance is a criteria. J. Andrew Rogers --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On 4/17/08, J. Andrew Rogers [EMAIL PROTECTED] wrote: No, you are not correct about this. All good database engines use a combination of clever adaptive cache replacement algorithms (read: keeps stuff you are most likely to access next in RAM) and cost-based optimization (read: optimizes performance by adaptively selecting query execution algorithms based on measured resource access costs) to optimize performance across a broad range of use cases. For highly regular access patterns (read: similar query types and complexity), the engine will converge on very efficient access patterns and resource management that match this usage. For irregular access patterns, it will attempt to dynamically select the best options given recent access history and resource cost statistics -- not always the best result (on occasion hand optimization could do better), but more likely to produce good results than simpler rule-based optimization on average. Note that by good database engine I am talking engines that actually support these kinds of tightly integrated and adaptive management features: Oracle, DB2, PostgreSQL, et al. This does *not* include MySQL, which is a naive and relatively non-adaptive engine, and which scales much worse and is generally slower than PostgreSQL anyway if you are looking for a free open source solution. I would also point out that different engines are optimized for different use cases. For example, while Oracle and PostgreSQL share the same transaction model, Oracle design decisions optimized for massive numbers of small concurrent update transactions and PostgreSQL design decisions optimized for massive numbers of small concurrent insert/delete transaction. Databases based on other transaction models, such as IBM's DB2, sacrifice extreme write concurrency for superior read-only performance. There are unavoidable tradeoffs with such things, so the market has a diverse ecology of engines that have chosen a different set of tradeoffs and buyers should be aware of what these tradeoffs are if scalable performance is a criteria. Thanks for the info -- I studied database systems almost a decade ago, so I can hardly remember the details =) ARC (Adaptive Cache Replacement) seems to be one of the most popular methods, and it's based on keeping track of frequently used and recently used. Unfortunately, for AGI / inference purposes, those may not be the right optimization objectives. The requirement of inference is that we need to access a lot of *different* nodes, but the same nodes may not be required many times. Perhaps what we need is to *bundle* up nodes that are associated with each other, so we can read a whole block of nodes with 1 disk access. This requires a very special type of storage organization -- it seems that existing DBMSs don't have it =( YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
To use an example, If a lot of people search for Harry Porter, then a conventional database system would make future retrieval of the Harry Porter node faster. But the requirement of the inference system is such that, if Harry Porter is fetched, then we would want *other* things that are associated with Harry Porter to be retrieved faster in the future, for example items such as JK Rowling or fantasy fiction. YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
No. You are not correct. Most DBMS's compile and optimize complex queries as a separate operation before doing data retrieval -- but even the most complex query is actually implemented as a series of simple retrievals (which is what the database is truly designed to do). On the other hand, communication to and from your database -- particularly across a network -- is very likely to be a speed problem. My solution is to actually implement your inference in the database engine. That way the database handles all of your memory management, caching, storage, etc., etc. - Original Message - From: YKY (Yan King Yin) [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, April 17, 2008 12:51 AM Subject: [agi] database access fast enough? For those using database systems for AGI, I'm wondering if the data retrieval rate would be a problem. Typically we need to retrieve many nodes from the DB to do inference. The nodes may be scattered around the DB. So it may require *many* disk accesses. My impression is that most DBMS are optimized for complex queries but not for large numbers of simple retrievals -- am I correct about this? YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
YKY Here is what I learned from implementing the Texai knowledge base. It persists symbolic statements about concepts. I designed an SQL schema to persist OpenCyc in its full CycL form, in MySQL on SuSE 64-bit Linux. My Java application driving MySQL dramatically slowed down when the number of rows exceeded 20 million as compared to the initial load of 5 million rows. I then tried Oracle Berkeley DB Java Edition (open source) which provides no SQL query facility, instead one programs directly to its API for inserts, queries, updates and so forth. It is faster than MySQL for my large KB, but uses four times as much disk space due to its method of inserting new rows at the end of the file, and having lots of free space.I then studied partitioning, which means to break up the monolithic KB into smaller databases in which accesses are expected to be clustered. And I studied sharding, which means to slice up a database into logical segments that are hosted by separate db engines, typically with separate disk filesystems.I began writing my own storage engine, for a fast, space-efficient, partitioned and sharded knowledge base, soon realizing that this was far too big a task for a sole developer. Revisiting my project object persistence needs, and thinking more about interoperability with semantic web technologies, I decided to convert my existing KB to an RDF-compatible form and then to evaluate RDF quad stores.After some analysis, I chose to evaluate the Sesame 2 RDF store, which is Java based and open source and thus very compatible with my other components. In Texai, RDF queries have a simpler form than SQL queries when retrieving logical statements from a store. For example, in SQL my schema had to provide separate tables for each object type: concept term, functional term, string, boolean, long integer, double, statement, arity-1 rule, arity-2 rule, arity-3 rule, arity-4 rule and arity-5 rule. Many of these tables would have to be joined for a typical query (e.g. what concepts subsume a given concept?). My development Linux computer has 4 GB of memory, and Linux has a feature called tmpfs which permits mounting a directory in RAM. I partitioned my KB into separate KBs of less than six million rows each. In Sesame these are less than one GB in size and I can therefore put any one of them in tmpfs - running that application-relevant part of the KB at RAM speed. Experiments demonstrate about a 10 times speedup. When Texai is deployed, I expect that the application will log its transactions to disk as a background process as a safeguard against losing the volatile KB in tmpfs.Hope this information is useful. -Steve Stephen L. Reed Artificial Intelligence Researcher http://texai.org/blog http://texai.org 3008 Oak Crest Ave. Austin, Texas, USA 78704 512.791.7860 - Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, April 16, 2008 11:51:35 PM Subject: [agi] database access fast enough? For those using database systems for AGI, I'm wondering if the data retrieval rate would be a problem. Typically we need to retrieve many nodes from the DB to do inference. The nodes may be scattered around the DB. So it may require *many* disk accesses. My impression is that most DBMS are optimized for complex queries but not for large numbers of simple retrievals -- am I correct about this? YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
And, as far as I'm concerned, the last clause of item 4 and the transition to 5 and 6 clearly demonstrates why Steve seems to be making a lot of progress compared to everyone else. - Original Message - From: Stephen Reed To: agi@v2.listbox.com Sent: Thursday, April 17, 2008 10:23 AM Subject: Re: [agi] database access fast enough? YKY Here is what I learned from implementing the Texai knowledge base. It persists symbolic statements about concepts. 1.. I designed an SQL schema to persist OpenCyc in its full CycL form, in MySQL on SuSE 64-bit Linux. My Java application driving MySQL dramatically slowed down when the number of rows exceeded 20 million as compared to the initial load of 5 million rows. 2.. I then tried Oracle Berkeley DB Java Edition (open source) which provides no SQL query facility, instead one programs directly to its API for inserts, queries, updates and so forth. It is faster than MySQL for my large KB, but uses four times as much disk space due to its method of inserting new rows at the end of the file, and having lots of free space. 3.. I then studied partitioning, which means to break up the monolithic KB into smaller databases in which accesses are expected to be clustered. And I studied sharding, which means to slice up a database into logical segments that are hosted by separate db engines, typically with separate disk filesystems. 4.. I began writing my own storage engine, for a fast, space-efficient, partitioned and sharded knowledge base, soon realizing that this was far too big a task for a sole developer. 5.. Revisiting my project object persistence needs, and thinking more about interoperability with semantic web technologies, I decided to convert my existing KB to an RDF-compatible form and then to evaluate RDF quad stores. 6.. After some analysis, I chose to evaluate the Sesame 2 RDF store, which is Java based and open source and thus very compatible with my other components. In Texai, RDF queries have a simpler form than SQL queries when retrieving logical statements from a store. For example, in SQL my schema had to provide separate tables for each object type: concept term, functional term, string, boolean, long integer, double, statement, arity-1 rule, arity-2 rule, arity-3 rule, arity-4 rule and arity-5 rule. Many of these tables would have to be joined for a typical query (e.g. what concepts subsume a given concept?). 7.. My development Linux computer has 4 GB of memory, and Linux has a feature called tmpfs which permits mounting a directory in RAM. I partitioned my KB into separate KBs of less than six million rows each. In Sesame these are less than one GB in size and I can therefore put any one of them in tmpfs - running that application-relevant part of the KB at RAM speed. Experiments demonstrate about a 10 times speedup. 8.. When Texai is deployed, I expect that the application will log its transactions to disk as a background process as a safeguard against losing the volatile KB in tmpfs. Hope this information is useful. -Steve Stephen L. Reed Artificial Intelligence Researcher http://texai.org/blog http://texai.org 3008 Oak Crest Ave. Austin, Texas, USA 78704 512.791.7860 - Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, April 16, 2008 11:51:35 PM Subject: [agi] database access fast enough? For those using database systems for AGI, I'm wondering if the data retrieval rate would be a problem. Typically we need to retrieve many nodes from the DB to do inference. The nodes may be scattered around the DB. So it may require *many* disk accesses. My impression is that most DBMS are optimized for complex queries but not for large numbers of simple retrievals -- am I correct about this? YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com -- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. -- agi | Archives | Modify Your Subscription --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On Apr 17, 2008, at 2:50 AM, YKY (Yan King Yin) wrote: ARC (Adaptive Cache Replacement) seems to be one of the most popular methods, and it's based on keeping track of frequently used and recently used. Unfortunately, for AGI / inference purposes, those may not be the right optimization objectives. It is a cache replacement algorithm, what would be a right optimization objective for such an algorithm? There is a lot of cleverness in the use of the cache to maximize cache efficiency beyond the cache replacement algorithm -- it is one of the most heavily engineered parts of a database engine. As an FYI, ARC is patented by IBM. PostgreSQL uses a different but similar algorithm that is indistinguishable from ARC in benchmarks (having implemented ARC briefly, not realizing that it was patented). The requirement of inference is that we need to access a lot of *different* nodes, but the same nodes may not be required many times. Perhaps what we need is to *bundle* up nodes that are associated with each other, so we can read a whole block of nodes with 1 disk access. This requires a very special type of storage organization -- it seems that existing DBMSs don't have it =( Again, most good database engines can do this, as it is a standard access pattern for databases, and most databases can solve this problem multiple ways. As an example, clustering and index- organization features in databases address your issue here. It is pretty difficult to generate an access pattern use case that they cannot be optimized for with a good database engine. They are very densely engineered pieces of software, designed to be very fast while scaling well in multiple dimensions and adapting to varying workloads. On the other hand, if your use case is simple enough you can gain some significant speed for modest effort by writing your own engine that is purpose-built to be optimized for your needs. J. Andrew Rogers --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On Apr 17, 2008, at 6:07 AM, Mark Waser wrote: I have to laugh at your total avoidance of Microsoft SQL Server which is arguably faster and better scaling for truly mixed use than everything except possibly Oracle on ordinary hardware; which is much easier to use than Oracle; and which is the easiest to actually put *GOOD* code in the database engine itself (particularly when compared to Oracle's *REALLY* poor java imitation). Discussing SQL Server does not generalize well in that they reimplement the core engine design with almost every release once they realize they hosed the design with the last release. For example, up until SQL Server 2005 the transaction engine was weak such that PostgreSQL could spank it in transaction throughput -- in 2005 they switched to a transaction model more like PostgreSQL and Oracle and gained some parity. SQL Server still does not really distribute all that easily, unlike Oracle or PostgreSQL. SQL Server versions before the current two year old one were pretty much dogs in a lot of ways. The most recent version is as you state a pretty solid database engine. Oracle is a major pain in the ass to use but does scale well, though for many OLTP loads it is barely faster than PostgreSQL these days. If putting your code in the engine is the goal, PostgreSQL wins by a country mile. The entire engine from front to back is deeply hackable with very clean APIs and you can even safely bind binary code into the engine at runtime. That the transaction engine scales quite well is just a bonus. People have already written hooks for a dozen languages into it. I've written performance-sensitive customizations of PostgreSQL in the past, and for purposes like that it can often be much faster than the commercial alternatives, as the alternatives tend to be relatively feature poor and shallow when it comes to engine customization. Making deep and very flexible customization a safe core feature was a design decision tradeoff in PostgreSQL that is somewhat unique to it. You can do a lot of really cool software implementation tricks with it that Oracle and SQL Server do not do. J. Andrew Rogers --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On 4/17/08, Mark Waser [EMAIL PROTECTED] wrote: You *REALLY* need to get up to speed on current database systems before you make more ignorant statements. First off, *most* databases RARELY go to the disk for reads. Memory is cheap and the vast majority of complex databases are generally small enough that they are normally held in memory during normal operation. That's true as of now, but let's think one or two steps further: Do you really think a mature AGI's (say with 3-6 year-old human intelligence) KB can reside in RAM, entirely? Next, I suspect that whatever bundling you're talking about is likely to be along field boundaries and is likely going to be akin to just reading an entire FIELD table into memory (that will have the exact same structure as all other field tables but will be contiguous on disk so as to promote fast loads). To clarify what I mean: 1. the DB contains a large number of facts / rules (perhaps stored as rows in SQL parlance) 2. many of these rows have to be fetched for inference (Resolution tests if a rule leads to a successful proof, but more often than not the rules are discarded). 3. the rows are scattered all around the DB For example, let say I want to infer something about Harry Porter and JK Rowling, I would want to fetch these facts / rules: 1. Harry Porter is a successful book series 2. Harry Porter belongs to the fantasy genre 3. JK Rowling is the author of Harry Porter 4. JK Rowling is now richer than Queen Elizabeth II. etc... But I would probably NOT need facts / rules like: 1. Einstein is the creator of General Relativity 2. Water is heavier than oil etc... So we should keep track of what rules are usually used *together*, and perhaps bring them into physically contagious storage. I'm not sure which DB feature(s) allow this... YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
Hi Stephen, Thanks for sharing this! VERY few people have experience with this stuff... On 4/17/08, Stephen Reed [EMAIL PROTECTED] wrote: 4. I began writing my own storage engine, for a fast, space-efficient, partitioned and sharded knowledge base, soon realizing that this was far too big a task for a sole developer. That seems like what we actually need. My development Linux computer has 4 GB of memory, and Linux has a feature called tmpfs which permits mounting a directory in RAM. I partitioned my KB into separate KBs of less than six million rows each. In Sesame these are less than one GB in size and I can therefore put any one of them in tmpfs - running that application-relevant part of the KB at RAM speed. Experiments demonstrate about a 10 times speedup. If the inference requires a rule outside the sub-KB, you'd have to do a very expensive swap. I think this only works if you're sure the entire inference is contained within a sub-KB. YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
Hi Mark, This is, by the way, my primary complaint about Novamente -- far too much energy, mind-space, time, and effort has gone into optimizing and repeatedly upgrading the custom atom table that should have been built on top of existing tools instead of being built totally from scratch. Really, work on the AtomTable has been a small percentage of work on the Novamente Cognition Engine ... and, the code running the AtomTable is now pretty much the same as it was in 2001 (though it was tweaked to make it 64-bit compatible, back in 2004 ... and there has been ongoing bug-removal as well...). We wrote some new wrappers for the AtomTable last year (based on STL containers), but that didn't affect the internals, just the API. It's true that a highly-efficient, highly-customizable graph database could potentially serve the role of the AtomTable, within the NCE or OpenCog. But that observation is really not such a big deal. Potentially, one could just wrap someone else's graph DB behind the 2007 AtomTable API, and this change would be completely transparent to the AI processes using the AtomTable. However, I'm not convinced this would be a good idea. There are a lot of useful specialized indices in the AtomTable, and replicating all this in some other graph DB would wind up being a lot of work ... and we could use that time/effort on other stuff instead Using a relational DB rather than a graph DB is not appropriate for the NCE design, however. But we've been over this before... And, this is purely a software implementation issue rather than an AI issue, of course. The NCE and OpenCog designs require **some** graph or hypergraph DB which supports the manual and automated creation of complex customized indices ... and supports refined cognitive control over what lives on disk and what lives in RAM, rather than leaving this up to some non-intelligent automated process. Given these requirements, the choice of how to realize them in software is not THAT critical ... and what we have there now works -- Ben G --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On 4/17/08, J. Andrew Rogers [EMAIL PROTECTED] wrote: Again, most good database engines can do this, as it is a standard access pattern for databases, and most databases can solve this problem multiple ways. As an example, clustering and index-organization features in databases address your issue here. Thanks... clustered indexing looks promising, but I need to study it in more details to see if it really solves the problem... YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
YKY said: If the inference requires a rule outside the sub-KB, you'd have to do a very expensive swap. I think this only works if you're sure the entire inference is contained within a sub-KB. Right. I envision Texai deployed as distributed agents operating within a hierarchical control system. Each agent's mission will be scoped to require immediate access to only a cache of some KB partition. Hopefully infrequent, cache misses will incur the penalty you mention, either to local disk, or worse - to the network. I also expect the system to be adaptive to whatever the user's computer allows with regard to resources (e.g. more RAM begets faster response). I am also considering torrent-style transfers to satisfy cache misses. As you point out an AGI's KB query is likely to access other linked objects (e.g. spreading activation search). So given that users will likely have asymmetric Internet connection bandwidth, It may be faster for large chunks of cache-filling KB data to be obtained simultaneously in slices from a multitude of collaborating peer agents. -Steve Stephen L. Reed Artificial Intelligence Researcher http://texai.org/blog http://texai.org 3008 Oak Crest Ave. Austin, Texas, USA 78704 512.791.7860 Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
--- YKY (Yan King Yin) [EMAIL PROTECTED] wrote: For those using database systems for AGI, I'm wondering if the data retrieval rate would be a problem. I analyzed the scalability of distributed indexing for my thesis (linked at http://www.mattmahoney.net/agi.html ). For data randomly distributed in a vector space model up to log n dimensions, storage is O(n log n), retrieval time is effectively O(log n) and update is O(log^2 n). In practice you can do better because data tends to cluster, reducing the effective number of dimensions, and because accesses tend to be distributed nonuniformly. Data accessed frequently will tend to be cached in nearby nodes. I realize you are asking about the relational model, but you can implement the most common transactions, e.g. retrieving or updating a small number of records at a time, by storing records of the form author timestamp table field=value field=value This also gives you transaction logging, rollback, and authentication, which will be important in any database with lots of users (I assume AGI). However I don't think it will be as powerful as records of the form author timestamp arbitrary_text. To use an example, If a lot of people search for Harry Porter, then a conventional database system would make future retrieval of the Harry Porter node faster. But the requirement of the inference system is such that, if Harry Porter is fetched, then we would want *other* things that are associated with Harry Porter to be retrieved faster in the future, for example items such as JK Rowling or fantasy fiction. A huge relational database would retrieve the fact that Harry Porter won a gold medal for the high jump in the 1908 Olympics. A better language model (like Google) might figure out that you meant Harry Potter :-) -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
Clustered indexing *WILL* solve your problem if you're willing to include all the data you're going to need in the index. It's definitely a trade-off . . . . but arguably a solid one. - Original Message - From: YKY (Yan King Yin) [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, April 17, 2008 1:03 PM Subject: Re: [agi] database access fast enough? On 4/17/08, J. Andrew Rogers [EMAIL PROTECTED] wrote: Again, most good database engines can do this, as it is a standard access pattern for databases, and most databases can solve this problem multiple ways. As an example, clustering and index-organization features in databases address your issue here. Thanks... clustered indexing looks promising, but I need to study it in more details to see if it really solves the problem... YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On 4/18/08, Mark Waser [EMAIL PROTECTED] wrote: Yes. RAM is *HUGE*. Intelligence is *NOT*. Really? I will believe that if I see more evidence... right now I'm skeptical. Also, I'm designing a learning algorithm that stores *hypotheses* in the KB along with accepted rules. This will multiply the size of the KB by a factor. YKY PS: In my last message, contagious should be contiguous... =) --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On Apr 17, 2008, at 9:08 AM, Mark Waser wrote: Yes, the newest versions of PostgeSQL could spank SQL Server 2000 after it was several years old. One tremendous advantage of PostgreSQL is it's very short development cycle. Actually, this was a fundamental and known weakness in the SQL Server 2000 transactional model, being more like DB2 than Oracle. Because PostgreSQL has used the same kind of model as Oracle -- and for a very long time -- it has always been relatively strong at OLTP throughput. Until SQL Server 2005, the Microsoft offering was never really competitive. It had little to do with development timelines. On the other hand, PostgreSQL was a bit of a dog at OLAP until relatively recently. You imply that the performance is due to some kind of linear development path, but in fact SQL Server 2005 changed its internal model to be like Oracle and PostgreSQL so that it could be competitive at OLTP. It is a matter of algorithm selection and tradeoffs, not engineering effort. SQL Server (until two years ago) has always had relatively poor lock concurrency, but gave very good baseline OLAP performance as a consequence of that decision. The reality is that it is much easier to make the Oracle/Postgres model perform satisfactorily at OLAP than to make the old SQL Server model perform satisfactorily at OLTP. -- in 2005 they switched to a transaction model more like PostgreSQL and Oracle and gained some parity. SQL Server still does not really distribute all that easily, unlike Oracle or PostgreSQL. Have you ever worked with an Oracle distributed database? Oracle does not distribute well. I've worked with very large databases on several major platforms, including Oracle and SQL Server in many different guises. Oracle's parallel implementation may not distribute that well, but that is because traditional transactional semantics are *theoretically incapable* of distributing well. To the extent it is possible at all, Oracle does a very good job at making it work. There are new transactional architectures in academia that should work better in a modern distributed environment than any of the current commercial adaptations of classical architectures to distributed environments. Oracle only scales well when you know how to properly use it. In most installations that I've seen, Oracle underperforms even SQL Server 2000 because the DBA didn't do the necessary to make it perform optimally (because Oracle is *NOT* average person friendly). I've made *a lot* of money optimizing people's Oracle installations that I shouldn't have been able to make if Oracle could get out of it's own way. No argument here, one of the major problems of Oracle is that it is bloody impossible to use well without a full-time staff. I spent many years solving scaling problems on extremely large Oracle systems. The insidiousness of PostgreSQL in the market is that it is very Oracle- like at a high-level but *massively* simpler and easier to use and administer while still delivering much of the performance and a significant subset of the features of Oracle. SQL Server has done well against Oracle for similar reasons. The main problem with SQL Server these days is that it does not run on Unix. Most of the major historical suckiness does not apply to the current version. Making deep and very flexible customization a safe core feature was a design decision tradeoff in PostgreSQL that is somewhat unique to it. You can do a lot of really cool software implementation tricks with it that Oracle and SQL Server do not do. Yes. The biggest problems with PostgreSQL are that it doesn't have a Microsoft compatibility mode and it isn't clear to corporations where you can get *absolutely guaranteed* support. Sun Microsystems not only officially supports it, they do a lot of development on it, as does Fujitsu in Asia, Red Hat and a few other large companies that are heavily invested in it. A significant portion of the main PostgreSQL developers do it as their official corporate job. PostgreSQL is very broadly ANSI compatible (including a lot of ancillary database standards surrounding SQL), and to the extent it has a flavor it clearly borrows from Oracle rather than SQL Server. SQL Server has a lot of bits that do not conform to standards that everyone else supports. From a historical perspective, PostgreSQL shares a transaction model with Oracle, started on Unix, and has been around since a time when SQL Server was not something you would want to emulate. PostgreSQL has matured to the point where it mostly follows standards to the extent possible but has enough unique features and capabilities that it has started to become a flavor of its own. If you could swap out an MS-SQL server *immediately* for a PostgreSQL server simply by copy the data and rebinding a WINS
Re: [agi] database access fast enough?
Everyone, At startup, I simply had Dr. Eliza cycle through the heavily used part of the DB, so that it would run in RAM except for unusual access. Of course, its demo DB now easily fits into RAM. VM paging was a MUCH worse problem than is DB access. I suspect that unless you lock the code into RAM, that this may will forever be the case because less-used routines (e.g. exception handlers) will get pushed out of RAM by the DB engine's scramble for buffer space, which of course you can limit by tweaking the DB engine. Also, has any one here looked at using Flash Disks for DB? Vista now puts VM onto any available flash drives to gain performance. On 4/17/08, Mark Waser [EMAIL PROTECTED] wrote: That's true as of now, but let's think one or two steps further: Do you really think a mature AGI's (say with 3-6 year-old human intelligence) KB can reside in RAM, entirely? Yes. RAM is *HUGE*. Intelligence is *NOT*. Hmm, thinking on the keyboard... ~100E9 computing cells with ~50K inputs each, of which ~200 are active. One theory is that you would only have to carry the active inputs, plus some fraction of the inactive inputs while you watched for things to happen to make them active. Let's say that we must track ~1E3 inputs, for a total of 100E12 or one hundred trillion inputs. We could use fractal means to generate the original configuration (as biological brains probably do), very low precision arithmetic with statistical rounding, etc., which would reduce each input to just a few bytes to maintain, say ~10. This makes a total of 1E15 or one quadrillion bytes to represent a simulated human's instantaneous state of construction. An entire checkpoint would take little more, because it would only include in addition the electrical state of each of the 100E9 cells. Note however, that the *FUNCTIONAL* state would only be 1/5 of this estimate because 4/5 of the represented inputs are presently inactive, for a total of only 100 terabytes. Note that ~90% of those 100E9 cells are slow-responding glial cells, so while the state is large, the computational requirements may be well short of a petaflop. Of course, this makes a LOT of assumptions that no one has yet bothered to confirm in the laboratory, and I do NOT want to ignite an estimates war, so I invite constructive comments from anyone with more recent data than I have. Steve Richfield --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On Thu, Apr 17, 2008 at 2:42 PM, Mark Waser [EMAIL PROTECTED] wrote: Really, work on the AtomTable has been a small percentage of work on the Novamente Cognition Engine ... and, the code running the AtomTable is now pretty much the same as it was in 2001 (though it was tweaked to make it 64-bit compatible, back in 2004 ... and there has been ongoing bug-removal as well...). And . . . and . . . and . . . :-) It's far more than you're admitting to yourself.:-) That's simply not true, but I know of no way to convince you. The AtomTable work was full-time work for a two guys for a few months in 2001, and since then it's been occasional part-time tweaking by two people who have been full-time engaged on other projects. We wrote some new wrappers for the AtomTable last year (based on STL containers), but that didn't affect the internals, just the API. Which is what everything should have been designed around anyways -- so effectively, last year was a major breaking change that affected *all* the software written to the old API. Yes, but calls to the AT were already well-encapsulated within the code, so changing from the old API to the new has not been a big deal. Absolutely. That's what I'm pushing for. Could you please, please publish the 2007 AtomTable API? That's actually far, far more important than the code behind it. Please, please . . . . publish the spec today . . . . pretty please with a cherry on top? It'll be done as part of the initial OpenCog release, which will be pretty soon now ... I don't have a date yet though... However, I'm not convinced this would be a good idea. There are a lot of useful specialized indices in the AtomTable, and replicating all this in some other graph DB would wind up being a lot of work ... and we could use that time/effort on other stuff instead Which (pardon me, but . . . ) clearly shows that you're not a professional software engineer I'm not but many other members of the Novamente team are My contention is that you all should be *a lot* further along than you are. You have more talent than anyone else but are moving at a truly glacial pace. 90% of Novamente LLC's efforts historically have gone into various AI consulting projects that pay the bills. Now, about 60% is going into consulting projects, and 40% is going into the virtual pet brain project We have very rarely had funding to pay folks to work on AGI, so we've worked on it in bits and pieces here and there... Sad, but true... I understand that you believe that this is primarily due to other reasons but *I am telling you* that A LOT of it is also your own fault due to your own software development choices. You're wrong, but arguing the point over and over isn't getting us anywhere. Worse, fundamentally, currently, you're locking *everyone* into *your* implementation of the atom table. Well, that will not be the case in OpenCog. The OpenCog architecture will be such that other containers could be inserted if desired. Why not let someone else decide whether or not it is worth their time and effort to implement those specialized indices on another graph DB of their choice? If you would just open up the API and maybe accept some good enhancements (or, maybe even, if necessary, some changes) to it? Yes, that's going to happen within OpenCog. Using a relational DB rather than a graph DB is not appropriate for the NCE design, however. Incorrect. If the API is identical and the speed is identical, whether it is a relational db or a graph db *behind the scenes* is irrelevant. Design to your API -- *NOT* to the underlying technology. You keep making this mistake. The speed will not be identical for an important subset of queries, because of intrinsic limitations of the B-tree datastructures used inside RDB's. We discussed this before. Seriously -- I think that you're really going to be surprised at how fast OpenCog might take off if you'd just relax some control and concentrate on the specifications and the API rather than the implementation issues that you're currently wasting time on. I am optimistic about the development speedup we'll see from OpenCog, but not for the reason you cite. Rather, I think that by opening it up in an intelligent way, we're simply going to get a lot more people involved, contributing their code, their time, and their ideas. This will accelerate things considerably, if all goes well. I repeat that NO implementation time has been spent on the AtomTable internals for quite some time now. A few weeks was spent on the API last year, by one person. I'm not sure why you want to keep exaggerating the time put into that component, when after all you weren't involved in its development at all (and I didn't even know you when the bulk of that development was being done!!) I don't care if, in OpenCog, someone replaces the AtomTable internals with something
Re: [agi] database access fast enough?
Actually, this was a fundamental and known weakness in the SQL Server 2000 transactional model, being more like DB2 than Oracle. I disagree. First off, we're talking about the DEFAULT transactional model, locking mode, and where new records are placed. It has always been posssible to tweak any of the databases to the other's transactional model. Second of all, it was not a weakness -- it was a deliberate choice of optimization -- it was a choice of OLAP over OLTP (and, let's be honest, for most databases on limited memory machines with low OLTP requirements, this was the correct choice until ballooning memories made the reverse true). Because PostgreSQL has used the same kind of model as Oracle -- and for a very long time -- it has always been relatively strong at OLTP throughput. Until SQL Server 2005, the Microsoft offering was never really competitive. Bull. For anything except the heaviest OLTP loads, Microsoft was more than adequate. You don't need a semi to drive the highways. It had little to do with development timelines. On the other hand, PostgreSQL was a bit of a dog at OLAP until relatively recently. See? You're making my point.:-) You imply that the performance is due to some kind of linear development path, but in fact SQL Server 2005 changed its internal model to be like Oracle and PostgreSQL so that it could be competitive at OLTP. It is a matter of algorithm selection and tradeoffs, not engineering effort. SQL Server (until two years ago) has always had relatively poor lock concurrency, but gave very good baseline OLAP performance as a consequence of that decision. The reality is that it is much easier to make the Oracle/Postgres model perform satisfactorily at OLAP than to make the old SQL Server model perform satisfactorily at OLTP. Again, you're making my point. Until memory became cheap and OLTP become more critical, Microsoft made the right choice of OLAP over OLTP. When the world changed, so did they. I'd call that a strength and flexibility, not a weakness. I've worked with very large databases on several major platforms, including Oracle and SQL Server in many different guises. Oracle's parallel implementation may not distribute that well, but that is because traditional transactional semantics are *theoretically incapable* of distributing well. To the extent it is possible at all, Oracle does a very good job at making it work. So, is your claim that Oracle distributes better than Microsoft? If so, why? There are new transactional architectures in academia that should work better in a modern distributed environment than any of the current commercial adaptations of classical architectures to distributed environments. And PostgreSQL will probably implement them long before Oracle or MS. Sun Microsystems not only officially supports it, they do a lot of development on it, as does Fujitsu in Asia, Red Hat and a few other large companies that are heavily invested in it. A significant portion of the main PostgreSQL developers do it as their official corporate job. Cool. I wasn't aware that it had made that many inroads. Awesome. PostgreSQL is very broadly ANSI compatible (including a lot of ancillary database standards surrounding SQL), and to the extent it has a flavor it clearly borrows from Oracle rather than SQL Server. SQL Server has a lot of bits that do not conform to standards that everyone else supports. From a historical perspective, PostgreSQL shares a transaction model with Oracle, started on Unix, and has been around since a time when SQL Server was not something you would want to emulate. PostgreSQL has matured to the point where it mostly follows standards to the extent possible but has enough unique features and capabilities that it has started to become a flavor of its own. If you could swap out an MS-SQL server *immediately* for a PostgreSQL server simply by copy the data and rebinding a WINS name or an IP address, I would be in hog heaven even if support wasn't absolutely guaranteed since I could always switch back. Given that there's a huge transition cost (changing scripts, procedures, etc.), I can't get *ANY* agreement for the thought of switching (and I'm sure that there are *MANY* more in my circumstances). The only corporate database that relatively easily ports back and forth with PostgreSQL is Oracle. Nonetheless, a number of people have ported applications to PostgreSQL from MS-SQL with good results; questions about porting nuances come up regularly on the PostgreSQL mailing lists. Beyond your basic ANSI compliance, database portability only sort of exists. Inevitably people use non-standard platform features that expose the specific capabilities of the engine being used to maximize performance. As a practical matter, you pick a database platform and stick with it as long as is reasonably possible.
Re: [agi] database access fast enough?
On Apr 17, 2008, at 12:20 PM, Mark Waser wrote: It has always been posssible to tweak any of the databases to the other's transactional model. Eh? Choices in concurrency control and scheduling run very deep in a database engine, with ramifications that cascade through every other part of the system. Equivalent transaction isolation levels can behave very different in practice depending on the internal transaction representation and management model. You cannot turn off these side-effects, and you cannot tweak a non-MVCC-ish model to behave like an MVCC-ish model at runtime in any way that matters. Second of all, it was not a weakness -- it was a deliberate choice of optimization -- it was a choice of OLAP over OLTP (and, let's be honest, for most databases on limited memory machines with low OLTP requirements, this was the correct choice until ballooning memories made the reverse true). The rise of the Internet, with its massive OLTP load characteristic, kind of settled the issue. It is true though that Oracle-like OLTP monsters have significantly higher resource overhead for storing the same set of records. These days it is concurrency bottlenecks that will kill you. So, is your claim that Oracle distributes better than Microsoft? If so, why? Very mature implementation of the concepts, and almost every conceivable mechanism and model for doing it is hidden under the hood. Remember, they started introducing the relevant concepts ages ago in Oracle 7, though in practice it was mostly unusable until relatively recently. Consequently, their implementation is easily the most general in that it works moderately well across the broadest number of use cases because they've been tweaking that aspect for years. Other commercial implementations tend to only work for a much narrower set of use cases. In short, Oracle has a long head start. There are new transactional architectures in academia that should work better in a modern distributed environment than any of the current commercial adaptations of classical architectures to distributed environments. And PostgreSQL will probably implement them long before Oracle or MS. Ironically, a specific design decision that has created a fair amount of argument for years makes PostgreSQL the engine starting from the closest design point. PostgreSQL does not support threading and only uses a single process per query execution, originally for portability and data safety reasons -- the extreme hackability would be difficult to do otherwise. This made certain types of trivial parallelism for OLAP difficult. On the other hand, it has had distributed lock functionality for a number of versions now. If you look at newer models explicitly designed to make transactional database scale better across distributed systems, you find that they are built on a design requirement of single processes per resource, strict access serialization, no local parallelism, and distributed locks. Which is not that far removed from where PostgreSQL is today, if you remove massive local concurrency support and its high overhead. There are a number of outfits (see www.greenplum.com for a very advanced implementation) that have hacked PostgreSQL to scale across very large clusters for OLAP by essentially making the necessary tweaks to approximate these types of models. The next step would be to rip out a lot of expensive bits based on classical design assumptions that make distributed write loads scale poorly. In a sense, a design choice that has traditionally put some limits on scaling PostgreSQL for OLAP put it in exactly the right place to make implementation of next-generation architectures as natural of an evolution as can be expected in this case. J. Andrew Rogers --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On Apr 17, 2008, at 12:26 PM, Mark Waser wrote: Actually, it's far worse than that. For serious systems, most of the heavy lifting is done inside the database with stored procedures which are not standard AT ALL. SQL is reasonably easy to port. Stored procedures that do a lot of work are not. The standard is SQL/PSM, which looks similar to Oracle's PL/SQL (and PostgreSQL's pl/pgsql). As a practical matter, support is not consistent enough or widespread enough for it to be entirely usable for purposes of portability though it is getting better. To be fair, full SQL/PSM support will not be core in PostgreSQL until the next release. J. Andrew Rogers --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
YKY, I agree with your side of the debate about whole KB not fitting into RAM. As a solution, I propose to partition the whole KB into the tiniest possible cached chunks, suitable for a single agent running on a host computer with RAM resources of at least one GB. And I propose that AGI will consist not of one program running on one computer, but a vast multitude of separately hosted agents working in concert. But my opinion of the OpenCyc concept coverage with respect to that of a human five-year old differs greatly from yours. I concede that 20 OpenCyc facts are about the number a child might know, but in order to properly ground these concepts, I believe that a much larger number of feature vectors will have to be stored or available in abstracted form. For example, there is the concept of the child's mother. Properly grounding that one concept might require abstracting features from thousands of observations: wet hair motherfar away motherangy mothermother hidden from viewmother in a crowdmother's voicemother in dim lightmother from belowand so on Of course you can ignore fully grounded concepts as does current Cycorp for its applications, and as I will with Texai until it is past the bootstrap stage. -Steve Stephen L. Reed Artificial Intelligence Researcher http://texai.org/blog http://texai.org 3008 Oak Crest Ave. Austin, Texas, USA 78704 512.791.7860 - Original Message From: YKY (Yan King Yin) [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Thursday, April 17, 2008 3:58:43 PM Subject: Re: [agi] database access fast enough? On 4/18/08, Mark Waser [EMAIL PROTECTED] wrote: Yes. RAM is *HUGE*. Intelligence is *NOT*. Really? I will believe that if I see more evidence... right now I'm skeptical. And your *opinion* has what basis? Are you arguing that RAM isn't huge? That's easily disprovable. Or are you arguing that intelligence is huge? That too is easily disprovable. Which one do I need to knock down? The current OpenCyc KB is ~200 Mbs (correct me if I'm wrong). The RAM size of current high-end PCs is ~10 Gbs. My intuition estimates that the current OpenCyc is only about 10%-40% of a 5 year-old human intelligence. Plus, learning requires that we store a lot of hypotheses. Let's say 1000-1 times the real KB. That comes to 500Gb - 20Tb. It seems that if we allow several years for RAM size to double a few times, RAM may have a chance to catch up to the low end. Obviously not now. YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Computational requirements of AGI (Re: [agi] database access fast enough?)
--- Steve Richfield [EMAIL PROTECTED] wrote: On 4/17/08, Mark Waser [EMAIL PROTECTED] wrote: That's true as of now, but let's think one or two steps further: Do you really think a mature AGI's (say with 3-6 year-old human intelligence) KB can reside in RAM, entirely? Yes. RAM is *HUGE*. Intelligence is *NOT*. Hmm, thinking on the keyboard... ~100E9 computing cells with ~50K inputs each, of which ~200 are active. One theory is that you would only have to carry the active inputs, plus some fraction of the inactive inputs while you watched for things to happen to make them active. Let's say that we must track ~1E3 inputs, for a total of 100E12 or one hundred trillion inputs. We could use fractal means to generate the original configuration (as biological brains probably do), very low precision arithmetic with statistical rounding, etc., which would reduce each input to just a few bytes to maintain, say ~10. This makes a total of 1E15 or one quadrillion bytes to represent a simulated human's instantaneous state of construction. An entire checkpoint would take little more, because it would only include in addition the electrical state of each of the 100E9 cells. Note however, that the *FUNCTIONAL* state would only be 1/5 of this estimate because 4/5 of the represented inputs are presently inactive, for a total of only 100 terabytes. Note that ~90% of those 100E9 cells are slow-responding glial cells, so while the state is large, the computational requirements may be well short of a petaflop. Of course, this makes a LOT of assumptions that no one has yet bothered to confirm in the laboratory, and I do NOT want to ignite an estimates war, so I invite constructive comments from anyone with more recent data than I have. The Blue Brain project estimates 8000 synapses per neuron in mouse cortex. I haven't seen a more accurate estimate for humans, so your numbers are probably as good as mine. I estimate 10^11 neurons, 10^15 synapses (1 bit each) and a response time of 100 ms, or 10^16 OPS to replicate the processing of a human brain. The memory requirement is considerably higher than the information content of long term memory estimated by Landauer [1], about 10^9 bits. This may be due to the constraints of slow neurons, parallelism, and the pulsed binary nature of nerve transmission. For example, the lower levels of visual processing in the brain involve massive replication of nearly identical spot filters which could be simulated in a machine by scanning a small filter coefficient array across the retina. It also takes large numbers of nerves to represent a continuous signal with any accuracy, e.g. fine motor control or distinguishing nearly identical perceptions. However my work with text compression suggests that the cost of modeling 1 GB of text (about one human lifetime's worth) is considerably more than a few GB of memory. My guess is at least 10^12 bits just for ungrounded language modeling. If the model is represented as a set of (sparse) graphs, matrices, or neural networks, that's about 10^13 OPS. Remember that the goal of AGI is not to duplicate the human brain, but to do the work that humans are now paid to do. It still requires solving hard problems like language, vision, and robotics, which consume a significant fraction of the brain's computing power. But what matters is that the cost of AGI be less than human labor, currently US $10K per year worldwide and growing at 3-4% (5% GDP growth - 1.5% population growth). If my guess is right and Moore's law continues (halving costs every 1.5 to 2 years), then AGI is at least 10-15 years away. If it actually turns out there are no shortcuts to simulating the brain, then it is 30 years away. 1. Landauer, Tom, How much do people remember? Some estimates of the quantity of learned information in long term memory, Cognitive Science (10) pp. 477-493, 1986. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On 4/18/08, Stephen Reed [EMAIL PROTECTED] wrote: I agree with your side of the debate about whole KB not fitting into RAM. As a solution, I propose to partition the whole KB into the tiniest possible cached chunks, suitable for a single agent running on a host computer with RAM resources of at least one GB. And I propose that AGI will consist not of one program running on one computer, but a vast multitude of separately hosted agents working in concert. Disk access rate is ~10 times faster than ethernet access rate. IMO, if RAM is not enough the next thing to turn to should be the harddisk. Distributive AGI is a fascinating idea, but you have to solve a lot of algorithmic problems to make it work. If each agent has only a slice of the full KB, the average commonsense query would require cooperation among many agents. That's a very challenging algorithmic problem. I'm content to do simple, single-machine AGI. But my opinion of the OpenCyc concept coverage with respect to that of a human five-year old differs greatly from yours. I concede that 20 OpenCyc facts are about the number a child might know, but in order to properly ground these concepts, I believe that a much larger number of feature vectors will have to be stored or available in abstracted form. For example, there is the concept of the child's mother. Properly grounding that one concept might require abstracting features from thousands of observations: = Yes, I actually agree with you -- I subconsciously tuned down my estimates as I was talking to Mark =) I think sensory processing is going to be a very hard problem, so we should postpone sensory grounding as late as possible, and instead focus on text. Don't forget that the AGI needs to have *episodic* memory as well. If we include that, secondary storage is certainly needed. YKY --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
Re: [agi] database access fast enough?
On Apr 17, 2008, at 3:32 PM, YKY (Yan King Yin) wrote: Disk access rate is ~10 times faster than ethernet access rate. IMO, if RAM is not enough the next thing to turn to should be the harddisk. Eh? Ethernet latency is sub-millisecond, and in a highly tuned system approaches the 10 microsecond range for something local. Much, much faster than disk if the remote node has your data in RAM and is relatively local. Note that relatively local can mean geographically regional. The round-trip RAM access time from my machine to a machine on the other side of town is a fraction of millisecond over the Internet connection (not hypothetical, actually measured at ~400 microseconds). I wish disk access was even remotely that good. And this was with inexpensive Gigabit Ethernet. J. Andrew Rogers --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com
RE: [agi] database access fast enough?
YKY Said: The current OpenCyc KB is ~200 Mbs (correct me if I'm wrong). The RAM size of current high-end PCs is ~10 Gbs. My intuition estimates that the current OpenCyc is only about 10%-40% of a 5 year-old human intelligence. Plus, learning requires that we store a lot of hypotheses. Let's say 1000-1 times the real KB. That comes to 500Gb - 20Tb. It seems that if we allow several years for RAM size to double a few times, RAM may have a chance to catch up to the low end. Obviously not now. Don't forget about solid state hard drives (SSDs). Currently Solid State Drives speed up typical database applications by about 30 times. And that's without stripping out all the old caching overhead code databases used for handling the order of magnitude speed differences between RAM and hard drives. Large Storage Area Network Vendors like EMC are looking to SSD Drives to eliminate IO bottlenecks in corporate applications where large datawarehouses reach 20Tb very quickly. And look for capacity to continue to double about every 18 months driving the price down very quickly. And due to higher reliability and lower energy costs to run it won't be too long before hard drive join the ranks of 8-track tape players, record players and 5 1/4 diskettes. http://searchstorage.techtarget.com/sDefinition/0,,sid5_gci1300939,00.html# http://www.storagesearch.com/ssd-fastest.html --- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com