YKY Here is what I learned from implementing the Texai knowledge base. It persists symbolic statements about concepts. I designed an SQL schema to persist OpenCyc in its full CycL form, in MySQL on SuSE 64-bit Linux. My Java application driving MySQL dramatically slowed down when the number of rows exceeded 20 million as compared to the initial load of 5 million rows. I then tried Oracle Berkeley DB Java Edition (open source) which provides no SQL query facility, instead one programs directly to its API for inserts, queries, updates and so forth. It is faster than MySQL for my large KB, but uses four times as much disk space due to its method of inserting new rows at the end of the file, and having lots of free space.I then studied partitioning, which means to break up the monolithic KB into smaller databases in which accesses are expected to be clustered. And I studied sharding, which means to slice up a database into logical segments that are hosted by separate db engines, typically with separate disk filesystems.I began writing my own storage engine, for a fast, space-efficient, partitioned and sharded knowledge base, soon realizing that this was far too big a task for a sole developer. Revisiting my project object persistence needs, and thinking more about interoperability with semantic web technologies, I decided to convert my existing KB to an RDF-compatible form and then to evaluate RDF quad stores.After some analysis, I chose to evaluate the Sesame 2 RDF store, which is Java based and open source and thus very compatible with my other components. In Texai, RDF queries have a simpler form than SQL queries when retrieving logical statements from a store. For example, in SQL my schema had to provide separate tables for each object type: concept term, functional term, string, boolean, long integer, double, statement, arity-1 rule, arity-2 rule, arity-3 rule, arity-4 rule and arity-5 rule. Many of these tables would have to be joined for a typical query (e.g. what concepts subsume a given concept?). My development Linux computer has 4 GB of memory, and Linux has a feature called tmpfs which permits mounting a directory in RAM. I partitioned my KB into separate KBs of less than six million rows each. In Sesame these are less than one GB in size and I can therefore put any one of them in tmpfs - running that application-relevant part of the KB at RAM speed. Experiments demonstrate about a 10 times speedup. When Texai is deployed, I expect that the application will log its transactions to disk as a background process as a safeguard against losing the volatile KB in tmpfs.Hope this information is useful. -Steve Stephen L. Reed
Artificial Intelligence Researcher http://texai.org/blog http://texai.org 3008 Oak Crest Ave. Austin, Texas, USA 78704 512.791.7860 ----- Original Message ---- From: YKY (Yan King Yin) <[EMAIL PROTECTED]> To: agi@v2.listbox.com Sent: Wednesday, April 16, 2008 11:51:35 PM Subject: [agi] database access fast enough? For those using database systems for AGI, I'm wondering if the data retrieval rate would be a problem. Typically we need to retrieve many nodes from the DB to do inference. The nodes may be scattered around the DB. So it may require *many* disk accesses. My impression is that most DBMS are optimized for complex queries but not for large numbers of simple retrievals -- am I correct about this? YKY ------------------------------------------- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?& Powered by Listbox: http://www.listbox.com ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ ------------------------------------------- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244&id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com