YKY

Here is what I learned from implementing the Texai knowledge base. It persists 
symbolic statements about concepts.
I designed an SQL schema to persist OpenCyc in its full CycL form, in MySQL on 
SuSE 64-bit Linux.  My Java application driving MySQL dramatically slowed down 
when the number of rows exceeded 20 million as compared to the initial load of 
5 million rows.
I then tried Oracle Berkeley DB Java Edition (open source) which provides no 
SQL query facility, instead one programs directly to its API for inserts, 
queries, updates and so forth.  It is faster than MySQL for my large KB, but 
uses four times as much disk space due to its method of inserting new rows at 
the end of the file, and having lots of free space.I then studied partitioning, 
which means to break up the monolithic KB into smaller databases in which 
accesses are expected to be clustered.  And I studied sharding, which means to 
slice up a database into logical segments that are hosted by separate db 
engines, typically with separate disk filesystems.I began writing my own 
storage engine, for a fast, space-efficient, partitioned and sharded knowledge 
base, soon realizing that this was far too big a task for a sole developer.   
Revisiting my project object persistence needs, and thinking more about 
interoperability with semantic web technologies, I decided to convert my 
existing KB to an RDF-compatible form and then to evaluate RDF quad 
stores.After some analysis, I chose to evaluate the Sesame 2 RDF store, which 
is Java based and open source and thus very compatible with my other 
components.   In Texai, RDF queries have a simpler form than SQL queries when 
retrieving logical statements from a store.  For example, in SQL my schema had 
to provide separate tables for each object type:  concept term, functional 
term, string, boolean, long integer, double, statement, arity-1 rule, arity-2 
rule, arity-3 rule, arity-4 rule and arity-5 rule.  Many of these tables would 
have to be joined for a typical query (e.g. what concepts subsume a given 
concept?).    
My development Linux computer has 4 GB of memory, and Linux has a feature 
called tmpfs which permits mounting a directory in RAM.  I partitioned my KB 
into separate KBs of less than six million rows each.  In Sesame these are less 
than one GB in size and I can therefore put any one of them in tmpfs - running 
that application-relevant part of the KB at RAM speed.   Experiments 
demonstrate about a 10 times speedup.
When Texai is deployed, I expect that the application will log its transactions 
to disk as a background process as a safeguard against losing the volatile KB 
in tmpfs.Hope this information is useful.
-Steve
 
Stephen L. Reed

Artificial Intelligence Researcher
http://texai.org/blog
http://texai.org
3008 Oak Crest Ave.
Austin, Texas, USA 78704
512.791.7860

----- Original Message ----
From: YKY (Yan King Yin) <[EMAIL PROTECTED]>
To: agi@v2.listbox.com
Sent: Wednesday, April 16, 2008 11:51:35 PM
Subject: [agi] database access fast enough?

 For those using database systems for AGI, I'm wondering if the data
retrieval rate would be a problem.

Typically we need to retrieve many nodes from the DB to do inference.
The nodes may be scattered around the DB.  So it may require *many*
disk accesses.  My impression is that most DBMS are optimized for
complex queries but not for large numbers of simple retrievals -- am I
correct about this?

YKY

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: http://www.listbox.com/member/?&;
Powered by Listbox: http://www.listbox.com







      
____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Reply via email to