Re: [BioRDF] Scalability

Susie Stephens Thu, 06 Apr 2006 15:29:59 -0700


I've embedded answers to your questions below.


Susie


Cutler, Roger (RogerCutler) wrote:

No problem.  Getting back to the main subject of the thread, I'm a
little curious whether you've got some Oracle perspective on this issue.
I understand that new Oracle databases are putting RDF into some sort of
triple-store, but I don't know much about the details.  Some questions
that occur to me, but maybe not exactly the right questions:

- Does the RDF just go in as-is or is it compressed in some way?  If
there is a size factor of something like 15 from the data itself, are
these RDF stores tending to be real bulky?

RDF data is compressed - repeated node and link values are stored onlyonce, and when a value repeats in the data only a reference to thealready stored value is stored. There is no factor in Oracle RDF thatadds to the size of the data. RDF is stored in the Oracle Database in anobject-relational implementation, allowing users to manipulate RDFtriples as objects.

The RDF Data Model can take advantage of the scalability and performancefeatures in the database, e.g. indexing, parallelization, memorymanagement, Real Application Clusters (RAC), etc. It can also work withour image and text management capability, and the security features.

As some parsing is needed when the data is initially loaded, there mightbe slower performance on loading compared to some other systems.However, in return for that, we have fast query performance.

- Is there some sort of indexing and related join-like function?  If so,
what are the performance characteristics?

There are several indexes built on the internal storage structures. Wedo perform joins but these are highly optimized. Our performance figuresshow how our design has resulted in very good performance. We have alsoextended SQL to enable SPARQL-like query capabilities, so the user doesnot have to be aware that data is held in different tables internally.

As I said, I don't have any experience with the RDF stuff, but some
thoughts based on my experience with relational databases:

- Just because you've got your data in an Oracle (or any other) database
doesn't mean you are going to be able to get at it in a performant
manner.  The devil is in the details.

- Operations that initiate a full read of a Gigabyte database are
extremely painful.

- Big joins can also be extremely painful.  Would traversing a big bunch
of RDF look something like an incredibly complex hairball of complex
joins?  If so, is there a potential problem here?

Yes, certainly the devil is in the details. And big joins are indeedpainful. However the user does not have to do these big joins, nor worryabout the details. The RDF query function provided by Oracle gives theuser a simple SQL interface to query the internal tables. The internaloperations are highly optimized, and where necessary internal Oraclefeatures have been enhanced. Some of these techniques are described inthe VLDB paper by Chong et al athttp://www.oracle.com/technology/tech/semantic_technologies/pdf/vldb_2005.pdf

Re: [BioRDF] Scalability

Reply via email to