I just want to share a few of my desiderata for working with RDF data. There really are a few of these that are contradictory in nature. These touch on the Graph/Model split and similar things.
One of them is streaming processing with tools like Spark, where the real point is raw speed, and that comes down to getting as close to "zero copy" as possible in terms of processing. Sometimes I am looking at a stream of triples and I want to filter out anything from 50% to 90% to 99.99% of them and I am often doing some kind of map or reduce that works a triple at a time, so the elephant in the room is parsing time and memory consumption, so something that is insanely fast (like the Hadoop Writable) that is highly mutable is desirable. Now I want it to be optional in a pipeline to shove facts into an in-memory model, because sometimes that is a great way to get things done, and it would be nice to be able not have to change my filtering code and have confidence that what is happening under the hood is efficient, without a lot of mindless copying. On the other hand I am also doing things where immutable data structures are the way, particularly I am using Jena classes with production rules engines such as Drools. From my current viewpoint, RDFS and OWL are just "logical theories" which are on the shelf together with logical theories on other topics such as invoices and postal addresses. In this model there is (i) a small rule base, (ii) a fair-sized "T-Box" like knowledge base (say 1-1M triples), and (iii) a small "A-Box" knowledge base which is streaming past the system in the sense that it is doing a 'consultation' which may involve a number of decisions, then we toss the A-Box out. I like the feature set of Drools but may end up using something clojure-based for a rules engine, basically for the reason that the source code of OPS5 in LISP is about 3k LOC and Drools core is orders of magnitude bigger. When I look at data modelling problems people run into with "business rules engine" it is clear that RDF is the right answer for many such conundrums. -- Paul Houle *Applying Schemas for Natural Language Processing, Distributed Systems, Classification and Text Mining and Data Lakes* (607) 539 6254 paul.houle on Skype [email protected] https://legalentityidentifier.info/lei/lookup <http://legalentityidentifier.info/lei/lookup>
