Re: Context Tags, Context Sets and Beyond Named Graphs...
On Mon, Jan 18, 2010 at 2:20 PM, Leigh Dodds leigh.do...@talis.com wrote: Looks to me like you need Named Graphs plus a mechanism to describe combinations of graphs. Exactly! That for me was what I liked about the idea: having a mechanism to do the things I want that builds on all the work people are doing w/ Named Graphs. ...and these as more Named Graphs, or at least graphs that are derived from those in the underlying data store. I tend to refer to these as synthetic graphs. Most SPARQL implementations have the concept of at least one synthetic graph: the union of all Named Graphs in the system. But as I alluded to in a recent posting [1], there are many other ways that these graphs could be derived. Rather than building them into the implementation, they could be described and using a simple domain specific language. So I think Named Graphs plus graph algebra gives you much of what you want. Cheers, L. [1]. http://www.ldodds.com/blog/2009/11/managing-rdf-using-named-graphs/ That's a nice link. I like the term graph algebra, because that really is what I'm talking about. It's pretty clear that an almost unlimited number of synthetic graphs are possible: for instance, if there's a SPARQL query that generates a graph, that could define a named graph which is a lot like a view in SQL. In fact, I could see this being computed on the fly, or being materialized, like a temporary table in SQL. Specifically, however, I need the ability to stick named graph tags cheaply on items in a local RDF store (specify that a triple is in 10 named graphs w/o copying it 10 times), and to be able to efficiently do graph algebra involving unions and intersections of graphs defined by those tags. I'm thinking about using this on triple stores with between 1 billion-100 billion triples. On the low end I expect to be able to do it with a single computer commodity hardware, but I'll accept having to use some kind of cluster to handle stuff on the high end of this range. Optimization and good index structures would be essential to this. Beyond that, it's pretty exciting to explore what's possible with synthetic graphs (on the side of the software stack facing the user) and with named graph tags on the inside. For instance, synthetic graphs could specify what sort of inference is used to extend the graph: much as early versions of Cyc had multiple get() functions, we could have some synthetic graphs with practically no inference capability, and other ones that go to extremes (analogical reasoning, CWA) to answer questions. For instance, I think physical partitioning is going to become extremely important for large-scale RDF systems: named graph tags would be an effective mechanism to route triples to specialized storage mechanisms: for instance, I might want to route 20,000 upper ontology triples to a specialized in-RAM engine that does expensive inference operations, route a web link graph with 5 billion triples to a specialized storage engine that does extreme compression, etc.
Context Tags, Context Sets and Beyond Named Graphs...
For a while I've been struggling with a number of practical problems working in RDF. Some of these addressed by Named Graphs as they currently exists, but others aren't. Over the weekend I had an idea for something that I think is highly expressive but also can be implemented efficiently. The idea is that the context of triple can be, not a name, but a collection of tags that work like tags on delicious, flickr, etc. Tags are going to be namespaced like RDF properties, of course, but they could have meanings like: #ImportedFromDBpedia3.3 #StoredInPhysicalPartition7 #ConfidentialSecurityLevel #NotTrue #InTheStarTrekUniverse #UsedInProjectX #UsedInProjectY #VerifiedToBeTrue #HypothesisToBeTested Individually I call these Context Tags, and the set of them that is associated with a triple is a Context Set. Now, named graphs can be composed from boolean combination of tags, such as AND(#ImportedFromDbPedia3.3,#InTheStarTrekUniverse) NOT(#NotTrue) AND(NOT(#ConfidentialSecurityLevel),OR(#UsedInProjectX,#UsedInProjectY)) === Note that this is a feature of the underlying storage layer that can be exploited by layers of the RDF store that are above: for instance, something just above the storage layer could hash the subject URL and then pass a physical partition tag to that the actual storage layer. Similarly, security rules could be applied automatically. === There are many details to be filled in and features that could be added. For instance, I could image that it might be useful to allow multiple context sets to be attached to a triple, for instance, if you had the triple :WarpDrive :isA :SpacePropulsionDevice it might be desirable to assert this as #RealWorld #NotTrue and to also assert that it is #InTheStarTrekUniverse I've thought through the implications of this less, however. The immediate use for this system that I see is that existing query mechanisms for named graphs can be applied to the computed graphs. Inference mechanisms that can do other things with context tags is a wide open question. === Anyhow, I want this pretty bad. (i) If you're selling this, I'm buying; (ii) if you want to build this, contact me.
Re: Context Tags, Context Sets and Beyond Named Graphs...
Hi Jeni, 2010/1/18 Jeni Tennison j...@jenitennison.com: Do you think that http://www.w3.org/2004/03/trix/rdfg-1/ is sufficient for describing the relationships between graphs (for these purposes) and if not, what do you think needs adding? No I don't think its sufficient, certainly not for the kinds of use cases that Paul was describing. The RDFG schema only has two properties defining equivalency and a sub-set relationship. I was thinking more along the lines of a means to describe the process of constructing a graph by operations on a set of other graphs, where those operations would include basic algrebra operators. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Re: Context Tags, Context Sets and Beyond Named Graphs...
This description reminds me of NRL, maybe it is closer to what you need. http://www.semanticdesktop.org/ontologies/nrl/ Axel On Jan 18, 2010, at 21:47 , Leigh Dodds wrote: Hi Jeni, 2010/1/18 Jeni Tennison j...@jenitennison.com: Do you think that http://www.w3.org/2004/03/trix/rdfg-1/ is sufficient for describing the relationships between graphs (for these purposes) and if not, what do you think needs adding? No I don't think its sufficient, certainly not for the kinds of use cases that Paul was describing. The RDFG schema only has two properties defining equivalency and a sub-set relationship. I was thinking more along the lines of a means to describe the process of constructing a graph by operations on a set of other graphs, where those operations would include basic algrebra operators. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com -- axel.rauschma...@ifi.lmu.de http://www.pst.ifi.lmu.de/~rauschma/