Re: Context Tags, Context Sets and Beyond Named Graphs...

2010-01-19 Thread Paul Houle
On Mon, Jan 18, 2010 at 2:20 PM, Leigh Dodds leigh.do...@talis.com wrote:


 Looks to me like you need Named Graphs plus a mechanism to describe
 combinations of graphs.


Exactly!

That for me was what I liked about the idea:  having a mechanism to do the
things I want that builds on all the work people are doing w/ Named Graphs.



 ...and these as more Named Graphs, or at least graphs that are derived
 from those in the underlying data store. I tend to refer to these as
 synthetic graphs. Most SPARQL implementations have the concept of at
 least one synthetic graph: the union of all Named Graphs in the
 system. But as I alluded to in a recent posting [1], there are many
 other ways that these graphs could be derived. Rather than building
 them into the implementation, they could be described and using a
 simple domain specific language. So I think Named Graphs plus graph
 algebra gives you much of what you want.

 Cheers,

 L.

 [1]. http://www.ldodds.com/blog/2009/11/managing-rdf-using-named-graphs/


That's a nice link.

I like the term graph algebra,  because that really is what I'm talking
about.  It's pretty clear that an almost unlimited number of synthetic
graphs are possible:  for instance,  if there's a SPARQL query that
generates a graph,  that could define a named graph which is a lot like a
view in SQL.  In fact,  I could see this being computed on the fly,  or
being materialized,  like a temporary table in SQL.

Specifically,  however,  I need the ability to stick named graph tags
cheaply on items in a local RDF store (specify that a triple is in 10 named
graphs w/o copying it 10 times),  and to be able to efficiently do graph
algebra involving unions and intersections of graphs defined by those tags.
 I'm thinking about using this on triple stores with between 1 billion-100
billion triples. On the low end I expect to be able to do it with a single
computer  commodity hardware,  but I'll accept having to use some kind of
cluster to handle stuff on the high end of this range.  Optimization and
good index structures would be essential to this.

Beyond that,  it's pretty exciting to explore what's possible with
synthetic graphs (on the side of the software stack facing the user) and
with named graph tags on the inside. For instance,  synthetic graphs
could specify what sort of inference is used to extend the graph:  much as
early versions of Cyc had multiple get() functions,  we could have some
synthetic graphs with practically no inference capability,  and other ones
that go to extremes (analogical reasoning,  CWA) to answer questions.

 For instance,  I think physical partitioning is going to become extremely
important for large-scale RDF systems:  named graph tags would be an
effective mechanism to route triples to specialized storage mechanisms:  for
instance,  I might want to route 20,000 upper ontology triples to a
specialized in-RAM engine that does expensive inference operations,  route a
web link graph with 5 billion triples to a specialized storage engine that
does extreme compression,  etc.


Context Tags, Context Sets and Beyond Named Graphs...

2010-01-18 Thread Paul Houle
For a while I've been struggling with a number of practical problems working
in RDF.  Some of these addressed by Named Graphs as they currently exists,
but others aren't.

Over the weekend I had an idea for something that I think is highly
expressive but also can be implemented efficiently.

The idea is that the context of triple can be,  not a name,  but a
collection of tags that work like tags on delicious,  flickr,  etc.  Tags
are going to be namespaced like RDF properties,  of course,  but they could
have meanings like:

#ImportedFromDBpedia3.3
#StoredInPhysicalPartition7
#ConfidentialSecurityLevel
#NotTrue
#InTheStarTrekUniverse
#UsedInProjectX
#UsedInProjectY
#VerifiedToBeTrue
#HypothesisToBeTested

Individually I call these Context Tags,  and the set of them that is
associated with a triple is a Context Set.

Now,  named graphs can be composed from boolean combination of tags,  such
as

AND(#ImportedFromDbPedia3.3,#InTheStarTrekUniverse)

NOT(#NotTrue)

AND(NOT(#ConfidentialSecurityLevel),OR(#UsedInProjectX,#UsedInProjectY))

===

Note that this is a feature of the underlying storage layer that can be
exploited by layers of the RDF store that are above:  for instance,
 something just above the storage layer could hash the subject URL and then
pass a physical partition tag to that the actual storage layer.  Similarly,
 security rules could be applied automatically.

===

There are many details to be filled in and features that could be added.
 For instance,  I could image that it might be useful to allow multiple
context sets to be attached to a triple,  for instance,  if you had the
triple

:WarpDrive :isA :SpacePropulsionDevice

it might be desirable to assert this as

#RealWorld #NotTrue

and to also assert that it is

#InTheStarTrekUniverse

I've thought through the implications of this less,  however.


The immediate use for this system that I see is that existing query
mechanisms for named graphs can be applied to the computed graphs.
 Inference mechanisms that can do other things with context tags is a wide
open question.

===

Anyhow,  I want this pretty bad.  (i) If you're selling this,  I'm buying;
 (ii) if you want to build this,  contact me.


Re: Context Tags, Context Sets and Beyond Named Graphs...

2010-01-18 Thread Leigh Dodds
Hi Jeni,

2010/1/18 Jeni Tennison j...@jenitennison.com:
 Do you think that http://www.w3.org/2004/03/trix/rdfg-1/ is sufficient for
 describing the relationships between graphs (for these purposes) and if not,
 what do you think needs adding?

No I don't think its sufficient, certainly not for the kinds of use
cases that Paul was describing. The RDFG schema only has two
properties defining equivalency and a sub-set relationship.

I was thinking more along the lines of a means to describe the process
of constructing a graph by operations on a set of other graphs, where
those operations would include basic algrebra operators.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Context Tags, Context Sets and Beyond Named Graphs...

2010-01-18 Thread Axel Rauschmayer
This description reminds me of NRL, maybe it is closer to what you need.
http://www.semanticdesktop.org/ontologies/nrl/

Axel

On Jan 18, 2010, at 21:47 , Leigh Dodds wrote:

 Hi Jeni,
 
 2010/1/18 Jeni Tennison j...@jenitennison.com:
 Do you think that http://www.w3.org/2004/03/trix/rdfg-1/ is sufficient for
 describing the relationships between graphs (for these purposes) and if not,
 what do you think needs adding?
 
 No I don't think its sufficient, certainly not for the kinds of use
 cases that Paul was describing. The RDFG schema only has two
 properties defining equivalency and a sub-set relationship.
 
 I was thinking more along the lines of a means to describe the process
 of constructing a graph by operations on a set of other graphs, where
 those operations would include basic algrebra operators.
 
 Cheers,
 
 L.
 
 -- 
 Leigh Dodds
 Programme Manager, Talis Platform
 Talis
 leigh.do...@talis.com
 http://www.talis.com
 
 

-- 
axel.rauschma...@ifi.lmu.de
http://www.pst.ifi.lmu.de/~rauschma/