[gradle-dev] RDF, SemWeb, reasoning, and Gradle.

Luke Daley Tue, 11 Dec 2012 17:42:17 -0800

Hi all,

I'm back on this bandwagon. While I'm away, I've been reading/researching on 
this topic. I'm more convinced than ever that our future lies in this 
direction. I've got more learning to do, but I wanted to share some interesting 
things at this point.

At the heart of the stack of practices/technologies that is collectively called
Semantic Web, is RDF - Resource Description Framework (you can substitute
“resource” with “it” or “thing”). RDF can be thought of as a similar initiative
to XML, in some ways. Unlike XML, RDF is not a serialisation format. RDF is
formal system for stating facts about things, in fundamentally a graph
structure. This has serious implications when compared to a hierarchical model
(e.g. XML) or relational. Also built into the concept is what is considered the
AAA principle, Anyone can say Anything about Anything. Said more practically,
built into the system is the idea of enriching a graph with new
facts/connections by aggregating graphs. There are many sources of RDF data and
many ways to embed RDF in common serialisation formats (e.g. XML, JSON), and
even ways to on the fly convert relational information into RDF on the fly.
Once you have data in RDF (however that is), it becomes trivial to aggregate
the data and use the enriched model. The key thing here, is that built in to
the system is the idea that there are always more things to learn about
something. More facts can be discovered over time. Said a different way, the
distributed nature of the data is embraced.

I am convinced that this is the toolset and mindset by which we should be
modelling our domain. We all know the power of graph structures, and this idea
of collecting facts about things resonates very strongly with the direction
that we are heading in with our dependency management model.

There are other aspects as well.

There are specialised data stores that are known as triple stores. RDF is based
on the concept of triple statements; «subject» «predicate» «object» (e.g. luke
is-a male, london is-in uk, luke lives-in london). Triple stores can store huge
numbers of facts, which can be queried. There are many interesting things about
triple stores, but one of the most pertinent for us is that they are
effectively schema less. If we are to be collecting facts about things that we
don't explicitly model (and I can guarantee we will be) then this is critical.

There are different querying options, but the emerging standard is SPARQL.
Think SQL but for graph data. SPARQL can be used to select paths through the
graph of facts. Given our simple graph (luke is-a male, london is-in uk, luke
lives-in london), you can query like…

SELECT ?who
WHERE ?who is-a male
WHERE ?who lives-in ?city
WHERE ?city is-in uk

Who are the males who live in cities in the uk? That's pretty standard graph
querying stuff. If you've ever done logic programming, you might recognise the
idea. To bring it home in practical terms, think queries like: what are all the
source jars for libraries that are java 7 compatible?

There is a very strong emphasis on modelling within RDF/SemWeb, but also the
acceptance that no model is completely correct/perfect/universal. You can
merge/adapt models without having to transform one schema into another with
code. You just need to make connections between the two models, linking the
entities. This reduces the pressure to model everything you'll need, or get it
exactly right up front.

One aspect that is still unclear to me is the role that reasoning could play.
Going beyond RDF, you can start to use richer modelling languages to describe
higher order concepts. As an example, we could model dependency “compatibility”
in such a way that a reasoning tool could reason out a compatible set of
dependencies from a graph. The appeal here, is that to extend the capability of
the system (e.g. adding variants, architectures etc) would be a matter of
extending the model and not writing more algorithms to interpret the data
graph. This is an appealing idea. I haven't gone to far down this road yet as
I'm just getting started, but it's pretty easy to see the potential. The
potential is especially staggering when you consider the enterprise component
graph and what kind of facts you could reason out.

I've got much more reading/study to do on this topic. I'm convinced it's in our
best interests to pursue, though I don't expect everyone else to at this point.
It's worth noting that this is not an experimental, academic, technology. This
is in serious use in some very large scale sophisticated systems.

I'm mentioning all this at this time to try and pique your interest and
encourage you to take any opportunity that might arise in the near future to
learn more about this stuff.

--
Luke Daley
Principal Engineer, Gradleware
http://gradleware.com

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

[gradle-dev] RDF, SemWeb, reasoning, and Gradle.

Reply via email to