Hi all,

I'm back on this bandwagon. While I'm away, I've been reading/researching on 
this topic. I'm more convinced than ever that our future lies in this 
direction. I've got more learning to do, but I wanted to share some interesting 
things at this point.

At the heart of the stack of practices/technologies that is collectively called 
Semantic Web, is RDF - Resource Description Framework (you can substitute 
“resource” with “it” or “thing”). RDF can be thought of as a similar initiative 
to XML, in some ways. Unlike XML, RDF is not a serialisation format. RDF is 
formal system for stating facts about things, in fundamentally a graph 
structure. This has serious implications when compared to a hierarchical model 
(e.g. XML) or relational. Also built into the concept is what is considered the 
AAA principle, Anyone can say Anything about Anything. Said more practically, 
built into the system is the idea of enriching a graph with new 
facts/connections by aggregating graphs. There are many sources of RDF data and 
many ways to embed RDF in common serialisation formats (e.g. XML, JSON), and 
even ways to on the fly convert relational information into RDF on the fly. 
Once you have data in RDF (however that is), it becomes trivial to aggregate 
the data and use the enriched model. The key thing here, is that built in to 
the system is the idea that there are always more things to learn about 
something. More facts can be discovered over time. Said a different way, the 
distributed nature of the data is embraced. 

I am convinced that this is the toolset and mindset by which we should be 
modelling our domain. We all know the power of graph structures, and this idea 
of collecting facts about things resonates very strongly with the direction 
that we are heading in with our dependency management model. 

There are other aspects as well.

There are specialised data stores that are known as triple stores. RDF is based 
on the concept of triple statements; «subject» «predicate» «object» (e.g. luke 
is-a male, london is-in uk, luke lives-in london). Triple stores can store huge 
numbers of facts, which can be queried. There are many interesting things about 
triple stores, but one of the most pertinent for us is that they are 
effectively schema less. If we are to be collecting facts about things that we 
don't explicitly model (and I can guarantee we will be) then this is critical.

There are different querying options, but the emerging standard is SPARQL. 
Think SQL but for graph data. SPARQL can be used to select paths through the 
graph of facts. Given our simple graph (luke is-a male, london is-in uk, luke 
lives-in london), you can query like…

SELECT ?who
WHERE ?who is-a male
WHERE ?who lives-in ?city
WHERE ?city is-in uk

Who are the males who live in cities in the uk? That's pretty standard graph 
querying stuff. If you've ever done logic programming, you might recognise the 
idea. To bring it home in practical terms, think queries like: what are all the 
source jars for libraries that are java 7 compatible?

There is a very strong emphasis on modelling within RDF/SemWeb, but also the 
acceptance that no model is completely correct/perfect/universal. You can 
merge/adapt models without having to transform one schema into another with 
code. You just need to make connections between the two models, linking the 
entities. This reduces the pressure to model everything you'll need, or get it 
exactly right up front.

One aspect that is still unclear to me is the role that reasoning could play. 
Going beyond RDF, you can start to use richer modelling languages to describe 
higher order concepts. As an example, we could model dependency “compatibility” 
in such a way that a reasoning tool could reason out a compatible set of 
dependencies from a graph. The appeal here, is that to extend the capability of 
the system (e.g. adding variants, architectures etc) would be a matter of 
extending the model and not writing more algorithms to interpret the data 
graph. This is an appealing idea. I haven't gone to far down this road yet as 
I'm just getting started, but it's pretty easy to see the potential. The 
potential is especially staggering when you consider the enterprise component 
graph and what kind of facts you could reason out.

I've got much more reading/study to do on this topic. I'm convinced it's in our 
best interests to pursue, though I don't expect everyone else to at this point. 
It's worth noting that this is not an experimental, academic, technology. This 
is in serious use in some very large scale sophisticated systems.

I'm mentioning all this at this time to try and pique your interest and 
encourage you to take any opportunity that might arise in the near future to 
learn more about this stuff. 

-- 
Luke Daley
Principal Engineer, Gradleware 
http://gradleware.com


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply via email to