Hi all, I'm back on this bandwagon. While I'm away, I've been reading/researching on this topic. I'm more convinced than ever that our future lies in this direction. I've got more learning to do, but I wanted to share some interesting things at this point.
At the heart of the stack of practices/technologies that is collectively called Semantic Web, is RDF - Resource Description Framework (you can substitute “resource” with “it” or “thing”). RDF can be thought of as a similar initiative to XML, in some ways. Unlike XML, RDF is not a serialisation format. RDF is formal system for stating facts about things, in fundamentally a graph structure. This has serious implications when compared to a hierarchical model (e.g. XML) or relational. Also built into the concept is what is considered the AAA principle, Anyone can say Anything about Anything. Said more practically, built into the system is the idea of enriching a graph with new facts/connections by aggregating graphs. There are many sources of RDF data and many ways to embed RDF in common serialisation formats (e.g. XML, JSON), and even ways to on the fly convert relational information into RDF on the fly. Once you have data in RDF (however that is), it becomes trivial to aggregate the data and use the enriched model. The key thing here, is that built in to the system is the idea that there are always more things to learn about something. More facts can be discovered over time. Said a different way, the distributed nature of the data is embraced. I am convinced that this is the toolset and mindset by which we should be modelling our domain. We all know the power of graph structures, and this idea of collecting facts about things resonates very strongly with the direction that we are heading in with our dependency management model. There are other aspects as well. There are specialised data stores that are known as triple stores. RDF is based on the concept of triple statements; «subject» «predicate» «object» (e.g. luke is-a male, london is-in uk, luke lives-in london). Triple stores can store huge numbers of facts, which can be queried. There are many interesting things about triple stores, but one of the most pertinent for us is that they are effectively schema less. If we are to be collecting facts about things that we don't explicitly model (and I can guarantee we will be) then this is critical. There are different querying options, but the emerging standard is SPARQL. Think SQL but for graph data. SPARQL can be used to select paths through the graph of facts. Given our simple graph (luke is-a male, london is-in uk, luke lives-in london), you can query like… SELECT ?who WHERE ?who is-a male WHERE ?who lives-in ?city WHERE ?city is-in uk Who are the males who live in cities in the uk? That's pretty standard graph querying stuff. If you've ever done logic programming, you might recognise the idea. To bring it home in practical terms, think queries like: what are all the source jars for libraries that are java 7 compatible? There is a very strong emphasis on modelling within RDF/SemWeb, but also the acceptance that no model is completely correct/perfect/universal. You can merge/adapt models without having to transform one schema into another with code. You just need to make connections between the two models, linking the entities. This reduces the pressure to model everything you'll need, or get it exactly right up front. One aspect that is still unclear to me is the role that reasoning could play. Going beyond RDF, you can start to use richer modelling languages to describe higher order concepts. As an example, we could model dependency “compatibility” in such a way that a reasoning tool could reason out a compatible set of dependencies from a graph. The appeal here, is that to extend the capability of the system (e.g. adding variants, architectures etc) would be a matter of extending the model and not writing more algorithms to interpret the data graph. This is an appealing idea. I haven't gone to far down this road yet as I'm just getting started, but it's pretty easy to see the potential. The potential is especially staggering when you consider the enterprise component graph and what kind of facts you could reason out. I've got much more reading/study to do on this topic. I'm convinced it's in our best interests to pursue, though I don't expect everyone else to at this point. It's worth noting that this is not an experimental, academic, technology. This is in serious use in some very large scale sophisticated systems. I'm mentioning all this at this time to try and pique your interest and encourage you to take any opportunity that might arise in the near future to learn more about this stuff. -- Luke Daley Principal Engineer, Gradleware http://gradleware.com --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email
