I think this is a very nice example, and will show the power of seeing atom as RDF. If we accept that the atom syntax is an RDF syntax [Ø] then we can draw the two entries in the ascii graph notation [1] as follows.


_f1 ---is a---> <Feed>
 |----head---->  ....
 |----entry---> _e1 --is a---> <Entry>
                 |----id-----> <http://123>
                 |---date----> "2005-02-02T13:05:04"^^xsd:dateTime
                 |---geo:x--->"10.1"
                 |---geo:y--->"57.3"

_f2 ---is a---> <Feed>
 |----head---->  ....
 |----entry---> _e2 --is a---> <Entry>
                 |----id-----> <http://123>
                 |---date----> "2005-02-03T07:05:04"^^xsd:dateTime
                 |---seismo:magnitude--->"7"


let us take the point of view now of a simple machine confronted by these two entries. Should it merge these two entries or should it leave them as separate?


The machine that parses this could now look up the atom owl document located at the atom name space location, where it would find out that the <atom:id> and the <atom:date> relations are functional [2], so can only have one value. Since _e1 and _e2 are related to different dates it follows that _e1 and _e2 are not the same and so should not be merged.

Now the problem with the graph above, is that it does not make sense to
assign a geo location to an entry. This will probably not be allowed by the geo ontology, which will probably specify some location class to be the the subject of such a relation. One may want to imagine some weird monster called a Entry-geolocation, but it will have all kinds of weird properties. So let us be throw that monster out of the window, even though it makes xml sense: it does not make any other sense at all.


This is probably what was meant:

_f1 ---is a---> <Feed>
|----head----> _h1
| |---author-> _p1 ---is a---> <Person>
| |----email---> [EMAIL PROTECTED]
|
|----entry---> _e1 --is a---> <Entry>
|----id-----> <http://123>
|---date----> "2005-02-02T13:05:04"^^xsd:dateTime
|--content--> _p1 ---is a ---> <foaf:Person>
|--foaf:home--> _loc ---is a---><geo:Location>
| |---geo:x--->"10.1"
| |---geo:y--->"57.3"
|--foaf:mbox--> <mailto:[EMAIL PROTECTED]>



_f2 ---is a---> <Feed>
|--head--> ....
|--entry-> _e2 --is a---> <Entry>
|----id-----> <http://example/123>
|---date----> "2005-03-02T07:05:04"^^xsd:dateTime
|--content--> _event1 ---is a ---> <phy:Event>
|--phy:date-->"2005-02-02T13:05:04"^^xsd:dateTime
|--phy:loc---> _loc2 --is a ---> <geo:Location>
| |--geo:x-------> "10.1"
| |--geo:y-------> "57.3"
|--seismo:magnitude-->"7"



So here again the machine can see that it has two entries. Since _e1 and _e2 have different dates and different ids. But the machine can go on its merry way and deduce some other interesting things. It can take notice that the location of the
home of person _p1 is the same as the location of _event1. It knows this because
by downloading the OWL ontology description of the geo ontology it will find that the geo:x and geo:y properties are functional too, and that combined they are inverse functinal.[3] So it knows that _loc1 and _loc2 are the same. These can then be merged. As a result it knows that _event1 occurred at the location of the home of person _p1. It still does not know that this occurrence was at at time during which _p1 had his home there. Perhaps another entry somewhere else will have that information. Perhaps somewhere else there would be foaf:knows information that relates the author of the feed _f1 to _p1, and this would be again some very useful information.



Ok. So what can we deduce from the above:
- it is possible to have nonsensical but well formed xml
- well defined Ontologies can help create extensions to atom that make sense
- These ontologies then allow certain deductions to be made by in the know software that can be very valuable.



Henry Story



[0] http://www.imc.org/atom-syntax/mail-archive/msg11850.html
[1] one does not have to understand graph theory to understand this, just as one does not have to understand set theory to check that one has been given the correct change.
As a notational convention variables such as _f1 and _f2 are blank nodes. <Entry>, <Feed>, <http://123> are URIs. ----date----> is a directed relation between the Node on the left and the node on the right which as a url of
atom:date.
[2] functional in the same way as f(x)=y means that for all x there is only one mapped y. If f is the times 2 function the f(7)=14. This is why the square root function only covers positive real numbers because if it did not it would not be functional and sqrt(-4) = -2 and +2
[3] I am not sure this can be defined as it stands in OWL, but clearly it could.


On 8 Jan 2005, at 13:00, David Powell wrote:



Saturday, January 8, 2005, 9:59:12 AM, you wrote:

Say your system is aggregating material from two sensors, and you get
the following, one from each:

<entry>
  <id>http://123</id>
  <date>2005-02-02</date>
  <geo:x>10.1</geo:x>
  <geo:y>57.3</geo:y>
</entry>

<entry>
  <id>http://123</id>
  <date>2005-02-03</date>
  <seismo:magnitude>7</seismo:magnitude>
</entry>

It isn't clear how these should be merged - does the entry with the
later date replace the earlier one? The (presumably) desired behaviour
is for the geo+seismo properties all to appear as elements under
(properties of) the entry. Mapping syntax to a model can help decide
what to do  *in the general case* rather than per-extension. As it
happens, RSS/RDF would only go part of the way with something like
this - you'd get the desired merged entry, but with two dates.

I think that reason that RDF is only helping partially here is that you have a simplified model of an entry that is actually only modelling the state of an entry at a point in time, when actually entries are expected to vary over time and have multiple instances.

This is fine, it is a useful simplification; models are supposed to be
simplifications.

But you can't then expect to merge two different instances of the
entry under this model using simple RDF graph merging, because the
model is an over-simplification.

Eg: If you merged:

<entry>
  <id>http://123</id>
  <date>2005-02-03</date>
  <seismo:magnitude>7</seismo:magnitude>
</entry>

and

<entry>
  <id>http://123</id>
  <date>2005-02-04</date>
  <seismo:magnitude>7.5</seismo:magnitude>
</entry>

... you would get:

<entry>
  <id>http://123</id>
  <date>2005-02-03</date>
  <date>2005-02-04</date>
  <seismo:magnitude>7.5</seismo:magnitude>
  <seismo:magnitude>7</seismo:magnitude>
</entry>

Not very useful.


A solution for RDF-based applications wanting to merge incoming RSS and get a consistent RDF model is to either have special application-level merging logic that knows about the simplification that has been made; or to have a model where instances of an entry are modelled explicitly. If the instances have ids, then you can allow for both posting new instances of entries, and annotating existing instances with extra information. Its a tradeoff between flexibility and complexity.


-- Dave





Reply via email to