RE: Representing NULL in RDF

Andy Turner Tue, 04 Jun 2013 03:26:47 -0700

Hi,

You may not know a persons date/time of birth, but if they were born this can 
be a property of their linked data. If you know a persons age to a given level 
of accuracy at a specific time, then you can derive bounds for their date/time 
of birth and provide a likelihood value for that date/time of birth. Similarly, 
people once they die have a date/time of death.

There is a way of querying linked data to select/filter all living people that 
may be of a specific age range at a specific time. This can be done both using 
null values or by implicit null of no property being specified.

I think for entities which are defined as having specific attributes, then it 
is better to have null values in the RDF/XML when these are unknown as I think 
that makes the computation easier.

For triple store RDF then I think Hugh is right.

Of course we have to deal with uncertainty and mess in data about people as we 
often do not know things accurately for sure! Some times we might discover 
conflict such that increases our uncertainty about specific attributes.

Regards,

Andy

________________________________________
From: Hugh Glaser [h...@ecs.soton.ac.uk]
Sent: 04 June 2013 10:35
To: Jan Michelfeit
Cc: <public-lod@w3.org>
Subject: Re: Representing NULL in RDF

If there is a "*standard or generally accepted*" way of doing things, then, as 
has been pointed out, it is to ignore it.
Or rather the norm is that NULL (and "unknown" and anything else like - I'll 
use NULL for shorthand) that is ignored, and doesn't generate a triple.
In fact it is really important to do so, as NULL most often simply represents 
that the value is not known, in my experience.
Making a triple in such situations is one of the RDF101 basic mistakes, as I'm 
sure you know, since it causes all sorts of sensible queries to do very strange 
things.
For example, if the field is a person's age, then it would mean that a simple 
query asking for people of the same age as someone of unknown age would give 
you all the other people whose ages were not known.

If you are in a generic world where you cannot bring any extra information to 
the table, then this is all you can do.

Beyond that, I think that you have to ask exactly what is meant (as you do) and 
then model it.
Basically, is there something that is being said by the NULL, and if so, how 
should that be captured in RDF?
So your
> 4. The value is withheld, e.g., when the data consumer is not allowed to 
> access it.

should be a "visibility" or "privacy" triple.
I think this may be what you are doing in (3) below, but I have some concerns 
about the way you do it there.
Similarly for others such as
> 2. The value is unknown, i.e., it should be there but we don't know it.
which is where you ask the question of whether you want to represent that 
someone's age is actually missing, with a triple.

You need to ask what the new property should be attached to.
It is an important question whether it should be "part of" the value itself.
So, for a "visibility" triple, it may be more that the subject of the row is 
having the property withheld than the value is a nonVisibleValue.
It is the person's foaf:givenName that is not being recorded, not some property 
of a field from a DB.
There are patterns in various domains that try to tackle these sorts of 
problems - in programming languages it is similar to the problem of returning 
an exception instead of a value, and things like Union types can get used.
But remember that you want things to be easy to query for the most basic 
question, and it is likely that you want to simply have a triple that says
:foo foaf:givenName "Jan"
which is what a user expects.
That then allows
SELECT ?name WHERE { :foo foaf:givenName ?name }
In fact, if you have things like your :nullableValue construct, then you can't 
use predicates such as foaf:givenName at all, since the domain/range 
constraints are bad (I think).

Of course you may well find that there is another field in the DB that actually 
has the information already, and is being transformed into RDF as well, in 
which case the NULL field can simply be discarded.

I think for these two I would just leave them without a triple:
> 1. The value is not applicable, i.e. property p does not exist or does not 
> make sense in the context.
> 3. The value doesn't exist, i.e. the property doesn't have a value (e.g. year 
> of death for a person alive).

I don't think I would go of into RDFS and OWL specifically to capture things - 
it is likely that the DB is simply modelling things in an unclear way, and the 
challenge of transforming to RDF is to work out what the fuzziness was and 
shine a light on it.
Remember that the purpose of the whole exercise is to construct some RDF that 
is easy to query - or at least I hope that is the purpose!
So not having triples for things that don't have values is good.
And having triples that give more information about things is also good, as 
they are very easy to query.
In fact, using RDFS and OWL for what is likely to be simple stuff from a DB is 
only likely to provide checking at assertion, and not add anything easy to 
querying - and since you are transforming from a DB, it is likely that the data 
you are transforming is well-formed.

Finally, I know this generates controversy, but I would always avoid bnodes if 
it is possible/sensible to do - generating a URI is not hard, and can be useful 
in the long run. In your example, you could just as easily say "Use a node to 
give more details about the questioned value."

Sorry, I've gone on a bit, but I just went with the flow!

Best
Hugh

On 3 Jun 2013, at 22:39, Jan Michelfeit <michelfeit....@gmail.com>
 wrote:

> Hi,
> thank you all for your answers.
>
>> ... One "represents" a null by failing to include the relationship
>> ... RDF semantics make no assumptions about what the absence of a 
>> proposition/statement means
>
> I agree. The question was actually about *distinguishing* between the 
> mentioned cases.
>
>> From your suggestions and a quite comprehensive answer at SO [1], I see 
>> these solutions:
>
> (1) Use ontology to specify proper constraints. This may be cardinality of 
> the questioned property or, as suggested by Phillip, assertion "that anything 
> with a year of death is necessarily a dead person".
>
> (2) Use an RDF container and possibly rdf:nil (thanks to Barry and Robert for 
> his example) .
>
> (3) Use a blank node to give more details about the questioned value. Examle 
> [2]:
>   :foo :aProp [a :nullableValue; rdf:value "value"] ;
>        :bProp [a :nullableValue; :reason :notAvailable ]
>
> Regards,
> Jan
>
> [1] http://stackoverflow.com/a/16889273/2032064
> [2] http://stackoverflow.com/a/16898786/2032064
>

RE: Representing NULL in RDF

Reply via email to