Re: Subjects as Literals

Pat Hayes Thu, 08 Jul 2010 10:08:58 -0700


On Jul 6, 2010, at 9:51 PM, Sampo Syreeni wrote:

On 2010-07-05, Pat Hayes wrote:
This objection strikes me as completely wrong-headed. Of courseliterals are machine processable.
What precisely does "Sampo" as a plain literal mean to a computer?Do give me the fullest semantics you can.

In RDF, it means the five-character string ess-ay-em-pee-oh, in thatorder. It does not mean anything else. This meaning is fixed by theRDF specification documents themselves. BTW, these are Unicodecharacters, so consult the Unicode documentation for more detail onwhat exactly is meant by a "character" (it is surprisinglycomplicated, and makes fascinating reading.)

As in, is it the Finnish Sampo as in me, my neighbour, or what wouldbe roughly translated as "cornucopia" in some languages?

As you did not specify any language tag, the characters are presumedto be in the English ("Latin") alphabet. Technically, the charactersare all in unicode plane 0.

You could of course just answer that it's just a literal, but thenyou'd be telling precisely the same thing I did: that sort of thinghas only axiomatic semantics, lacking the real world denotationwhich is needed if we want to actually apply this stuff to somethingtangible.

Not at all. Character strings may not be 'tangible', but they are realthings in the world. Being tangible isn't a necessary condition forbeing real. The world comprises many things, probably more kinds ofthing than any of us are capable of imagining at any given moment (the'horatio principle': it is a mistake to want to exclude things fromthe universe of everyone else's discourse, or to presume that one'sown ontological biases are going to be universally shared by others.)

So what is it? As opposed to me as an OID (I don't think the URInamespace registration went through yet): 1.3.6.1.4.1.12798.1.2049 ?I mean, if your semweb killer app ordered that, the user shouldmostly receive a no-thanks for hairy mail prostitution. If theyordered the third kind of Sampo -- they should probably receive hardpsychedelics instead. (And yes, I know this is rather concretebound. I think it should be, too.)
Well, nobody is suggesting allowing literals as predicates [...]
Why? Is there a lesson to be learnt there?

Only that the world in general probably isn't ready yet for this kindof generalized logic. It is being used by specialists and those whoreally need it, like the security agencies (who have been using it forseveral years now).

But it is easy to give 'ridiculous' examples for any syntacticpossibility. I can write apparent nonsense using nothing but URIs,but this is not an argument for disallowing URIs in RDF.
In fact it could be. Whatever format you accept, you should beliberal with, but at the same time you should always have anunambiguous, safe, productive and well-documented interpretation forit all.
This is WRONG. The type specifiers *completely* disambiguate thetext in the body of the literal.
A language signifier tacked onto a plain literal doesn't, as I justshowed.


Actually it does. The literal denotes the string, no more and no less.

An integer annotation on a number just says it's a number

And that ends the matter, right there. A number is a real thing in theworld, it is the denotation of a numeral. It doesn't "carry" anythingelse. If you want to talk about numbers of zlotys, or numbers ofcentimeters, then you need ontologies of zlotys and centimeters (or,perhaps, new datatypes for these things.)

, not what unit it perhaps carries; those are two completelydifferent kinds of numbers, carrying different operational semantics.

No, they are not different kinds of *numbers*. There is only one kindof number, AKA the natural numbers (Im ignoring reals, rationals, andcomplex numbers.)

With literals, typing has come up but it hasn't been fullyintegrated with the rest of the RDF grammar; you can still saythings like 'ten(integer) much-likes "Sampo"@fi' without any usualtype system catching the error.

LIteral types don't check 'errors' in RDF. (Though this one ought tobe caught by any RDF parser, in fact.) This is a complicated issue inthe design of RDF, one which absorbed a great deal of the WG's time.Its probably not relevant to go into this here; it has to do withkeeping RDF monotonic. I can wax lyrical on this if you really want meto.

I'd say that's pretty far from well defined semantics. Even in thesimplest, axiomatic sense. The literal is then the primary culprit-- otherwise you and others have done a swell job in tightening it up.
For plain literals, the meaning of the literal is the stringitself, a unique string of characters.
That I know too.


Well then, isn't that unambiguous enough for you?

With Schema derived or otherwise strictly derived types, the levelof disambiguation can be the same as or even better than withURI's, true. But then that goes the other way around, too: URI'scould take the place of any such precise type.
No, they cannot. For numbers, for example, one would needinfinitely many URIs; but in any case, why bother creating allthese URIs?
There are just as many URI's in abstract as there are integers. Justtake oid:integer:1 and go right past oid:integer:<googol> ifnecessary. Certainly even today the practical maximum GET stringsover even HTTP go right upto thousands of digits of potentialnumerical capacity, quite without the need to compress further.
In theory, it can be argued that we can think about only such manydiscrete concepts. As long as they are discrete, they can beenumerated, and as long as the number stays finite, we could justgive all of them separate numbers. Then just tack them onto a verybig namespace prefix, like my number above. Theoretically it's easy;in pracitce you'd like the kind of hierarhical namespace that URI'sand OID's buy you. But still, naming something like 10^100 discreteobjects would still be easy.

Of course, but then you are presuming that your URI scheme obeys therules of a datatyped literal, but they don't. If I see the URIsampo:thingie.567, who tells me that I should apply the decimal rulesfor figuring out that this means five hundred and sixty seven? Andeven if you can put some weird PHP script at the end of sampo:thingiewhich can autogenerate some (what? OWL? HTML? RDFa?) which 'tells' mewhat that number means, that doesn't help me when I see sampo:thingie.568. Not to mention the issue of why should I use YOUR URI--numerals?What if someone else wants to take over the natural numbers, and theyhave a faster server? So we need aleph-0 sameAs links betweensampo:thingie.<numeral> and someotherguy:betternumber.<numeral> ?This is completely absurd, worse than email spam, to choke up the Webwith HTTP requests for disambiguating decimal numerals.

And then !!!:
We have (universally understood) names for the numbers already,called numerals. For dates, times and so forth, there are manyformats in use throughout human societies, of course. That is WHYthe work of establishing datatype standards work was done. Toignore all this, to reject a widely accepted standard, and advocatereversion to a home-made URI scheme seems to me to be blatantlyirresponsible.
What I want is for more stuff to be standardized and their formatshared. That is *squarely* my problem, here: RDF literals invitemisuse. Perhaps if we banned plain literals, it would be better. Butright now, few people type their literals well, and the typingmechanism even invites people to treat typed values as separate fromthe rest of the triple oriented data model. Which is extra work;which means your typical lazy nerd won't like it enough to implementit proper.

I have heard this argument many times, and I absolutely reject it. Itis an argument against the Web, and ultimately an argument fromarrogance. These lazy nerds can (and do) mistype URIs just as often asliteral strings. But in fact, the world seems to manage. They - thisgreat crowd of stupid people who can't be trusted to type a numbercorrectly - regularly do things like order on-line and check theirbank balances and charge things to their credit cards. I wonder howanyone can permit them to do this, its such a *risk*.


Pat Hayes

Personally, I'd like to see data standardized as broadly aspossible. I'd like to have broad datasets out there, will welldefined semantics. That is pretty much why I then oppose literalswithin the semantic web: they encourage sloppy typing which can killthe whole deal. Especially if we start to allow them all-round.
--
Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2


------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Re: Subjects as Literals

Reply via email to