Re: TDB Literal Canonicalization

Andy Seaborne Tue, 16 Aug 2011 09:57:46 -0700


On 14/08/11 22:05, Ian Emmons wrote:

Andy,

Sorry about the attachments.  I'm not sure why they were eaten.  I've
pasted the two files into the email body below, along with the
output.

I'm afraid that as soon as I retried my test program (with a couple
of minor changes) in light of your advice, I was unable to duplicate
the behavior that I thought I had observed.  Rather, I found
different, but still puzzling behavior.  I suspect I simply made a
mistake previously.  Here is a quick summary of my experiment:

* I am comparing a numeric literal in a query to an integer literal
in a model.

* The variables are: - Memory model versus TDB model - Comparison
within a filter versus in the triple pattern itself - Integer versus
decimal - Canonical versus non-canonical lexical form

* Complete results can be seen below, but the unexpected result is
this:  When the literal in the query is in the triple pattern and is
type decimal, then a memory model produces a positive match, but a
TDB model does not.

* I am using TDB 0.8.10 (and the Jena and ARQ that come with it).

Is this what you expect?


Yes, it is what I expect with TDB currently.

Jena in-memory does comparisons by value and keeps terms separate;
; TDB comparision in patterns are done by comparing the NodeIds.

TDB canonicalizes integers and decimals but keeps them separate, so theyare different NodeIds.


Is

:x :p 47 .
:x :p 47.0 .

one triple or two?

For TDB, it could keep values only, get the comparison you expected (notunreasonably) but to keep access efficient if would have to be bykeeping one triple for the example. Probbaly, I'd keep integer valuesas integers even if decimals in the data:


"47.0"^^xsd:decimal input would be "47"^^xsd:integer output.

        Andy

Re: TDB Literal Canonicalization

Reply via email to