Re: How is UTF-8 handled in TDB

Andy Seaborne Thu, 23 Feb 2012 10:07:54 -0800

On 23/02/12 17:05, Tim Harsch wrote:

So I knew that TDB used an id in place of a string, except in the
case of inlined values.  Are you saying that non-inlined values use
an MD5 digest?  I did not know that.


To go from string to id, yes.  It's needed to look up query constants.

So, if no normalization is done on literals how does Fuseki/TDB pass
the normalization tests of SPARQL DAWG?  My understanding of this is
still limited but I'm assuming that normalization tests won't pass
for two non-normalized literals (that are non-equal without
normalization; but would be after) unless both literals in a
comparison were first normalized (either as pre-step or at string
table load time or at query time).

Thanks, Tim


Which tests exactly?

normalization-01 is explicitly showing that normalized andnon-normalized don't match. The results do not include Alice; there isone match for Eve, not two.


normalization 02,03

If you follow to the email, it's about IRI normalization - that'sdifferent to unicode normalization.


http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JulSep/0096

As q query engine isn't an ebd system (data goes in and out)normalization of URIs isn't required and some argue should not be done.


        Andy

Re: How is UTF-8 handled in TDB

Reply via email to