Hello,

I'm doing some work matching between XML documents - one set has no characters 
outside the basic ASCII range while the other has a mix of of Ø and Ö and lots 
of others. Some are in UPPER case and some in Mixed. I need to match a "James" 
in one file to "JAMES" in another and so on. To do the comparisons I've been 
looking at BaseX's support for collations.

Following the example in the documentation like this works perfectly:

declare default collation 'http://basex.org/collation?strength=primary';
"Straße" = "Strasse",
"Jérome" = "Jerome",
"James" = "JAMES"

But it doesn't work when testing attribute (or node) values in a statement like 
this:

declare default collation 'http://basex.org/collation?strength=primary';
let $doc := doc('
                <root>
                        <test name="Straße">Straße</test>
                        <test name="Strasse">Strasse</test>
                </root>
')
return count($doc/root/test[@name = "Strasse"])

I would expect that to return a count of 2 but it returns a count of 1.

I can get round this by calling fn:compare() like this but it feels like a hack:

declare default collation 'http://basex.org/collation?strength=primary';
let $doc := doc('
                <root>
                        <test name="Straße">Straße</test>
                        <test name="Strasse">Strasse</test>
                </root>
')
return count($doc/root/test[0=fn:compare(@name,"Strasse")])

Is this behaviour as intended? I can see that it might make query speed and 
indexes much better to ignore collation for = but I couldn't find it stated in 
the documentation. My quick read of the specification suggested that the 
operation of fn:compare would drive the behaviour of eq, gt, lt etc.

I think that I'm probably doing this completely the wrong way and I should be 
using some of the other features of Full-Text but I'm not sure. If anyone can 
point me in the right direction I will be very grateful.

Many thanks, James


_______________________________________________
BaseX-Talk mailing list
[email protected]
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Reply via email to