On 14.11.2013, at 20:39, James Ball <[email protected]> wrote:

> Hello,
> 
> I'm doing some work matching between XML documents - one set has no 
> characters outside the basic ASCII range while the other has a mix of of Ø 
> and Ö and lots of others. Some are in UPPER case and some in Mixed. I need to 
> match a "James" in one file to "JAMES" in another and so on. To do the 
> comparisons I've been looking at BaseX's support for collations.
> 
> Following the example in the documentation like this works perfectly:
> 
> declare default collation 'http://basex.org/collation?strength=primary';
> "Straße" = "Strasse",
> "Jérome" = "Jerome",
> "James" = "JAMES"
> 
> But it doesn't work when testing attribute (or node) values in a statement 
> like this:
> 
> declare default collation 'http://basex.org/collation?strength=primary';
> let $doc := doc('
>               <root>
>                       <test name="Straße">Straße</test>
>                       <test name="Strasse">Strasse</test>
>               </root>
> ')
> return count($doc/root/test[@name = "Strasse"])

Applying @name/data() already helps:

declare default collation 'http://basex.org/collation?strength=primary';
let $doc := doc('
                <root>
                        <test name="Straße">Straße</test>
                        <test name="Strasse">Strasse</test>
                </root>
')
return count($doc/root/test[@name/data() = "Strasse"])
-> 2

> I would expect that to return a count of 2 but it returns a count of 1.
> 
> I can get round this by calling fn:compare() like this but it feels like a 
> hack:
> 
> declare default collation 'http://basex.org/collation?strength=primary';
> let $doc := doc('
>               <root>
>                       <test name="Straße">Straße</test>
>                       <test name="Strasse">Strasse</test>
>               </root>
> ')
> return count($doc/root/test[0=fn:compare(@name,"Strasse")])
> 
> Is this behaviour as intended? I can see that it might make query speed and 
> indexes much better to ignore collation for = but I couldn't find it stated 
> in the documentation. My quick read of the specification suggested that the 
> operation of fn:compare would drive the behaviour of eq, gt, lt etc.
> 
> I think that I'm probably doing this completely the wrong way and I should be 
> using some of the other features of Full-Text but I'm not sure. If anyone can 
> point me in the right direction I will be very grateful.
> 
> Many thanks, James




_______________________________________________
BaseX-Talk mailing list
[email protected]
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Reply via email to