Hi Ticker

> Problem is that resources/sort/cp65001.txt doesn't give ordering to
> lots of characters; it looks like it covers only about 10,500 of the
> 1,112,064 possible code-points. Many of these non-ordered characters
> are being used by the names in the tile in question.

I used the program in extra/src/uk/me/parabola/util/CollationRules.java
to generate some of the tables.

This uses the file "allkeys.txt" which can be obtained
from https://www.unicode.org/Public/UCA/latest/allkeys.txt

The document explaining the unicode collation rules that references
that file is: http://www.unicode.org/reports/tr10/ It includes a
section for programmatically deriving the weights for characters that
do not have explicit entries in the table.

> Assuming the actual ordering of unspecified code-points doesn't really
> matter, I propose to change the logic slightly so undefined Unicode is
> sorted on its 16-bit value after the range of known sorts.

I think that is a good initial approach to get things working.

Steve

_______________________________________________
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Reply via email to