Mary Holstege wrote:
On Tue, 01 Apr 2008 07:05:45 -0700, Marc Moskowitz
<[EMAIL PROTECTED]> wrote:
I'm trying to sort transliterations of Chinese words by standard
pinyin sorting (syllable alphabetically, then by tone, followed by
the next syllable). Is there a collation in either English or Chinese
that deals correctly with this? If not, is there some way of creating
a user-defined sort order? I know that I can create a sortable form
for each word that sorts correctly by codepoint, but I would rather
do something more efficient if possible.
Marc Moskowitz
Interactive Factory
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
The collation named "http://marklogic.com/collation/zh" ought to
do what you want. Pinyin is the default ordering for (mainland)
Chinese. There is no way of defining your own collation. In
theory you could write your own ordering function that operated
on the strings, but it would be fairly painful and slow I imagine.
//Mary
Mary Holstege
Lead Engineer
Mark Logic Corporation
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
Mary,
The standard zh collation sorts Chinese characters correctly, but I'm
trying to sort the pinyin transliterations. For example, this XQuery:
default collation="http://marklogic.com/collation/zh"
let $words := ('fù-bèi shòu dí','fùdi','fùgǎo','fūzi','fùtòng','fùxiè',
'fù-mu')
for $x in $words
order by $x
return $x
returns
fù-bèi shòu dí
fù-mu
fùdi
fùgǎo
fùtòng
fùxiè
fūzi
which is in codepoint order, instead of the correct order:
fūzi (1st tone comes before 4th)
fù-bèi shòu dí
fùdi
fùgǎo
fù-mu (hyphens should be ignored)
fùtòng
fùxiè
Am I correct that the supported way to sort this text is to create a
sortable form for each of these strings at document load time?
-Marc
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general