In article <40d108ec-b019-4829-a969-c8ef51386...@googlegroups.com>, Pander Musubi <pander.mus...@gmail.com> wrote:
> Hi all, > > I would like to sort according to this order: > > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C', > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f', > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?', > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?', > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R', > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v', > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z') > > How can I do this? The default sorted() does not give the desired result. I'm assuming that doesn't correspond to some standard locale's collating order, so we really do need to roll our own encoding (and that you have a good reason for wanting to do this). I'm also assuming that what I'm seeing as question marks are really accented characters in some encoding that my news reader just isn't dealing with (it seems to think your post was in ISO-2022-CN (Simplified Chinese). I'm further assuming that you're starting with a list of unicode strings, the contents of which are limited to the above alphabet. I'm even further assuming that the volume of data you need to sort is small enough that efficiency is not a huge concern. Given all that, I would start by writing some code which turned your alphabet into a pair of dicts. One maps from the code point to a collating sequence number (i.e. ordinals), the other maps back. Something like (for python 2.7): alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?', [...] 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z') map1 = {c: n for n, c in enumerate(alphabet)} map2 = {n: c for n, c in enumerate(alphabet)} Next, I would write some functions which encode your strings as lists of ordinals (and back again) def encode(s): "encode('foo') ==> [34, 19, 19]" # made-up ordinals return [map1[c] for c in s] def decode(l): "decode([34, 19, 19]) ==> 'foo'" return ''.join(map2[i] for i in l) Use these to convert your strings to lists of ints which will sort as per your specified collating order, and then back again: encoded_strings = [encode(s) for s in original_list] encoded_strings.sort() sorted_strings = [decode(l) for l in encoded_strings] That's just a rough sketch, and completely untested, but it should get you headed in the right direction. Or at least one plausible direction. Old-time perl hackers will recognize this as the Schwartzian Transform. -- http://mail.python.org/mailman/listinfo/python-list