Re: Custom alphabetical sort

2015-05-02 Thread Joel Goldstick
On Sat, May 2, 2015 at 3:25 PM, Dave Angel  wrote:
> On 05/02/2015 11:35 AM, Pander Musubi wrote:
>>
>> On Monday, 24 December 2012 16:32:56 UTC+1, Pander Musubi  wrote:
>>>
>>> Hi all,
>>>
>>> I would like to sort according to this order:
>>>
>>> (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
>>> 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c',
>>> 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È',
>>> 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì',
>>> 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O',
>>> 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r',
>>> 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù',
>>> 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>>>
>>> How can I do this? The default sorted() does not give the desired result.
>>>
>>> Thanks,
>>>
>>> Pander
>>
>>
>> Meanwhile Python 3 supports locale aware sorting, see
>> https://docs.python.org/3/howto/sorting.html
>>
>
> You're aware that the message you're responding to is 16 months old?
>
> And answered pretty thoroughly, starting with the fact that the OP's desired
> order didn't match any particular locale.
>
>
>
> --
> DaveA
> --
> https://mail.python.org/mailman/listinfo/python-list

Dave, he is the OP.  Talking to himself?

-- 
Joel Goldstick
http://joelgoldstick.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2015-05-02 Thread Dave Angel

On 05/02/2015 11:35 AM, Pander Musubi wrote:

On Monday, 24 December 2012 16:32:56 UTC+1, Pander Musubi  wrote:

Hi all,

I would like to sort according to this order:

(' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 
'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 
'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 
'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 
'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 
'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 
'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 
'Y', 'z', 'Z')

How can I do this? The default sorted() does not give the desired result.

Thanks,

Pander


Meanwhile Python 3 supports locale aware sorting, see 
https://docs.python.org/3/howto/sorting.html



You're aware that the message you're responding to is 16 months old?

And answered pretty thoroughly, starting with the fact that the OP's 
desired order didn't match any particular locale.




--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2015-05-02 Thread Pander Musubi
On Monday, 24 December 2012 16:32:56 UTC+1, Pander Musubi  wrote:
> Hi all,
> 
> I would like to sort according to this order:
> 
> (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 
> 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 
> 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 
> 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 
> 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 
> 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 
> 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 
> 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> 
> How can I do this? The default sorted() does not give the desired result.
> 
> Thanks,
> 
> Pander

Meanwhile Python 3 supports locale aware sorting, see 
https://docs.python.org/3/howto/sorting.html
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-28 Thread wxjmfauth
Le vendredi 28 décembre 2012 00:17:53 UTC+1, Ian a écrit :
> On Thu, Dec 27, 2012 at 3:17 PM, Terry Reedy  wrote:
> 
> >> PS Py 3.3 warranty: ~30% slower than Py 3.2
> 
> >
> 
> >
> 
> > Do you have any actual timing data to back up that claim?
> 
> > If so, please give specifics, including build, os, system, timing code, and
> 
> > result.
> 
> 
> 
> There was another thread about this one a while back.  Using IDLE on Windows 
> XP:
> 
> 
> 
> >>> import timeit, locale
> 
> >>> li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']
> 
> >>> locale.setlocale(locale.LC_ALL, 'French_France')
> 
> 'French_France.1252'
> 
> 
> 
> >>> # Python 3.2
> 
> >>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale; from 
> >>> __main__ import li", number=10))
> 
> 1.1581226105552531
> 
> 
> 
> >>> # Python 3.3.0
> 
> >>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale; from 
> >>> __main__ import li", number=10))
> 
> 1.4595282361305697
> 
> 
> 
> 1.460 / 1.158 = 1.261
> 
> 
> 
> >>> li = li * 100
> 
> >>> import random
> 
> >>> random.shuffle(li)
> 
> 
> 
> >>> # Python 3.2
> 
> >>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale; from 
> >>> __main__ import li", number=1000))
> 
> 1.233450899485831
> 
> 
> 
> >>> # Python 3.3.0
> 
> >>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale; from 
> >>> __main__ import li", number=1000))
> 
> 1.5793845307155152
> 
> 
> 
> 1.579 / 1.233 = 1.281
> 
> 
> 
> So about 26% slower for sorting a short list of French words and about
> 
> 28% slower for a longer list.  Replacing the strings with ASCII and
> 
> removing the 'key' argument gives a comparable result for the long
> 
> list but more like a 40% slowdown for the short list.



Not related to this thread, for information.

My sorting algorithm is doing a little bit more than a 
"locale.strxfrm". locale.strxfrm works precisely fine with
the list I gave as an exemple, it fails in many cases. One
of the bottlenecks is the "œ", which must be seen as "oe".
It is not the place to discuss this kind of linguistic aspects
here.

My algorithm does not use unicodedata or unicode normalization.
Mainly a lot of chars / substrings substitution for the
creation of the primary keys.

jmf
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-27 Thread Ian Kelly
On Thu, Dec 27, 2012 at 3:17 PM, Terry Reedy  wrote:
>> PS Py 3.3 warranty: ~30% slower than Py 3.2
>
>
> Do you have any actual timing data to back up that claim?
> If so, please give specifics, including build, os, system, timing code, and
> result.

There was another thread about this one a while back.  Using IDLE on Windows XP:

>>> import timeit, locale
>>> li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']
>>> locale.setlocale(locale.LC_ALL, 'French_France')
'French_France.1252'

>>> # Python 3.2
>>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale; from 
>>> __main__ import li", number=10))
1.1581226105552531

>>> # Python 3.3.0
>>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale; from 
>>> __main__ import li", number=10))
1.4595282361305697

1.460 / 1.158 = 1.261

>>> li = li * 100
>>> import random
>>> random.shuffle(li)

>>> # Python 3.2
>>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale; from 
>>> __main__ import li", number=1000))
1.233450899485831

>>> # Python 3.3.0
>>> min(timeit.repeat("sorted(li, key=locale.strxfrm)", "import locale; from 
>>> __main__ import li", number=1000))
1.5793845307155152

1.579 / 1.233 = 1.281

So about 26% slower for sorting a short list of French words and about
28% slower for a longer list.  Replacing the strings with ASCII and
removing the 'key' argument gives a comparable result for the long
list but more like a 40% slowdown for the short list.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-27 Thread Terry Reedy

On 12/27/2012 1:17 PM, wxjmfa...@gmail.com wrote:

Le lundi 24 décembre 2012 16:32:56 UTC+1, Pander Musubi a écrit :

I would like to sort according to this order:
(' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 
'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 
'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 'F', 'g', 
'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 
'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 
'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 
'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 
'Y', 'z', 'Z')



One way is to create a list of 2-lists / 2-tuples, like

[(modified_word_1, word_1), (modified_word_2, word_2), ...]

and to use the native sorting wich will use the first element
modified_word_2 as primary key.

The task lies in the creation of the primary keys.

I did it once for French (seriously) and for German (less
seriously) scripts. (Only as an exercise for fun).



>>> rob = ['noduleux', 'noël', 'noèse', 'noétique',
... 'nœud', 'noir', 'noirâtre']
>>> z = list(rob)
>>> random.shuffle(z)
>>> z
['noirâtre', 'noèse', 'noir', 'noël', 'nœud', 'noétique',
'noduleux']
>>> zo = libfrancais.sortfr(z)
>>> zo
['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir',
'noirâtre']
>>> zo == rob
True



PS Py 3.3 warranty: ~30% slower than Py 3.2


Do you have any actual timing data to back up that claim?
If so, please give specifics, including build, os, system, timing code, 
and result.


--
Terry Jan Reedy


--
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-27 Thread wxjmfauth
Le lundi 24 décembre 2012 16:32:56 UTC+1, Pander Musubi a écrit :
> Hi all,
> 
> 
> 
> I would like to sort according to this order:
> 
> 
> 
> (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 
> 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 
> 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 
> 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 
> 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 
> 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 
> 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 
> 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> 
> 
> 
> How can I do this? The default sorted() does not give the desired result.
> 

-

One way is to create a list of 2-lists / 2-tuples, like

[(modified_word_1, word_1), (modified_word_2, word_2), ...]

and to use the native sorting wich will use the first element
modified_word_2 as primary key.

The task lies in the creation of the primary keys.

I did it once for French (seriously) and for German (less
seriously) scripts. (Only as an exercise for fun).

Eg.

>>> rob = ['noduleux', 'noël', 'noèse', 'noétique',
... 'nœud', 'noir', 'noirâtre']
>>> z = list(rob)
>>> random.shuffle(z)
>>> z
['noirâtre', 'noèse', 'noir', 'noël', 'nœud', 'noétique',
'noduleux']
>>> zo = libfrancais.sortfr(z)
>>> zo
['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir',
'noirâtre']
>>> zo == rob
True

PS Py 3.3 warranty: ~30% slower than Py 3.2

jmf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-26 Thread Joshua Landau
On 25 December 2012 06:18, Dave Angel  wrote:

> On 12/24/2012 06:19 PM, Pander Musubi wrote:

 

>  > Thanks very much for this efficient code.
>
> Perhaps you missed Ian Kelly's correction of Thomas Bach's approach:
>
> d = { k: v for v, k in enumerate(cs) }
>
>
> def collate(x):
> return list(map(d.get, x))
>
> sorted(data, key=collate)
>
> I'd use Ian Kelly's approach.


Well, he was first to it :P


> It's not only more compact,


I take offence* here! The only difference was "list(map(d.get, x))" vs
"[hashindex[s] for s in string]" (11 chars) and my longer naming scheme. If
you really care enough about those to sway your judgement, shame on you! ;)

* Not really

it shouldn't
> give an exception for a character not in the table.


That was a choice, not a bug. I didn't want undefined behaviour, so I
thought I'd leave it to crash on "bad" input than sort in a way that may be
unwanted. Even Ian Kelly gave this as way of coding it.


> At least, not for
> Python 2.x.  I'm not sure about Python 3, since it can give an exception
> comparing None to int.



Please not that this post was done in humour (but with truth) to delay
sleep. No offence to Ian or you intended ;).

Happy After-Christmas!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Dave Angel
On 12/24/2012 06:19 PM, Pander Musubi wrote:
> 

> to prevent
>
> Traceback (most recent call last):
>   File "./sort.py", line 23, in 
> things_to_sort.sort(key=string2sortlist)
>   File "./sort.py", line 15, in string2sortlist
> return [hashindex[s] for s in string]
> KeyError: '\xc3'
>
> Thanks very much for this efficient code.

Perhaps you missed Ian Kelly's correction of Thomas Bach's approach:

d = { k: v for v, k in enumerate(cs) }


def collate(x):
return list(map(d.get, x))

sorted(data, key=collate)

I'd use Ian Kelly's approach.  It's not only more compact, it shouldn't
give an exception for a character not in the table.  At least, not for
Python 2.x.  I'm not sure about Python 3, since it can give an exception
comparing None to int.


-- 

DaveA

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Pander Musubi
On Monday, December 24, 2012 7:12:43 PM UTC+1, Joshua Landau wrote:
> On 24 December 2012 16:18, Roy Smith  wrote:
> 
> 
> 
> 
> In article <40d108ec-b019-4829-a969-c8ef51386...@googlegroups.com>,
> 
>  Pander Musubi  wrote:
> 
> 
> 
> > Hi all,
> 
> 
> >
> 
> > I would like to sort according to this order:
> 
> >
> 
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
> 
> > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C',
> 
> > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f',
> 
> > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?',
> 
> > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?',
> 
> > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R',
> 
> > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v',
> 
> 
> > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> 
> >
> 
> 
> > How can I do this? The default sorted() does not give the desired result.
> 
> 
> 
>  
> 
> 
> 
> 
> Given all that, I would start by writing some code which turned your
> 
> alphabet into a pair of dicts.  One maps from the code point to a
> 
> collating sequence number (i.e. ordinals), the other maps back.
> 
> Something like (for python 2.7):
> 
> 
> 
> alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
> 
>             '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
> 
>             [...]
> 
> 
>             'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> 
> 
> 
> map1 = {c: n for n, c in enumerate(alphabet)}
> 
> map2 = {n: c for n, c in enumerate(alphabet)}
> 
> 
> 
> Next, I would write some functions which encode your strings as lists of
> 
> ordinals (and back again)
> 
> 
> 
> def encode(s):
> 
>    "encode('foo') ==> [34, 19, 19]"  # made-up ordinals
> 
>    return [map1[c] for c in s]
> 
> 
> 
> def decode(l):
> 
>    "decode([34, 19, 19]) ==> 'foo'"
> 
>     return ''.join(map2[i] for i in l)
> 
> 
> 
> Use these to convert your strings to lists of ints which will sort as
> 
> per your specified collating order, and then back again:
> 
> 
> 
> encoded_strings = [encode(s) for s in original_list]
> 
> encoded_strings.sort()
> 
> sorted_strings = [decode(l) for l in encoded_strings]
> 
> 
> 
> This isn't needed and the not-so-new way to do this is through .sort's key 
> attribute.
> 
> 
> 
> 
> encoded_strings = [encode(s) for s in original_list]
> encoded_strings.sort()
> sorted_strings = [decode(l) for l in encoded_strings]
> 
> 
> 
> changes to
> 
> 
> 
> 
> encoded_strings.sort(key=encode)
> 
> 
> 
> [Which happens to be faster ]
> 
> 
> 
> 
> Hence you neither need map2 or decode:
> 
> 
> ## CODE ##
> 
> 
> 
> 
> 
> alphabet = (
>   ' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 
> 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â',
> 
> 
>   'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 
> 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È',
> 
> 
>   'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 
> 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L',
> 
> 
>   'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 
> 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q',
> 
> 
>   'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 
> 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X',
> 
> 
>   'y', 'Y', 'z', 'Z'
> )
> 
> 
> 
> hashindex = {character:index for index, character in enumerate(alphabet)}
> 
> def string2sortlist(string):
>   return [hashindex[s] for s in string]
> 
> 
> 
> 
> # Quickly make some stuff to sort. Let's try 200k, as that's what's suggested.
> import random
> things_to_sort = ["".join(random.sample(alphabet, random.randint(4, 6))) for 
> _ in range(20)]
> 
> 
> 
> 
> print(things_to_sort[:15])
> 
> 
> things_to_sort.sort(key=string2sortlist)
> 
> 
> 
> 
> print(things_to_sort[:15])
> 
> 
> ## END CODE ##
> 
> 
> 
> 
> Not-so-coincidentally, this is exactly the same as Ian Kelly's extension to 
> Tomas Bach's method.

With Python2.7 I had to use

alphabet = (
u' ', u'.', u'\'', u'-', u'0', u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', 
u'9', u'a', u'A', u'ä', u'Ä', u'á', u'Á', u'â', u'Â',
u'à', u'À', u'å', u'Å', u'b', u'B', u'c', u'C', u'ç', u'Ç', u'd', u'D', u'e', 
u'E', u'ë', u'Ë', u'é', u'É', u'ê', u'Ê', u'è', u'È',
u'f', u'F', u'g', u'G', u'h', u'H', u'i', u'I', u'ï', u'Ï', u'í', u'Í', u'î', 
u'Î', u'ì', u'Ì', u'j', u'J', u'k', u'K', u'l', u'L',
u'm', u'M', u'n', u'ñ', u'N', u'Ñ', u'o', u'O', u'ö', u'Ö', u'ó', u'Ó', u'ô', 
u'Ô', u'ò', u'Ò', u'ø', u'Ø', u'p', u'P', u'q', u'Q',
u'r', u'R', u's', u'S', u't', u'T', u'u', u'U', u'ü', u'Ü', u'ú', u'Ú', u'û', 
u'Û', u'ù', u'Ù', u'v', u'V', u'w', u'W', u'x', u'X',
u'y', u'Y', u'z', u'Z'
)

to prevent

Traceback (most recent call last):
  File "./sort.py", line 23, in 
things_to_sort.sort(key=string2sortlist)
  File "./sort.py", line 15, in string2sortlist
return [hashindex[s

Re: Custom alphabetical sort

2012-12-24 Thread Steven D'Aprano
On Mon, 24 Dec 2012 11:18:37 -0500, Roy Smith wrote:

> In article <40d108ec-b019-4829-a969-c8ef51386...@googlegroups.com>,
>  Pander Musubi  wrote:
> 
>> Hi all,
>>
>> I would like to sort according to this order:
[...]
> I'm assuming that doesn't correspond to some standard locale's collating
> order, so we really do need to roll our own encoding (and that you have
> a good reason for wanting to do this).  I'm also assuming that what I'm
> seeing as question marks are really accented characters in some encoding
> that my news reader just isn't dealing with (it seems to think your post
> was in ISO-2022-CN (Simplified Chinese).

Good lord man, what sort of crappy newsreader software are you using? (It 
claims to be "MT-NewsWatcher/3.5.3b3 (Intel Mac OS X)" -- I think 
anything as bad as that shouldn't advertise what it is.) The OP's post 
was correctly labelled with an encoding, and not an obscure one:

Content-Type: text/plain; charset=ISO-8859-1

which if I remember correctly is Latin-1. If your newsreader can't handle 
that, surely it should default to UTF-8, which should give you the right 
results sans question marks.




-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Mark Lawrence

On 24/12/2012 17:40, Roy Smith wrote:

In article <46db479a-d16f-4f64-aaf2-76de65418...@googlegroups.com>,
  Pander Musubi  wrote:


I'm assuming that doesn't correspond to some standard locale's collating
order, so we really do need to roll our own encoding (and that you have
a good reason for wanting to do this).


It is for creating a Dutch dictionary.


Wait a minute.  You're telling me that Python, of all languages, doesn't
have a built-in way to sort Dutch words???



There's a built-in called secret that's only available to those who are 
Dutch and members of the PSU.


A slight aside, I understand that the BDFL is currently on holiday.  For 
those who want a revolution now is as good a time as any :)


--
Cheers.

Mark Lawrence.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Joshua Landau
On 24 December 2012 16:18, Roy Smith  wrote:

> In article <40d108ec-b019-4829-a969-c8ef51386...@googlegroups.com>,
>  Pander Musubi  wrote:
>
> > Hi all,
> >
> > I would like to sort according to this order:
> >
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
> 'a',
> > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c',
> 'C',
> > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?',
> 'f',
> > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?',
> '?',
> > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O',
> '?',
> > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r',
> 'R',
> > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?',
> 'v',
> > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> >
> > How can I do this? The default sorted() does not give the desired result.
>



Given all that, I would start by writing some code which turned your
> alphabet into a pair of dicts.  One maps from the code point to a
> collating sequence number (i.e. ordinals), the other maps back.
> Something like (for python 2.7):
>
> alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
> '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
> [...]
> 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
> map1 = {c: n for n, c in enumerate(alphabet)}
> map2 = {n: c for n, c in enumerate(alphabet)}
>
> Next, I would write some functions which encode your strings as lists of
> ordinals (and back again)
>
> def encode(s):
>"encode('foo') ==> [34, 19, 19]"  # made-up ordinals
>return [map1[c] for c in s]
>
> def decode(l):
>"decode([34, 19, 19]) ==> 'foo'"
> return ''.join(map2[i] for i in l)
>
> Use these to convert your strings to lists of ints which will sort as
> per your specified collating order, and then back again:
>
> encoded_strings = [encode(s) for s in original_list]
> encoded_strings.sort()
> sorted_strings = [decode(l) for l in encoded_strings]
>

This isn't needed and the not-so-new way to do this is through .sort's key
attribute.

encoded_strings = [encode(s) for s in original_list]
encoded_strings.sort()
sorted_strings = [decode(l) for l in encoded_strings]

changes to

encoded_strings.sort(key=encode)

[Which happens to be faster ]

Hence you neither need map2 or decode:

## CODE ##

alphabet = (
' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â',
 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë',
'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È',
 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì',
'Ì', 'j', 'J', 'k', 'K', 'l', 'L',
 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò',
'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q',
 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù',
'Ù', 'v', 'V', 'w', 'W', 'x', 'X',
 'y', 'Y', 'z', 'Z'
)

hashindex = {character:index for index, character in enumerate(alphabet)}
def string2sortlist(string):
return [hashindex[s] for s in string]

# Quickly make some stuff to sort. Let's try 200k, as that's what's
suggested.
import random
things_to_sort = ["".join(random.sample(alphabet, random.randint(4, 6)))
for _ in range(20)]

print(things_to_sort[:15])

things_to_sort.sort(key=string2sortlist)

print(things_to_sort[:15])

## END CODE ##

Not-so-coincidentally, this is exactly the same as Ian Kelly's extension to
Tomas Bach's method.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Pander Musubi
> 
> 
> 
> > > I'm assuming that doesn't correspond to some standard locale's collating 
> 
> > > order, so we really do need to roll our own encoding (and that you have 
> 
> > > a good reason for wanting to do this).
> 
> > 
> 
> > It is for creating a Dutch dictionary.
> 
> 
> 
> Wait a minute.  You're telling me that Python, of all languages, doesn't 
> 
> have a built-in way to sort Dutch words???

Not when you want Roman characters with diacritics to be sorted in the normal 
a-Z range.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Roy Smith
In article <46db479a-d16f-4f64-aaf2-76de65418...@googlegroups.com>,
 Pander Musubi  wrote:

> > I'm assuming that doesn't correspond to some standard locale's collating 
> > order, so we really do need to roll our own encoding (and that you have 
> > a good reason for wanting to do this).
> 
> It is for creating a Dutch dictionary.

Wait a minute.  You're telling me that Python, of all languages, doesn't 
have a built-in way to sort Dutch words???
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Ian Kelly
On Dec 24, 2012 9:37 AM, "Pander Musubi"  wrote:

> > >>> ''.join(sorted(random.sample(cs, 20), key=d.get))
> >
> > '5aAàÀåBCçËÉíÎLÖøquùx'
>
> This doesn't work for words with more than one character:

Try this instead:

def collate(x):
return list(map(d.get, x))

sorted(data, key=collate)

I would also probably change "d.get" to "d.__getitem__" for a clearer error
message in the case the string contains characters that it doesn't know how
to sort.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Pander Musubi
> > Hi all,
> 
> >
> 
> > I would like to sort according to this order:
> 
> >
> 
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
> 
> > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C',
> 
> > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f',
> 
> > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?',
> 
> > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?',
> 
> > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R',
> 
> > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v',
> 
> > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> 
> >
> 
> > How can I do this? The default sorted() does not give the desired result.
> 
> 
> 
> I'm assuming that doesn't correspond to some standard locale's collating 
> 
> order, so we really do need to roll our own encoding (and that you have 
> 
> a good reason for wanting to do this).

It is for creating a Dutch dictionary. This sorting order is not to be found in 
an existing locale.

>  I'm also assuming that what I'm 
> 
> seeing as question marks are really accented characters in some encoding 
> 
> that my news reader just isn't dealing with (it seems to think your post 
> 
> was in ISO-2022-CN (Simplified Chinese).
> 
> 
> 
> I'm further assuming that you're starting with a list of unicode 
> 
> strings, the contents of which are limited to the above alphabet.

Correct.

>  I'm 
> 
> even further assuming that the volume of data you need to sort is small 
> 
> enough that efficiency is not a huge concern.

Well, it is for 200,000 - 450,000 words but the code is allowed be slow. It 
will not be used for web application or something which requires a quick 
response.

> Given all that, I would start by writing some code which turned your 
> 
> alphabet into a pair of dicts.  One maps from the code point to a 
> 
> collating sequence number (i.e. ordinals), the other maps back.  
> 
> Something like (for python 2.7):
> 
> 
> 
> alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
> 
> '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
> 
> [...]
> 
> 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> 
> 
> 
> map1 = {c: n for n, c in enumerate(alphabet)}
> 
> map2 = {n: c for n, c in enumerate(alphabet)}

OK, similar to Thomas' proposal.

> Next, I would write some functions which encode your strings as lists of 
> 
> ordinals (and back again)
> 
> 
> 
> def encode(s):
> 
>"encode('foo') ==> [34, 19, 19]"  # made-up ordinals
> 
>return [map1[c] for c in s]
> 
> 
> 
> def decode(l):
> 
>"decode([34, 19, 19]) ==> 'foo'"
> 
> return ''.join(map2[i] for i in l)
> 
> 
> 
> Use these to convert your strings to lists of ints which will sort as 
> 
> per your specified collating order, and then back again:
> 
> 
> 
> encoded_strings = [encode(s) for s in original_list]
> 
> encoded_strings.sort()
> 
> sorted_strings = [decode(l) for l in encoded_strings]
> 
> 
> 
> That's just a rough sketch, and completely untested, but it should get 
> 
> you headed in the right direction.  Or at least one plausible direction.  
> 
> Old-time perl hackers will recognize this as the Schwartzian Transform.

I will test it and let you know. :) Pander
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Pander Musubi
On Monday, December 24, 2012 5:11:03 PM UTC+1, Thomas Bach wrote:
> On Mon, Dec 24, 2012 at 07:32:56AM -0800, Pander Musubi wrote:
> 
> > I would like to sort according to this order:
> 
> > 
> 
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 
> > 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 
> > 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 
> > 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 
> > 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 
> > 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 
> > 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 
> > 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> 
> > 
> 
> 
> 
> One option is to use sorted's key parameter with an appropriate
> 
> mapping in a dictionary:
> 
> 
> 
> >>> cs = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', 
> >>> '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 
> >>> 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 
> >>> 'Ê', 'è', 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 
> >>> 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 
> >>> 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 
> >>> 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 
> >>> 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 
> >>> 'Y', 'z', 'Z')
> 
> 
> 
> >>> d = { k: v for v, k in enumerate(cs) }
> 
> 
> 
> >>> import random
> 
> 
> 
> >>> ''.join(sorted(random.sample(cs, 20), key=d.get))
> 
> '5aAàÀåBCçËÉíÎLÖøquùx'

This doesn't work for words with more than one character:

>>> test=('øasdf', 'áá', 'aa', 'a123','á1234', 'Aaa', )
>>> sorted(test, key=d.get)
['\xc3\xb8asdf', '\xc3\xa1\xc3\xa1', 'aa', 'a123', '\xc3\xa11234', 'Aaa']


> 
> 
> 
> Regards,
> 
>   Thomas.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Roy Smith
In article <40d108ec-b019-4829-a969-c8ef51386...@googlegroups.com>,
 Pander Musubi  wrote:

> Hi all,
>
> I would like to sort according to this order:
>
> (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
> 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C',
> '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f',
> 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?',
> 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?',
> '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R',
> 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v',
> 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
> How can I do this? The default sorted() does not give the desired result.

I'm assuming that doesn't correspond to some standard locale's collating 
order, so we really do need to roll our own encoding (and that you have 
a good reason for wanting to do this).  I'm also assuming that what I'm 
seeing as question marks are really accented characters in some encoding 
that my news reader just isn't dealing with (it seems to think your post 
was in ISO-2022-CN (Simplified Chinese).

I'm further assuming that you're starting with a list of unicode 
strings, the contents of which are limited to the above alphabet.  I'm 
even further assuming that the volume of data you need to sort is small 
enough that efficiency is not a huge concern.

Given all that, I would start by writing some code which turned your 
alphabet into a pair of dicts.  One maps from the code point to a 
collating sequence number (i.e. ordinals), the other maps back.  
Something like (for python 2.7):

alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
'6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
[...]
'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

map1 = {c: n for n, c in enumerate(alphabet)}
map2 = {n: c for n, c in enumerate(alphabet)}

Next, I would write some functions which encode your strings as lists of 
ordinals (and back again)

def encode(s):
   "encode('foo') ==> [34, 19, 19]"  # made-up ordinals
   return [map1[c] for c in s]

def decode(l):
   "decode([34, 19, 19]) ==> 'foo'"
return ''.join(map2[i] for i in l)

Use these to convert your strings to lists of ints which will sort as 
per your specified collating order, and then back again:

encoded_strings = [encode(s) for s in original_list]
encoded_strings.sort()
sorted_strings = [decode(l) for l in encoded_strings]

That's just a rough sketch, and completely untested, but it should get 
you headed in the right direction.  Or at least one plausible direction.  
Old-time perl hackers will recognize this as the Schwartzian Transform.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Custom alphabetical sort

2012-12-24 Thread Thomas Bach
On Mon, Dec 24, 2012 at 07:32:56AM -0800, Pander Musubi wrote:
> I would like to sort according to this order:
> 
> (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 
> 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 
> 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È', 'f', 
> 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 
> 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 
> 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 'r', 'R', 
> 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 
> 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> 

One option is to use sorted's key parameter with an appropriate
mapping in a dictionary:

>>> cs = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', 
>>> '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â', 'à', 'À', 'å', 'Å', 'b', 'B', 
>>> 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 
>>> 'È', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 
>>> 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 
>>> 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q', 
>>> 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 
>>> 'Ù', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')

>>> d = { k: v for v, k in enumerate(cs) }

>>> import random

>>> ''.join(sorted(random.sample(cs, 20), key=d.get))
'5aAàÀåBCçËÉíÎLÖøquùx'

Regards,
Thomas.
-- 
http://mail.python.org/mailman/listinfo/python-list