[issue47259] string sorting often incorrect

2022-04-08 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

I don't think splashing this everywhere else in the docs would be helpful.  
Tools like list.sort, sorted, min, max, nlargest, nsmallest use whatever sort 
order is provided by the underlying object whether it be a string, tuple, 
float, or int.

The section on expressions is the intended place to cover how comparison are 
defined for core objects:  
https://docs.python.org/3/reference/expressions.html#value-comparisons

As suggested, I will edit the sorting howto to be cleared that locale aware 
sort ordering refers to alphabetical orderings which can vary (for example, the 
Spanish ll sorts differently in different locales).

--
assignee:  -> rhettinger
components: +Documentation -Interpreter Core

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47259] string sorting often incorrect

2022-04-08 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47259] string sorting often incorrect

2022-04-08 Thread Steven D'Aprano


Change by Steven D'Aprano :


--
nosy: +steven.daprano

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47259] string sorting often incorrect

2022-04-08 Thread Pierre Ossman

New submission from Pierre Ossman :

There is a big gotcha in Python that is easily overlooked and should at the 
very least be more prominently pointed out in the documentation.

Sorting strings will produce results that is very confusing for humans.

I happens to work for ASCII, but will generally produce bad results for other 
things as code points do not always follow the alphabetical order.

The expressions chapter¹ mentions this fact, but you have to dig quite a bit to 
reach that. It also mentions that normalization is an issue, but it never 
mentions the issue about code point order versus alphabetical order.

The sorting tutorial mentions under "Odds and ends"² that you need to use a 
special key or comparison function to get locale aware sorting. It doesn't 
mention that this also includes respecting alphabetical order, which might be 
overlooked unless you are very familiar with how the sorting works. The 
tutorial is also something you have to dig a bit to reach.

Ideally string comparison would always be locale aware in a high level language 
such as Python. However, a smaller step would be a note on sorted()³ that extra 
care needs to be taken for strings as the default behaviour will produce 
unexpected results once your strings include anything outside the English 
alphabet.

¹ https://docs.python.org/3/reference/expressions.html
² https://docs.python.org/3/howto/sorting.html#odd-and-ends
³ https://docs.python.org/3/library/functions.html#sorted

--
components: Interpreter Core
messages: 416972
nosy: CendioOssman
priority: normal
severity: normal
status: open
title: string sorting often incorrect

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com