New submission from STINNER Victor <victor.stin...@haypocalc.com>:

To factorize the code and to fix encoding issues in the time module, I added 
functions to decode/encode from/to the locale encoding: 
PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize() and 
PyUnicode_EncodeLocale() (issue #13560). During tests, I realized that 
os.strerror() should also use the current locale encoding.

Do you think that the codec should be exposed in Python?

--

The C functions are used by:

 * the locale module to decode result of locale functions
 * Py_Main() to decode the PYTHONWARNING environment variable 
(PyUnicode_DecodeFSDefault can be used here, but PyUnicode_DecodeFSDefault 
would just call PyUnicode_DecodeLocale because the Python codec is not loaded 
yet, a funny bootstrap issue)
 * PyUnicode_EncodeFSDefault() and PyUnicode_DecodeFSDefault[AndSize]() before 
the locale encoding is known and the Python codec is fully ready
 * os.strerror() and PyErr_SetFromErrno*() to decode the error message
 * time.strftime() to encode the format and decode the result if the wcsftime() 
function is not available and on Windows. On Windows, wcsftime() is available 
but avoided to workaround an encoding issue in the timezone (see the issue 
#10653)
 * time to decode time.tzname

The codec can be useful for developers interacting with C functions depending 
on the locale. Examples: strerror(), strftime(), ... Use the filesystem 
encoding would be wrong for such function because the locale encoding can be 
changed by setlocale() with LC_CTYPE or LC_ALL. Use the filesystem encoding 
would lead to mojibake.

Even if the most common usecases of C functions depending on the locale are 
already covered by the Python standard library, developers may want to bind new 
functions using ctypes (or something else), and I believe that the locale 
encoding would be useful for these bindings.

--

The problem with a new codec is that it becomes more difficult to choose the 
right encoding:

 * filesystem encoding: filenames, directory names, hostname, environment 
variables, command line arguments
 * mbcs (ANSI code page): (basically, it is just an alias of the filesystem 
encoding)
 * locale: write bindings for new C functions?

I suppose that this issue can be solve by writing documentation explaining the 
usage of each codec.

--

Attached patch adds the new locale codec.

The major limitation of the current implementation is that the codec only 
supports the strict and the surrogateescape error handlers. I don't plan to 
implement other error handlers because I don't think that they would be useful, 
but it would be possible to implement them.

--

I would be "nice" to fix os.strerror() and time.strftime() in Python 3.2, but I 
don't want to fix them because it would require to add the locale codec and I 
don't want to do such change in a stable version. The issue only concerns few 
people changing their locale encoding at runtime. I hope that everybody uses 
UTF-8 and never change their locale encoding to something else ;-)

----------
components: Library (Lib)
messages: 149660
nosy: haypo, loewis
priority: normal
severity: normal
status: open
title: Add a new codec: "locale", the current locale encoding
versions: Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13619>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to