David Malcolm wrote:
> I'm thinking of making this downstream change to Fedora's site.py (and
> possibly in future RHEL releases) so that the default encoding
> automatically picks up the encoding from the locale:
> 
>  def setencoding():
>      """Set the string encoding used by the Unicode implementation.  The
>      default is 'ascii', but if you're willing to experiment, you can
>      change this."""
>      encoding = "ascii" # Default value set by _PyUnicode_Init()
> -    if 0:
> +    if 1:
>          # Enable to support locale aware default string encodings.
>          import locale
>          loc = locale.getdefaultlocale()
>          if loc[1]:
>              encoding = loc[1]
>      if 0:
>          # Enable to switch off string to Unicode coercion and implicit
>          # Unicode to string conversion.
>          encoding = "undefined"
>      if encoding != "ascii":
>          # On Non-Unicode builds this will raise an AttributeError...
>          sys.setdefaultencoding(encoding) # Needs Python Unicode build !
> 
> I've written up extensive notes on the change and the history of the
> issue here:
> https://fedoraproject.org/wiki/Features/PythonEncodingUsesSystemLocale
> 
> Please let me know if there are any errors on that page!
> 
> The aim is to avoid strange behavior changes when running a script
> within a shell pipeline/cronjob as opposed to at a tty (and to capture
> some of the bizarre cornercases, for example, I found the behavior of
> the pango/pygtk modules particularly surprising).
> 
> I mention it here as a "heads-up" about the change:
>   - in case other distributions may want to do the same (or already do
> so, though in my very brief survey no-one else seemed to), and
>   - in case doing so breaks things in a way I'm not expecting; can
> anyone see any flaws in my arguments?
>   - in case other people find my notes on the issue useful
> 
> Hope this is helpful; can anyone see any potential problems with this
> change?

Yes: such a change is unsupported by Python. The code you are
changing should really have been removed many releases ago -
it was originally only intended to serve as basis for experimentation
on choosing the "right" default encoding.

The only supported default encodings in Python are:

 Python 2.x: ASCII
 Python 3.x: UTF-8

If you change these, you are on your own and strange things will
start to happen. The default encoding does not only affect
the translation between Python and the outside world, but also
all internal conversions between 8-bit strings and Unicode.

Hacks like what's happening in the pango module (setting the
default encoding to 'utf-8' by reloading the site module in
order to get the sys.setdefaultencoding() API back) are just
downright wrong and will cause serious problems since Unicode
objects cache their default encoded representation.

Please don't enable the use of a locale based default encoding.

If all you want to achieve is getting the encodings of
stdout and stdin correctly setup for pipes, you should
instead change the .encoding attribute of those (only).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 20 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to