STINNER Victor <victor.stin...@haypocalc.com> added the comment:

About pwd, we have 7 fields:
 - username: the regex looks like « [a-za-z0-...@]
[a-za-z0-...@\/]*$? », so it's ASCII only
 - password: ASCII only? on my Ubuntu, /etc/passwd uses "x" for all 
passwords, and /etc/shadow uses MD5 hash with a like 
like "$1$x6vJEXyc$" (MD5 marker + salt)
 - user identifier: integer (ASCII)
 - main group identifier: integer (ASCII)
 - GECOS: user text
 - shell: filename
 - home directory: filename

We can expect GECOS and filenames to be encoded in the "default system 
locale" (eg. latin-1 or UTF-8). An user is allowed to change its GECOS 
field. If the user account use a different locale and set a non-ASCII 
GECOS, decoding the string (to unicode) will fail.

Your patch latin1.diff is wrong: the charset is not always latin-1 or 
always utf-8: it depends on the system default charset. You should use 
sys.getfilesystemencoding() or locale.getpreferredencoding() to get 
the right encoding. If you used latin-1 as automagic charset to get 
text as bytes, it's not the good solution: use the bytes type to get 
real bytes (as you implemented with your get*b() functions).

The situation is similar to the bytes/unicode filename debate (see 
issue #3187). I think that we can consider that a system correctly 
configured will use the same locale for all users accounts => use 
unicode. But for compatibility with old systems mixing different 
locales / or new system with locale problems => use bytes.

The default should be unicode, but we need to be able get all fields 
as bytes. Example:
  pwd.getpwnam(str) -> str fields (and integers for uid/gid)
  pwd.getpwnamb(bytes) -> bytes fields (and integers for uid/gid)

We have already bytes/unicode functions using the "b" suffix: 
os.getpwd()->str and os.getpwdb()->bytes.

Note: The GECOS field problem was already reported in issue #3023 (by 
baikie).

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue4859>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to