Re: Beginner python 3 unicode question

Laszlo Nagy Sat, 16 Nov 2013 13:21:53 -0800

Why it is behaving differently on the command line? What should I doto fix this?

I was experimenting with this a bit more and found some more confusingthings. Can somebody please enlight me?


Here is a test function:


    def password_hash(self,password):
        public = bytearray([random.randint(0,255) for _ in range(5)])
        private = bytearray([random.randint(0,255)])
        pwd = bytearray(password.encode())
        digest = hashlib.sha1(public+pwd+private).digest()
        print("digest",digest,type(digest))
        print("de",digest.encode())
        # and some more stuff here...

This function was called inside a script, and gave me this:

('digest', '\xa0\x98\x8b\xff\x04\xf9V;\xbd\x1eIHzh\x10-\xc5!\x14\x1b',<type 'str'>)

Traceback (most recent call last):

File "/home/gandalf/Python/Lib/shopzeus/scripts/yaaf_pwmgr.py", line478, in <module>

    pwmgr.run(parser,args)

File "/home/gandalf/Python/Lib/shopzeus/scripts/yaaf_pwmgr.py", line241, in run

    self.authdb.user_create(name,password,propvalues)

File "/home/gandalf/Python/Lib/shopzeus/yaaf/db/authdb.py", line 205,in user_create

    "password":(password and Binary(self.password_hash(password))) or None,

File "/home/gandalf/Python/Lib/shopzeus/yaaf/db/authdb.py", line 134,in password_hash

    print("de",digest.encode())

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0:ordinal not in range(128)


Then I have tried the very same thing from the interactive shell:

gandalf@gandalf-HP-G62-Notebook-PC:~/Python/Projects/appserver$ python3
Python 3.3.1 (default, Sep 25 2013, 19:29:01)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> digest = '\xa0\x98\x8b\xff\x04\xf9V;\xbd\x1eIHzh\x10-\xc5!\x14\x1b'
>>> digest.encode()
b'\xc2\xa0\xc2\x98\xc2\x8b\xc3\xbf\x04\xc3\xb9V;\xc2\xbd\x1eIHzh\x10-\xc3\x85!\x14\x1b'
>>>

WHAT??? Seems like the default value of the encoding parameter of thestr.encode method is different if I start it interactively. But thiscontradicts its documentation:


>>> print(digest.encode.__doc__)
S.encode(encoding='utf-8', errors='strict') -> bytes

Encode S using the codec registered for encoding. Default encoding
is 'utf-8'. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that can handle UnicodeEncodeErrors.

So is the default utf-8 or not? Should the documentation be updated? Ordo we have a bug in the interactive shell?




--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

--
https://mail.python.org/mailman/listinfo/python-list

Re: Beginner python 3 unicode question

Reply via email to