[issue34437] print statement using \x results in improper and extra bytes

2018-08-19 Thread Steven D'Aprano


Steven D'Aprano  added the comment:

You wrote:

> There are 6 bytes not 4 and where did the c3, bd, and c2 come from?

In Python 2, strings are byte strings, in Python 3, strings by default are 
Unicode text strings. You are seeing the UTF-8 representation of the text 
string.

py> "\xfd\x84\x04\x08".encode('utf-8')
b'\xc3\xbd\xc2\x84\x04\x08'

So the behaviour in Python 3 is correct and not a bug, it has just changed 
(intentionally) from Python 2.

Googling may help you find more about this:

https://duckduckgo.com/?q=python3+write+bytes+to+stdout

--
nosy: +steven.daprano
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34437] print statement using \x results in improper and extra bytes

2018-08-19 Thread Nathan Benson

New submission from Nathan Benson :

While writing some shellcode I uncovered an unusual bug where Python 3 seems to 
print out incorrect (and extra) hex bytes using the print statement with \x. 
Needless to say I was pulling my hair out trying to figure out why my shellcode 
wasn’t working.  Python 2 behaves as expected.

I haven't tested the latest version of Python 3, but all the versions prior to 
that seem to have the bug.  I’ve been able to reproduce the bug in Ubuntu Linux 
and on my Mac.

An example printing "\xfd\x84\x04\x08" I expect to get back "fd 84 04 08", but 
Python 3 seems to add bytes beginning with c2 and c3 and tosses in random bytes.

For the purpose of these demonstrations:

  Akame:~ jfa$ python2 --version
  Python 2.7.15

  Akame:~ jfa$ python3 --version
  Python 3.7.0


Here is Python 2 operating as expected:

Akame:~ jfa$ python2 -c 'print("\xfd\x84\x04\x08")' | hexdump -C
  fd 84 04 08 0a|.|
0005


Here is Python 3 with the exact same print statement:

Akame:~ jfa$ python3 -c 'print("\xfd\x84\x04\x08")' | hexdump -C
  c3 bd c2 84 04 08 0a  |...|
0007

There are 6 bytes not 4 and where did the c3, bd, and c2 come from?

Playing around with it a little bit more it seems like the problem arises when 
you are printing bytes that start with a-f or 8 or 9:

Here is a-f:

Akame:~ jfa$ for b in {a..f}; do echo "\x${b}0"; python3 -c 
"print(\"\x${b}0\")" | hexdump -C; done
\xa0
  c2 a0 0a  |...|
0003
\xb0
  c2 b0 0a  |...|
0003
\xc0
  c3 80 0a  |...|
0003
\xd0
  c3 90 0a  |...|
0003
\xe0
  c3 a0 0a  |...|
0003
\xf0
  c3 b0 0a  |...|
0003


Here is 0-9 (notice everything is fine until 8):

Akame:~ jfa$ for b in {0..9}; do echo "\x${b}0"; python3 -c 
"print(\"\x${b}0\")" | hexdump -C; done
\x00
  00 0a |..|
0002
\x10
  10 0a |..|
0002
\x20
  20 0a | .|
0002
\x30
  30 0a |0.|
0002
\x40
  40 0a |@.|
0002
\x50
  50 0a |P.|
0002
\x60
  60 0a |`.|
0002
\x70
  70 0a |p.|
0002
\x80
  c2 80 0a  |...|
0003
\x90
  c2 90 0a  |...|
0003



Here are the same tests with Python 2:

Akame:~ jfa$ for b in {a..f}; do echo "\x${b}0"; python2 -c 
"print(\"\x${b}0\")" | hexdump -C; done
\xa0
  a0 0a |..|
0002
\xb0
  b0 0a |..|
0002
\xc0
  c0 0a |..|
0002
\xd0
  d0 0a |..|
0002
\xe0
  e0 0a |..|
0002
\xf0
  f0 0a |..|
0002


Akame:~ jfa$ for b in {0..9}; do echo "\x${b}0"; python2 -c 
"print(\"\x${b}0\")" | hexdump -C; done
\x00
  00 0a |..|
0002
\x10
  10 0a |..|
0002
\x20
  20 0a | .|
0002
\x30
  30 0a |0.|
0002
\x40
  40 0a |@.|
0002
\x50
  50 0a |P.|
0002
\x60
  60 0a |`.|
0002
\x70
  70 0a |p.|
0002
\x80
  80 0a |..|
0002
\x90
  90 0a |..|
0002


As you can see Python 2 works as expected and Python 3, when printing using 
\x[a-f08], seem to cause the byte to be replaced with a c2 or c3 and another 
byte of data.

--
messages: 323773
nosy: Nathan Benson
priority: normal
severity: normal
status: open
title: print statement using \x results in improper and extra bytes
type: behavior
versions: Python 3.4, Python 3.5, Python 3.6, Python 3.7

___
Python tracker 

___