On 2020-02-27 17:14, Serge Guelton wrote:
On Thu, Feb 27, 2020 at 10:51:39AM -0500, Charalampos Stratakis wrote:
Hello folks,

I recently observed a failure on the s390x fedora rawhide buildbot, on the 
clang builds, when clang got updated to version 10:
     https://bugs.python.org/issue39689

The call:
     struct.unpack('>?', b'\xf0')
means to unpack a "native bool", i.e. native size and alignment. Internally, 
this does:

     static PyObject *
     nu_bool(const char *p, const formatdef *f)
     {
         _Bool x;
         memcpy((char *)&x, p, sizeof x);
         return PyBool_FromLong(x != 0);
     }

i.e., copies "sizeof x" (1 byte) of memory to a temporary buffer x, and then 
treats that as _Bool.

While I don't have access to the C standard, I believe it says that assignment of a true 
value to _Bool can coerce to a unique "true" value. It seems that if a char 
doesn't have the exact bit pattern for true or false, casting to _Bool is undefined 
behavior. Is that correct?

Clang 10 on s390x seems to take advantage of this: it probably only looks at 
the last bit(s) so a _Bool with a bit pattern of 0xf0 turns out false.
But the tests assume that 0xf0 should unpack to True.

I don't think it's specific to Clang 9, or the s390x arch. Have a look to

     https://godbolt.org/z/3n-LqN

clang indeed just checks for the lowest bit. Is it correct? I think so. _Bool
can only holds two value, 0 and 1, [0] which is different from an int whose 
value is
true or false whether its different or equal to 0. GCC and Clang agree on that:

     https://godbolt.org/z/koc4Pb

So yeah, according to that rule, the value set in `p` wasn't from a _Bool if it
has the 0xf0 value. So you're re-interepreting memory between two different 
types type-punning, and that's UB.

Quick and obvious fix:

      static PyObject *
      nu_bool(const char *p, const formatdef *f)
      {
          char x;
          memcpy((char *)&x, p, sizeof x);
          return PyBool_FromLong(x != 0);
      }

(This assumes size of _Bool is the same as size of char, which I guess is also UB? But I guess we can add a build-time assertion for that, and say we don't support platforms where that's not the case.)


So thanks! I'm left with a question for CPython's struct experts, which is better kept to the bug tracker: https://bugs.python.org/issue39689#msg362815
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/364VZPYLOTVTXD6SXH4T4E36K25WM4B2/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to