[PATCH] D30773: Make git-clang-format python 3 compatible

Michał Górny via Phabricator via cfe-commits Sat, 11 Mar 2017 03:11:00 -0800

mgorny added inline comments.

================
Comment at: tools/clang-format/git-clang-format:306
+    try:
+        return to_string(bytes.decode('utf-8'))
+    except AttributeError: # 'str' object has no attribute 'decode'.
----------------
EricWF wrote:
> mgorny wrote:
> > EricWF wrote:
> > > mgorny wrote:
> > > > This logic looks really weird to me. What is the purpose of having both 
> > > > `to_string()` and `convert_string()`? Why do `to_bytes()` and 
> > > > `to_string()` use `isinstance()` to recognize types, and here you rely 
> > > > on exceptions? Why is `to_string()` called after decoding?
> > > `to_string` is called after decoding because in python2 the result of 
> > > decoding is a `unicode` type, and we need to encode it a `str` type. 
> > > Hense to_string.
> > No offense intended but this sounds really horrible. In modern Python, 
> > everything is either `bytes` or `unicode`. The difference basically is that 
> > `str` in py2 was pretty much `bytes` (except that `bytes` explicitly 
> > removes some operations that are unsuitable for bytestrings), and that 
> > `str` in py3 is equivalent to `unicode` before.
> > 
> > So if you are specifically converting to `str`, it means that you want to 
> > have two distinct types in py2/py3. Which really sounds like you're doing 
> > something wrong.
> No offence taken. I had to do way to much work to answer your questions 
> accurately which means the code is way too complicated or non-obvious.
> 
> However there are a couple of things you got wrong. In python2 everything is 
> normally `bytes`, `str`, or `unicode`. The `to_string` method converts both 
> `unicode` and `bytes` to the type `str`. 
> 
> > So if you are specifically converting to str, it means that you want to 
> > have two distinct types in py2/py3. Which really sounds like you're doing 
> > something wrong.
> 
> I don't think having the Python library return different types in py2/py3 
> means something is wrong. In fact that's exactly what's happening and  that's 
> exactly what `convert_string` is trying to fix. In python 2 `stdout, stderr, 
> and stdin` return strings, but in python 3 they return `bytes`. 
> `convert_string` is meant to transform these different types into `str`.
> 
> Regardless I'll remove the call to `to_bytes` from inside `to_string` because 
> it's too confusing.
In Python 2, `bytes` and `str` is the same type, e.g.:
```
In [13]: bytes('foo').__class__
Out[13]: str
```

In other words, in each version of Python there are two kinds of strings: 
binary strings (`std::string` in C++) and Unicode strings (alike 
`std::u32string`). In Python 2, `str` is binary string (`bytes` is also 
accepted for compatibility) and `unicode` is the Unicode string. In Python 3, 
`bytes` is binary string and `str` is Unicode string.

---

Now, let's consider subprocess streams. In Python 2 they use `str` -- which 
means binary strings. In Python 3 they use `bytes` -- i.e. also binary strings. 
So there's really no change here, except that Python 3 is more strict in 
handling different types.

So if you plan to use the data as binary data, you can just use the data 
directly. If you need to use Unicode strings, you can reliably use `.decode()` 
and `.encode()` to convert. If you really believe you need to use `str` (i.e. 
two distinct types at the same time), it just means that the code is most 
likely wrong and partially relies on characteristic of one type, and partially 
on the other.

---

For completeness, I should also mention that string types used by standard 
system streams (i.e. `sys.std*`) *have* changed between Python 2 and 3. Python 
2 is operating them in `bytes` mode by default, while Python 3 is operating 
them (transcoding) in `str` (i.e. unicode) mode. In the latter case, you can 
use the `.buffer` attribute to access the underlying `bytes` stream -- e.g. 
`sys.stdin.buffer` will yield bytes.

https://reviews.llvm.org/D30773

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D30773: Make git-clang-format python 3 compatible

Reply via email to