[issue19251] bitwise ops for bytes of equal length
cowlicks added the comment: I'll look through the list @serhiy.storchaka posted and make sure this still seems sane to me. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19251> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19251] bitwise ops for bytes of equal length
cowlicks added the comment: @gvanrossum in this previous comment https://bugs.python.org/issue19251?@ok_message=msg%20264184%20created%0Aissue%2019251%20message_count%2C%20messages%20edited%20ok&@template=item#msg257964 I pointed out code from the wild which would be more readable, and posted preliminary benchmarks. But there is a typo, I should have written: def __mix_single_column(self, a): t = len(a) * bytes([reduce(xor, a)]) a ^= t ^ xtime(a ^ (a[1:] + a[0:1])) As @gregory.p.smith points out, my claim about security isn't very clear. This would be "more secure" for two reasons. Code would be easier to read and therefore verify, but this is the same as readability. The other reason, doing some binary bitwise op on two bytes objects enforces that the objects be the same length, so unexpected bugs in these code samples would be avoided. bytes(x ^ y for x, y in zip(a, b)) (int.from_bytes(a, 'big') ^ int.from_bytes(b, 'big')).to_bytes(len(a), 'big') # XOR each byte of the roundKey with the state table def addRoundKey(state, roundKey): for i in range(len(state)): state[i] = state[i] ^ roundKey[i] -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19251> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19251] bitwise ops for bytes of equal length
cowlicks added the comment: To reiterate, this issue would make more readable, secure, and speed up a lot of code. The concerns about this being a numpy-like vector operation are confusing to me. The current implementation is already vector-like, but lacks size checking. Isn't "int ^ int" really just the xor of two arbitrarily long arrays of binary data? At least with "bytes ^ bytes" we can enforce the arrays be the same size. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19251> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19251] bitwise ops for bytes of equal length
cowlicks added the comment: @Andrew Barnert > Maybe if you're coming to Python from... I'm not sure if your trying argue that my expectations are unusual? Python is my first programming language. To reiterate: I expected cpython to support bitwise operations on binary data. I don't think that is so strange. No I have not looked at PyPi. What I did was have an idea to do this, and there happened to be an open bug on it that needed a patch. So I wrote one. And yes, I realize NumPy can do this, but it is still a very large dependency. Anyway, here are some random projects which would look a lot nicer with this: An implementation of the blake2 hash function in pure python. Consider this line: https://github.com/buggywhip/blake2_py/blob/master/blake2.py#L234 self.h = [self.h[i] ^ v[i] ^ v[i+8] for i in range(8)] Which would become something like: self.h ^= v[:8] ^ v[8:] Which is much easier to read and much faster. Or consider this function from this aes implementation: https://github.com/bozhu/AES-Python/blob/master/aes.py#L194-L201 def __mix_single_column(self, a): # please see Sec 4.1.2 in The Design of Rijndael t = a[0] ^ a[1] ^ a[2] ^ a[3] u = a[0] a[0] ^= t ^ xtime(a[0] ^ a[1]) a[1] ^= t ^ xtime(a[1] ^ a[2]) a[2] ^= t ^ xtime(a[2] ^ a[3]) a[3] ^= t ^ xtime(a[3] ^ u) This would become something like: def __mix_single_column(self, a): a ^= a ^ xtime(a ^ (a[1:] + a[0:1])) Clearer and faster. Another piece of code this would improve: https://github.com/mgoffin/keccak-python/blob/master/Keccak.py#L196-L209 These were easy to find so I'm sure there are more. I think these demonstrate that despite what people *should* be doing, they are doing things in a way that could be substantially improved with this patch. This does resemble NumPy's vectorized functions, but it is much more limited in scope since there is no broadcasting involved. Here is a quick benchmark: $ ./python -m timeit -n 10 -s "a=b'a'*64; b=b'b'*64" "(int.from_bytes(a, 'little') ^ int.from_bytes(b, 'little')).to_bytes(64, 'little')" 10 loops, best of 3: 0.942 usec per loop $ ./python -m timeit -n 10 -s "a=b'a'*64; b=b'b'*64" "a ^ b" 10 loops, best of 3: 0.041 usec per loop NumPy is the slowest but I'm probably doing something wrong, and its in ipython so I'm not timing the import: In [13]: %timeit bytes(np.frombuffer(b'b'*64, dtype=np.int8) ^ np.frombuffer(b'a'*64, dtype=np.int8)) 10 loops, best of 3: 3.69 µs per loop About 20 times faster, -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19251> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19251] bitwise ops for bytes of equal length
cowlicks added the comment: I've attached a diff that adds ^, |, and & to bytes and bytearray object on the master branch (err the analogous hg thing). It also includes a test file which definitely is in the wrong place, but demonstrates what is working. Personally this came up while I was playing with toy crypto problems. I expected to already be part of the language, but it wasn't. I think this is a natural expectation. I don't think it is obvious to newer python users that they need to cast bytes to ints to do bitwise operations on them. And I do not understand how bitwise operations work on arbitrary precision integers. So perhaps it is not as simple of a concept as "bytes xor bytes". Some folks have suggested using NumPy, but that is a very heavy dependency, and not useful outside of cpython. I searched for code this would clean up in the wild. It was easy to find. This would also catch bugs when bytes objects are different lengths, but there is no check. Like this code I found # XOR each byte of the roundKey with the state table def addRoundKey(state, roundKey): for i in range(len(state)): state[i] = state[i] ^ roundKey[i] p.s. this is my first cpython issue/patch please let me know if I'm doing something wrong. -- keywords: +patch nosy: +cowlicks Added file: http://bugs.python.org/file41569/bitwise_bytes.diff ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19251> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com