[issue19251] bitwise ops for bytes of equal length

2016-04-26 Thread cowlicks

cowlicks added the comment:

I'll look through the list @serhiy.storchaka posted and make sure this still 
seems sane to me.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19251>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19251] bitwise ops for bytes of equal length

2016-04-26 Thread cowlicks

cowlicks added the comment:

@gvanrossum in this previous comment 
https://bugs.python.org/issue19251?@ok_message=msg%20264184%20created%0Aissue%2019251%20message_count%2C%20messages%20edited%20ok&@template=item#msg257964

I pointed out code from the wild which would be more readable, and posted 
preliminary benchmarks. But there is a typo, I should have written:

def __mix_single_column(self, a):
t = len(a) * bytes([reduce(xor, a)])
a ^= t ^ xtime(a ^ (a[1:] + a[0:1]))


As @gregory.p.smith points out, my claim about security isn't very clear. This 
would be "more secure" for two reasons. Code would be easier to read and 
therefore verify, but this is the same as readability. The other reason, doing 
some binary bitwise op on two bytes objects enforces that the objects be the 
same length, so unexpected bugs in these code samples would be avoided.

bytes(x ^ y for x, y in zip(a, b))

(int.from_bytes(a, 'big') ^ int.from_bytes(b, 'big')).to_bytes(len(a), 'big')

# XOR each byte of the roundKey with the state table
def addRoundKey(state, roundKey):
for i in range(len(state)):
state[i] = state[i] ^ roundKey[i]

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19251>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19251] bitwise ops for bytes of equal length

2016-04-25 Thread cowlicks

cowlicks added the comment:

To reiterate, this issue would make more readable, secure, and speed up a lot 
of code.

The concerns about this being a numpy-like vector operation are confusing to 
me. The current implementation is already vector-like, but lacks size checking. 
Isn't "int ^ int" really just the xor of two arbitrarily long arrays of binary 
data? At least with "bytes ^ bytes" we can enforce the arrays be the same size.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19251>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19251] bitwise ops for bytes of equal length

2016-01-11 Thread cowlicks

cowlicks added the comment:

@Andrew Barnert
> Maybe if you're coming to Python from...
I'm not sure if your trying argue that my expectations are unusual? Python is 
my first programming language. To reiterate: I expected cpython to support 
bitwise operations on binary data. I don't think that is so strange.

No I have not looked at PyPi. What I did was have an idea to do this, and there 
happened to be an open bug on it that needed a patch. So I wrote one.

And yes, I realize NumPy can do this, but it is still a very large dependency.

Anyway, here are some random projects which would look a lot nicer with this:

An implementation of the blake2 hash function in pure python. Consider this 
line:
https://github.com/buggywhip/blake2_py/blob/master/blake2.py#L234

self.h = [self.h[i] ^ v[i] ^ v[i+8] for i in range(8)]

Which would become something like:

self.h ^= v[:8] ^ v[8:]

Which is much easier to read and much faster.

Or consider this function from this aes implementation:
https://github.com/bozhu/AES-Python/blob/master/aes.py#L194-L201

def __mix_single_column(self, a):
# please see Sec 4.1.2 in The Design of Rijndael
t = a[0] ^ a[1] ^ a[2] ^ a[3]
u = a[0]
a[0] ^= t ^ xtime(a[0] ^ a[1])
a[1] ^= t ^ xtime(a[1] ^ a[2])
a[2] ^= t ^ xtime(a[2] ^ a[3])
a[3] ^= t ^ xtime(a[3] ^ u)

This would become something like:

def __mix_single_column(self, a):
a ^= a ^ xtime(a ^ (a[1:] + a[0:1]))

Clearer and faster. 

Another piece of code this would improve:
https://github.com/mgoffin/keccak-python/blob/master/Keccak.py#L196-L209

These were easy to find so I'm sure there are more. I think these demonstrate 
that despite what people *should* be doing, they are doing things in a way that 
could be substantially improved with this patch.

This does resemble NumPy's vectorized functions, but it is much more limited in 
scope since there is no broadcasting involved.

Here is a quick benchmark:

$ ./python -m timeit -n 10 -s "a=b'a'*64; b=b'b'*64" "(int.from_bytes(a, 
'little') ^ int.from_bytes(b, 'little')).to_bytes(64, 'little')"
10 loops, best of 3: 0.942 usec per loop

$ ./python -m timeit -n 10 -s "a=b'a'*64; b=b'b'*64" "a ^ b"
10 loops, best of 3: 0.041 usec per loop

NumPy is the slowest but I'm probably doing something wrong, and its in ipython 
so I'm not timing the import:

In [13]: %timeit bytes(np.frombuffer(b'b'*64, dtype=np.int8) ^ 
np.frombuffer(b'a'*64, dtype=np.int8))
10 loops, best of 3: 3.69 µs per loop

About 20 times faster,

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19251>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19251] bitwise ops for bytes of equal length

2016-01-10 Thread cowlicks

cowlicks added the comment:

I've attached a diff that adds ^, |, and & to bytes and bytearray object on the 
master branch (err the analogous hg thing).

It also includes a test file which definitely is in the wrong place, but 
demonstrates what is working.

Personally this came up while I was playing with toy crypto problems. I 
expected to already be part of the language, but it wasn't. I think this is a 
natural expectation. I don't think it is obvious to newer python users that 
they need to cast bytes to ints to do bitwise operations on them.

And I do not understand how bitwise operations work on arbitrary precision 
integers. So perhaps it is not as simple of a concept as "bytes xor bytes".

Some folks have suggested using NumPy, but that is a very heavy dependency, and 
not useful outside of cpython.

I searched for code this would clean up in the wild. It was easy to find.

This would also catch bugs when bytes objects are different lengths, but there 
is no check. Like this code I found 

# XOR each byte of the roundKey with the state table
def addRoundKey(state, roundKey):
for i in range(len(state)):
state[i] = state[i] ^ roundKey[i]

p.s. this is my first cpython issue/patch please let me know if I'm doing 
something wrong.

--
keywords: +patch
nosy: +cowlicks
Added file: http://bugs.python.org/file41569/bitwise_bytes.diff

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19251>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com