The documentation for os.urandom says that it reads /dev/urandom, so that's probably not it. Is the Julia file layer caching too much? I imagine that this syscall is expensive on a per byte basis, unlike normal file reads where block reads are closer to constant cost.
On Sunday, October 19, 2014, Stefan Karpinski <[email protected]> wrote: > I kind of doubt that readbytes is all that slow, I wonder if Python is > doing something besides actually reading from /dev/urandom or if these are > somehow otherwise not really comparable bits of code. > > On Sun, Oct 19, 2014 at 6:57 AM, Tim Holy <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> If you profile, you'll discover that this is simply measuring the >> performance >> of `readbytes!`. Can you reduce your example to focus on this point and >> file as >> an issue? >> >> Best, >> --Tim >> >> On Saturday, October 18, 2014 01:19:50 PM alexander maznev wrote: >> > Apologies for the semi-redundant theme :) >> > >> > I noticed the Julia rand function did not seem to work with BigInt >> numbers >> > (idk if there is a different one somewhere else that does) but it's >> only a >> > few lines to add one. For example the python ecdsa library uses the >> > following custom randrange, which is nice and concise, but the >> > implementation is rather naive in that it approximates to the top byte, >> > meaning a worst case mean of 256 reads from urandom for ranges of the >> type >> > 256**k + 1. >> > >> > def randrange(order, entropy=None): >> > if entropy is None: >> > entropy = os.urandom >> > assert order > 1 >> > bytes = orderlen(order) >> > dont_try_forever = 10000 >> > while dont_try_forever > 0: >> > dont_try_forever -= 1 >> > candidate = string_to_number(entropy(bytes)) + 1 >> > if 1 <= candidate < order: >> > return candidate, (10000 - dont_try_forever) >> > continue >> > raise RuntimeError("randrange() tried hard but gave up, either >> > something" >> > " is very wrong or you got realllly unlucky. >> Order >> > was" >> > " %x" % order) >> > >> > Julia (bit-based version). >> > >> > function randrange(order) >> > upper_2 = length(base(2, order-1)); >> > upper_256 = int(upper_2/8 +.5); tries=0; >> > f= open("/dev/urandom") >> > while true >> > tries+=1 >> > ent_256 = readbytes(f, upper_256) >> > ent_2 = "" >> > for x in ent_256 >> > ent_2 = *(ent_2, base(2,x,8)) >> > end >> > rand_num = parseint(BigInt, ent_2[1:upper_2], 2) >> > if rand_num < order >> > close(f); return rand_num, tries >> > else continue; >> > end >> > end >> > end >> > >> > function timeit(n=1000) >> > t = time(); tries = 0; order = BigInt(256)^31; >> > for i in range(1,n) >> > x, tr = randrange(order) >> > tries +=tr >> > end >> > @printf("avg run. time: %llf, avg num. tries: %f \n", ((time() -t) >> / n >> > ), tries/n) >> > end >> > >> > >> > >> > Turns out the Julia randrange is slower, even though the average number >> of >> > worst case reads is only 2, at a power of 31, that will be a 32 byte >> > ent_256, so 32 string concatenations, and 32 base2 conversions, - but it >> > still comes out slower than the python randrange (even with its average >> of >> > 256 os.urandom calls). If one goes up to a bigger number, i.e. 256^751, >> the >> > smart bit collecting randrange overtakes the naive python one. On the >> other >> > hand below is a bit-based version of the python randrange. It beats the >> > Julia one at all sizes by a factor of about 25x. >> > >> > python (bit-based version, worst case average 2 reads) >> > import os >> > from numpy import base_repr as base >> > >> > def randrange(order): >> > upper_2 = (order-1).bit_length() >> > upper_256 = int(upper_2/8 + 1); tries=0 >> > while True: >> > tries +=1 >> > ent_256 = os.urandom(upper_256) >> > ent_2 = "" >> > for x in ent_256: >> > ent_2 += base(x, 2, 8)[-8:] >> > rand_num = int(ent_2[:upper_2], base=2) >> > if rand_num < order: >> > return rand_num, tries >> > else:continue >> >> >
