Re: Is this secure?

2010-03-03 Thread Michael Rudolf

Am 03.03.2010 04:51, schrieb Lie Ryan:

import itertools
def gen():
 valid_chars = 'abcdefghijklmnopqrstuvwxyz'
 for char in itertools.repeat(valid_chars):
 yield char

gen = gen()
def gen_rand_string(length):
 chars = (next(gen) for i in range(length))
 return ''.join(chars)

since it gives me a perfect distribution of letters,


It does not. Only if not (length(valid_chars) % length)

Regards,
Michael
--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-03-02 Thread Lie Ryan
On 02/25/2010 06:16 AM, mk wrote:
> On 2010-02-24 20:01, Robert Kern wrote:
>> I will repeat my advice to just use random.SystemRandom.choice() instead
>> of trying to interpret the bytes from /dev/urandom directly.
> 
> Out of curiosity:
> 
> def gen_rand_string(length):
> prng = random.SystemRandom()
> chars = []
> for i in range(length):
> chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
> return ''.join(chars)
> 
> if __name__ == "__main__":
> chardict = {}
> for i in range(1):
> ##w = gen_rand_word(10)
> w = gen_rand_string(10)
> count_chars(chardict, w)
> counts = list(chardict.items())
> counts.sort(key = operator.itemgetter(1), reverse = True)
> for char, count in counts:
> print char, count
> 
> 
> s 3966
> d 3912
> g 3909
> h 3905
> a 3901
> u 3900
> q 3891
> m 3888
> k 3884
> b 3878
> x 3875
> v 3867
> w 3864
> y 3851
> l 3825
> z 3821
> c 3819
> e 3819
> r 3816
> n 3808
> o 3797
> f 3795
> t 3784
> p 3765
> j 3730
> i 3704
> 
> Better, although still not perfect.
> 

I give you this:

I give you this:

import itertools
def gen():
valid_chars = 'abcdefghijklmnopqrstuvwxyz'
for char in itertools.repeat(valid_chars):
yield char

gen = gen()
def gen_rand_string(length):
chars = (next(gen) for i in range(length))
return ''.join(chars)

since it gives me a perfect distribution of letters, it must be a very
secure random password generation scheme.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-03-02 Thread Aahz
In article ,
Robert Kern   wrote:
>On 2010-02-28 01:28 AM, Aahz wrote:
>> In article,
>> Robert Kern  wrote:
>>>
>>> If you are storing the password instead of making your user remember
>>> it, most platforms have some kind of keychain secure password
>>> storage. I recommend reading up on the APIs available on your targeted
>>> platforms.
>>
>> Are you sure?  I haven't done a lot of research, but my impression was
>> that Windows didn't have anything built in.
>
>You're right, not built-in, but Windows does provide enough crypto
>services for a cross-platform Python implementation to be built:
>
>   http://pypi.python.org/pypi/keyring

Thanks you!  That's a big help!
-- 
Aahz (a...@pythoncraft.com)   <*> http://www.pythoncraft.com/

"Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important." --Henry Spencer
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-03-02 Thread Robert Kern

On 2010-02-28 01:28 AM, Aahz wrote:

In article,
Robert Kern  wrote:


If you are storing the password instead of making your user remember
it, most platforms have some kind of keychain secure password
storage. I recommend reading up on the APIs available on your targeted
platforms.


Are you sure?  I haven't done a lot of research, but my impression was
that Windows didn't have anything built in.


You're right, not built-in, but Windows does provide enough crypto services for 
a cross-platform Python implementation to be built:


  http://pypi.python.org/pypi/keyring

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-28 Thread Paul Rubin
a...@pythoncraft.com (Aahz) writes:
> Are you sure?  I haven't done a lot of research, but my impression was
> that Windows didn't have anything built in.

I don't know much about the windows but there is the CAPI and then
there is all the TCPA (i.e. DRM) stuff.  Maybe it can be used somehow.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-27 Thread Aahz
In article ,
Robert Kern   wrote:
>
>If you are storing the password instead of making your user remember
>it, most platforms have some kind of keychain secure password
>storage. I recommend reading up on the APIs available on your targeted
>platforms.

Are you sure?  I haven't done a lot of research, but my impression was
that Windows didn't have anything built in.
-- 
Aahz (a...@pythoncraft.com)   <*> http://www.pythoncraft.com/

"Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important." --Henry Spencer
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-26 Thread Peter Pearson
On Wed, 24 Feb 2010 21:02:07 +0100, mk  wrote:
[snip]
>
> rand_str_SystemRandom_seeding
> mean 3845.15384615 std dev 46.2016419186
> l 3926 1.75 std devs away from mean
> y 3916 1.53 std devs away from mean
> d 3909 1.38 std devs away from mean
> a 3898 1.14 std devs away from mean
> p 3898 1.14 std devs away from mean
> c 3889 0.95 std devs away from mean
> u 3884 0.84 std devs away from mean
> j 3873 0.60 std devs away from mean
> n 3873 0.60 std devs away from mean
> w 3866 0.45 std devs away from mean
> x 3863 0.39 std devs away from mean
> r 3855 0.21 std devs away from mean
> m 3852 0.15 std devs away from mean
> b 3841 -0.09 std devs away from mean
> t 3835 -0.22 std devs away from mean
> o 3829 -0.35 std devs away from mean
> k 3827 -0.39 std devs away from mean
> i 3821 -0.52 std devs away from mean
> s 3812 -0.72 std devs away from mean
> q 3806 -0.85 std devs away from mean
> v 3803 -0.91 std devs away from mean
> g 3799 -1.00 std devs away from mean
> h 3793 -1.13 std devs away from mean
> e 3782 -1.37 std devs away from mean
> f 3766 -1.71 std devs away from mean
> z 3758 -1.89 std devs away from mean

Chi2 = 14.43, 25 d.f., prob = 0.046362.
The observed distribution is SIGNIFICANTLY CLOSER
to the uniform distribution than reasonable by chance.

> rand_str_SystemRandom_noseeding
> mean 3845.15384615 std dev 55.670522726
> i 3961 2.08 std devs away from mean
> r 3911 1.18 std devs away from mean
> e 3910 1.16 std devs away from mean
> m 3905 1.08 std devs away from mean
> a 3901 1.00 std devs away from mean
> u 3893 0.86 std devs away from mean
> t 3882 0.66 std devs away from mean
> w 3872 0.48 std devs away from mean
> s 3870 0.45 std devs away from mean
> c 3868 0.41 std devs away from mean
> n 3866 0.37 std devs away from mean
> q 3865 0.36 std devs away from mean
> k 3863 0.32 std devs away from mean
> y 3848 0.05 std devs away from mean
> j 3836 -0.16 std devs away from mean
> v 3830 -0.27 std devs away from mean
> f 3829 -0.29 std devs away from mean
> z 3829 -0.29 std devs away from mean
> g 3827 -0.33 std devs away from mean
> l 3818 -0.49 std devs away from mean
> b 3803 -0.76 std devs away from mean
> d 3803 -0.76 std devs away from mean
> p 3756 -1.60 std devs away from mean
> x 3755 -1.62 std devs away from mean
> h 3744 -1.82 std devs away from mean
> o 3729 -2.09 std devs away from mean

Chi2 = 20.96, 25 d.f., prob = 0.304944.
The observed distribution is not significantly different
from the uniform distribution.

> rand_str_custom
> mean 3517.15384615 std dev 40.7541336343
> i 3586 1.69 std devs away from mean
> a 3578 1.49 std devs away from mean
> e 3575 1.42 std devs away from mean
> m 3570 1.30 std devs away from mean
> q 3562 1.10 std devs away from mean
> c 3555 0.93 std devs away from mean
> g 3552 0.86 std devs away from mean
> w 3542 0.61 std devs away from mean
> p 3536 0.46 std devs away from mean
> x 3533 0.39 std devs away from mean
> s 3528 0.27 std devs away from mean
> o 3524 0.17 std devs away from mean
> d 3516 -0.03 std devs away from mean
> t 3515 -0.05 std devs away from mean
> h 3511 -0.15 std devs away from mean
> v 3502 -0.37 std devs away from mean
> z 3502 -0.37 std devs away from mean
> b 3500 -0.42 std devs away from mean
> f 3496 -0.52 std devs away from mean
> u 3492 -0.62 std devs away from mean
> l 3486 -0.76 std devs away from mean
> r 3478 -0.96 std devs away from mean
> n 3476 -1.01 std devs away from mean
> j 3451 -1.62 std devs away from mean
> k 3450 -1.65 std devs away from mean
> y 3430 -2.14 std devs away from mean

Chi2 = 12.28, 25 d.f., prob = 0.015815.
The observed distribution is SIGNIFICANTLY CLOSER
to the uniform distribution than reasonable by chance.

> It would appear that SystemRandom().choice is indeed best (in terms of 
> how much the counts stray from mean in std devs), but only after seeding 
> it with os.urandom.

I don't see any reason to worry about any of the three, except
perhaps that the first and last are surprisingly uniform.

-- 
To email me, substitute nowhere->spamcop, invalid->net.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-26 Thread Peter Pearson
On Wed, 24 Feb 2010 20:16:24 +0100, mk  wrote:
> On 2010-02-24 20:01, Robert Kern wrote:
>> I will repeat my advice to just use random.SystemRandom.choice() instead
>> of trying to interpret the bytes from /dev/urandom directly.
>
> Out of curiosity:
>
> def gen_rand_string(length):
>  prng = random.SystemRandom()
>  chars = []
>  for i in range(length):
>  chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
>  return ''.join(chars)
>
> if __name__ == "__main__":
>  chardict = {}
>  for i in range(1):
> ##w = gen_rand_word(10)
>  w = gen_rand_string(10)
>  count_chars(chardict, w)
>  counts = list(chardict.items())
>  counts.sort(key = operator.itemgetter(1), reverse = True)
>  for char, count in counts:
>  print char, count
>
>
> s 3966
> d 3912
> g 3909
> h 3905
> a 3901
> u 3900
> q 3891
> m 3888
> k 3884
> b 3878
> x 3875
> v 3867
> w 3864
> y 3851
> l 3825
> z 3821
> c 3819
> e 3819
> r 3816
> n 3808
> o 3797
> f 3795
> t 3784
> p 3765
> j 3730
> i 3704
>
> Better, although still not perfect.

What would be perfect?  Surely one shouldn't be happy if all the
tallies come out exactly equal: that would be a blatant indication
of something very nonrandom going on.

The tallies given above give a chi-squared value smack in the
middle of the range expected for random sampling of a uniform
distribution (p = 0.505).  So the chi-squared metric of
goodness-of-fit to a unifom distribution says you're doing fine.

-- 
To email me, substitute nowhere->spamcop, invalid->net.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-25 Thread Robert Kern

On 2010-02-25 09:03 AM, mk wrote:


2. The app will have GUI and it will be locally installed; it's not
going to be web app, it will just be online in the sense of downloading
data frequently from the net.


If you are storing the password instead of making your user remember it, most 
platforms have some kind of keychain secure password storage. I recommend 
reading up on the APIs available on your targeted platforms.


--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-25 Thread mk

On 2010-02-25 02:31, Paul Rubin wrote:

It might be helpful if you could say what your application does, or
anyway give an idea of what its actual security requirements are.
Generating and emailing someone a random password is a fairly standard
method for (e.g.) web forums to verify that the person has supplied a
working email address, basically as a first level spam filter.  Your
scheme is probably ok for that.  If you're doing something with more
demanding security requirements, then as mentioned before, there is a
whole lot of stuff you have to pay attention to, and focusing narrowly
on password generation isn't that useful.


OK some data:

1. I'm going to probably use HTTPS (I meant HTTP over SSL, but wrote 
HTTP instead of being precise)


2. The app will have GUI and it will be locally installed; it's not 
going to be web app, it will just be online in the sense of downloading 
data frequently from the net.


3. I can't disclose the details on what the app will be doing, but it's 
not going to be terribly security-critical, medium-level at most - what 
happens in the app itself is not _directly_ related to money.


4. The app will be based on subscription model, costing $10-$20 per 
month. It's not really doing online banking or smth like that.


5. The worst thing that can happen when security of some account is 
compromised is that the user will have to reset the password, resending 
it to predefined email (I don't really see the way of organizing it in a 
different manner).


I'm thinking about optionally passphrase-securing the password saved in 
GUI. In that case 'diceware' approach would be helpful.


I certainly do not want to focus narrowly on password generation: the 
entire security model will have to be worked out, but the project didn't 
get to that stage yet, it's all still in planning stages. I just wanted 
to have this one part (password generation) researched before I get to 
other stages so I don't have to implement this later in haste and do 
smth wrong.


Regards,
mk

--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-25 Thread mk

On 2010-02-25 02:07, Steven D'Aprano wrote:

On Wed, 24 Feb 2010 18:23:17 +0100, mk wrote:


Anyway, the passwords for authorized users will be copied and pasted
from email into in the application GUI which will remember it for them,
so they will not have to remember and type them in.


So to break your application's security model, all somebody has to do is
use their PC and they have full access to their account?

Or get hold of the copy and paste buffer?

Or the application's config files?


Yes. There's no way around this, short of forcing them to use hardware 
key, which is an overkill for this application.



So I have little in
the way of limitations of password length - even though in *some* cases
somebody might have to (or be ignorant enough) to retype the password
instead of pasting it in.



Or your users might be sensible enough to not trust a role-your-own
security model, and prefer to memorize the password than to trust that
nobody will get access to their PC.


The app is not that critical, it's about quarterly subscription to the 
service, and the users will be able to reset the password anyway. If it 
were that critical, I'd use the hardware keys; if hardware keys are not 
used, once somebody gains an (unconstrained) access to the user's PC, 
there's not much that app developer can do. I've read somewhere a 
warning from PuTTY developer that even though the key is (normally) 
protected by the passphrase, losing even an encrypted key is quite 
likely to lead to its compromise. There's even some software for that on 
the net:


http://www.neophob.com/serendipity/index.php?/archives/127-PuTTY-Private-Key-cracker.html



The main application will access the data using HTTP (probably), so the
main point is that an attacker is not able to guess passwords using
brute force.



And why would they bother doing that when they can sniff the wire and get
the passwords in plain text? You should assume your attackers are
*smarter* than you, not trust them to be foolish.


I should have written HTTPS.

Regards,
mk

--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread Paul Rubin
mk  writes:
> Anyway, the passwords for authorized users will be copied and pasted
> from email into in the application GUI which will remember it for
> them, so they will not have to remember and type them in. 

It occurs to me that you don't even need to mess with letters in that
case:

   password = os.urandom(5).encode('hex')

will generate a string of 10 hex digits that you can give to the user.
(That is for Python 2.x but I think it might be broken in Python 3).

It might be helpful if you could say what your application does, or
anyway give an idea of what its actual security requirements are.
Generating and emailing someone a random password is a fairly standard
method for (e.g.) web forums to verify that the person has supplied a
working email address, basically as a first level spam filter.  Your
scheme is probably ok for that.  If you're doing something with more
demanding security requirements, then as mentioned before, there is a
whole lot of stuff you have to pay attention to, and focusing narrowly
on password generation isn't that useful.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread Steven D'Aprano
On Wed, 24 Feb 2010 18:23:17 +0100, mk wrote:

> Anyway, the passwords for authorized users will be copied and pasted
> from email into in the application GUI which will remember it for them,
> so they will not have to remember and type them in.

So to break your application's security model, all somebody has to do is 
use their PC and they have full access to their account?

Or get hold of the copy and paste buffer?

Or the application's config files?



> So I have little in
> the way of limitations of password length - even though in *some* cases
> somebody might have to (or be ignorant enough) to retype the password
> instead of pasting it in.

Or your users might be sensible enough to not trust a role-your-own 
security model, and prefer to memorize the password than to trust that 
nobody will get access to their PC.



> The main application will access the data using HTTP (probably), so the
> main point is that an attacker is not able to guess passwords using
> brute force.

And why would they bother doing that when they can sniff the wire and get 
the passwords in plain text? You should assume your attackers are 
*smarter* than you, not trust them to be foolish.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread Michael Rudolf

Am 24.02.2010 21:06, schrieb mk:


I just posted a comparison with calculating std deviations for various
methods - using os.urandom, SystemRandom.choice with seeding and without
seeding.


I saw them


They all seem to have slightly different distributions.


No they don't. Just run those tests again and you will see that you 
cannot put them in any order or behaviour. They are all correct now, 
except that you cannot seed SystemRandom, as it is *not* a PRNG (at 
least here, it is a wrapper for /dev/random)


Regards,
Michael
--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread Paul Rubin
mk  writes:
> def rand_str_custom(n):
> s = os.urandom(n)
> return ''.join([chr(ord('a') + ord(x) % 26) for x in s if ord(x) < 234])

Note that simply throws away some of the chars.  You have to replace
them, not throw them away.

> rand_str_SystemRandom_seeding
> mean 3845.15384615 std dev 46.2016419186
> l 3926 1.75 std devs away from mean
> y 3916 1.53 std devs away from mean
... 

What do you think you're measuring here?  Yes, if you're doing 1000's of
draws from a distribution, you'd expect a few of them to be 1.75 sigma
from the mean.  Since there are 26 letters, you'd expect a multinomial
distribution which you can test for with the multinomial test or some
approximation from the article:

   http://en.wikipedia.org/wiki/Multinomial_test

I wish I knew more statistics than I do, since there is probably some
more familiar statistical test (e.g. the T-test) that you can use as the
number of trials gets large, since each bin of the multinomial
distribution should eventually start to look like a normal distribution
due to the central limit theorem.  Others here know a lot more about
this stuff than I do, and can probably give better advice.

Anyway though, the output of os.urandom should be extremely hard to
distinguish from real randomness (that's the whole point of a
cryptographic PRNG).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread Robert Kern

On 2010-02-24 14:02 PM, mk wrote:


It would appear that SystemRandom().choice is indeed best (in terms of
how much the counts stray from mean in std devs), but only after seeding
it with os.urandom.


Calling random.seed() does not affect SystemRandom() whatsoever. You are getting 
perfectly acceptable distributions for all three variants.


--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread mk

On 2010-02-24 20:30, Michael Rudolf wrote:

The reason is 256 % 26 != 0
256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is
approx. 10) more often than w-z.


writing secure code is hard...


So true. That's why one should stick to standard libs when it comes to
crypto or security in general. It's just to easy to mess it up. Just ask
Debian about whether touching OpenSSL was a good idea ;)


That was brain-dead hiccup, for crying out loud how could they do smth 
so stupid.



def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)
> 22])


Off-by-one-error: you're skipping len(range(22))==23 hits.


Argh, it's late here.


Well, I really think that you should use repeated Random.choice on an
alphabet.
Or Random.Systemrandom.choice if you don't trust the PRNG.


I just posted a comparison with calculating std deviations for various 
methods - using os.urandom, SystemRandom.choice with seeding and without 
seeding.


They all seem to have slightly different distributions.

Regards,
mk


--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread mk

On 2010-02-24 20:19, Robert Kern wrote:

On 2010-02-24 13:09 PM, mk wrote:

On 2010-02-24 20:01, Robert Kern wrote:

I will repeat my advice to just use random.SystemRandom.choice() instead
of trying to interpret the bytes from /dev/urandom directly.


Oh I hear you -- for production use I would (will) certainly consider
this. However, now I'm interested in the problem itself: why is the damn
distribution not uniform?


You want "< 234", not "< 235". (234 % 26 == 0), so you get some extra 'a's.


Right, this explains the 'a' outlier. Fixed. But still:

import operator
import os
import random
import math

def rand_str_custom(n):
s = os.urandom(n)
return ''.join([chr(ord('a') + ord(x) % 26) for x in s if ord(x) < 
234])


def count_chars(chardict, word):
for c in word:
try:
chardict[c] += 1
except KeyError:
chardict[c] = 0

def rand_str_SystemRandom_seeding(length):
seed = os.urandom(32)
random.seed(seed)
prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
return ''.join(chars)

def rand_str_SystemRandom_noseeding(length):
prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
return ''.join(chars)

def sd(x):
sd.sum  += x
sd.sum2 += x*x
sd.n+= 1.0
sum, sum2, n = sd.sum, sd.sum2, sd.n
return math.sqrt(sum2/n - sum*sum/n/n)

def gen_rand_with_fun(fun):
print fun.__name__
chardict = {}
for i in range(1):
w = fun(10)
count_chars(chardict, w)
counts = list(chardict.items())
counts.sort(key = operator.itemgetter(1), reverse = True)
nums = [c[1] for c in counts]
sd.sum = sd.sum2 = sd.n = 0
mean = (1.0*sum(nums))/len(nums)
stddev = map(sd, nums)[-1]
print 'mean', mean, 'std dev', stddev
for char, count in counts:
print char, count, '%.2f' % ((count - mean)/stddev), 'std devs 
away from mean'


if __name__ == "__main__":
gen_rand_with_fun(rand_str_SystemRandom_seeding)
print
gen_rand_with_fun(rand_str_SystemRandom_noseeding)
print
gen_rand_with_fun(rand_str_custom)




rand_str_SystemRandom_seeding
mean 3845.15384615 std dev 46.2016419186
l 3926 1.75 std devs away from mean
y 3916 1.53 std devs away from mean
d 3909 1.38 std devs away from mean
a 3898 1.14 std devs away from mean
p 3898 1.14 std devs away from mean
c 3889 0.95 std devs away from mean
u 3884 0.84 std devs away from mean
j 3873 0.60 std devs away from mean
n 3873 0.60 std devs away from mean
w 3866 0.45 std devs away from mean
x 3863 0.39 std devs away from mean
r 3855 0.21 std devs away from mean
m 3852 0.15 std devs away from mean
b 3841 -0.09 std devs away from mean
t 3835 -0.22 std devs away from mean
o 3829 -0.35 std devs away from mean
k 3827 -0.39 std devs away from mean
i 3821 -0.52 std devs away from mean
s 3812 -0.72 std devs away from mean
q 3806 -0.85 std devs away from mean
v 3803 -0.91 std devs away from mean
g 3799 -1.00 std devs away from mean
h 3793 -1.13 std devs away from mean
e 3782 -1.37 std devs away from mean
f 3766 -1.71 std devs away from mean
z 3758 -1.89 std devs away from mean

rand_str_SystemRandom_noseeding
mean 3845.15384615 std dev 55.670522726
i 3961 2.08 std devs away from mean
r 3911 1.18 std devs away from mean
e 3910 1.16 std devs away from mean
m 3905 1.08 std devs away from mean
a 3901 1.00 std devs away from mean
u 3893 0.86 std devs away from mean
t 3882 0.66 std devs away from mean
w 3872 0.48 std devs away from mean
s 3870 0.45 std devs away from mean
c 3868 0.41 std devs away from mean
n 3866 0.37 std devs away from mean
q 3865 0.36 std devs away from mean
k 3863 0.32 std devs away from mean
y 3848 0.05 std devs away from mean
j 3836 -0.16 std devs away from mean
v 3830 -0.27 std devs away from mean
f 3829 -0.29 std devs away from mean
z 3829 -0.29 std devs away from mean
g 3827 -0.33 std devs away from mean
l 3818 -0.49 std devs away from mean
b 3803 -0.76 std devs away from mean
d 3803 -0.76 std devs away from mean
p 3756 -1.60 std devs away from mean
x 3755 -1.62 std devs away from mean
h 3744 -1.82 std devs away from mean
o 3729 -2.09 std devs away from mean

rand_str_custom
mean 3517.15384615 std dev 40.7541336343
i 3586 1.69 std devs away from mean
a 3578 1.49 std devs away from mean
e 3575 1.42 std devs away from mean
m 3570 1.30 std devs away from mean
q 3562 1.10 std devs away from mean
c 3555 0.93 std devs away from mean
g 3552 0.86 std devs away from mean
w 3542 0.61 std devs away from mean
p 3536 0.46 std devs away from mean
x 3533 0.39 std devs away from mean
s 3528 0.27 std devs away from mean
o 3524 0.17 std devs away from mean
d 3516 -0.03 std devs away from mean
t 3515 -0.05 std devs away from mean
h 3511 -0.15 std devs away from mean
v 3502 -0.37 std devs away from mean
z 3502 -0.37 std devs away from mean
b 3500 -0.42 std devs away from m

Re: Is this secure?

2010-02-24 Thread Michael Rudolf

Am 24.02.2010 19:35, schrieb mk:

On 2010-02-24 18:56, Michael Rudolf wrote:


The reason is 256 % 26 != 0
256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is
approx. 10) more often than w-z.


writing secure code is hard...


So true. That's why one should stick to standard libs when it comes to 
crypto or security in general. It's just to easy to mess it up. Just ask 
Debian about whether touching OpenSSL was a good idea ;)




You might want to skip the values 0-22
to achieve a truly uniform distribution.

Hmm perhaps you meant to skip values over 256 - 22 ?


That's the same thing as x mod y equals x+N*y mod y for every natural N.


def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)
 > 22])


Off-by-one-error: you're skipping len(range(22))==23 hits.
OK, I just see that I wrote misleading 0-22 while I meant range(22).



While with this:
def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)
< 235])


Same off-by-one.


FYI: Electronic Cash PINs in europe (dont know about the rest of the
world) were computed the same way (random hexdigit and just mod it when
it's too large) leading to a high probability that your first digit was
a 1 :)

Schadenfreude is deriving joy from others' misfortunes; what is the
German word, if any, for deriving solace from others' misfortunes? ;-)


Well - "Schadenfreude" *is* in fact a german word :)
"Schaden" is the event or result of misfortune, "Freude" is joy.


Well, I really think that you should use repeated Random.choice on an 
alphabet.

Or Random.Systemrandom.choice if you don't trust the PRNG.

Regards,
Michael
--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread Paul Rubin
mk  writes:
> So I have little in the way of limitations of password length ...>
> The main application will access the data using HTTP (probably), so
> the main point is that an attacker is not able to guess passwords
> using brute force.

If it's HTTP instead of HTTPS and you're sending the password in the
clear, then a serious attacker can simply eavesdrop the connection and
pick up the password.  Again, if the application is a web forum or
something like that, the security requirements probably aren't terribly
high.  If it's (say) a financial application with potentially motivated
attackers, you've got to be a lot more careful than I think you're being
right now, and you should really get security specialists involved.

> Using A-z with 10-char password seems to provide 3 orders of magnitude
> more combinations than a-z:

Yes, 2**10 = 1024 so (57/25)**10 is a little more than that. 

> Even then I'm not getting completely uniform distribution for some reason:

Exact equality of the counts would be surprising and a sign that
something was wrong with the generation process.  It would be like
flipping a coin 1 times and getting exactly 5000 heads.  The
binomial distribution tells you that the number should be close to 5000,
but that it's unlikely to be -exactly- 5000.

Also, as Michael Rudolf mentioned, getting a letter by taking n%26 where
n is drawn uniformly from [0..255] doesn't give a uniform distribution
because 256 is not a multiple of 26.  I had thought about making an
adjustment for that when I posted, but it didn't seem worth cluttering
up the code.  Uniformity for its own sake doesn't gain you anything;
what matters is entropy.  If you compute the entropy difference between
the slightly nonuniform distribution and a uniform one, it's very small.

To get a more uniform distribution I usually just take a larger n,
rather than conditionalizing the draws.  For example, in the
diceware-like code I posted, I read 10 random bytes (giving a uniform
random number on [0..2**80]) from urandom for each word.  That is still
not perfectly uniform, but it's closer to the point where the difference
would be very hard to detect.

> Aw shucks when will I learn to do the stuff in 3 lines well instead of
> 20, poorly. :-/

Well, that's partly a matter of practice, but I'll mention one way I
simplified the code, which was by reading more bytes from /dev/urandom
than was really necessary.  I read one byte for each random letter
(i.e. throwing away about 3 random bits for each letter) while you tried
to encode the urandom data cleverly and map 4 random bytes to 5
alphabetic letters.  /dev/urandom uses a cryptographic PRNG and it's
pretty fast, so reading a few extra bytes from it to simplify your code
doesn't really cost you anything.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread Robert Kern

On 2010-02-24 13:16 PM, mk wrote:

On 2010-02-24 20:01, Robert Kern wrote:

I will repeat my advice to just use random.SystemRandom.choice() instead
of trying to interpret the bytes from /dev/urandom directly.


Out of curiosity:

def gen_rand_string(length):
prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
return ''.join(chars)

if __name__ == "__main__":
chardict = {}
for i in range(1):
## w = gen_rand_word(10)
w = gen_rand_string(10)
count_chars(chardict, w)
counts = list(chardict.items())
counts.sort(key = operator.itemgetter(1), reverse = True)
for char, count in counts:
print char, count


s 3966
d 3912
g 3909
h 3905
a 3901
u 3900
q 3891
m 3888
k 3884
b 3878
x 3875
v 3867
w 3864
y 3851
l 3825
z 3821
c 3819
e 3819
r 3816
n 3808
o 3797
f 3795
t 3784
p 3765
j 3730
i 3704

Better, although still not perfect.


This distribution is well within expectations.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread Paul Rubin
Robert Kern  writes:
> I will repeat my advice to just use random.SystemRandom.choice()
> instead of trying to interpret the bytes from /dev/urandom directly.

SystemRandom is something pretty new so I wasn't aware of it.  But
yeah, if I were thinking more clearly I would have suggested os.urandom
instead of opening /dev/urandom.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread Robert Kern

On 2010-02-24 13:09 PM, mk wrote:

On 2010-02-24 20:01, Robert Kern wrote:

I will repeat my advice to just use random.SystemRandom.choice() instead
of trying to interpret the bytes from /dev/urandom directly.


Oh I hear you -- for production use I would (will) certainly consider
this. However, now I'm interested in the problem itself: why is the damn
distribution not uniform?


You want "< 234", not "< 235". (234 % 26 == 0), so you get some extra 'a's.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread mk

On 2010-02-24 20:01, Robert Kern wrote:

I will repeat my advice to just use random.SystemRandom.choice() instead
of trying to interpret the bytes from /dev/urandom directly.


Out of curiosity:

def gen_rand_string(length):
prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
return ''.join(chars)

if __name__ == "__main__":
chardict = {}
for i in range(1):
##w = gen_rand_word(10)
w = gen_rand_string(10)
count_chars(chardict, w)
counts = list(chardict.items())
counts.sort(key = operator.itemgetter(1), reverse = True)
for char, count in counts:
print char, count


s 3966
d 3912
g 3909
h 3905
a 3901
u 3900
q 3891
m 3888
k 3884
b 3878
x 3875
v 3867
w 3864
y 3851
l 3825
z 3821
c 3819
e 3819
r 3816
n 3808
o 3797
f 3795
t 3784
p 3765
j 3730
i 3704

Better, although still not perfect.

Regards,
mk

--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread mk

On 2010-02-24 20:01, Robert Kern wrote:

I will repeat my advice to just use random.SystemRandom.choice() instead
of trying to interpret the bytes from /dev/urandom directly.


Oh I hear you -- for production use I would (will) certainly consider 
this. However, now I'm interested in the problem itself: why is the damn 
distribution not uniform?


Regards,
mk


--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread Robert Kern

On 2010-02-24 12:35 PM, mk wrote:


While with this:

def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)
< 235])

a 3852

...


1. I'm systematically getting 'a' outlier: have no idea why for now.

2. This is somewhat better (except 'a') but still not uniform.


I will repeat my advice to just use random.SystemRandom.choice() instead of 
trying to interpret the bytes from /dev/urandom directly.


--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread mk

On 2010-02-24 18:56, Michael Rudolf wrote:


The reason is 256 % 26 != 0
256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is
approx. 10) more often than w-z.


writing secure code is hard...

I'm going to switch to PHP: Python world wouldn't lose much, but PHP 
would gain a lot.



You might want to skip the values 0-22
to achieve a truly uniform distribution.


Hmm perhaps you meant to skip values over 256 - 22 ? Bc I'm getting this 
(reduced the run to 1000 generated strings):


def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) 
if ord(x) > 22])



z 3609
b 3608
s 3567
e 3559
j 3556
r 3555
g 3548
p 3540
m 3538
q 3532
h 3528
y 3526
v 3524
i 3500
x 3496
c 3488
k 3488
l 3487
u 3487
a 3469
o 3465
d 3455
t 3439
f 3437
n 3417
w 3175

While with this:

def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) 
if ord(x) < 235])


a 3852
w 3630
s 3623
v 3582
y 3569
p 3568
c 3558
k 3558
b 3556
r 3553
x 3546
m 3534
n 3522
o 3515
h 3510
d 3505
u 3487
t 3486
i 3482
f 3477
e 3474
g 3460
q 3453
l 3437
z 3386
j 3382

1. I'm systematically getting 'a' outlier: have no idea why for now.

2. This is somewhat better (except 'a') but still not uniform.



FYI: Electronic Cash PINs in europe (dont know about the rest of the
world) were computed the same way (random hexdigit and just mod it when
it's too large) leading to a high probability that your first digit was
a 1 :)


Schadenfreude is deriving joy from others' misfortunes; what is the 
German word, if any, for deriving solace from others' misfortunes? ;-)


Regards,
mk


--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread mk

On 2010-02-24 18:59, Steve Holden wrote:


Aw shucks when will I learn to do the stuff in 3 lines well instead of
20, poorly. :-/


When you've got as much experience as Paul?


And how much experience does Paul have? (this is mostly not a facile 
question)


For my part, my more serious effort (on and off) with programming in 
Python is under a year.


Regards,
mk

--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread Michael Rudolf

Am 24.02.2010 18:23, schrieb mk:

Even then I'm not getting completely uniform distribution for some reason:
d 39411
l 39376
f 39288
a 39275
s 39225
r 39172
p 39159
t 39073
k 39071
u 39064
e 39005
o 39005
n 38995
j 38993
h 38975
q 38958
c 38938
b 38906
g 38894
i 38847
m 38819
v 38712
z 35321
y 35228
w 35189
x 35075

Code:

import operator

def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)])


The reason is 256 % 26 != 0
256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is 
approx. 10) more often than w-z. You might want to skip the values 0-22 
to achieve a truly uniform distribution.


FYI: Electronic Cash PINs in europe (dont know about the rest of the 
world) were computed the same way (random hexdigit and just mod it when 
it's too large) leading to a high probability that your first digit was 
a 1 :)


Regards,
Michael
--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread Steve Holden
mk wrote:
> On 2010-02-24 03:50, Paul Rubin wrote:
>> The stuff about converting 4 random bytes to a decimal string and then
>> peeling off 2 digits at a time is pretty awful, and notice that since
>> 2**32 is 4294967296, in the cases where you get 10 digits, the first
>> 2-digit pair is never higher than 42.
> 
> Yikes! I didn't think about that. This is probably where (some part of)
> probability skewing comes from.
> 
> Anyway, the passwords for authorized users will be copied and pasted
> from email into in the application GUI which will remember it for them,
> so they will not have to remember and type them in. So I have little in
> the way of limitations of password length - even though in *some* cases
> somebody might have to (or be ignorant enough) to retype the password
> instead of pasting it in.
> 
> In that case the "diceware" approach is not necessary, even though I
> will certainly remember this approach for a case when users will have to
> remember & type the passwords in.
> 
> The main application will access the data using HTTP (probably), so the
> main point is that an attacker is not able to guess passwords using
> brute force.
> 
> Using A-z with 10-char password seems to provide 3 orders of magnitude
> more combinations than a-z:
> 
 57 ** 10
> 36201456891249L
 25 ** 10
> 95367431640625L
> 
> Even though I'm not sure it is worth it, assuming 1000 brute-force
> guesses per second (which over the web would amount pretty much to DOS),
> this would take # days:
> 
 57 ** 10 / (1000 * 3600 * 24)
> 4190200595L
 25 ** 10 / (1000 * 3600 * 24)
> 1103789L
> 
> Even then I'm not getting completely uniform distribution for some reason:
> 
> d 39411
> l 39376
> f 39288
> a 39275
> s 39225
> r 39172
> p 39159
> t 39073
> k 39071
> u 39064
> e 39005
> o 39005
> n 38995
> j 38993
> h 38975
> q 38958
> c 38938
> b 38906
> g 38894
> i 38847
> m 38819
> v 38712
> z 35321
> y 35228
> w 35189
> x 35075
> 
> Code:
> 
> import operator
> 
> def gen_rand_word(n):
> with open('/dev/urandom') as f:
> return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)])
> 
> def count_chars(chardict, word):
> for c in word:
> try:
> chardict[c] += 1
> except KeyError:
> chardict[c] = 0
> 
> if __name__ == "__main__":
> chardict = {}
> for i in range(10):
> w = gen_rand_word(10)
> count_chars(chardict, w)
> counts = list(chardict.items())
> counts.sort(key = operator.itemgetter(1), reverse = True)
> for char, count in counts:
> print char, count
> 
>> I'd write your code something like this:
>>
>>  nletters = 5
>>
>>  def randomword(n):
>>  with open('/dev/urandom') as f:
>>  return ''.join([chr(ord('a')+ord(c)%26) for c in f.read(n)])
>>
>>  print randomword(nletters)
> 
> Aw shucks when will I learn to do the stuff in 3 lines well instead of
> 20, poorly. :-/
> 
When you've got as much experience as Paul?

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010  http://us.pycon.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread mk

On 2010-02-24 03:50, Paul Rubin wrote:

The stuff about converting 4 random bytes to a decimal string and then
peeling off 2 digits at a time is pretty awful, and notice that since
2**32 is 4294967296, in the cases where you get 10 digits, the first
2-digit pair is never higher than 42.


Yikes! I didn't think about that. This is probably where (some part of) 
probability skewing comes from.


Anyway, the passwords for authorized users will be copied and pasted 
from email into in the application GUI which will remember it for them, 
so they will not have to remember and type them in. So I have little in 
the way of limitations of password length - even though in *some* cases 
somebody might have to (or be ignorant enough) to retype the password 
instead of pasting it in.


In that case the "diceware" approach is not necessary, even though I 
will certainly remember this approach for a case when users will have to 
remember & type the passwords in.


The main application will access the data using HTTP (probably), so the 
main point is that an attacker is not able to guess passwords using 
brute force.


Using A-z with 10-char password seems to provide 3 orders of magnitude 
more combinations than a-z:


>>> 57 ** 10
36201456891249L
>>> 25 ** 10
95367431640625L

Even though I'm not sure it is worth it, assuming 1000 brute-force 
guesses per second (which over the web would amount pretty much to DOS), 
this would take # days:


>>> 57 ** 10 / (1000 * 3600 * 24)
4190200595L
>>> 25 ** 10 / (1000 * 3600 * 24)
1103789L

Even then I'm not getting completely uniform distribution for some reason:

d 39411
l 39376
f 39288
a 39275
s 39225
r 39172
p 39159
t 39073
k 39071
u 39064
e 39005
o 39005
n 38995
j 38993
h 38975
q 38958
c 38938
b 38906
g 38894
i 38847
m 38819
v 38712
z 35321
y 35228
w 35189
x 35075

Code:

import operator

def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)])

def count_chars(chardict, word):
for c in word:
try:
chardict[c] += 1
except KeyError:
chardict[c] = 0

if __name__ == "__main__":
chardict = {}
for i in range(10):
w = gen_rand_word(10)
count_chars(chardict, w)
counts = list(chardict.items())
counts.sort(key = operator.itemgetter(1), reverse = True)
for char, count in counts:
print char, count


I'd write your code something like this:

 nletters = 5

 def randomword(n):
 with open('/dev/urandom') as f:
 return ''.join([chr(ord('a')+ord(c)%26) for c in f.read(n)])

 print randomword(nletters)


Aw shucks when will I learn to do the stuff in 3 lines well instead of 
20, poorly. :-/


Regards,
mk



--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-24 Thread Paul Rubin
Steven D'Aprano  writes:
> Given a random six character password taken out of an alphabet of 52 
> characters, it takes over nine billion attempts to brute force it. 
> Reducing the alphabet by 50% cuts that down to less than 200 million. To 
> make up for that loss of 1 bit of entropy, you need two extra characters 
> in your password.

One extra character comes pretty close (within 1.3 bits).  Even two
extra chars is probably (subjective) easier for a user to deal with than
a completely random mixture of upper/lower case.  You don't get the
extra bit per character if that distribution is anything other than
random, of course.

For something like a web password (each guess takes a server hit), where
the resource guarded is not very valuable, 5 chars is probably enough
for most purposes.  For something like an encryption key subject to
offline attacks, 6 mixed-case characters will barely slow a real
attacker down.

As before, my suggestion is still diceware.  I've used random
alphanumerics in the past but they're too big a hassle, they have to be
written down, etc.  

And of course, if you're doing something serious, use a hardware token.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Steven D'Aprano
On Tue, 23 Feb 2010 18:39:53 -0800, Paul Rubin wrote:

> Steven D'Aprano  writes:
>> Paul, if you were anyone else, I'd be sneering uncontrollably about
>> now, but you're not clueless about cryptography, so what have I missed?
>> Why is reducing the number of distinct letters by more than 50%
>> anything but a disaster? This makes the task of brute-forcing the
>> password exponentially easier.
> 
> Reducing the number of distinct letters by 50% decreases the entropy per
> character by 1 bit.  

You say that as if 1 bit of entropy isn't much :)

Given a random six character password taken out of an alphabet of 52 
characters, it takes over nine billion attempts to brute force it. 
Reducing the alphabet by 50% cuts that down to less than 200 million. To 
make up for that loss of 1 bit of entropy, you need two extra characters 
in your password.


> That stuff about mixing letters and digits and
> funny symbols just makes the password a worse nuisance to remember and
> type, for a small gain in entropy that you can compute and make up for.

Well, everybody has their own ways of remembering passwords, and I'd much 
prefer to remember an eight character password with "funny symbols" that 
I chose myself, than a six character password with nothing but letters 
that was chosen for me.

Of course, I flatter myself that I know how to choose good passwords, and 
I hate remembering long random strings even from a reduced alphabet (e.g. 
I hate memorizing eight digit phone numbers, and am completely incapable 
of remembering ten digit mobile phone numbers).



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Lawrence D'Oliveiro
In message <7xwry39tpi@ruckus.brouhaha.com>, Paul Rubin wrote:

> More generally still, passwords regardless of their entropy content are
> a sucky way to encapsulate cryptographic secrets.

They’re a shared secret. How else would you represent a shared secret, if 
not with a shared secret?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Paul Rubin
Lie Ryan  writes:
> If an attacker knows the that the random number generator have an
> extreme skew and he knows the distribution of the letters, how much
> advantage would it give the attacker? My initial guess is that the more
> skewed the letters are, the better the advantage, since an attacker
> using brute-force can write his program to prefer the most likely letters?

A useable (conservative) estimate is that the attacker's workload is 1/p
where p is the probability of the most likely password.  That basically
says the password strength can be measured by the min-entropy.
Cryptographers often use that approach.  If you want to be more precise,
you can do a conditional probability calculation assuming the attacker
works down the list of possible passwords in order of decreasing
probability, stopping when they hit the right one.

More generally still, passwords regardless of their entropy content are
a sucky way to encapsulate cryptographic secrets.  We keep using them
because every alternative has drawbacks of its own.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Lie Ryan
On 02/24/10 14:09, Robert Kern wrote:
> On 2010-02-23 20:43 , Steven D'Aprano wrote:
>> On Wed, 24 Feb 2010 02:40:13 +, Steven D'Aprano wrote:
>>
>>> On Tue, 23 Feb 2010 15:36:02 +0100, mk wrote:
>>>
 The question is: is this secure? That is, can the string generated this
 way be considered truly random?
>>>
>>> Putting aside the philosophical question of what "truly random" means, I
>>> presume you mean that the letters are uniformly distributed. The answer
>>> to that is, they don't like uniformly distributed.
>>
>> Er, they don't *look* uniformly distributed.
>>
>> (Of course, being random, perhaps they are and I just got unlucky.)
> 
> You'd have to be very, *very* unlucky to get a sample of that size so
> far from uniformly distributed if the generating process actually were
> uniform.
> 
> Of course, uniformity isn't really necessary. You just need enough
> entropy in the distribution (amongst other things like protection of the
> seed from being known or guessed). A skewed distribution of characters
> is perfectly fine provided that you had enough characters in the
> password to meet the desired entropy requirement. A skewed distribution
> does require more characters to meet a specified entropy requirement
> than a uniform distribution, of course.
> 
> That said, for a naive strategy like "pick an independent random
> character, repeat", you should just use a uniform distribution. It makes
> the analysis easier. Worthwhile generators that give skewed
> distributions usually do so for a good reason, like generating
> pronounceable passwords.

If an attacker knows the that the random number generator have an
extreme skew and he knows the distribution of the letters, how much
advantage would it give the attacker? My initial guess is that the more
skewed the letters are, the better the advantage, since an attacker
using brute-force can write his program to prefer the most likely letters?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Robert Kern

On 2010-02-23 20:43 , Steven D'Aprano wrote:

On Wed, 24 Feb 2010 02:40:13 +, Steven D'Aprano wrote:


On Tue, 23 Feb 2010 15:36:02 +0100, mk wrote:


The question is: is this secure? That is, can the string generated this
way be considered truly random?


Putting aside the philosophical question of what "truly random" means, I
presume you mean that the letters are uniformly distributed. The answer
to that is, they don't like uniformly distributed.


Er, they don't *look* uniformly distributed.

(Of course, being random, perhaps they are and I just got unlucky.)


You'd have to be very, *very* unlucky to get a sample of that size so far from 
uniformly distributed if the generating process actually were uniform.


Of course, uniformity isn't really necessary. You just need enough entropy in 
the distribution (amongst other things like protection of the seed from being 
known or guessed). A skewed distribution of characters is perfectly fine 
provided that you had enough characters in the password to meet the desired 
entropy requirement. A skewed distribution does require more characters to meet 
a specified entropy requirement than a uniform distribution, of course.


That said, for a naive strategy like "pick an independent random character, 
repeat", you should just use a uniform distribution. It makes the analysis 
easier. Worthwhile generators that give skewed distributions usually do so for a 
good reason, like generating pronounceable passwords.


--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Paul Rubin
mk  writes:
>> You might look at the sitewww.diceware.comfor an approach to this,
>> which you can implement with a program.  The docs there are pretty
>> thoughtful and may help you understand the relevant issues.
>
> Thanks. But I would also be grateful for indicating what is wrong/ugly
> in my code.

The stuff about converting 4 random bytes to a decimal string and then
peeling off 2 digits at a time is pretty awful, and notice that since
2**32 is 4294967296, in the cases where you get 10 digits, the first
2-digit pair is never higher than 42.  There are also some effects on
the lower digits.  The total entropy loss probably isn't fatal but as
described, it's ugly.

I'd write your code something like this:

nletters = 5

def randomword(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a')+ord(c)%26) for c in f.read(n)])

print randomword(nletters)

I wouldn't rely on a 5 letter combination for a high security
application, but it might be ok for some low security purposes.  Two
random 5-letter combinations separated by a hyphen will be much better,
and is probably easier to type than a solid block of 10 letters.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Paul Rubin
Steven D'Aprano  writes:
> Putting aside the philosophical question of what "truly random" means, I 
> presume you mean that the letters are uniformly distributed. The answer 
> to that is, they don't like uniformly distributed.

That is a good point, the way those letters are generated (through the
decimal conversion stuff), they won't be all that uniform.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Paul Rubin
Steven D'Aprano  writes:
> Paul, if you were anyone else, I'd be sneering uncontrollably about now, 
> but you're not clueless about cryptography, so what have I missed? Why is 
> reducing the number of distinct letters by more than 50% anything but a 
> disaster? This makes the task of brute-forcing the password exponentially 
> easier.

Reducing the number of distinct letters by 50% decreases the entropy per
character by 1 bit.  That stuff about mixing letters and digits and
funny symbols just makes the password a worse nuisance to remember and
type, for a small gain in entropy that you can compute and make up for.
The main thing you have to make sure is that the min-entropy is
sufficient for your purposes, and it's generally more convenient to do
that by making the password a little bit longer than by imposing
contortions on the person typing it.  Ross Anderson's "Security
Engineering" chapter about passwords is worth reading too:

  http://www.cl.cam.ac.uk/~rja14/Papers/SE-03.pdf

When I mentioned entropy loss to the OP though, I mostly meant loss from
getting rid of the letter z.  The (binary) Shannon entropy of the
uniform probability distribution on 26 letters is 4.7004397 bits; on 25
letters, it's 4.6438561 bits.  The difference isn't enough to give an
attacker that much advantage.

I like the diceware approach to passphrase generation and I've been
using it for years.  www.diceware.com explains it in detail and the docs
there are quite well-thought-out and informative.  Keep in mind that the
entropy needed for an online password (attacker has to make a server
query for every guess, and hopefully gets locked out after n wrong
tries) and an offline one (attacker has something like a hash of the
password and can run a completely offline search) are different.

Here is a program that I use sometimes:

from math import log
dictfile = '/usr/share/dict/words'

def genrandom(nbytes):
with open('/dev/urandom') as f:
return int(f.read(nbytes).encode('hex'), 16)

def main():
wordlist = list(x.strip() for x in open(dictfile) if len(x) < 7)
nwords = len(wordlist)
print "%d words, entropy=%.3f bits/word"% (
nwords, log(nwords, 2))
print '-'.join(wordlist[genrandom(10)%nwords] for i in xrange(6))

main()
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Steven D'Aprano
On Wed, 24 Feb 2010 02:40:13 +, Steven D'Aprano wrote:

> On Tue, 23 Feb 2010 15:36:02 +0100, mk wrote:
> 
>> The question is: is this secure? That is, can the string generated this
>> way be considered truly random?
> 
> Putting aside the philosophical question of what "truly random" means, I
> presume you mean that the letters are uniformly distributed. The answer
> to that is, they don't like uniformly distributed.

Er, they don't *look* uniformly distributed.

(Of course, being random, perhaps they are and I just got unlucky.)


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Steven D'Aprano
On Tue, 23 Feb 2010 15:36:02 +0100, mk wrote:

> The question is: is this secure? That is, can the string generated this
> way be considered truly random?

Putting aside the philosophical question of what "truly random" means, I 
presume you mean that the letters are uniformly distributed. The answer 
to that is, they don't like uniformly distributed.

This isn't a sophisticated statistical test, it's the equivalent of a 
back-of-the-envelope calculation: I generated 100,000 random strings with 
your code, and counted how often each letter appears:

If the letters are uniformly distributed, you would expect all the 
numbers to be quite close, but instead they range from 15063 to 25679:

{'a': 15063, 'c': 20105, 'b': 15100, 'e': 25465, 'd': 25458, 'g': 25597, 
'f': 25589, 'i': 25045, 'h': 25679, 'k': 22945, 'j': 25531, 'm': 16187, 
'l': 16252, 'o': 16076, 'n': 16012, 'q': 16069, 'p': 16119, 's': 16088, 
'r': 16087, 'u': 15951, 't': 16081, 'w': 16236, 'v': 15893, 'y': 15834, 
'x': 15956}

Eye-balling it, it looks vaguely two-humped, one hump around 15-16K, the 
second around 22-25K. Sure enough, here's a quick-and-dirty graph:

a  | ***
b  | ***
c  | ***
d  | ***
e  | ***
f  | 
g  | 
h  | 
i  | ***
j  | 
k  | **
l  | **
m  | **
n  | *
o  | **
p  | **
q  | **
r  | **
s  | **
t  | **
u  | *
v  | *
w  | **
x  | *
y  | *


The mean of the counts is 19056.72, and the mean deviation is 3992.28. 
While none of this is statistically sophisticated, it does indicate to me 
that your function is nowhere even close to uniform. It has a very strong 
bias.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Steven D'Aprano
On Tue, 23 Feb 2010 11:19:59 -0800, Paul Rubin wrote:

> mk  writes:
>> I need to generate passwords and I think that pseudo-random generator
>> is not good enough, frankly. So I wrote this function:... The question
>> is: is this secure? That is, can the string generated this way be
>> considered truly random? (I abstract from not-quite-perfect nature of
>> /dev/urandom at the moment; I can always switch to /dev/random which is
>> better)
> 
> urandom is fine and the entropy loss from the numeric conversions and
> eliminating 'z' in that code before you get letters out is not too bad.

What?

You're going from a possible alphabet of 62 (excluding punctuation) or 94 
(inc punctuation available on an American keyboard) distinct letters down 
to 25, and you say that's "not too bad"?

Paul, if you were anyone else, I'd be sneering uncontrollably about now, 
but you're not clueless about cryptography, so what have I missed? Why is 
reducing the number of distinct letters by more than 50% anything but a 
disaster? This makes the task of brute-forcing the password exponentially 
easier.

Add the fact that the passwords are so short (as little as two characters 
in my tests) and this is about as far from secure as it is possible to be.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Steven D'Aprano
On Tue, 23 Feb 2010 15:36:02 +0100, mk wrote:

> Hello,
> 
> I need to generate passwords and I think that pseudo-random generator is
> not good enough, frankly. So I wrote this function:
[snip]
> (yes I know that this way generated string will not contain 'z' because
> 99/4 + 97 = 121 which is 'y')

You're worried about the security of the PRNG but then generate a TWO to 
FIVE character lowercase password with no digits, punctuation or the 
letter 'z'? That's priceless!

Python's PRNG is not suitable for producing cryptographically strong 
streams of random bytes, but it is perfectly strong enough for generating 
good passwords.



> The question is: is this secure? 

No. 

You are wasting your time trying to fix something which isn't a problem, 
and introducing a much bigger problem instead. You are MUCH MUCH MUCH 
better off with a six or ten character password taken from upper and 
lowercase letters, plus digits, plus punctuation, than a four digit 
password taken from lowercase letters only. Even if the first case has 
some subtle statistical deviation from uniformity, and the second is 
"truly random" (whatever that means), it doesn't matter.

Nobody is going to crack your password because the password generator is 
0.01% more likely to generate a "G" than a "q". But they *will* brute-
force your password if you have a four digit password taken from a-y only.



> That is, can the string generated this
> way be considered truly random? 

Define truly random.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Lawrence D'Oliveiro
In message , mk wrote:

> I need to generate passwords and I think that pseudo-random generator is
> not good enough, frankly. So I wrote this function:

Much simpler:

import subprocess

data, _ = subprocess.Popen \
  (
args = ("pwgen", "-nc"),
stdout = subprocess.PIPE
  ).communicate()
print data

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Robert Kern

On 2010-02-23 13:59 PM, mk wrote:

On Feb 23, 7:19 pm, Paul Rubin  wrote:


The code is pretty ugly.  The main problem is you end up with a password
that's usually 5 letters but sometimes just 4 or fewer.


Well I didn't write the whole thing here, in actual use I'd write a
loop repeating the function until I have enough characters and then
I'd select a substring of specified length.

Anything else in the code that is ugly and I should correct?


I would recommend using random.SystemRandom.choice() on a sequence of acceptable 
characters. E.g. (untested)


import random
import string


characters = string.letters + string.digits + '~...@#$%^&*()-+=,;./\?><|'
# ... or whatever.

def gen_rand_string(length):
prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice(characters))
return ''.join(chars)

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Robert Kern

On 2010-02-23 13:19 PM, Paul Rubin wrote:


I find it's most practical to use a few random words (chosen from a word
list like /usr/dict/words) rather than random letters.  Words are easier
to remember and type.

You might look at the site www.diceware.com for an approach to this,
which you can implement with a program.  The docs there are pretty
thoughtful and may help you understand the relevant issues.


I like RFC 1751 for this:

http://gitweb.pycrypto.org/?p=crypto/pycrypto-2.x.git;a=blob;f=lib/Crypto/Util/RFC1751.py;h=1c98a212c22066adabfee521b495eeb4f9d7232b;hb=HEAD

Shortened URL:

http://tr.im/Pv9B

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread mk
On Feb 23, 7:19 pm, Paul Rubin  wrote:

> The code is pretty ugly.  The main problem is you end up with a password
> that's usually 5 letters but sometimes just 4 or fewer.  

Well I didn't write the whole thing here, in actual use I'd write a
loop repeating the function until I have enough characters and then
I'd select a substring of specified length.

Anything else in the code that is ugly and I should correct?

> You might look at the sitewww.diceware.comfor an approach to this,
> which you can implement with a program.  The docs there are pretty
> thoughtful and may help you understand the relevant issues.

Thanks. But I would also be grateful for indicating what is wrong/ugly
in my code.

Regards,
mk


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is this secure?

2010-02-23 Thread Paul Rubin
mk  writes:
> I need to generate passwords and I think that pseudo-random generator
> is not good enough, frankly. So I wrote this function:...
> The question is: is this secure? That is, can the string generated
> this way be considered truly random? (I abstract from
> not-quite-perfect nature of /dev/urandom at the moment; I can always
> switch to /dev/random which is better)

urandom is fine and the entropy loss from the numeric conversions and
eliminating 'z' in that code before you get letters out is not too bad.
The code is pretty ugly.  The main problem is you end up with a password
that's usually 5 letters but sometimes just 4 or fewer.  Passwords that
short are vulnerable to dictionary attacks.  Longer passwords made from
random letters are difficult to remember.

I find it's most practical to use a few random words (chosen from a word
list like /usr/dict/words) rather than random letters.  Words are easier
to remember and type.

You might look at the site www.diceware.com for an approach to this,
which you can implement with a program.  The docs there are pretty
thoughtful and may help you understand the relevant issues.
-- 
http://mail.python.org/mailman/listinfo/python-list