On Sunday, 10 August 2014 17:31:01 UTC+1, Steven D'Aprano wrote: > Devin Jeanpierre wrote: > > > > > On Fri, Aug 8, 2014 at 2:01 AM, Paul Wolf <paulwolf...@gmail.com> wrote: > > >> This is a proposal with a working implementation for a random string > > >> generation template syntax for Python. `strgen` is a module for > > >> generating random strings in Python using a regex-like template language. > > >> Example: > > >> > > >> >>> from strgen import StringGenerator as SG > > >> >>> SG("[\l\d]{8:15}&[\d]&[\p]").render() > > >> u'F0vghTjKalf4^mGLk' > > > > > > Why aren't you using regular expressions? I am all for conciseness, > > > but using an existing format is so helpful... > > > > You've just answered your own question: > > > > > Unfortunately, the equivalent regexp probably looks like > > > r'(?=.*[0-9])(?=.*[A-Z])(?=.*[a-z])[a-zA-Z0-9]{8:15}' > > > > Apart from being needlessly verbose, regex syntax is not appropriate because > > it specifies too much, specifies too little, and specifies the wrong > > things. It specifies too much: regexes like ^ and $ are meaningless in this > > case. It specifies too little: there's no regex for the "shuffle operator". > > And it specifies the wrong things: regexes like (?= ...) as used in your > > example are for matching, not generating strings, and it isn't clear > > what "match any character but don't consume any of the string" means when > > generating strings. > > > > Personally, I think even the OP's specified language is too complex. For > > example, it supports literal text, but given the use-case (password > > generators) do we really want to support templates like "password[\d]"? I > > don't think so, and if somebody did, they can trivially say "password" + > > SG('[\d]').render(). > > > > Larry Wall (the creator of Perl) has stated that one of the mistakes with > > Perl's regular expression mini-language is that the Huffman coding is > > wrong. Common things should be short, uncommon things can afford to be > > longer. Since the most common thing for password generation is to specify > > character classes, they should be short, e.g. d rather than [\d] (one > > character versus four). > > > > The template given could potentially be simplified to: > > > > "(LD){8:15}&D&P" > > > > where the round brackets () are purely used for grouping. Character codes > > are specified by a single letter. (I use uppercase to avoid the problem > > that l & 1 look very similar. YMMV.) The model here is custom format codes > > from spreadsheets, which should be comfortable to anyone who is familiar > > with Excel or OpenOffice. If you insist on having the facility to including > > literal text in your templates, might I suggest: > > > > "'password'd" # Literal string "password", followed by a single digit. > > > > but personally I believe that for the use-case given, that's a mistake. > > > > Alternatively, date/time templates use two-character codes like %Y %m etc, > > which is better than > > > > > > > > > (I've been working on this kind of thing with regexps, but it's still > > > incomplete.) > > > > > >> * Uses SystemRandom class (if available, or falls back to Random) > > > > > > This sounds cryptographically weak. Isn't the normal thing to do to > > > use a cryptographic hash function to generate a pseudorandom sequence? > > > > I don't think that using a good, but not cryptographically-strong, random > > number generator to generate passwords is a serious vulnerability. What's > > your threat model? Attacks on passwords tend to be one of a very few: > > > > - dictionary attacks (including tables of common passwords and > > simple transformations of words, e.g. 'pas5w0d'); > > > > - brute force against short and weak passwords; > > > > - attacking the hash function used to store passwords (not the password > > itself), e.g. rainbow tables; > > > > - keyloggers or some other way of stealing the password (including > > phishing sites and the ever-popular "beat them with a lead pipe > > until they give up the password"); > > > > - other social attacks, e.g. guessing that the person's password is their > > date of birth in reverse. > > > > But unless the random number generator is *ridiculously* weak ("9, 9, 9, 9, > > 9, 9, ...") I can't see any way to realistically attack the password > > generator based on the weakness of the random number generator. Perhaps I'm > > missing something? > > > > > > > Someone should write a cryptographically secure pseudorandom number > > > generator library for Python. :( > > > > Here, let me google that for you :-) > > > > https://duckduckgo.com/html/?q=python+crypto > > > > > > > > -- > > Steven
I should clarify that the use case of password generation is only one of the use cases out of several that strgen is intended to support. It is also for: Test data generation: [\l]{1:20}&[._]{0:1}@[\l]{15}.(com|net|org) email addresses that use word characters and might have a period or an underscore in the first part. Or ((john|robert|harry)|(mary|agnes|shelly)) (smith|jones|taylor) produce names with roughly equal distribution of female/male first names. I contemplated - but did not implement - a feature where you can give strgen named functions that generate the required string (using whatever selection process that implementation chooses): ($malefirstname|$femalefirstname) $lastname where def malefirstname(): # get a name from the database at random Voucher generation: [\d]{10} 10-digit voucher numbers. In none of the foregoing is security a concern, it should be noted. > Since the most common thing for password generation is to specify > character classes, they should be short, e.g. d rather than [\d] (one > character versus four). But you assume only standard character classes and not custom ones like "[aeiuy]", not to mention unicode ranges outside of the English language. > If you insist on having the facility to including literal text in your templates, I do :-), as per above. > might I suggest: "'password'd" # Literal string "password", followed by a single digit. As per above, I think the more verbose notation for character classes is necessary. Although your suggestion is not a bad one. I could have taken a route where you define the character classes with aliases and then construct a very lean template. That is effectively what the - unimplemented - function expressions do in the example above. The ability to produce weak passwords ('[abc]{3}') is something I chose not to take up in the strgen module because it should be (mostly) agnostic about what constitutes good security and to support a broader set of use cases as per above. -- https://mail.python.org/mailman/listinfo/python-list