2012/7/16 Ángel González <keis...@gmail.com>: >> 1a) If you want to support character classes, you can do it with pcre: >> http://www.php.net/manual/en/regexp.reference.character-classes.php
> That's more or less what I have thought. > If it's a string surrounded by square brackets, it's a character class, > else > treat as a literal list of characters. > ] and - can be provided with the old trick of provide "] as first > character", > "make - the first or last one". Right thought. But introducing a new scheme of character-class identificators or a new kind of describing character-classes is confusing. As PHP developer I think "Oh no, not again new magic charsets". I suggest again to use PCRE for that. The difference to your proposal is not so big. Examples: "/[[:alnum:]]/" will return "abc...XYZ0123456789". We can do this also with "/[a-zA-Z0-9]/". Or "/[a-z0-9]/i". Or "/[[:alpha:][:digit:]]/" You see: You can do things in much more different ways with PCRE. And you continue to use this "standard". [And PCRE supports UTF8. Currently not important. But who knows?] And maybe we can think about removing the beginning "/[" and the ending "]/", but a "/" at the end should be optionally possible to add some regex-parameters (like "/i"). > Having to detect character limits makes it uglier. Exactly. That's why I think we need not so much magic to the second parameter. The character-list is just a list of characters. No magic. We can extent this with a third parameter to tell the function from which charset it is. And maybe a fourth to tell the random-algorithm, but I think it's eventually better to have a function for each algorithm, because that's the way how random currently works. If I should write it with php this looks like that: pseudofunction str_random($len, $characters, $encoding = 'ASCII', $algo) { $result = ''; $chlen = mb_strlen($characters,$encoding); for ($i = 0; $i < $len; $i++) { $result .= mb_substr($characters, myrandom(0, $chlen, $algo),1); } return $result; } Without testing anything. It's just an idea. This is a working php-function, but $encoding doesn't work (some stupid error?) and not using $algo: function str_random($len, $characters, $encoding = 'ASCII', $algo = null) { $result = ''; $chlen = mb_strlen($characters,$encoding); for ($i = 0; $i < $len; $i++) { $result .= mb_substr($characters, rand(0, $chlen),1); } return $result; } > About supporting POSIX classes, that could be cool. But you then need a way > to enumerate them. Note that isalpha() will be provided by the C > library, so you > can't count on having its data. It's possible that PCRE, which we bundle, > contains the needed unicode tables. It works without thinking as above written in PHP code, but I dunno if this could be done in C equally. >> 3. Because generating a string from character-classes is very handy in >> general for some other things (many string functions have it), I >> suggest that it is not part of random_string(). Make a new function >> str_from_character_class(), or if you use pcre like above >> pcre_str_from_character_class()? > How would you use such function? If you want to make a string out of them, Oh, there are many cases to use it. For example (I renamed the function to "str_charset()", because it is just a string of a charset): // Search spacer strings strpbrk ("Hello World", str_charset('/[\s]/')); // remove invisible chars at begin or end (not very much sense, because a regex in this case is maybe faster) trim("\rblaa\n", str_charset('/[^[:print:]]/')); // remove invisible chars: when doing this with very big strings it could be much faster than with regex. str_replace(str_split(str_charset('/[^[:print:]]/')), "\rblaa\n"); There are many other more or less useful things you can do with a charset-string. :) -- Alex Aulbach -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php