On 17/07/12 13:34, Alex Aulbach wrote: >> That's more or less what I have thought. >> If it's a string surrounded by square brackets, it's a character class, >> else >> treat as a literal list of characters. >> ] and - can be provided with the old trick of provide "] as first >> character", >> "make - the first or last one". > Right thought. But introducing a new scheme of character-class > identificators or a new kind of describing character-classes is > confusing. As PHP developer I think "Oh no, not again new magic > charsets". Not really new. Those escapings is how you had to work with them in character classes of traditional regular expressions. But I agree it can be confusing. What about a flag parameter, then?
> I suggest again to use PCRE for that. The difference to your proposal > is not so big. Examples: > > "/[[:alnum:]]/" will return "abc...XYZ0123456789". We can do this also > with "/[a-zA-Z0-9]/". Or "/[a-z0-9]/i". Or "/[[:alpha:][:digit:]]/" > > You see: You can do things in much more different ways with PCRE. And > you continue to use this "standard". > > [And PCRE supports UTF8. Currently not important. But who knows?] > > And maybe we can think about removing the beginning "/[" and the > ending "]/", but a "/" at the end should be optionally possible to add > some regex-parameters (like "/i"). Those could be in the flag. The / are not really needed, they are an additional syntax over regex provided by PHP (and the character can be a different one, although usually / is picked). >> Having to detect character limits makes it uglier. > Exactly. That's why I think we need not so much magic to the second > parameter. The character-list is just a list of characters. No magic. > We can extent this with a third parameter to tell the function from > which charset it is. And maybe a fourth to tell the random-algorithm, > but I think it's eventually better to have a function for each > algorithm, because that's the way how random currently works. > > If I should write it with php this looks like that: > > pseudofunction str_random($len, $characters, $encoding = 'ASCII', $algo) > { > $result = ''; > $chlen = mb_strlen($characters,$encoding); > for ($i = 0; $i < $len; $i++) { > $result .= mb_substr($characters, myrandom(0, $chlen, $algo),1); > } > return $result; > } > > Without testing anything. It's just an idea. > > This is a working php-function, but $encoding doesn't work (some > stupid error?) and not using $algo: > > function str_random($len, $characters, $encoding = 'ASCII', $algo = null) > { > $result = ''; > $chlen = mb_strlen($characters,$encoding); > for ($i = 0; $i < $len; $i++) { > $result .= mb_substr($characters, rand(0, $chlen),1); > } > return $result; > } > > >> About supporting POSIX classes, that could be cool. But you then need a way >> to enumerate them. Note that isalpha() will be provided by the C >> library, so you >> can't count on having its data. It's possible that PCRE, which we bundle, >> contains the needed unicode tables. > It works without thinking as above written in PHP code, but I dunno if > this could be done in C equally. The above code doesn't support POSIX character classes, just picking characters out of a string (which I agree is simple). >>> 3. Because generating a string from character-classes is very handy in >>> general for some other things (many string functions have it), I >>> suggest that it is not part of random_string(). Make a new function >>> str_from_character_class(), or if you use pcre like above >>> pcre_str_from_character_class()? >> How would you use such function? If you want to make a string out of them, > Oh, there are many cases to use it. > > For example (I renamed the function to "str_charset()", because it is > just a string of a charset): > > // Search spacer strings > strpbrk ("Hello World", str_charset('/[\s]/')); So you're expanding all spacing characters, then iterating over them with strpbrk(), a preg_match() would have been more efficient. > // remove invisible chars at begin or end (not very much sense, > because a regex in this case is maybe faster) > trim("\rblaa\n", str_charset('/[^[:print:]]/')); > > // remove invisible chars: when doing this with very big strings it > could be much faster than with regex. > str_replace(str_split(str_charset('/[^[:print:]]/')), "\rblaa\n"); I don't see why expanding to a string, then converting to an array to finally str_replace would be faster :S Also, that str_split() for all non-printable characters (even considering that you wouldn't get out of the memory limit with the many unicode chars you will meet) will fail with codepoints > 127 (str_split works on bytes) > There are many other more or less useful things you can do with a > charset-string. :) I'm not really convinced it's the right way to do them :) -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php