2012/7/16 Ángel González <keis...@gmail.com>:
>> 1a) If you want to support character classes, you can do it with pcre:
>> http://www.php.net/manual/en/regexp.reference.character-classes.php

> That's more or less what I have thought.
> If it's a string surrounded by square brackets, it's a character class,
> else
> treat as a literal list of characters.
> ] and - can be provided with the old trick of provide "] as first
> character",
> "make - the first or last one".

Right thought. But introducing a new scheme of character-class
identificators or a new kind of describing character-classes is
confusing. As PHP developer I think "Oh no, not again new magic
charsets".

I suggest again to use PCRE for that. The difference to your proposal
is not so big. Examples:

"/[[:alnum:]]/" will return "abc...XYZ0123456789". We can do this also
with "/[a-zA-Z0-9]/". Or "/[a-z0-9]/i". Or "/[[:alpha:][:digit:]]/"

You see: You can do things in much more different ways with PCRE. And
you continue to use this "standard".

[And PCRE supports UTF8. Currently not important. But who knows?]

And maybe we can think about removing the beginning "/[" and the
ending "]/", but a "/" at the end should be optionally possible to add
some regex-parameters (like "/i").


> Having to detect character limits makes it uglier.

Exactly. That's why I think we need not so much magic to the second
parameter. The character-list is just a list of characters. No magic.
We can extent this with a third parameter to tell the function from
which charset it is. And maybe a fourth to tell the random-algorithm,
but I think it's eventually better to have a function for each
algorithm, because that's the way how random currently works.

If I should write it with php this looks like that:

pseudofunction str_random($len, $characters, $encoding = 'ASCII', $algo)
{
    $result = '';
    $chlen = mb_strlen($characters,$encoding);
    for ($i = 0; $i < $len; $i++) {
        $result .= mb_substr($characters, myrandom(0, $chlen, $algo),1);
    }
    return $result;
}

Without testing anything. It's just an idea.

This is a working php-function, but $encoding doesn't work (some
stupid error?) and not using $algo:

function str_random($len, $characters, $encoding = 'ASCII', $algo = null)
{
            $result = '';
            $chlen = mb_strlen($characters,$encoding);
            for ($i = 0; $i < $len; $i++) {
                 $result .= mb_substr($characters, rand(0, $chlen),1);
            }
            return $result;
}


> About supporting POSIX classes, that could be cool. But you then need a way
> to enumerate them. Note that isalpha() will be provided by the C
> library, so you
> can't count on having its data. It's possible that PCRE, which we bundle,
> contains the needed unicode tables.

It works without thinking as above written in PHP code, but I dunno if
this could be done in C equally.


>> 3. Because generating a string from character-classes is very handy in
>> general for some other things (many string functions have it), I
>> suggest that it is not part of random_string(). Make a new function
>> str_from_character_class(), or if you use pcre like above
>> pcre_str_from_character_class()?
> How would you use such function? If you want to make a string out of them,

Oh, there are many cases to use it.

For example (I renamed the function to "str_charset()", because it is
just a string of a charset):

// Search spacer strings
strpbrk ("Hello World", str_charset('/[\s]/'));

// remove invisible chars at begin or end (not very much sense,
because a regex in this case is maybe faster)
trim("\rblaa\n", str_charset('/[^[:print:]]/'));

// remove invisible chars: when doing this with very big strings it
could be much faster than with regex.
str_replace(str_split(str_charset('/[^[:print:]]/')), "\rblaa\n");

There are many other more or less useful things you can do with a
charset-string. :)


-- 
Alex Aulbach

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to