On 05/04/2011 03:10 PM, Ashley Sheridan wrote:
> On Wed, 2011-05-04 at 13:46 -0600, Jason Gerfen wrote:
> 
>> On 05/04/2011 01:27 PM, Ashley Sheridan wrote:
>>> On Wed, 2011-05-04 at 13:20 -0600, Jason Gerfen wrote:
>>>
>>>> I am running into a problem using the REGEXP option with filter_var().
>>>>
>>>> The string I am using: 09VolunteerApplication.doc
>>>> The PCRE regex I am using:
>>>> /^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di
>>>>
>>>> The function in it's entirety:
>>>> return (!filter_var('09VolunteerApplication.doc',
>>>> FILTER_VALIDATE_REGEXP,
>>>> array('options'=>array('regexp'=>'/^[a-z0-9]\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$/Di'))))
>>>> ? false : true;
>>>>
>>>> Anyone have any insight into this?
>>>>
>>>
>>>
>>> You missed a + in your regex, at the moment you're only checking to see
>>> if a file starts with a single a-z or number and then is followed by the
>>> period. Then you're checking for oddly for one to four extensions in the
>>> list, are you sure you want to do that? And the square brackets are used
>>> to match characters, not strings, use the standard brackets to allow
>>> from a choice of strings
>>>
>>> Try this:
>>>
>>> '/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls)$/Di'
>>>
>>> One other thing you should be aware of maybe, filenames won't always
>>> consist of just the letters a-z and numbers 0-9, they may contain
>>> accented or foreign letters, hyphens, spaces and a number of other
>>> characters depending on the client machines OS. Windows allows very few
>>> characters for example compared to the Unix-like OS's like MacOS and
>>> Linux.
>>>
>>
>> Both are valid PCRE regex's. However the rules regarding usage of
>> parenthesis for an XOR string does not explain a similar regex being
>> used with the filter_var() like so:
>>
>> return (filter_var('kc-1', FILTER_VALIDATE_REGEXP,
>> array('options'=>array('regexp'=>'/^[kc\-1|kc\-color|gr\-1|fa\-1|un\-1|un\-color|ben\-1|bencolor|sage\-1|sr\-1|st\-1]{1,8}$/Di')))
>> ? true : false;
>>
>> The above returns string(4) "kc-1"
>>
>> Another test using the following works similarly:
>>
>> return (filter_var('u0368839', FILTER_VALIDATE_REGEXP,
>> array('options'=>array('regexp'=>'/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ?
>> true : false;
>>
>> The above returns string(8) "u0368839"
>>
>> And
>> return (filter_var('u0368839', FILTER_VALIDATE_REGEXP,
>> array('options'=>array('regexp'=>'/^[gp|u|gx]{1,2}[\d+]{6,15}$/Di'))) ?
>> true : false;
>>
>> returns string(8) "gp123456"
>>
>> As you can see these three examples use the start [] as XOR conditionals
>> for multiple strings as prefixes.
>>
>>
>>
> 
> 
> Not quite, you think they match correctly because that's all you're
> testing for, and you're not looking for anything that might disprove
> that. Using your last example, it will also match these strings:
> 
> gu0368839
> xx0368839
> p0368839
> 
> 
> I tested your first regex with '09VolunteerApplication.doc' and it
> doesn't work at all until you add in that plus after the basename match
> part of the regex:
> 
> ^[a-z0-9]+\.[doc|pdf|txt|jpg|jpeg|png|docx|csv|xls]{1,4}$
> 
> However, your regex (with the plus) also matches these strings:
> 
> 09VolunteerApplication.docp
> 09VolunteerApplication.docj
> 09VolunteerApplication.doc|    <-- note it's matching the literal bar
> character
> 
> Making the changes I suggested (^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|
> docx|csv|xls)$) means the regex works as you expect. Square brackets in
> a regex match a range, not a literal string, and without any sort of
> modifier, match only a single instance of that range. So in your
> example, you're matching a 4 character extension containing any of the
> following characters '|cdfgjlnopstx', and a basename containing only 1
> character that is either an a-z or a number.
> 

You are right, after a few other tests I stand corrected. My apologies.
However according to the documentation for filter_var() and the PCRE
regexp option if it returns false, which it is, this is indicating an
error with the regex.

In addition to this I would like to point out that the same regex using
the older preg_match() function works as it should while the character
class following by the pattern (+) fails the validation portion of the
regex.

print_r(var_dump(filter_var('09VolunteerApplication.doc',
FILTER_VALIDATE_REGEXP,
array('options'=>array('regexp'=>'/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/Di')))));

returns false (invalid regex) when using the character matching class
[a-z0-9]+ with the filter_var() function with the FILTER_VALIDATE_REGEXP
option

print_r(var_dump(preg_match('/^[a-z0-9]+\.(doc|pdf|txt|jpg|jpeg|png|docx|csv|xls){1,4}$/i',
'09VolunteerApplication.doc')));

return int(1) indicating a valid regex as well as a valid match.

I believe this should be reported as a bug but I appreciate your
assistance and insights.


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to