Edit report at https://bugs.php.net/bug.php?id=61018&edit=1

 ID:                 61018
 Comment by:         danielklein at airpost dot net
 Reported by:        dey101+php at gmail dot com
 Summary:            Unexplained bool(false) returned from preg_match
 Status:             Open
 Type:               Bug
 Package:            PCRE related
 PHP Version:        5.3.10
 Block user comment: N
 Private report:     N

 New Comment:

I have simplified the error to the following:
<?php
$string = 'ABCDEFGHIJ12345678.';
var_dump(preg_match('/^(?:\w*)*$/i', $string));
$string = 'ABCDEFGHIJ1234567.';
var_dump(preg_match('/^(?:\w*)*$/i', $string));
?>

Outputs:
boolean false
int 0

Saying /(\w*)*/ is VERY inefficient as it must try every combination before 
failing, i.e. matching:
'ABCDEFGHIJ12345678', ''
'ABCDEFGHIJ1234567', '8', ''
'ABCDEFGHIJ1234567', '', '8', ''
'ABCDEFGHIJ123456', '78', ''
'ABCDEFGHIJ123456', '7', '8', ''
'ABCDEFGHIJ123456', '7', '', '8', ''
'ABCDEFGHIJ123456', '', '78', ''
...
'', 'A', '', 'B', '', 'C', '', 'D', '', 'E', '', 'F', '', 'G', '', 'H', '', 
'I', '', 'J', '', '1', '', '2', '', '3', '', '4', '', '5', '', '6', '', '7', 
'', '8', ''

It is most likely running out of memory before it completes. I would suggest 
that this is not a bug as it will use exponentially more memory the longer the 
input string gets.
You should try something like '/^(?:(?>\w*))*$/i' instead to avoid undesired 
backtracking.


Previous Comments:
------------------------------------------------------------------------
[2012-02-15 18:39:27] mattfic...@php.net

I have verified that the output from this repro script is the same on both 
Windows and Linux (Both using 5.3.10), so this is not a Windows specific bug 
report anymore.

------------------------------------------------------------------------
[2012-02-14 13:42:21] dey101+php at gmail dot com

I did not have access to a linux test platform to test. If you have verified 
that the bug exists on multiple platforms, please fee free to re-classify as a 
general bug.

------------------------------------------------------------------------
[2012-02-13 23:40:06] mattfic...@php.net

Thank you for your report and helping to make php better.

When I ran your script on Windows 2008 and Linux(using TS build of php5.3.10), 
it looks like the output is the same on both OSes. I don't think this is a PHP 
on Windows bug.

If you would like, I can reclassify this bug as a general bug, not specific to 
Windows.

Or, am I missing something? Is this really a PHP on Windows problem?



win2008 sp1 x64 output(TS Build):

Regex: /^[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/
  Host: ABCDEFGHIJ1234567890.
    Result: (error) bool(false)
  Host: ABCDEFGHI234567890.
    Result: (no match) int(0)
  Host: ABCDEFGHIJ1234567890
    Result: (match) int(1)
  Host: ABCDEFGHI1234567890
    Result: (match) int(1)
  Host: ABCDEFGHI123456789
    Result: (match) int(1)
  Host: ABCDEFGHIJ-1234567890
    Result: (match) int(1)
  Host: ABCDEFGHIJ-123456789
    Result: (match) int(1)
  Host: ABCDEFGHI-123456789
    Result: (match) int(1)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (no match) int(0)
  Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890
-abcd-efgh-hijk.COM
    Result: (no match) int(0)

Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*$/
  Host: ABCDEFGHIJ1234567890.
    Result: (match) int(1)
  Host: ABCDEFGHI234567890.
    Result: (match) int(1)
  Host: ABCDEFGHIJ1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI123456789
    Result: (no match) int(0)
  Host: ABCDEFGHIJ-1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHIJ-123456789
    Result: (error) bool(false)
  Host: ABCDEFGHI-123456789
    Result: (no match) int(0)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (error) bool(false)
  Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890
-abcd-efgh-hijk.COM
    Result: (error) bool(false)

Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)+$/
  Host: ABCDEFGHIJ1234567890.
    Result: (match) int(1)
  Host: ABCDEFGHI234567890.
    Result: (match) int(1)
  Host: ABCDEFGHIJ1234567890
    Result: (no match) int(0)
  Host: ABCDEFGHI1234567890
    Result: (no match) int(0)
  Host: ABCDEFGHI123456789
    Result: (no match) int(0)
  Host: ABCDEFGHIJ-1234567890
    Result: (no match) int(0)
  Host: ABCDEFGHIJ-123456789
    Result: (no match) int(0)
  Host: ABCDEFGHI-123456789
    Result: (no match) int(0)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (error) bool(false)
  Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890
-abcd-efgh-hijk.COM
    Result: (error) bool(false)

Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*[[:alnum:]](?:[[:alnum:
]\-]*[[:alnum:]])*$/
  Host: ABCDEFGHIJ1234567890.
    Result: (error) bool(false)
  Host: ABCDEFGHI234567890.
    Result: (error) bool(false)
  Host: ABCDEFGHIJ1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI123456789
    Result: (match) int(1)
  Host: ABCDEFGHIJ-1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHIJ-123456789
    Result: (error) bool(false)
  Host: ABCDEFGHI-123456789
    Result: (match) int(1)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (match) int(1)
  Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890
-abcd-efgh-hijk.COM
    Result: (match) int(1)


Linux-x64-gentoo output:
Regex: /^[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/
  Host: ABCDEFGHIJ1234567890.
    Result: (error) bool(false)
  Host: ABCDEFGHI234567890.
    Result: (no match) int(0)
  Host: ABCDEFGHIJ1234567890
    Result: (match) int(1)
  Host: ABCDEFGHI1234567890
    Result: (match) int(1)
  Host: ABCDEFGHI123456789
    Result: (match) int(1)
  Host: ABCDEFGHIJ-1234567890
    Result: (match) int(1)
  Host: ABCDEFGHIJ-123456789
    Result: (match) int(1)
  Host: ABCDEFGHI-123456789
    Result: (match) int(1)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (no match) int(0)
  Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-123
  45-67890-abcd-efgh-hijk.COM
    Result: (no match) int(0)

Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*$/
  Host: ABCDEFGHIJ1234567890.
    Result: (match) int(1)
  Host: ABCDEFGHI234567890.
    Result: (match) int(1)
  Host: ABCDEFGHIJ1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI123456789
    Result: (no match) int(0)
  Host: ABCDEFGHIJ-1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHIJ-123456789
    Result: (error) bool(false)
  Host: ABCDEFGHI-123456789
    Result: (no match) int(0)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (error) bool(false)
  Host: WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-
  12345-67890-abcd-efgh-hijk.COM
    Result: (error) bool(false)

Regex: /^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)+$/
  Host: ABCDEFGHIJ1234567890.
    Result: (match) int(1)
  Host: ABCDEFGHI234567890.
    Result: (match) int(1)
  Host: ABCDEFGHIJ1234567890
    Result: (no match) int(0)
  Host: ABCDEFGHI1234567890
    Result: (no match) int(0)
  Host: ABCDEFGHI123456789
    Result: (no match) int(0)
  Host: ABCDEFGHIJ-1234567890
    Result: (no match) int(0)
  Host: ABCDEFGHIJ-123456789
    Result: (no match) int(0)
  Host: ABCDEFGHI-123456789
    Result: (no match) int(0)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (error) bool(false)
  Host: 
WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890-abcd-efgh-hijk.COM
    Result: (error) bool(false)

Regex: 
/^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/
  Host: ABCDEFGHIJ1234567890.
    Result: (error) bool(false)
  Host: ABCDEFGHI234567890.
    Result: (error) bool(false)
  Host: ABCDEFGHIJ1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI123456789
    Result: (match) int(1)
  Host: ABCDEFGHIJ-1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHIJ-123456789
    Result: (error) bool(false)
  Host: ABCDEFGHI-123456789
    Result: (match) int(1)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (match) int(1)
  Host: 
WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890-abcd-efgh-hijk.COM
    Result: (match) int(1)

------------------------------------------------------------------------
[2012-02-08 18:43:42] dey101+php at gmail dot com

Description:
------------
PHP VC9 x86 Thread Safe (from http://windows.php.net/download/)

Using a regex to validate if a string is a valid hostname (host or FQDN).

It seems that for certain length strings trying to match a literal period at 
the end will cause the preg_match to return false if the string does not have a 
period in it. It also will return false if the string has a period at the end, 
and the regex does not try to match them.

The regex is using subpatterns ()to apply the zero or more repetition 
quantifier *. I tried with both capturing and non-capturing (?:), both yield 
the same result. However, if I use the one or more quantifier + it does not 
return bool(false). Using {0,} instead of * does not change the outcome.

It seems that the cutoff length for the string is about 20 characters. Less 
than that, the results are int(0) or int(1) depending on if the regex matches, 
longer than that, and bool(false) is returned.

If the subpattern is part of a longer string, it does work as anticipated.

Matching a literal period at the beginning of the pattern does not yield an 
error.

Substituting a-zA-Z0-9 for the [:alnum:] character class does not affect the 
results.

error_get_last() does not return anything, nothing is showing up in logs with 
error_reporting(-1) set either.

Test script:
---------------
$regexs = array
(
        '/^[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/',
        '/^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*$/',
        '/^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)+$/',
        
'/^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/'
);

$hosts = array
(
        'ABCDEFGHIJ1234567890.', // long string with period at end
        'ABCDEFGHI234567890.', // slightly shorter string with period at end
        'ABCDEFGHIJ1234567890', // long string no period
        'ABCDEFGHI1234567890', // a little shorter
        'ABCDEFGHI123456789', // even shorter
        'ABCDEFGHIJ-1234567890', // long with hyphen
        'ABCDEFGHIJ-123456789', // sorter with hyphen
        'ABCDEFGHI-123456789', // even shorter with hyphen
        'WWW.ABCDEFGHIJ-1234567890.COM', // a FQDN with long sting and hyphen
        
'WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890-abcd-efgh-hijk.COM'
 // a really long FQDN
);

foreach ($regexs as $regex)
{
        echo "\nRegex: $regex\n";

        foreach ($hosts as $host)
        {
                echo "  Host: $host\n";

                $result = preg_match($regex, $host);

                echo '    Result: ';
                if ($result === false)
                {
                        echo '(error) ';
                        print_r(error_get_last()); // never prints anything?
                }
                else
                {
                        echo ($result) ? '(match) ' : '(no match) ';
                }

                var_dump($result);
        }
}

Expected result:
----------------
none of the results should yield bool(false)

Actual result:
--------------
// just the output from the last regex, but others yield bool(false)
Regex: 
/^(?:[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*\.)*[[:alnum:]](?:[[:alnum:]\-]*[[:alnum:]])*$/
  Host: ABCDEFGHIJ1234567890.
    Result: (error) bool(false)
  Host: ABCDEFGHI234567890.
    Result: (error) bool(false)
  Host: .ABCDEFGHIJ1234567890
    Result: (no match) int(0)
  Host: ABCDEFGHIJ1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHI123456789
    Result: (match) int(1)
  Host: ABCDEFGHIJ-1234567890
    Result: (error) bool(false)
  Host: ABCDEFGHIJ-123456789
    Result: (error) bool(false)
  Host: ABCDEFGHI-123456789
    Result: (match) int(1)
  Host: WWW.ABCDEFGHIJ-1234567890.COM
    Result: (match) int(1)
  Host: 
WWW.SUB-SUBDOMAIN.SUBDOMAIN.ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ-12345-67890-abcd-efgh-hijk.COM
    Result: (match) int(1)


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=61018&edit=1

Reply via email to