From: jrose at lgb-inc dot com
Operating system: Linux
PHP version: 4.3.1
PHP Bug Type: *Regular Expressions
Bug description: PREG_SPLIT_NO_EMPTY causes ctl chars to not be excluded with
[^a-zA-Z0-9] or \W
I slammed into this whilst converting Access data into MySQL.
I was attempting to break apart the following into words (* here indicates
that it fails to match /[\w,.-?]/):
|*|W|a|s|h|i|n|g|t|o|n|,|*|D|C|*|* // text
057617368696e67746f6e2c20444320200 // hex
I first tried splitting on any white space or commas:
preg_match( '/[\\s,]+/', $string, PREG_SPLIT_NO_EMPTY );
When this didn't work, I examined the hexadecimal values as above, and,
assuming that control characters weren't included in the \s group, tried
several things, including the very simple:
preg_match( '/\\W/', $string, PREG_SPLIT_NO_EMPTY );
Nothing worked, and ultimately I had to use preg_match_all() to split the
string up.
Example:
$string = chr(5) . 'Washington,' . chr(4) . 'DC' . chr(2);
$parts = preg_split( '/\\W/', $string, PREG_SPLIT_NO_EMPTY );
echo join('|', $parts);
--
Edit bug report at http://bugs.php.net/?id=23904&edit=1
--
Try a CVS snapshot: http://bugs.php.net/fix.php?id=23904&r=trysnapshot
Fixed in CVS: http://bugs.php.net/fix.php?id=23904&r=fixedcvs
Fixed in release: http://bugs.php.net/fix.php?id=23904&r=alreadyfixed
Need backtrace: http://bugs.php.net/fix.php?id=23904&r=needtrace
Try newer version: http://bugs.php.net/fix.php?id=23904&r=oldversion
Not developer issue: http://bugs.php.net/fix.php?id=23904&r=support
Expected behavior: http://bugs.php.net/fix.php?id=23904&r=notwrong
Not enough info: http://bugs.php.net/fix.php?id=23904&r=notenoughinfo
Submitted twice: http://bugs.php.net/fix.php?id=23904&r=submittedtwice
register_globals: http://bugs.php.net/fix.php?id=23904&r=globals
PHP 3 support discontinued: http://bugs.php.net/fix.php?id=23904&r=php3
Daylight Savings: http://bugs.php.net/fix.php?id=23904&r=dst
IIS Stability: http://bugs.php.net/fix.php?id=23904&r=isapi
Install GNU Sed: http://bugs.php.net/fix.php?id=23904&r=gnused