From:             sdamir at gmail dot com
Operating system: Linux 2.6.18
PHP version:      5.2.0
PHP Bug Type:     *Regular Expressions
Bug description:  UTF8 support

Description:
------------
I am trying to match all alphabetic utf8 characters. I know (tested) that
in perl if $string is utf8 encoded and if i use regex like =~ /\w/ it will
match all alphabetic utf8 characters, (cirilic alphabet, chinese, english
etc.). However this is not the case for php. I read i need to use special
patterns like \pL , well this doesn't work for me either, it matches some
characters but cirilic letters aren't matched. I don't know if this is a
bug or i am doing something wrong but i really searched the hell out of
everything, visited tons of irc support channels no one has an answer to
this.

Reproduce code:
---------------
<?php 

// setlocale(LC_ALL, 'en_US.utf8'); // if i set locale to en_US, it
matches some characters like öåä but not rilic, en_US.utf8 wont match
anything.

$str=" &#1057;&#1088;&#1077;&#1115;&#1072; ";

utf8_encode($str);

var_dump($str); 
preg_match("/[\w\pL]/u",$str, $r); 
var_dump($r);

?> 

Expected result:
----------------
string(3) " s "
array(1) {
  [0]=>
  string(1) "&#1057;"
}


Actual result:
--------------
string(12) " &#1057;&#1088;&#1077;&#1115;&#1072; "
array(0) {
}


-- 
Edit bug report at http://bugs.php.net/?id=39744&edit=1
-- 
Try a CVS snapshot (PHP 4.4): 
http://bugs.php.net/fix.php?id=39744&r=trysnapshot44
Try a CVS snapshot (PHP 5.2): 
http://bugs.php.net/fix.php?id=39744&r=trysnapshot52
Try a CVS snapshot (PHP 6.0): 
http://bugs.php.net/fix.php?id=39744&r=trysnapshot60
Fixed in CVS:                 http://bugs.php.net/fix.php?id=39744&r=fixedcvs
Fixed in release:             
http://bugs.php.net/fix.php?id=39744&r=alreadyfixed
Need backtrace:               http://bugs.php.net/fix.php?id=39744&r=needtrace
Need Reproduce Script:        http://bugs.php.net/fix.php?id=39744&r=needscript
Try newer version:            http://bugs.php.net/fix.php?id=39744&r=oldversion
Not developer issue:          http://bugs.php.net/fix.php?id=39744&r=support
Expected behavior:            http://bugs.php.net/fix.php?id=39744&r=notwrong
Not enough info:              
http://bugs.php.net/fix.php?id=39744&r=notenoughinfo
Submitted twice:              
http://bugs.php.net/fix.php?id=39744&r=submittedtwice
register_globals:             http://bugs.php.net/fix.php?id=39744&r=globals
PHP 3 support discontinued:   http://bugs.php.net/fix.php?id=39744&r=php3
Daylight Savings:             http://bugs.php.net/fix.php?id=39744&r=dst
IIS Stability:                http://bugs.php.net/fix.php?id=39744&r=isapi
Install GNU Sed:              http://bugs.php.net/fix.php?id=39744&r=gnused
Floating point limitations:   http://bugs.php.net/fix.php?id=39744&r=float
No Zend Extensions:           http://bugs.php.net/fix.php?id=39744&r=nozend
MySQL Configuration Error:    http://bugs.php.net/fix.php?id=39744&r=mysqlcfg

Reply via email to