ID:               39744
 User updated by:  sdamir at gmail dot com
 Reported By:      sdamir at gmail dot com
 Status:           Open
 Bug Type:         *Regular Expressions
 Operating System: Linux 2.6.18
 PHP Version:      5.2.0
 New Comment:

I dont know why but your bug-system converted letters in my php code
into &#crap; stuff.


Previous Comments:
------------------------------------------------------------------------

[2006-12-05 15:48:49] sdamir at gmail dot com

Description:
------------
I am trying to match all alphabetic utf8 characters. I know (tested)
that in perl if $string is utf8 encoded and if i use regex like =~ /\w/
it will match all alphabetic utf8 characters, (cirilic alphabet,
chinese, english etc.). However this is not the case for php. I read i
need to use special patterns like \pL , well this doesn't work for me
either, it matches some characters but cirilic letters aren't matched.
I don't know if this is a bug or i am doing something wrong but i
really searched the hell out of everything, visited tons of irc support
channels no one has an answer to this.

Reproduce code:
---------------
<?php 

// setlocale(LC_ALL, 'en_US.utf8'); // if i set locale to en_US, it
matches some characters like öåä but not rilic, en_US.utf8 wont match
anything.

$str=" &#1057;&#1088;&#1077;&#1115;&#1072; ";

utf8_encode($str);

var_dump($str); 
preg_match("/[\w\pL]/u",$str, $r); 
var_dump($r);

?> 

Expected result:
----------------
string(3) " s "
array(1) {
  [0]=>
  string(1) "&#1057;"
}


Actual result:
--------------
string(12) " &#1057;&#1088;&#1077;&#1115;&#1072; "
array(0) {
}



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=39744&edit=1

Reply via email to