From:             jmmolina at free dot fr
Operating system: Windows 2000 Pro
PHP version:      5.0.2
PHP Bug Type:     Feature/Change Request
Bug description:  UNICODE support to name variables and other PHP labels

Description:
------------
>From the variables chapter we can read :

« Variable names follow the same rules as other labels in PHP. A valid
variable name starts with a letter or underscore, followed by any number
of letters, numbers, or underscores. As a regular expression, it would be
expressed thus: '[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*'

    Note: For our purposes here, a letter is a-z, A-Z, and the ASCII
characters from 127 through 255 (0x7f-0xff). »

As many languages use other character sets, I was wondering if there was
any plan to support UNICODE to name variables and other PHP labels. As I
told the Zend support : « It would allow french and asian developers to
write their scripts in their own language. Not sure about the impact on
performances though. ». For example in french we often use the « œ »
characters, we call it « e in o » because it looks like « oe », it's used
in words like « cœur » (heart) or « nœud » (node). The problem is that
some characters are supported, some others are not. For example the « æ »
character, « e in a » is part of the ASCII character set, its ASCII
character code is 0xE6. It means PHP does support scripts using this
character, because 0xE6 is between 0x7f and 0xff. But « œ » is not an
ASCII character, it's the « Latin Small Lagature Oe » UNICODE character.

To sum things up, the idea is to allow developers to write PHP scripts
using their natural language. French developers would be able to write
scripts in french, using our weird « Latin Small Lagature » characters,
chinese and japanese developers would be able to use their favourite KANJI
to name their variables, classes...

I think the PHP team decided to choose this regular expression to improve
the script parsing performance, but I'm sure there's a solution to support
UNICODE. It could be an option to enable from the PHP configuration file
for example, or using a Apache .htaccess file. Beside the performance
penalty there might be an other problem. Allowing us to use the whole
UNICODE character set means we would be able to name our variables «
cœur♥ » (last character code is x2665, it represents a black heart)
or « ♀♂ » (male and female symbols) instead of « human_class
». I'm sure the PHP team will point out other issues but as I'm not a
hardcore Zend engine developer, It's all what I can think of :).

I join the « nœud » class script if you want to try it. The PHP parser
returns a parse error at line 9 (« private $nom; »).

Jean-Marc Molina.

Reproduce code:
---------------
<?php

/**
Classe nœud.
*/

class nœud
{
        private $nom;
        
        public function __construct ()
        {
                $this->nom = "Nœud sans nom";
        }
}

?>


-- 
Edit bug report at http://bugs.php.net/?id=30800&edit=1
-- 
Try a CVS snapshot (php4):   http://bugs.php.net/fix.php?id=30800&r=trysnapshot4
Try a CVS snapshot (php5.0): 
http://bugs.php.net/fix.php?id=30800&r=trysnapshot50
Try a CVS snapshot (php5.1): 
http://bugs.php.net/fix.php?id=30800&r=trysnapshot51
Fixed in CVS:                http://bugs.php.net/fix.php?id=30800&r=fixedcvs
Fixed in release:            http://bugs.php.net/fix.php?id=30800&r=alreadyfixed
Need backtrace:              http://bugs.php.net/fix.php?id=30800&r=needtrace
Need Reproduce Script:       http://bugs.php.net/fix.php?id=30800&r=needscript
Try newer version:           http://bugs.php.net/fix.php?id=30800&r=oldversion
Not developer issue:         http://bugs.php.net/fix.php?id=30800&r=support
Expected behavior:           http://bugs.php.net/fix.php?id=30800&r=notwrong
Not enough info:             
http://bugs.php.net/fix.php?id=30800&r=notenoughinfo
Submitted twice:             
http://bugs.php.net/fix.php?id=30800&r=submittedtwice
register_globals:            http://bugs.php.net/fix.php?id=30800&r=globals
PHP 3 support discontinued:  http://bugs.php.net/fix.php?id=30800&r=php3
Daylight Savings:            http://bugs.php.net/fix.php?id=30800&r=dst
IIS Stability:               http://bugs.php.net/fix.php?id=30800&r=isapi
Install GNU Sed:             http://bugs.php.net/fix.php?id=30800&r=gnused
Floating point limitations:  http://bugs.php.net/fix.php?id=30800&r=float
MySQL Configuration Error:   http://bugs.php.net/fix.php?id=30800&r=mysqlcfg

Reply via email to