ID:               30800
 Updated by:       [EMAIL PROTECTED]
 Reported By:      jmmolina at free dot fr
-Status:           Open
+Status:           Assigned
 Bug Type:         Feature/Change Request
 Operating System: Windows 2000 Pro
 PHP Version:      5.0.2
-Assigned To:      
+Assigned To:      derick
 New Comment:

Something like this is under consideration, most likely for PHP 5.2.


Previous Comments:
------------------------------------------------------------------------

[2004-11-15 20:03:47] jmmolina at free dot fr

Description:
------------
>From the variables chapter we can read :

« Variable names follow the same rules as other labels in PHP. A valid
variable name starts with a letter or underscore, followed by any
number of letters, numbers, or underscores. As a regular expression, it
would be expressed thus: '[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*'

    Note: For our purposes here, a letter is a-z, A-Z, and the ASCII
characters from 127 through 255 (0x7f-0xff). »

As many languages use other character sets, I was wondering if there
was any plan to support UNICODE to name variables and other PHP labels.
As I told the Zend support : « It would allow french and asian
developers to write their scripts in their own language. Not sure about
the impact on performances though. ». For example in french we often use
the « œ » characters, we call it « e in o » because it looks like « oe
», it's used in words like « cœur » (heart) or « nœud » (node). The
problem is that some characters are supported, some others are not. For
example the « æ » character, « e in a » is part of the ASCII character
set, its ASCII character code is 0xE6. It means PHP does support
scripts using this character, because 0xE6 is between 0x7f and 0xff.
But « œ » is not an ASCII character, it's the « Latin Small Lagature Oe
» UNICODE character.

To sum things up, the idea is to allow developers to write PHP scripts
using their natural language. French developers would be able to write
scripts in french, using our weird « Latin Small Lagature » characters,
chinese and japanese developers would be able to use their favourite
KANJI to name their variables, classes...

I think the PHP team decided to choose this regular expression to
improve the script parsing performance, but I'm sure there's a solution
to support UNICODE. It could be an option to enable from the PHP
configuration file for example, or using a Apache .htaccess file.
Beside the performance penalty there might be an other problem.
Allowing us to use the whole UNICODE character set means we would be
able to name our variables « cœur♥ » (last character code is
x2665, it represents a black heart) or « ♀♂ » (male and
female symbols) instead of « human_class ». I'm sure the PHP team will
point out other issues but as I'm not a hardcore Zend engine developer,
It's all what I can think of :).

I join the « nœud » class script if you want to try it. The PHP parser
returns a parse error at line 9 (« private $nom; »).

Jean-Marc Molina.

Reproduce code:
---------------
<?php

/**
Classe nœud.
*/

class nœud
{
        private $nom;
        
        public function __construct ()
        {
                $this->nom = "Nœud sans nom";
        }
}

?>



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=30800&edit=1

Reply via email to