Edit report at http://bugs.php.net/bug.php?id=43148&edit=1

 ID:                 43148
 Comment by:         anton85s at mail dot ru
 Reported by:        banu_daniel1 at yahoo dot com
 Summary:            filesize and unicode filenames
 Status:             Bogus
 Type:               Bug
 Package:            Filesystem function related
 Operating System:   windows xp 32 bits
 PHP Version:        5.2.4
 Block user comment: N

 New Comment:

"it just passes the filename to the OSes filesystem func and if it fails
- we can do nothing about it."

but it doesn't pass the filename to the unicode version of the
filesystem function, right ? It means that php could be modifed to use
the correct filesystem function at least, not non-unicode ones for all
calls.


Previous Comments:
------------------------------------------------------------------------
[2007-11-12 10:03:04] tony2...@php.net

PHP doesn't care if it's Unicode or not, it just passes the filename to
the OSes filesystem func and if it fails - we can do nothing about it.

------------------------------------------------------------------------
[2007-11-02 17:48:17] carsten_sttgt at gmx dot de

> but the problem is still there even on windows xp

> so this is the problem filesize function dose not

> work with filenames with unicode characters.



Ok, after some more tests, I can reproduce this problem. Just look at
this shell log:

| D:\>cd
D:\Apache2.2\htdocs\test\αβγδεζηθ

|

|
D:\Apache2.2\htdocs\test\αβγδεζηθ>dir
/b

| index.html

| phpinfo.php

|

|
D:\Apache2.2\htdocs\test\αβγδεζηθ>type
index.html

| <html><body><h1>It works!</h1></body></html>

|
D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>type
phpinfo.php

| <?php phpinfo(); ?>

|

|
D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>pear-request
http://localhost/

| test/%ce%b1%ce%b2%ce%b3%ce%b4%ce%b5%ce%b6%ce%b7%ce%b8/index.html

| <html><body><h1>It works!</h1></body></html>

|
D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>php
-r "echo getcwd();"

| D:\Apache2.2\htdocs\test\aß?de???

|
D:\Apache2.2\htdocs\test\&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;>cd..

|

| D:\Apache2.2\htdocs\test>php -r
"var_dump(stat('&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;'));"

|

| Warning: stat(): stat failed for aß?de??? in Command line code on

|  line 1

| bool(false)

|

| D:\Apache2.2\htdocs\test>



As you can see, I can't execute a PHP script in this folder
("&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;") or use the PHP
filesystem functions with this path. But I can access this folder
correctly with Apache via HTTP.





> on linux version i don't have this problem.



That's the difference. On Linux (or PHP) you have only UTF-8. But
Windows is using UTF-16 (or the current codepage for the installed
locale).





Just look at this script "test.php" (encoded in UTF-8):

| <?php

| mkdir('&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;');

| var_dump(is_dir('&#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;'));

| ?>



and the shell log:

| D:\Apache2.2\htdocs\test>php test.php

| bool(true)

|

| D:\Apache2.2\htdocs\test>dir /b

| test.php

| αβγδεζηθ

| 

| D:\Apache2.2\htdocs\test>



As you can see, you can create and access such paths with such a name
with PHP, but only inside PHP. In Windows or Apache you must use an
other (wrong) name. In this case PHP is just using the byte sequence of
UTF-8 chars as Latin1 chars.



This can be a quick fix for you, but is indeed not correct.



The problem is, PHP is only using simple string and filesystem functions
in the c sources, which are only working with the current locale
codepage. But it is not using the wide char and filesystem functions
from the Windows SDK, like Apache did.



BTW:

With a current PHP6 snap (full unicode support?), this also don't work.



Regards,

Carsten



BTW:

There is another bug in this bugtracker. You can't use UTF-8 chars in
bug reports, after submitting a comment, UTF-8 chars will be replaced
with entities, but all comments are placed between <pre> tags. Thus the
browser shows entities and not the correct chars.



Please open this html page with a browser:

| <html>

| <head>

| <meta http-equiv=content-type content="text/html; charset=UTF-8">

| </head>

| <body>

| &#945;&#946;&#947;&#948;&#949;&#950;&#951;&#952;

| </body>

| </html>

and replace all entities in by comment with the chars you can see in the
browser.

------------------------------------------------------------------------
[2007-11-01 22:11:12] banu_daniel1 at yahoo dot com

no i didn't see that. i remove that " and the result is exactly the
same( Array ( ) ).

I've try with other folders (non utf) and it works.

------------------------------------------------------------------------
[2007-11-01 21:57:27] carsten_sttgt at gmx dot de

> dirs = glob('"D:/Downloads/*', GLOB_ONLYDIR);

             --^

Please remove my typo... (you have not seen that?):

| dirs = glob('D:/Downloads/*', GLOB_ONLYDIR);



Regards,

Carsten

------------------------------------------------------------------------
[2007-11-01 21:42:27] banu_daniel1 at yahoo dot com

$dirs = glob('"D:/Downloads/*', GLOB_ONLYDIR);

print_r($dirs);



result is

Array ( )

------------------------------------------------------------------------


The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

    http://bugs.php.net/bug.php?id=43148


-- 
Edit this bug report at http://bugs.php.net/bug.php?id=43148&edit=1

Reply via email to