Edit report at https://bugs.php.net/bug.php?id=47096&edit=1

 ID:                 47096
 Comment by:         salsi at icosaedro dot it
 Reported by:        nuabaranda at web dot de
 Summary:            move_uploaded_file not OS encoding aware
 Status:             Open
 Type:               Bug
 Package:            Filesystem function related
 Operating System:   win32 only - Windows XP
 PHP Version:        5.2.8
 Block user comment: N
 Private report:     N

 New Comment:

As PHP operates under Windows as a "non-Unicode aware program", file names are 
bare array of bytes represented under PHP as "string"; these strings are 
converted back and forth to Unicode by Windows according to the currently 
selected "code page table" (see "Control Panel", "Regional and Language 
Options", "Administrative" tab panel, "Language for non-Unicode programs"). 
Unfortunately, UTF-8 encoding is not available there, so whatever locale you 
choose, some Unicode file names may still remain unaccessible to PHP.

For example, if your system locale is any western european encoding (code page 
1252), there is no way to refer to a file whose name is "日本語"; only on 
Windows system with japanese locale set (code page 932) you can access such a 
name, provided that the "string" that represents that name be properly encoded 
as requested by the code page 932, that is "\x93\xfa\x96\x7b\x8c\xea".

So, if you have a generic name of a file (along with its path) as a Unicode 
string $u (for example UTF-8 encoded) and you want to try to save it with that 
name under Windows, you must first check the current locale calling 
setlocale(LC_CTYPE, 0) to retrieve the current code page, then you must convert 
$u to an array of bytes according to the code page; if one or more code points 
have no counterpart in the current code page, the file cannot be saved with 
that name from PHP. Dot.

To complicate the implementation of such an algorithm, neither mbstring nor 
iconv are aware of all the Windows code pages, so you must write these 
conversion routines by yourself. This is just what I have done experimentally 
under PHP, and it appears to work nicely 
(http://www.icosaedro.it/phplint/libraries.cgi?lib=stdlib/it/icosaedro/io/FileName.html).
 Hopefully some day something similar will be available in PHP core lib., or 
some other abstraction layer of classes may provide full access to the Unicode 
realm.

References:

http://en.wikipedia.org/wiki/Windows_code_page

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/


Previous Comments:
------------------------------------------------------------------------
[2011-09-23 03:02:09] xd-yang at qq dot com

Since basename() is locale aware, why not move_uploaded_file()?
A common remedial measure is to use iconv() to explicitly convert the 
destination filename encoding usually from utf-8 to ansi(like gb2312). But this 
becomes complicated and unreachable in a multilingual CMS, like wordpress. Can 
this issue be solved in the future?

------------------------------------------------------------------------
[2009-02-26 09:46:51] mm107137 at spamcorptastic dot com

I have the same problem under debian host (ovh hoster).
Filename with french accents passed to move_upload_file are destroyed.
There's no problems if filename is not passed as utf8.

Very annoying

------------------------------------------------------------------------
[2009-02-06 20:21:49] mindfreakthemon at gmail dot com

And on Windows 7 and Vista under Apache 2.2 that bug exists too.

------------------------------------------------------------------------
[2009-01-14 09:26:41] nuabaranda at web dot de

Description:
------------
Files with filenames containing non-ascii characters like german umlauts get 
destroyed when saved with move_uploaded_file(). The UTF-8 special characters 
get translated byte-wise into CP1251 characters when determining the Windows 
filenames thus destroying the original special characters.



------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=47096&edit=1

Reply via email to