Edit report at https://bugs.php.net/bug.php?id=47096&edit=1
ID: 47096 Comment by: salsi at icosaedro dot it Reported by: nuabaranda at web dot de Summary: move_uploaded_file not OS encoding aware Status: Open Type: Bug Package: Filesystem function related Operating System: win32 only - Windows XP PHP Version: 5.2.8 Block user comment: N Private report: N New Comment: As PHP operates under Windows as a "non-Unicode aware program", file names are bare array of bytes represented under PHP as "string"; these strings are converted back and forth to Unicode by Windows according to the currently selected "code page table" (see "Control Panel", "Regional and Language Options", "Administrative" tab panel, "Language for non-Unicode programs"). Unfortunately, UTF-8 encoding is not available there, so whatever locale you choose, some Unicode file names may still remain unaccessible to PHP. For example, if your system locale is any western european encoding (code page 1252), there is no way to refer to a file whose name is "æ¥æ¬èª"; only on Windows system with japanese locale set (code page 932) you can access such a name, provided that the "string" that represents that name be properly encoded as requested by the code page 932, that is "\x93\xfa\x96\x7b\x8c\xea". So, if you have a generic name of a file (along with its path) as a Unicode string $u (for example UTF-8 encoded) and you want to try to save it with that name under Windows, you must first check the current locale calling setlocale(LC_CTYPE, 0) to retrieve the current code page, then you must convert $u to an array of bytes according to the code page; if one or more code points have no counterpart in the current code page, the file cannot be saved with that name from PHP. Dot. To complicate the implementation of such an algorithm, neither mbstring nor iconv are aware of all the Windows code pages, so you must write these conversion routines by yourself. This is just what I have done experimentally under PHP, and it appears to work nicely (http://www.icosaedro.it/phplint/libraries.cgi?lib=stdlib/it/icosaedro/io/FileName.html). Hopefully some day something similar will be available in PHP core lib., or some other abstraction layer of classes may provide full access to the Unicode realm. References: http://en.wikipedia.org/wiki/Windows_code_page http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/ Previous Comments: ------------------------------------------------------------------------ [2011-09-23 03:02:09] xd-yang at qq dot com Since basename() is locale aware, why not move_uploaded_file()? A common remedial measure is to use iconv() to explicitly convert the destination filename encoding usually from utf-8 to ansi(like gb2312). But this becomes complicated and unreachable in a multilingual CMS, like wordpress. Can this issue be solved in the future? ------------------------------------------------------------------------ [2009-02-26 09:46:51] mm107137 at spamcorptastic dot com I have the same problem under debian host (ovh hoster). Filename with french accents passed to move_upload_file are destroyed. There's no problems if filename is not passed as utf8. Very annoying ------------------------------------------------------------------------ [2009-02-06 20:21:49] mindfreakthemon at gmail dot com And on Windows 7 and Vista under Apache 2.2 that bug exists too. ------------------------------------------------------------------------ [2009-01-14 09:26:41] nuabaranda at web dot de Description: ------------ Files with filenames containing non-ascii characters like german umlauts get destroyed when saved with move_uploaded_file(). The UTF-8 special characters get translated byte-wise into CP1251 characters when determining the Windows filenames thus destroying the original special characters. ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=47096&edit=1