Hi Everyone,
Like I'm sure many of you, I've been programming for a long time on
various projects.
My typical platform is Windows although I do work on Linux machines as
well (usually Debian flavor).
At my work, we've recently deployed ownCloud (7.0.2, community edition)
and it's working well although we've run into some issues with unicode
characters in filenames causing some issues with inserting the rows into
the database.
The root issue appears to be the way PHP communicates with Windows, I
don't know if this issue is affecting other operating systems as well.
(I have reported this as a bug at
https://github.com/owncloud/core/issues/12112 but I'm perfectly happy to
help with contributing to this project. Some of that information will
be repeated here.)
Basically what happens is when PHP's functions look at the Windows file
system and it contains a unicode character it will appear to PHP's
mbstring to be encoded as UTF-8 but it is actually encoded (on US-EN
anyway) as Windows-1252.
Now with this, we can get the correct codepage...
$target_encoding = "UTF-8";
$default_codepage = "UTF-8";
if ( 'WIN' === substr( PHP_OS, 0, 3 ) ) {
$codepage = 'Windows-' . trim( strstr( setlocale( LC_CTYPE, "" ), '.'
), '.' );
} else {
$codepage = $default_codepage;
}
... and then convert it
$encoded_filename = mb_convert_encoding( $filename, $target_encoding,
$codepage );
So my thought is to add to config.php a default codepage to use,
initially filled in by the installer as UTF-8 or if on Windows, from the
routine above.
There should also be codepage settings for each of the external storages
(defaulting to the 'system one' for local/smb) just in case other file
systems are in play, this would allow the admin to account for special
or mixed environments.
The sync clients should probably communicate their local codepage as
well just to ensure that it all translates properly (if needed). (I'll
confess I haven't done any programming of WebDAV and don't know if any
codepage translation occurs.)
Other notes and potential gotchas:
1. Folders should probably be codepage encoded too.
2. Unfortunately we will likely need to decode back to the codepage to
open the file within PHP. (e.g. $decoded_filename =
mb_convert_encoding( $encoded_filename, $codepage, $target_encoding); )
3. It's also worth noting that for MySQL 5.5.3+, utf8/utf8_bin is not
sufficient for true UTF-8 support. It needs to be utf8mb4 with
utf8mb4_bin_ci or utf8mb4_unicode_ci collation. (ref:
http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html)
--
Lee Thompson
[email protected]
_______________________________________________
Devel mailing list
[email protected]
http://mailman.owncloud.org/mailman/listinfo/devel