Hi Everyone,

Like I'm sure many of you, I've been programming for a long time on various projects.

My typical platform is Windows although I do work on Linux machines as well (usually Debian flavor).

At my work, we've recently deployed ownCloud (7.0.2, community edition) and it's working well although we've run into some issues with unicode characters in filenames causing some issues with inserting the rows into the database.

The root issue appears to be the way PHP communicates with Windows, I don't know if this issue is affecting other operating systems as well.

(I have reported this as a bug at https://github.com/owncloud/core/issues/12112 but I'm perfectly happy to help with contributing to this project. Some of that information will be repeated here.)


Basically what happens is when PHP's functions look at the Windows file system and it contains a unicode character it will appear to PHP's mbstring to be encoded as UTF-8 but it is actually encoded (on US-EN anyway) as Windows-1252.

Now with this, we can get the correct codepage...

$target_encoding = "UTF-8";
$default_codepage = "UTF-8";

if ( 'WIN' === substr( PHP_OS, 0, 3 ) ) {
$codepage = 'Windows-' . trim( strstr( setlocale( LC_CTYPE, "" ), '.' ), '.' );
} else {
        $codepage = $default_codepage;
}

... and then convert it

$encoded_filename = mb_convert_encoding( $filename, $target_encoding, $codepage );



So my thought is to add to config.php a default codepage to use, initially filled in by the installer as UTF-8 or if on Windows, from the routine above.

There should also be codepage settings for each of the external storages (defaulting to the 'system one' for local/smb) just in case other file systems are in play, this would allow the admin to account for special or mixed environments.

The sync clients should probably communicate their local codepage as well just to ensure that it all translates properly (if needed). (I'll confess I haven't done any programming of WebDAV and don't know if any codepage translation occurs.)


Other notes and potential gotchas:

1. Folders should probably be codepage encoded too.

2. Unfortunately we will likely need to decode back to the codepage to open the file within PHP. (e.g. $decoded_filename = mb_convert_encoding( $encoded_filename, $codepage, $target_encoding); )

3. It's also worth noting that for MySQL 5.5.3+, utf8/utf8_bin is not sufficient for true UTF-8 support. It needs to be utf8mb4 with utf8mb4_bin_ci or utf8mb4_unicode_ci collation. (ref: http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html)


--
Lee Thompson
[email protected]

_______________________________________________
Devel mailing list
[email protected]
http://mailman.owncloud.org/mailman/listinfo/devel

Reply via email to