2010/8/11 Željko Marjanović <[email protected]>: > Is it possible to determine the character encoding the SSH/SFTP server is > using? I have read the protocol > specs for SFTP v3 and there is no mention of it, but in v4 default encoding > is UTF-8. Is it safe to assume > and use UTF-8 for default encoding?
Short answer, yes if connecting to machines running modern Unices. The reason the v3 spec didn't mandate UTF-8 for filenames is probably that some servers can't guarantee that. On Linux, for instance, you can give the file a name using an arbitrary encoding of your choice - it just stores a sequence of bytes [1][2]. When `ls` displays the contents of a directory, it decides how to decode the filenames based on the user's LANG environment variable. For instance, on my Ubuntu machine, this is en_GB.UTF-8 so all filename data is interpreted as UTF-8. If, by chance, an Arabic filename were encoded in MacArabic encoding, it would be garbled in the listing. This explains the problems encountered with a local `ls` but, of course, a remote listing over SFTP faces all the same issues; the filenames sent to the client can be a mix of UTF-8 and non-UTF-8. I have no idea how SFTP v4 expects servers to guarantee they supply UTF-8 when the server doesn't even know the encoding of its own filenames! In practice, however, modern Unices default to UTF-8 so it would be unusual to encounter a filename with a different encoding. My project assumes all filenames are UTF-8. A more correct solution would be to default to UTF-8 but provide the user with an option to specify a custom encoding. [1] http://serverfault.com/questions/82821/how-to-tell-the-language-encoding-of-a-filename-on-linux [2] http://www.linux.com/archive/feed/58689 HTH Alex -- Swish - Easy SFTP for Windows Explorer (http://www.swish-sftp.org) _______________________________________________ libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
