Ok, I've run into many problems in the past with diacritics, as there were some JDK problems, but I supposed they were all fixed today. But perhaps there's something I'm not understanding.

I've several files with diacritics in their name, let's say e.g. "La Cathédrale Engloutie.m4a". A catalog contains their names, and it has been prepared on Mac OS X, JDK 1.8.0_40 and saved with UTF-8 encoding. The catalog is read, of course specifying UTF-8 as encoding, on the Raspberry PI Rasbian with JDK 1.8.0_33. Everything is correct as I see the proper characters in the UI and logfiles.

The problem arises when I try to open a file with diacritics (this doesn't happen with all files with diacritics in their name, only with some): I get an exception because the file name is not found (both with io and nio). Thanks to some suggestions, I made it work by passing the file name through Paths.get(Normalizer.normalize(path.toString(), NFD)). This transforms the initial encoding for the é from c3 a9 (doesn't work) to 65 cc 81.

Now, first I don't understand why I have to take care of this. I'm aware that different file systems use different encodings, but I supposed that all the conversions were done by the JVM. BTW, both systems are configured with:

LC_ALL=en_US.UTF-8
LANG=en_US.UTF-8

The Java system properties are:

file.encoding: UTF-8
file.encoding.pkg: sun.io
sun.io.unicode.encoding: UnicodeLittle (ARM) sun.io.unicode.encoding: UnicodeBig (Mac)
sun.jnu.encoding: UTF-8

The files on the ARM were rsynced from the Mac. I'm not sure that LC_ALL/LANG/whatever were already set when the rsync was performed.

If it's correct that I have to deal with it, is there any official documentation I can reference? BTW, I'm not aware of why the NFD normalisation is the one who works, and not one of the other three.

Thanks.



--
Fabrizio Giudici - Java Architect @ Tidalwave s.a.s.
"We make Java work. Everywhere."
http://tidalwave.it/fabrizio/blog - fabrizio.giud...@tidalwave.it

Reply via email to