JDK 1.8.0 33/40, diacritics and file problems

Fabrizio Giudici Fri, 24 Apr 2015 16:39:43 -0700

Ok, I've run into many problems in the past with diacritics, as there weresome JDK problems, but I supposed they were all fixed today. But perhapsthere's something I'm not understanding.

I've several files with diacritics in their name, let's say e.g. "LaCathédrale Engloutie.m4a". A catalog contains their names, and it has beenprepared on Mac OS X, JDK 1.8.0_40 and saved with UTF-8 encoding. Thecatalog is read, of course specifying UTF-8 as encoding, on the RaspberryPI Rasbian with JDK 1.8.0_33. Everything is correct as I see the propercharacters in the UI and logfiles.

The problem arises when I try to open a file with diacritics (this doesn'thappen with all files with diacritics in their name, only with some): Iget an exception because the file name is not found (both with io andnio). Thanks to some suggestions, I made it work by passing the file namethrough Paths.get(Normalizer.normalize(path.toString(), NFD)). Thistransforms the initial encoding for the é from c3 a9 (doesn't work) to 65cc 81.

Now, first I don't understand why I have to take care of this. I'm awarethat different file systems use different encodings, but I supposed thatall the conversions were done by the JVM. BTW, both systems are configuredwith:


LC_ALL=en_US.UTF-8
LANG=en_US.UTF-8

The Java system properties are:

file.encoding: UTF-8
file.encoding.pkg: sun.io

sun.io.unicode.encoding: UnicodeLittle (ARM) sun.io.unicode.encoding:UnicodeBig (Mac)

sun.jnu.encoding: UTF-8

The files on the ARM were rsynced from the Mac. I'm not sure thatLC_ALL/LANG/whatever were already set when the rsync was performed.

If it's correct that I have to deal with it, is there any officialdocumentation I can reference? BTW, I'm not aware of why the NFDnormalisation is the one who works, and not one of the other three.


Thanks.



--
Fabrizio Giudici - Java Architect @ Tidalwave s.a.s.
"We make Java work. Everywhere."
http://tidalwave.it/fabrizio/blog - fabrizio.giud...@tidalwave.it

JDK 1.8.0 33/40, diacritics and file problems

Reply via email to