I thought Mac OS X has a standard normalization for unicode filenames. Linux just treats whatever it gets as bytes so it is up to the software creating the file. Am I correct?
(e.g. see: http://stackoverflow.com/questions/9757843/unicode-encoding-for-filesystem-in-mac-os-x-not-correct-in-python ) The point is on Linux, as you say, "it's a mess". I'm not sure there is anything that can be done by Java or your app that will work 100% of the time. Scott On Tue, Apr 28, 2015 at 9:11 AM, Mike Hearn <m...@plan99.net> wrote: > > > > They were rsynced from Mac OS X. > > > I said *original* app. Rsync is not the original app and most likely does > not attempt to re-encode or re-normalise Unicode strings. > > > > I feared that. In the end it might be even reasonably doable, if I can > > take advantage of some preconditions... for instance: is it safe to > assume > > that, given a specific instance of a filesystem, everything is > > encoded/normalised in the same way? > > > Probably not. Most software that handles unicode does not do code point > normalisation. Hence my emphasis on what app created the file name in the > first place. > > > >