Hi, i wrote: > > Truncation nowadays has to take into respect that UTF-8 may > > consist of multiple bytes and should avoid to leave incomplete > > byte sequences. > > (Does the kernel have a function for this ?)
Jan Kara wrote: > Well, such truncation function would have to be specific to encoding the fs > uses. But the problem of truncating a string that may contain multi-byte UTF-8 characters is generic. Rock Ridge gives no clue about the character set used with the names. (libsofs can do via its SUSP protocol AAIP.) Nowadays most unixly systems use UTF-8 anyways. So if we truncate then we should avoid byte sequences which demand more bytes to follow if interpreted as UTF-8. > > The truncated names are not necessarily unique within the > > directory. > Well, true but is it worth the bother? I mean realistically, do people use > media with more than 255 characters in a file name or is it mostly a > theoretical concern? One can easily produce such names with genisoimage. libisofs refuses to produce more than 255 bytes name length. It depends on the local filesystems whether such names can be present in backup situations. Home user backup is my motivation to care for ISO 9660 and optical drives. So i had to implement qualified truncation in order to get the minimum fidelity needed for backups. I doubt anybody toggles 250+ bytes by hand. But in the three-byte UTF-8 range, we get to the limit with less than 90 characters. Also there may be automats with insane ideas about file naming. The problem is that there will be no method to access the second file of an identical name pair. One can study the behavior now with two names of length 254 which differ only by bytes near their end. The heavy truncation helps to create non-unique names. One could use libisofs, e.g. via xorriso, to copy such inaccessible files out of the ISO onto hard disk. (Provided my truncation method is as good as i hope.) The most simplistic way to get unique names would be mount(8) option "norock". Then you get to see Joliet names or ISO 9660 names of harmless length. But guessing the original name from an ISO 9660 name can then be an adventure of its own. The MD5 suffix of libisofs would allow to compute the truncated name from the known original name. Have a nice day :) Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

