And BTW, OpenSolaris userland still have just one C locale... :-( On Thu, 2006-11-30 at 13:45 -0800, Ienup Sung wrote: > Yes, we have numerous locales with different codesets. Solaris 10, > as an example, we have 165 locales with 23 different codesets. > In many cases, codesets use quite similar representation forms and yet > the mappings between the code point values and actual characters/glyphs > are quite different. > > Underlying file systems also have various ways of depositing characters > althought many new file systems are converging to Unicode. (Even then, among > those rather new file systems that use Unicode, they use sometimes > different Unicode encodings not entirely compatible with others byte by > byte.) > > To solve the problem of not correctly showing non-ASCII characters and yet > keeping the maximum compatibility with existing applications and also > numerous locales and codesets it appears that either we tag codeset for > each file or adopt Unicode, in particular, UTF-8, as the file system codeset > as the one thing and then add/doing transparent codeset conversion as > the other. These two could go together or separately supported too. > > Re the file name length, having a big enough one will obviously > help as long as there is a clear way to keep the backward compatibility > and also with minimal breakage. Sticking to the current user land > length definitions is also another way, i.e., no change regarding > the length for the existing (traditional) file systems. > > Ienup > > Joerg Schilling wrote at 11/30/06 12:59: > > Ienup Sung <[EMAIL PROTECTED]> wrote: > > > > > >>I think distingushing between UTF-8 and ISO8859-? codesets by > >>examining byte values or patterns used in file names is quite difficult > >>and not always possible. I'd be interested to hear from you on > >>what would be the best way of achieving that. > > > > > > I did not think about other codings but only about ISO-8859-1 as it is > > the most popular single byte coding. I did not yet think about the idea > > completely. But there is another idea to (mostly) deal with the problem. > > See below.... > > > > > > > >>PS. BTW, I think we have about three (or perhaps more) file name length > >>restrictions or constraints and then various problems stem out from > >>any possible combinations out of the three: > >> > >>- Different locales/codesets use different number of bytes to > >> represent the same characters. > >>- Multiple user land side max filename length definitions. > >>- Multiple per file system max filename length definitions. > > > > > > In order to avoid unneeded problems, I recommend to change MAXNAMELEN in > > usr/src/uts/common/fs/lookup.c (and maybe a few other files) to 1024. > > This would already allow to use hsfs with Joliet without limitations. > > If you like to test, use the undocumented mount option "jolietlong" and > > a Joliet CD with very long file names. If you do not change lookup.c, > > you will be able to see long file names (up to 330 bytes - 110 UCS-2 chars) > > but not to stat/open the files. If you change MAXNAMELEN, you may also > > use them. > > > > > >>And I think we already have this problem of "mismatch" in terms of > >>the number of bytes and the number of characters allowed in file names in > >>various levels vertically and horizontally. > >> > >>While people usually don't see the problem so often (since not that many > >>people create and use really lengthy file names daily), I think the problem > >>does exist in today and that's not just on traditional file systems > >>such as UFS but also with rather new Unicode file systems such as NTFS, > >>HFS+, UDF, and so on since, for instance, NTFS allows 255 16-bit units for > >>a filename and it can be translated into 255 UTF-16 characters or > >>127 UTF-16 characters. The similar for UDF; it could be 127 or 254 > >>characters depending on what is the compression id used with. > > > > > > If NTFS allows 255 UTF-2 chars, you need to set MAXPATHNAME to at least 765. > > > > I did not check ZFS on disk structures, but on UFS MAXPATHNAME could be > > enhanced to 503 to allow longer UNICODE names. If MAXPATHNAME is 503, then > > we could allow 251 ISO-8859-1 chars from the 8-bit range or 167 katakana > > characters. > > > > > > Jörg > > _______________________________________________ > opensolaris-discuss mailing list > opensolaris-discuss@opensolaris.org > -- Erast
_______________________________________________ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org