And BTW, OpenSolaris userland still have just one C locale... :-(

On Thu, 2006-11-30 at 13:45 -0800, Ienup Sung wrote:
> Yes, we have numerous locales with different codesets. Solaris 10,
> as an example, we have 165 locales with 23 different codesets.
> In many cases, codesets use quite similar representation forms and yet
> the mappings between the code point values and actual characters/glyphs
> are quite different.
> 
> Underlying file systems also have various ways of depositing characters
> althought many new file systems are converging to Unicode. (Even then, among
> those rather new file systems that use Unicode, they use sometimes
> different Unicode encodings not entirely compatible with others byte by
> byte.)
> 
> To solve the problem of not correctly showing non-ASCII characters and yet
> keeping the maximum compatibility with existing applications and also
> numerous locales and codesets it appears that either we tag codeset for
> each file or adopt Unicode, in particular, UTF-8, as the file system codeset
> as the one thing and then add/doing transparent codeset conversion as
> the other. These two could go together or separately supported too.
> 
> Re the file name length, having a big enough one will obviously
> help as long as there is a clear way to keep the backward compatibility
> and also with minimal breakage. Sticking to the current user land
> length definitions is also another way, i.e., no change regarding
> the length for the existing (traditional) file systems.
> 
> Ienup
> 
> Joerg Schilling wrote at 11/30/06 12:59:
> > Ienup Sung <[EMAIL PROTECTED]> wrote:
> > 
> > 
> >>I think distingushing between UTF-8 and ISO8859-? codesets by
> >>examining byte values or patterns used in file names is quite difficult
> >>and not always possible. I'd be interested to hear from you on
> >>what would be the best way of achieving that.
> > 
> > 
> > I did not think about other codings but only about ISO-8859-1 as it is 
> > the most popular single byte coding. I did not yet think about the idea 
> > completely. But there is another idea to (mostly) deal with the problem.
> > See below....
> > 
> > 
> > 
> >>PS. BTW, I think we have about three (or perhaps more) file name length
> >>restrictions or constraints and then various problems stem out from
> >>any possible combinations out of the three:
> >>
> >>- Different locales/codesets use different number of bytes to
> >>   represent the same characters.
> >>- Multiple user land side max filename length definitions.
> >>- Multiple per file system max filename length definitions.
> > 
> > 
> > In order to avoid unneeded problems, I recommend to change MAXNAMELEN in
> > usr/src/uts/common/fs/lookup.c (and maybe a few other files) to 1024.
> > This would already allow to use hsfs with Joliet without limitations.
> > If you like to test, use the undocumented mount option "jolietlong" and
> > a Joliet CD with very long file names. If you do not change lookup.c,
> > you will be able to see long file names (up to 330 bytes - 110 UCS-2 chars) 
> > but not to stat/open the files. If you change MAXNAMELEN, you may also 
> > use them.
> > 
> > 
> >>And I think we already have this problem of "mismatch" in terms of
> >>the number of bytes and the number of characters allowed in file names in
> >>various levels vertically and horizontally.
> >>
> >>While people usually don't see the problem so often (since not that many
> >>people create and use really lengthy file names daily), I think the problem
> >>does exist in today and that's not just on traditional file systems
> >>such as UFS but also with rather new Unicode file systems such as NTFS,
> >>HFS+, UDF, and so on since, for instance, NTFS allows 255 16-bit units for
> >>a filename and it can be translated into 255 UTF-16 characters or
> >>127 UTF-16 characters. The similar for UDF; it could be 127 or 254
> >>characters depending on what is the compression id used with.
> > 
> > 
> > If NTFS allows 255 UTF-2 chars, you need to set MAXPATHNAME to at least 765.
> > 
> > I did not check ZFS on disk structures, but on UFS MAXPATHNAME could be 
> > enhanced to 503 to allow longer UNICODE names. If MAXPATHNAME is 503, then
> > we could allow 251 ISO-8859-1 chars from the 8-bit range or 167 katakana
> > characters.
> > 
> > 
> > Jörg
> 
> _______________________________________________
> opensolaris-discuss mailing list
> opensolaris-discuss@opensolaris.org
> 
-- 
Erast

_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to