On Mon, 08 Oct 2007 15:44:48 +1000, Nicholas Miell <[EMAIL PROTECTED]> wrote:

On Mon, 2007-10-08 at 15:07 +1000, Barry Naujok wrote:
On Sat, 06 Oct 2007 04:52:18 +1000, Nicholas Miell <[EMAIL PROTECTED]>
wrote:

> On Fri, 2007-10-05 at 16:44 +0100, Christoph Hellwig wrote:
>> [Adding -fsdevel because some of the things touched here might be of
>>  broader interest and Urban because his name is on nls_utf8.c]
>>
>> On Fri, Oct 05, 2007 at 11:57:54AM +1000, Barry Naujok wrote:
>> >
>> > On it's own, linux only provides case conversion for old-style
>> > character sets - 8 bit sequences only. A lot of distos are
>> > now defaulting to UTF-8 and Linux NLS stuff does not support
>> > case conversion for any unicode sets.
>>
>> The lack of case tables in nls_utf8.c defintively seems odd to me.
>> Urban, is there a reason for that?  The only thing that comes to
>> mind is that these tables might be quite large.
>>
>
> Case conversion in Unicode is locale dependent. The legacy 8-bit
> character encodings don't code for enough characters to run into the
> ambiguities, so they can get away with fixed case conversion tables.
> Unicode can't.

Based on http://www.unicode.org/reports/tr21/tr21-5.html and
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt

Doing case comparison using that table should cater for most
circumstances except a few exeptions. It should be enough
to satisfy a locale independant case-insensitive filesystem
(ie. the C + F case folding option).

Is normalization required after case-folding? What I read
implies it is not necessary for this purpose (and would
slow things down and bloat the code more).

Now I suppose, it's just a question of a fixed table in the
kernel driver (HFS+ style), or data stored in a special
inode on-disk (NTFS style, shared refcounted in memory
when the same). With the on-disk, the table can be generated
 from mkfs.xfs.

You also have to decide whether to screw over people who speak Turkic
languages and expect an 'I' to 'ı' mapping or everybody else who expect
an 'I' to 'i' mapping.

I have had a thought about this. If the case table is stored on-disk like
NTFS, then mkfs.xfs can specify whether to use Turkic I's or not.

That guarantees consistent case folding for the filesystem. mkfs.xfs can
default to a Turkic case table if the user's locale is tr/az and the
"default case table" if not. mkfs.xfs will have to highlight this setting
if the user specifies the generic case-insensitive option. mkfs.xfs
should also allow the user to specify which of the case tables to use.

Barry.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to