Very interesting read, that opens a whole new can of worms. How should we behave when we actually read file names from the filesystem.
As for the path literal the newest revision of S32-setting-library should make most people happy as the default is OS independent and abstract. More strictness can be set with use flags or more verbose syntax, this should also make it easier to make portable programmes in Perl 6. So far I'm quite happy with the current result, way to go people :) But what should we do when reading path's from the filesystem is still a problem. We can go the old Perl 5 way of treating filenames as binary by default and then trying to convert it based on local encoding settings. But this just mean any sane program will have to do an explicit, decoding to a Unicode path or string. Like we do in Perl 5: my $file = readdir $dir; $decoded_file = eval { decode("utf8", $file, Encode::FB_CROAK); }; if($@) { # Try something else as this was clearly not utf8. } else { $file = $decoded_file; } But then again is this reasonable, on both Windows and MacOS X we know exactly what we get as the filesystem will tell us. Even FAT has an encoding attribute telling us what encoding the filesystem is in. And given that the OS actually refuses to write files that are not valid, it would be a safe bet that a Path can be decoded with that encoding. So the problem of knowing encoding really only exists on Unix/Linux. This is mainly because As POSIX does not care about encoding and most filesystems seem to follow. But who knows if future filesystems will still be so lax with input, the current trend of putting more database features in the filesystem might also bring some more input validation, and the future we might not have to deal with the insanity of multiple encodings. Apparently JFS today has the option of limiting file name encoding. http://lwn.net/Articles/71472/ Even without a filesystem restriction, on Linux/Unix we have a default encoding specified in the locale that most software will respect, so when I name a file "ÆØÅ" on my Ubuntu box all my programs will show it as such and not give me a garbled string. So even if we have no guaranty that file names are encoded in what the locale is set to, it's the best information we have. One could always argue that even if the filesystem restricts file name input, one still have the option of ignoring this as one encoded string of bytes will be valid under the rules of another encoding just with another meaning. But this file name will be wrong in all other programs, so why should it be correct or unspecified(as in just a stream of bytes) in Perl 6? My idea of working with file names would be that we default to locale or filesystem settings, but give the options of working with paths/file names as binary or a specific encoding. my $file = readdir $dir; # Default to locale settings. fx utf8 This will return a UTF8 encoded Path unless and if this fails, no decoding will be done and we return a binary Path. my $file = readdir $dir, :utf8; # Decodes as utf8 my $file = readdir $dir, :bin; # No decoding is done The whole reason for this is paths and filenames should not be special, it's just another form of user input, where we should have some sane default so it does what we expect. More reading on the topic: Python 3 problems: http://bugs.python.org/issue4006 Unicode handling in Linux: http://hektor.umcs.lublin.pl/~mikosmul/computing/articles/linux-unicode.html Regards Troels. On Wed, Aug 19, 2009 at 03:17, Timothy S. Nelson<wayl...@wayland.id.au> wrote: > See this link. > > http://archive.netbsd.se/?ml=perl6-language&a=2008-11&t=9170058 > > In particular, I thought Tom Christiansen's long message had some > relevant info about filename literals. > > :) > > > --------------------------------------------------------------------- > | Name: Tim Nelson | Because the Creator is, | > | E-mail: wayl...@wayland.id.au | I am | > --------------------------------------------------------------------- > > ----BEGIN GEEK CODE BLOCK---- > Version 3.12 > GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- PE(+) Y+>++ PGP->+++ > R(+) !tv b++ DI++++ D G+ e++>++++ h! y- > -----END GEEK CODE BLOCK----- > >