On Tue, Aug 18, 2009 at 15:20, Carl Mäsak<cma...@gmail.com> wrote:
> Leon (>):
>> Reading this discussion, I'm getting the feeling that filename
>> literals are increasingly getting magical, something that I don't
>> think is a good development. The only sane way to deal with filenames
>> is treating them as opaque binary strings, making any more assumptions
>> is bound to get you into trouble. I don't want to deal with Windows'
>> strange restrictions on characters when I'm working on Linux. I don't
>> want to deal with any other platform's particularities either.
>> Portability should be positive, not negative IMNSHO.

The whole reason filenames/paths is a mess to code if because they are treated
as binary strings in most cases. This is also why we have modules like
File::Spec and bunch more on CPAN all trying to do the same thing. And today if
I want to code something that works on all platforms I have to use that
instead. How can this be positive?

For me a Path literal is a way to get rid of all this bandage so we don't have
to bother with the strange restrictions later when we get a bug report from a
CPAN user. And there is nothing magical about it, no more so than if I ask for
the length of UTF8 string I expect get back the number of characters not the
number of bytes.

A path is a well defined size on all platforms and should be treated as such.
The main problems is that POSIX really never did cover this part too well. But
today we have Unicode and UTF8 and as such this is the de facto default on most
modern unix'es as most libraries and tools will write filenames in this format
if so defined in the locale.

Just writing binary data to a filename is bound to get you into trouble and you
will quickly find that many of the common C libraries will fail if locale and
filename does not match.

So even on Linux/Unix a path really not just any number of bytes with / as
delimiter. It depends on the locale and the encoding set for the file system
and not caring about that will get you into trouble.

But than again you always have the option of using p:unix{}, it's also a clear
way to signal you really don't care about portability and that this will only
work on Unix. Or you could even use Q{} as this pretty much will allow you to
anything.

>>
>> As for comparing paths: reimplementing logic that belongs to the
>> filesystem sounds like really Bad Idea™ to me. Two paths can't be
>> reliably compared without choosing to make some explicit assumptions,
>> and I don't think Perl should make such choices for the programmer.

Getting any kind of path's from user input will require you to reimplement that
logic if you care about validate data before throwing it at the file system.

If you buy that paths are well defined types, then comparing paths should not
require making any assumptions. We can compare Unicode string without making
assumptions.

>
> Very nicely put. We can't predict the future, but in creating
> something that'll at least persist through the next decade, let's not
> do elaborate things with lots of moving parts.
>
> Let's make a solid ground to stand on; something so stable that it
> works uphill and underwater. People with expertise and tuits will
> write the facilitating modules.
>
> <PerlJam> To quote Kernighan and Pike:  Simplicity. Clarity. Generality.
> <moritz_> I agree.
> <Matt-W> magic can always be added with module goodness
>

I completely agree we can't predict the future but we do have to make some sane
choices about how the default should work, who knows if UTF8 will still be hot
new thing in 10 years, but that's still the default assumption for much of Perl
6 if nothing else is known about the input we get.

And I totally agree path literals should not be magically, they should be well
defined and you should not suffer when using them because platform X or Y has
strange restrictions. But when finding the sane default we have to make
restrictions and POSIX's path is binary data, simply is to lax.

My idea about using the lowest common denominator for modern Unix and windows
was that we could get as much of Unicode in path names as possible without
breaking on modern platforms and as a way to get Simplicity, Clarity and
Generality into paths.

Because this will never be simple, clear or general:

  File::Spec->catfile(qw(.. ext Sys Syslog macros.all));

or any of the other example that we can find:

http://www.google.com/codesearch?hl=en&start=10&sa=N&q=FIle::Spec-%3Ecatfile

Regards Troels

Reply via email to