* Mark Overmeer <[EMAIL PROTECTED]> [2008-12-04 16:50]:
> * Aristotle Pagaltzis ([EMAIL PROTECTED]) [081204 14:38]:
> > Furthermore, from the point of view of the OS, even treating file
> > names as opaque binary blobs is actually fine! Programs don’t
> > care after all. In fact, no problem shows up until the point
> > where you try to show filenames to a user; that is when the
> > headaches start, not any sooner.
>
> So, they start when
>   - you have users pick filenames (with Tk) for a graphical
>     applications. You have to know the right codeset to be able
>     to display them correctly.

Yes, but you can afford imperfection because presumably you know
which displayed filename corresponds to which stored octet
sequence, so even if the name displays incorrectly, you still
operate on the right file if the user picks it.

>   - you have XML-files with meta-data on files which are
>     being distributed. (I have a lot of those)

Use URI encoding unless you like a world of pain.

>   - when you start doing path manipulation on (UTF-16) "blob"s,
> and so forth. I have been fighting these problems for a long
> time, and they worry me more and more because we see Unicode being
> introduced on the OS-level. The mess is growing by the day.

And all we can do is to avoid making it even bigger. Because the
only ones in control here are the OS vendors, and they aren’t
solving it, only making it bigger. The only thing *we* can do is
not to erect obstacles that users will have to work around when
our abstractions invariably leak.

I am unconvinced that this problem actually yields to
abstraction. All the really hard problems in computing are the
ones that intersect with human culture – text in any form, and
dates and times. When computers deal with mathematical entities,
few problems are even hard, let alone insurmountable, you only
need to work at them long enough. Human concepts are not like
that, they are messy and inconsistent.

> > To that, the right solution is simply nt to roundtrip filenames
> > through the user interface; instead, keep both the original octet
> > sequence as well as the decoded version, and use the decoded
> > version in UI but refer back to the pristine original when the
> > user elects, via UI, to operate on that file.
>
> But now you simply say "decode it". But to be able to decode
> it, you must known in which charset it is in the first place.
> So: where do we start guessing? An educated guess at OS level,
> or on each user program again?

I am not advocating educated guesses. The mechanism would be
whatever interfaces the system provides. Unix does not have any,
so you can indeed only ever guess, but if they system can give
you something better, that should be used.

NTFS seems to say it’s all Unicode and comes back as either
CP1252 or UTF-16 depending on which API you use, so I guess you
could auto-decode those. But FAT is codepage-dependent, and I
don’t know if Windows has a good way of distinguishing when you
are getting what. So Windows seems marginally more consistent
than Unix, but possibly only apparently. (What happens if you zip
a file with random binary garbage for a name on Unix and then
unzip it on Windows?)

I have no idea what other systems do.

But there is no common denominator, so pretending there is one is
not going to help.

> > The higher-level problems like sorting names in a
> > locale-aware fashion will be solved by the CPAN collective
> > much better than any boil-the-ocean abstract interface design
> > that the Perl 6 cabal would produce – if indeed these are
> > real problems at all in practice.
>
> Why? Are CPAN programmers smarter than Perl6 Cabal people?

Of course! There are many more CPAN programmers than cabalists;
some of them are bound to have much greater expertise in some
relevant area of this problem than anyone in the cabal. Even
those who aren’t that smart will have direct access to and
specific knowledge of the system they are dealing with, that
the cabal may never even hear about.

> What I whould like to be designed is an object model for OS,
> processes directories, and files. We will not be able to solve
> all problems for each OS. Maybe people need to install
> additional CPAN modules to get smarter behavior. But I would
> really welcome it if platform independent coding is the default
> behavior, without need for File::Spec, Class::Path and such.

Ugh. I understand the desire, but it is very easy to get into
architecture astronautics. I think we should follow the DBI
approach and not try to provide a unified interface to system-
specific things like permissions and ownership: unify the most
general notions of filesystems but leave all the specifics to be
dealt with by user code in the concrete. That is the only place
where the amount of acceptable abstraction can be decided. Cf.
writing apps that run on all of PostgreSQL, MySQL and Oracle vs
those that take advantage of specific DBMS features: this is a
decision that the programmer has to make, it is not one we can
make on his behalf.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

Reply via email to