On Wed, Apr 29, 2009 at 12:22 PM, Mark Mentovai <m...@chromium.org> wrote:

> I understand your problem.  You're saying "I have user-supplied data
> that I want to build a filename from," and "I have this pathname that
> I want to display back to the user."  I agree that it would be good to

have a way to handle these cases in base.  I don't know if FilePath
> proper is the right place to do it.  If we do it in FilePath, it still
> won't really be right.


OK, so it sounds like you're telling me not to use FilePath to represent
file paths from a disk for my purposes because they can't ever be converted
reliably to a particular encoding on Linux (which is a requirement for me,
because of the third party libraries that require a particular encoding).

That's fine, but what do I do instead?  Roll my own FilePath clone that has
some encoding assumptions?  I can do that, but it has the same issues as the
ones you're worried about with FilePath, so it seems better to solve the
issue in one place rather than have two versions that are both insufficient.
 Man, it would be better if FilePath could reliably know its encoding!  (I
realize that Linux makes this impossible, it just seems like it would be
better that way. :-)

Since Linux is the only platform where the encoding is unclear, what if we
did the best we could on Linux:

When constructing a FilePath from a char* string on Linux:
- Test the input string for values > 127 to determine if it's really just
ASCII (and if so, we're out of the woods).
- Then check LANG, LC_CTYPE, LC_ALL (through appropriate Linux APIs) for an
encoding that we can support, and note the encoding for later if we are
requested to do a conversion.
- If we run into an invalid sequence during a conversion, or an encoding we
can't convert from, then use a CHECK to crash.

This should work on most filenames, in almost all situations -- I'll bet
most filenames are ASCII, even on foreign systems, and the ones that aren't
ASCII have set LANG to something in /etc/profile, so all filenames created
by any app running on that machine should match that encoding.

Where they don't do that correctly, they're already getting garbage (and
should expect garbage) from any application they use, not just Chrome, since
there is no way *any *app can decode a path with multiple encodings in it,
or where the encoding is different than LANG (or LC_*) says it is.

Chrome already crashes like this when it encounters situations where it's
just impossible to know what's right, so it's consistent with Chrome's
behavior in other areas.


> it should be the caller's responsibility to only deal with user-created
> names with
> this interface.


What do you mean here?  Isn't that the case now with FilePath?  (It's the
file_util routines that actually read the filesystem and make FilePaths out
of them, afterall).  As for your suggestion to only deal with path
components, how would you propose to parse user-supplied paths into one of
these?


> > 2) I'd like to make it possible to instantiate a POSIX FilePath object on
> > Windows and a Windows FilePath on POSIX platforms.  This is because some
> > libraries (e.g. the zip library, or tar files), use POSIX semantics for
> > their paths even on Windows (I haven't seen a use case for Windows paths
> on
> > POSIX yet, actually).   This would make it possible to use the nice API
> that
> > FilePath has to manipulate paths appropriately for these other libraries.
> > This could be easily accomplished by having POSIX and Windows versions of
> > FilePath, and then typedef'ing FilePath differently on different
> platforms
> > to one of these versions.
>
> Sounds pretty Pythonic.
>
> FilePath already sort of has some support for this - it does a bunch
> of things based on feature macros, mostly so that as I was writing it,
> I could test the Windows semantics without having to (shudder) resort
> to running on Windows.  These could probably be adapted to do what
> you're asking.


Cool.


> > 3) It would be helpful to have real path normalization for each of the
> > platforms (although I know what a testing nightmare that can be).  I
> might
> > try and tackle this if people think it would be beneficial.
>
> It's also a specification and implementation nightmare.  Everyone has
> a different idea of what "normalization" means.  What's your idea?


Yes, I know it's a nightmare all around, but I think it would be useful to
have something that addresses this.  My idea would be the same as Python's
os.path.normpath, mainly because it's a well-tested, seasoned example with
test cases.  Windows also has a routine for this (PathCanonicalize) that
could be used (but I know it doesn't work for UNC paths).

> 4) Make sure we handle case sensitivity vs case preservation correctly.
> > It's unclear to me that FilePath does this correctly on the Mac -- Mac
> file
> > names are case preserving, but case insensitive, Unix filenames are both
> > (and windows filenames are neither :-).
>
> Again with the normalization.  What do you want this stuff for?
> What's your idea of how this should work?


Probably the same as os.path.normcase in Python.  I want this stuff so that
I can make sure that I can at least semi-reliably compare/manipulate
FilePaths to do things like absolute->relative path conversion, or store
FilePaths in a set or map and be sure I don't have multiple entries pointing
to the same file.  Without these kinds of operations, doing these things is
pretty much impossible.


> Remember: FilePath is specified to be light and to never touch the
> disk.  If you've got a disk-touching operation, it probably doesn't
> belong in FilePath proper.


I'm OK with that -- it makes sense to keep the file system ops and FilePath
separate.

-Greg.

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

Reply via email to