On Wed, Apr 29, 2009 at 12:22 PM, Mark Mentovai <m...@chromium.org> wrote:
> I understand your problem. You're saying "I have user-supplied data > that I want to build a filename from," and "I have this pathname that > I want to display back to the user." I agree that it would be good to have a way to handle these cases in base. I don't know if FilePath > proper is the right place to do it. If we do it in FilePath, it still > won't really be right. OK, so it sounds like you're telling me not to use FilePath to represent file paths from a disk for my purposes because they can't ever be converted reliably to a particular encoding on Linux (which is a requirement for me, because of the third party libraries that require a particular encoding). That's fine, but what do I do instead? Roll my own FilePath clone that has some encoding assumptions? I can do that, but it has the same issues as the ones you're worried about with FilePath, so it seems better to solve the issue in one place rather than have two versions that are both insufficient. Man, it would be better if FilePath could reliably know its encoding! (I realize that Linux makes this impossible, it just seems like it would be better that way. :-) Since Linux is the only platform where the encoding is unclear, what if we did the best we could on Linux: When constructing a FilePath from a char* string on Linux: - Test the input string for values > 127 to determine if it's really just ASCII (and if so, we're out of the woods). - Then check LANG, LC_CTYPE, LC_ALL (through appropriate Linux APIs) for an encoding that we can support, and note the encoding for later if we are requested to do a conversion. - If we run into an invalid sequence during a conversion, or an encoding we can't convert from, then use a CHECK to crash. This should work on most filenames, in almost all situations -- I'll bet most filenames are ASCII, even on foreign systems, and the ones that aren't ASCII have set LANG to something in /etc/profile, so all filenames created by any app running on that machine should match that encoding. Where they don't do that correctly, they're already getting garbage (and should expect garbage) from any application they use, not just Chrome, since there is no way *any *app can decode a path with multiple encodings in it, or where the encoding is different than LANG (or LC_*) says it is. Chrome already crashes like this when it encounters situations where it's just impossible to know what's right, so it's consistent with Chrome's behavior in other areas. > it should be the caller's responsibility to only deal with user-created > names with > this interface. What do you mean here? Isn't that the case now with FilePath? (It's the file_util routines that actually read the filesystem and make FilePaths out of them, afterall). As for your suggestion to only deal with path components, how would you propose to parse user-supplied paths into one of these? > > 2) I'd like to make it possible to instantiate a POSIX FilePath object on > > Windows and a Windows FilePath on POSIX platforms. This is because some > > libraries (e.g. the zip library, or tar files), use POSIX semantics for > > their paths even on Windows (I haven't seen a use case for Windows paths > on > > POSIX yet, actually). This would make it possible to use the nice API > that > > FilePath has to manipulate paths appropriately for these other libraries. > > This could be easily accomplished by having POSIX and Windows versions of > > FilePath, and then typedef'ing FilePath differently on different > platforms > > to one of these versions. > > Sounds pretty Pythonic. > > FilePath already sort of has some support for this - it does a bunch > of things based on feature macros, mostly so that as I was writing it, > I could test the Windows semantics without having to (shudder) resort > to running on Windows. These could probably be adapted to do what > you're asking. Cool. > > 3) It would be helpful to have real path normalization for each of the > > platforms (although I know what a testing nightmare that can be). I > might > > try and tackle this if people think it would be beneficial. > > It's also a specification and implementation nightmare. Everyone has > a different idea of what "normalization" means. What's your idea? Yes, I know it's a nightmare all around, but I think it would be useful to have something that addresses this. My idea would be the same as Python's os.path.normpath, mainly because it's a well-tested, seasoned example with test cases. Windows also has a routine for this (PathCanonicalize) that could be used (but I know it doesn't work for UNC paths). > 4) Make sure we handle case sensitivity vs case preservation correctly. > > It's unclear to me that FilePath does this correctly on the Mac -- Mac > file > > names are case preserving, but case insensitive, Unix filenames are both > > (and windows filenames are neither :-). > > Again with the normalization. What do you want this stuff for? > What's your idea of how this should work? Probably the same as os.path.normcase in Python. I want this stuff so that I can make sure that I can at least semi-reliably compare/manipulate FilePaths to do things like absolute->relative path conversion, or store FilePaths in a set or map and be sure I don't have multiple entries pointing to the same file. Without these kinds of operations, doing these things is pretty much impossible. > Remember: FilePath is specified to be light and to never touch the > disk. If you've got a disk-touching operation, it probably doesn't > belong in FilePath proper. I'm OK with that -- it makes sense to keep the file system ops and FilePath separate. -Greg. --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---