[chromium-dev] Re: Changes to FilePath?

2009-05-14 Thread Greg Spencer
On Wed, May 13, 2009 at 7:24 PM, Brett Wilson  wrote:

> You can't actually canonicalize a filename on Windows, so I think it's
> dangerous to write a component that claims to do it.


You can do it under controlled conditions, and especially if the file exists
on the disk already and is accessible.  For instance, if you don't try to
handle (non-deterministic) 8.3 names of files that don't exist yet/anymore
and NTFS mount points, I think you can fairly safely apply the "regular"
rules to canonicalize paths (and even if you applied the rules to those,
most of the time they would still work).  I would make sure that the class
only claims to canonicalize paths that it really knows it can do, of course.

Look, I know there are tough problems here, but why not TRY to solve them as
well as possible.  FilePath is fine for simple manipulations, and is a good,
lightweight container if you're not planning on doing anything complex with
the file names.  If you actually need to do more interesting things with
them, like display the names, convert to relative paths, compare them for
equality or pass them off to a third party in a particular encoding, it's
not sufficient.

I could write a half-assed implementation that kinda works if you don't
throw anything wonky at it.  I've got that now.  I want something more
bulletproof.  It can't be perfect because file paths are non-deterministic
on all three systems in not so obvious ways, but why should everyone who
needs more than FilePath have to climb that learning curve?  And we can only
give out information that is as good as we get from the OS -- if the OS
isn't able to present a filesystem that makes sense, we can only provide the
best gibberish we can get our hands on.

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Darin Fisher
I mean.. there's a registry setting or something that can be set to disable
it.-darin

On Wed, May 13, 2009 at 8:40 PM, Darin Fisher  wrote:

> FYI:  Don't use GetShortPathName.  It isn't supported on some Windows
> systems.  We had a significant number of users that could not use Firefox
> until we stopped using it.
> -Darin
>
>
> On Wed, May 13, 2009 at 7:29 PM, Brett Wilson  wrote:
>
>>
>> On Wed, May 13, 2009 at 7:24 PM, Brett Wilson 
>> wrote:
>> > On Wed, May 13, 2009 at 6:12 PM, Greg Spencer 
>> wrote:
>> >> On Wed, May 13, 2009 at 4:07 PM, Brett Wilson 
>> wrote:
>> >>>
>> >>> On Wed, May 13, 2009 at 3:51 PM, Amanda Walker 
>> >>> wrote:
>> >>> >
>> >>> > Perhaps what we need is a companion to FilePath.  For example:
>> >>> >
>> >>> > FilePath: much as it is now, lightweight, "alternative to string
>> >>> > manipulation".
>> >>> > FileReference: heavierweight, can talk to the file system and have
>> >>> > carnal knowledge of platform specifics for things like resolving /
>> >>> > canonicalizing pathnames, determining whether or not they refer to
>> the
>> >>> > same files, generating C strings that can be passed to 3rd party
>> >>> > libraries, etc.
>> >>>
>> >>> I think this is very dangerous.
>> >>>
>> >>> I think Greg should not be talking to the filesystem when inserting
>> >>> filenames into a set. We don't allow filesystem access from the UI
>> >>> thread of Chrome, and I think other parts of our system should also
>> >>> not do filesystem access on their critical threads, especially if they
>> >>> want to be more part of Chrome in the future.
>> >>
>> >> Well, so the use I have for this in O3D at the moment is in our
>> importer,
>> >> which currently is a separate command-line tool that reads Collada
>> files and
>> >> writes out our wire format for geometry.  So it isn't meant to be
>> occuring
>> >> in a UI thread, but I could see times when it might be useful to know
>> for
>> >> sure if two files reference the same file in the UI thread (dragging
>> and
>> >> dropping a file onto a drop zone, for instance).
>> >> I do need to know if I have the same file more than once in a set
>> because
>> >> the COLLADA file might reference the same texture multiple times, or
>> (more
>> >> dangerous) it might reference a file that is one file on Windows,
>> >> but (incorrectly) maps to two different files in the (Unix-path-format)
>> .tgz
>> >> files.  To detect that, I need canonicalization.
>> >
>> > You can't actually canonicalize a filename on Windows, so I think it's
>> > dangerous to write a component that claims to do it.
>>
>> I guess you could call GetShortPathName every time you see a name. But
>> I think that's a crazy solution. I still think you should do my
>> suggestion below.
>>
>>
>> > I think you just need to come up with some simple rules that makes it
>> > work most of the time. Personally I would do ASCII lowercasing and
>> > stop worrying about it. If you use ICU to lower-case "correctly,"
>> > Windows won't necessarily agree and you won't be able to use that
>> > file.
>>
>> >>
>>
>

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Darin Fisher
FYI:  Don't use GetShortPathName.  It isn't supported on some Windows
systems.  We had a significant number of users that could not use Firefox
until we stopped using it.
-Darin


On Wed, May 13, 2009 at 7:29 PM, Brett Wilson  wrote:

>
> On Wed, May 13, 2009 at 7:24 PM, Brett Wilson  wrote:
> > On Wed, May 13, 2009 at 6:12 PM, Greg Spencer 
> wrote:
> >> On Wed, May 13, 2009 at 4:07 PM, Brett Wilson 
> wrote:
> >>>
> >>> On Wed, May 13, 2009 at 3:51 PM, Amanda Walker 
> >>> wrote:
> >>> >
> >>> > Perhaps what we need is a companion to FilePath.  For example:
> >>> >
> >>> > FilePath: much as it is now, lightweight, "alternative to string
> >>> > manipulation".
> >>> > FileReference: heavierweight, can talk to the file system and have
> >>> > carnal knowledge of platform specifics for things like resolving /
> >>> > canonicalizing pathnames, determining whether or not they refer to
> the
> >>> > same files, generating C strings that can be passed to 3rd party
> >>> > libraries, etc.
> >>>
> >>> I think this is very dangerous.
> >>>
> >>> I think Greg should not be talking to the filesystem when inserting
> >>> filenames into a set. We don't allow filesystem access from the UI
> >>> thread of Chrome, and I think other parts of our system should also
> >>> not do filesystem access on their critical threads, especially if they
> >>> want to be more part of Chrome in the future.
> >>
> >> Well, so the use I have for this in O3D at the moment is in our
> importer,
> >> which currently is a separate command-line tool that reads Collada files
> and
> >> writes out our wire format for geometry.  So it isn't meant to be
> occuring
> >> in a UI thread, but I could see times when it might be useful to know
> for
> >> sure if two files reference the same file in the UI thread (dragging and
> >> dropping a file onto a drop zone, for instance).
> >> I do need to know if I have the same file more than once in a set
> because
> >> the COLLADA file might reference the same texture multiple times, or
> (more
> >> dangerous) it might reference a file that is one file on Windows,
> >> but (incorrectly) maps to two different files in the (Unix-path-format)
> .tgz
> >> files.  To detect that, I need canonicalization.
> >
> > You can't actually canonicalize a filename on Windows, so I think it's
> > dangerous to write a component that claims to do it.
>
> I guess you could call GetShortPathName every time you see a name. But
> I think that's a crazy solution. I still think you should do my
> suggestion below.
>
>
> > I think you just need to come up with some simple rules that makes it
> > work most of the time. Personally I would do ASCII lowercasing and
> > stop worrying about it. If you use ICU to lower-case "correctly,"
> > Windows won't necessarily agree and you won't be able to use that
> > file.
>
> >
>

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Brett Wilson

On Wed, May 13, 2009 at 7:24 PM, Brett Wilson  wrote:
> On Wed, May 13, 2009 at 6:12 PM, Greg Spencer  wrote:
>> On Wed, May 13, 2009 at 4:07 PM, Brett Wilson  wrote:
>>>
>>> On Wed, May 13, 2009 at 3:51 PM, Amanda Walker 
>>> wrote:
>>> >
>>> > Perhaps what we need is a companion to FilePath.  For example:
>>> >
>>> > FilePath: much as it is now, lightweight, "alternative to string
>>> > manipulation".
>>> > FileReference: heavierweight, can talk to the file system and have
>>> > carnal knowledge of platform specifics for things like resolving /
>>> > canonicalizing pathnames, determining whether or not they refer to the
>>> > same files, generating C strings that can be passed to 3rd party
>>> > libraries, etc.
>>>
>>> I think this is very dangerous.
>>>
>>> I think Greg should not be talking to the filesystem when inserting
>>> filenames into a set. We don't allow filesystem access from the UI
>>> thread of Chrome, and I think other parts of our system should also
>>> not do filesystem access on their critical threads, especially if they
>>> want to be more part of Chrome in the future.
>>
>> Well, so the use I have for this in O3D at the moment is in our importer,
>> which currently is a separate command-line tool that reads Collada files and
>> writes out our wire format for geometry.  So it isn't meant to be occuring
>> in a UI thread, but I could see times when it might be useful to know for
>> sure if two files reference the same file in the UI thread (dragging and
>> dropping a file onto a drop zone, for instance).
>> I do need to know if I have the same file more than once in a set because
>> the COLLADA file might reference the same texture multiple times, or (more
>> dangerous) it might reference a file that is one file on Windows,
>> but (incorrectly) maps to two different files in the (Unix-path-format) .tgz
>> files.  To detect that, I need canonicalization.
>
> You can't actually canonicalize a filename on Windows, so I think it's
> dangerous to write a component that claims to do it.

I guess you could call GetShortPathName every time you see a name. But
I think that's a crazy solution. I still think you should do my
suggestion below.


> I think you just need to come up with some simple rules that makes it
> work most of the time. Personally I would do ASCII lowercasing and
> stop worrying about it. If you use ICU to lower-case "correctly,"
> Windows won't necessarily agree and you won't be able to use that
> file.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Brett Wilson

On Wed, May 13, 2009 at 6:12 PM, Greg Spencer  wrote:
> On Wed, May 13, 2009 at 4:07 PM, Brett Wilson  wrote:
>>
>> On Wed, May 13, 2009 at 3:51 PM, Amanda Walker 
>> wrote:
>> >
>> > Perhaps what we need is a companion to FilePath.  For example:
>> >
>> > FilePath: much as it is now, lightweight, "alternative to string
>> > manipulation".
>> > FileReference: heavierweight, can talk to the file system and have
>> > carnal knowledge of platform specifics for things like resolving /
>> > canonicalizing pathnames, determining whether or not they refer to the
>> > same files, generating C strings that can be passed to 3rd party
>> > libraries, etc.
>>
>> I think this is very dangerous.
>>
>> I think Greg should not be talking to the filesystem when inserting
>> filenames into a set. We don't allow filesystem access from the UI
>> thread of Chrome, and I think other parts of our system should also
>> not do filesystem access on their critical threads, especially if they
>> want to be more part of Chrome in the future.
>
> Well, so the use I have for this in O3D at the moment is in our importer,
> which currently is a separate command-line tool that reads Collada files and
> writes out our wire format for geometry.  So it isn't meant to be occuring
> in a UI thread, but I could see times when it might be useful to know for
> sure if two files reference the same file in the UI thread (dragging and
> dropping a file onto a drop zone, for instance).
> I do need to know if I have the same file more than once in a set because
> the COLLADA file might reference the same texture multiple times, or (more
> dangerous) it might reference a file that is one file on Windows,
> but (incorrectly) maps to two different files in the (Unix-path-format) .tgz
> files.  To detect that, I need canonicalization.

You can't actually canonicalize a filename on Windows, so I think it's
dangerous to write a component that claims to do it.

I think you just need to come up with some simple rules that makes it
work most of the time. Personally I would do ASCII lowercasing and
stop worrying about it. If you use ICU to lower-case "correctly,"
Windows won't necessarily agree and you won't be able to use that
file.

Brett

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Greg Spencer
On Wed, May 13, 2009 at 4:07 PM, Brett Wilson  wrote:

> On Wed, May 13, 2009 at 3:51 PM, Amanda Walker 
> wrote:
> >
> > Perhaps what we need is a companion to FilePath.  For example:
> >
> > FilePath: much as it is now, lightweight, "alternative to string
> manipulation".
> > FileReference: heavierweight, can talk to the file system and have
> > carnal knowledge of platform specifics for things like resolving /
> > canonicalizing pathnames, determining whether or not they refer to the
> > same files, generating C strings that can be passed to 3rd party
> > libraries, etc.
>
> I think this is very dangerous.
>
> I think Greg should not be talking to the filesystem when inserting
> filenames into a set. We don't allow filesystem access from the UI
> thread of Chrome, and I think other parts of our system should also
> not do filesystem access on their critical threads, especially if they
> want to be more part of Chrome in the future.


Well, so the use I have for this in O3D at the moment is in our importer,
which currently is a separate command-line tool that reads Collada files and
writes out our wire format for geometry.  So it isn't meant to be occuring
in a UI thread, but I could see times when it might be useful to know for
sure if two files reference the same file in the UI thread (dragging and
dropping a file onto a drop zone, for instance).

I do need to know if I have the same file more than once in a set because
the COLLADA file might reference the same texture multiple times, or (more
dangerous) it might reference a file that is one file on Windows,
but (incorrectly) maps to two different files in the (Unix-path-format) .tgz
files.  To detect that, I need canonicalization.

I also need to convert paths in the Collada file to relative paths in our
tgz files.  In order to do that, I need to be able to normalize the path to
the Collada file so I can normalize the paths to the referenced texture
files and strip off common base directories.

I'd really like to avoid the filesystem access too -- it's a real pain in
the ass to do, which is why it hasn't been done yet.  Currently, the user
has to tell me the string to strip off of the pathnames to make them
relative, and if files collide or split, then the output is just 2x bigger,
or just doesn't work.  I'd like to fix those things, but to do it right, I
need a better set of tools, and it seemed to me that if I was needing these
tools, then someone else could use them too.

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Greg Spencer
On Wed, May 13, 2009 at 4:35 PM, Darin Fisher  wrote:

> The "solution" is to not convert to UTF-16 unless you are trying to
> generate a string to display to the user.  Then you should use the LANG
> information to determine how best to render the text for display to the
> user.
>

Yeah, that would be nice, and I agree, but the reason I need it is that some
third party APIs (probably wrongly) take UTF16 to represent an input file in
their API.  So in order for the third party API to load the file properly, I
need a UTF16 version of the file path.  Also, in all of the O3D code, we
assume that strings are encoded in UTF8 (which is fine and correct for any
string except for filenames on Linux), so any string that might come from
the user would come in as UTF8, and I'd have to translate it into a FilePath
(somehow).


> I know this doesn't really help.  I think it is reasonable to have a
> utility somewhere to perform a conversion to UTF-16 (or UTF-8), but it
> should come with a stern warning, and I kind of prefer it not being a method
> on FilePath since I would prefer people not be tempted to overuse it.
>

Yeah, I think we've beat that to death: it won't be in FilePath.

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Brett Wilson

On Wed, May 13, 2009 at 4:34 PM, Amanda Walker  wrote:
> On Wed, May 13, 2009 at 7:07 PM, Brett Wilson  wrote:
>> On Wed, May 13, 2009 at 3:51 PM, Amanda Walker  wrote:
>>>
>>> Perhaps what we need is a companion to FilePath.  For example:
>>>
>>> FilePath: much as it is now, lightweight, "alternative to string 
>>> manipulation".
>>> FileReference: heavierweight, can talk to the file system and have
>>> carnal knowledge of platform specifics for things like resolving /
>>> canonicalizing pathnames, determining whether or not they refer to the
>>> same files, generating C strings that can be passed to 3rd party
>>> libraries, etc.
>>
>> I think this is very dangerous.
>>
>> I think Greg should not be talking to the filesystem when inserting
>> filenames into a set. We don't allow filesystem access from the UI
>> thread of Chrome, and I think other parts of our system should also
>> not do filesystem access on their critical threads, especially if they
>> want to be more part of Chrome in the future.
>
> But in context, he's passing these things to 3rd party libraries that
> will be doing plenty of file system access (importing and exporting
> data, for example).  That's why I was suggesting something separate
> from FilePath for such use.

Then he doesn't need canonicalization at all. He needs to know how the
third party library is going to use the string for filesystem access
and then do the corresponding transformations. That does not involve
filesystem access.

Brett

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Darin Fisher
On Wed, May 13, 2009 at 2:20 PM, Greg Spencer  wrote:

> On Wed, May 13, 2009 at 2:05 PM, Darin Fisher  wrote:
>>
>> That conversion is not defined.  If you are on Linux, the contents of the
>> file path is just an array of bytes.  It might be UTF-8, in which case you
>> can convert to UTF-16.  However, it may also be some crazy encoding or it
>> may not match any encoding.  This OS does not require it to match an
>> encoding.
>>
>> When we need to convert a FilePath to Unicode, we use the
>> SysWideToNativeMB and SysNativeMBToWide functions from base.  This works by
>> inspecting what the system thinks the current multi-byte encoding is.  On
>> Mac that is UTF-8.  On Linux, it depends on the value of $LANG.  Each time
>> we do such a conversion, we are introducing a potential bug in the product
>> (on Linux at least), so we try hard to avoid them.
>>
>
> Yes, I know that this is how it works (see earlier messages in this
> thread), but can you tell me if there are any Linux apps that manage to do
> this correctly (e.g. without having this bug), and how they do it?
>
> I can't see how any Linux app can do any better than looking at LANG and
> LC_CHAR and hoping that they're set correctly.  Certainly there's no way to
> decode a pathname that includes multiple encodings, and I have no idea what
> happens with NFS mounts between machines with different settings.
>
> I'm just saying why not just do as well as can be done by the best app out
> there, and punt after that?
>
> -Greg.
>


Sorry to repeat information.  This is a long thread!

The "solution" is to not convert to UTF-16 unless you are trying to generate
a string to display to the user.  Then you should use the LANG information
to determine how best to render the text for display to the user.

The program should try its best to preserve the file path in the original
form and not try to convert to UTF-16 and back again since that conversion
may be lossy.

I know this doesn't really help.  I think it is reasonable to have a utility
somewhere to perform a conversion to UTF-16 (or UTF-8), but it should come
with a stern warning, and I kind of prefer it not being a method on FilePath
since I would prefer people not be tempted to overuse it.

-Darin

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Amanda Walker

On Wed, May 13, 2009 at 7:07 PM, Brett Wilson  wrote:
> On Wed, May 13, 2009 at 3:51 PM, Amanda Walker  wrote:
>>
>> Perhaps what we need is a companion to FilePath.  For example:
>>
>> FilePath: much as it is now, lightweight, "alternative to string 
>> manipulation".
>> FileReference: heavierweight, can talk to the file system and have
>> carnal knowledge of platform specifics for things like resolving /
>> canonicalizing pathnames, determining whether or not they refer to the
>> same files, generating C strings that can be passed to 3rd party
>> libraries, etc.
>
> I think this is very dangerous.
>
> I think Greg should not be talking to the filesystem when inserting
> filenames into a set. We don't allow filesystem access from the UI
> thread of Chrome, and I think other parts of our system should also
> not do filesystem access on their critical threads, especially if they
> want to be more part of Chrome in the future.

But in context, he's passing these things to 3rd party libraries that
will be doing plenty of file system access (importing and exporting
data, for example).  That's why I was suggesting something separate
from FilePath for such use.

--Amanda

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Brett Wilson

On Wed, May 13, 2009 at 3:51 PM, Amanda Walker  wrote:
>
> Perhaps what we need is a companion to FilePath.  For example:
>
> FilePath: much as it is now, lightweight, "alternative to string 
> manipulation".
> FileReference: heavierweight, can talk to the file system and have
> carnal knowledge of platform specifics for things like resolving /
> canonicalizing pathnames, determining whether or not they refer to the
> same files, generating C strings that can be passed to 3rd party
> libraries, etc.

I think this is very dangerous.

I think Greg should not be talking to the filesystem when inserting
filenames into a set. We don't allow filesystem access from the UI
thread of Chrome, and I think other parts of our system should also
not do filesystem access on their critical threads, especially if they
want to be more part of Chrome in the future.

Brett

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Amanda Walker

Perhaps what we need is a companion to FilePath.  For example:

FilePath: much as it is now, lightweight, "alternative to string manipulation".
FileReference: heavierweight, can talk to the file system and have
carnal knowledge of platform specifics for things like resolving /
canonicalizing pathnames, determining whether or not they refer to the
same files, generating C strings that can be passed to 3rd party
libraries, etc.

--Amanda


On Wed, May 13, 2009 at 5:22 PM, Greg Spencer  wrote:
> On Wed, May 13, 2009 at 1:03 PM, Mark Mentovai  wrote:
>>
>> If you've got to take an arbitrary FilePath and convert it for display
>> to the user, or take an arbitrary string in a known encoding and
>> re-encode it for the filesystem, then we don't have anything in
>> FilePath for this.  I believe that if we do add something, it should
>> strictly operate only on single pathname components at a time, and not
>> entire pathnames.  We could add it to FilePath or we could add it
>> somewhere else, because it is sort of distinct from what FilePath is
>> really supposed to be, which is just a container for ferrying around
>> native paths.
>
>
> OK, I can see the allure of dealing in terms of lists of encoded strings so
> that you
> can encode them separately.   For my purposes, I need to get a string
> encoded as
> UTF16 (on Windows) or UTF8 (on other platforms) that represents a filename
> so that
> I can pass it to third party APIs, so it has to include the path separators.
>  But that
> can be done as a "join" operation when I get the string out.
>>
>> >> It's also a specification and implementation nightmare.  Everyone has
>> >> a different idea of what "normalization" means.  What's your idea?
>> >
>> > Yes, I know it's a nightmare all around, but I think it would be useful
>> > to
>> > have something that addresses this.  My idea would be the same as
>> > Python's
>> > os.path.normpath, mainly because it's a well-tested, seasoned example
>> > with
>> > test cases.  Windows also has a routine for this (PathCanonicalize) that
>> > could be used (but I know it doesn't work for UNC paths).
>>
>> Why would it be useful?  Do you want to compare paths for equality?
>
> Yes, for instance to be able to place them into a map or set and be sure I
> only have one
> entry for a particular file.  And I want to be able to do absolute to
> relative path conversions
> (as far as possible, anyhow).  And yes, I know that those are *really hard*
> to do properly,
> which argues even more for implementing one in a common library so that
> individual
> developers don't roll their own all the time, thinking that it is easy (and
> consequently
> producing buggy implementations).
>
>>
>> Then we should have an API that compares paths for equality.  It would
>> have to hit the disk to do so.  You might need general-purpose
>> canonization to implement that on some systems.  Great, you need to
>> hit the disk to do that too.  It's fine if you want these things, but
>> we can't put them into FilePath.  It's important that FilePath remain
>> lightweight and not make any system calls, because system calls can
>> block and FilePath is just a data carrier.
>
> Which is why I proposed in my last message not putting them into FilePath,
> since I can see
> that it is not your intention that it support anything that hits the
> filesystem (and I can see why
> you would want that).
>>
>> os.path.normpath is known to be buggy.  It might be well-tested and
>> seasoned, but only within the confines of its known limitations.
>> Watch this. [...]
>
> Yes, I'm aware that you can create situations (especially with symbolic
> links) where
> the same path conversions will succeed or fail depending on the filesystem
> contents.  This is why
> the class would have to have access to the filesystem.
>
>>
>> Again, it sounds like what you really want is a pathname comparator
>> that hits the disk.  You really can't do this stuff correctly on most
>> systems without talking to the filesystem.  You can't even do
>> general-purpose canonization without talking to the filesystem.
>
> Yep.  Totally agreed. (and normcase is probably not the behavior I'm looking
> for, you're right).
>
>>
>> Let me make clear: I'm not trying to shoot down the idea of needing to
>> be able to compare paths or even necessarily canonize them.  I'm
>> arguing primarily against doing it in FilePath, but I'm also also
>> trying to illustrate that doing proper comparisons and canonization is
>> harder than it seems, that even "seasoned and well-tested" APIs are
>> limited in ways that developers don't necessarily expect, and that the
>> semantics and expectations need to be well-defined.
>
> Very well illustrated, and I assure you that I'm well aware that it's a
> bitch to do right.
> -Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group

[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Scott Hess

This post made me think that we should have infrastructure so that
certain unit tests can opt to run in a restricted environment to
enforce that someone doesn't come along and add filesystem-access code
or other known-bad synchronous APIs.

I realize that that is probably hard, and that patches would be
welcome.  Just throwing it out there in hopes that someone says "Hey,
I know how to do that" and someone else says "Hey, do that".

-scott

[It could also be a rathole that only seems like a good idea until you
actually try it, like getting const-ness propagation thoroughly
correct.]


On Wed, May 13, 2009 at 1:03 PM, Mark Mentovai  wrote:
>
> If you've got a file that begins its life as something on-disk, and
> you just need to carry the path to it around, then that's fine, it
> should live its life as a FilePath.
>
> If you've got to create a file using some name where the name is some
> constant in code, use FilePath with ASCII constants.  AppendASCII
> exists to stick new ASCII components onto existing FilePaths.  This is
> fine and is considered safe because ASCII is a subset of any rational
> filesystem encoding.
>
> If you've got to take an arbitrary FilePath and convert it for display
> to the user, or take an arbitrary string in a known encoding and
> re-encode it for the filesystem, then we don't have anything in
> FilePath for this.  I believe that if we do add something, it should
> strictly operate only on single pathname components at a time, and not
> entire pathnames.  We could add it to FilePath or we could add it
> somewhere else, because it is sort of distinct from what FilePath is
> really supposed to be, which is just a container for ferrying around
> native paths.
>
>>> It's also a specification and implementation nightmare.  Everyone has
>>> a different idea of what "normalization" means.  What's your idea?
>>
>> Yes, I know it's a nightmare all around, but I think it would be useful to
>> have something that addresses this.  My idea would be the same as Python's
>> os.path.normpath, mainly because it's a well-tested, seasoned example with
>> test cases.  Windows also has a routine for this (PathCanonicalize) that
>> could be used (but I know it doesn't work for UNC paths).
>
> Why would it be useful?  Do you want to compare paths for equality?
> Then we should have an API that compares paths for equality.  It would
> have to hit the disk to do so.  You might need general-purpose
> canonization to implement that on some systems.  Great, you need to
> hit the disk to do that too.  It's fine if you want these things, but
> we can't put them into FilePath.  It's important that FilePath remain
> lightweight and not make any system calls, because system calls can
> block and FilePath is just a data carrier.
>
> os.path.normpath is known to be buggy.  It might be well-tested and
> seasoned, but only within the confines of its known limitations.
> Watch this.
>
> m...@anodizer bash$ ls -l a/b/../c
> -rw-r--r--  1 mark  staff  0 May 13 15:47 a/b/../c
> m...@anodizer bash$ python
> Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12)
> [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
 import os.path
 os.path.normpath('a/b/../c')
> 'a/c'
 ^D
> m...@anodizer bash$ ls -l a/c
> ls: a/c: No such file or directory
>
>> Probably the same as os.path.normcase in Python.  I want this stuff so that
>> I can make sure that I can at least semi-reliably compare/manipulate
>> FilePaths to do things like absolute->relative path conversion, or store
>> FilePaths in a set or map and be sure I don't have multiple entries pointing
>> to the same file.  Without these kinds of operations, doing these things is
>> pretty much impossible.
>
> I don't think os.path.normcase does what you're asking for either.
>
> m...@anodizer bash$ ls -lid /System/Library
> 81 drwxr-xr-x  64 root  wheel  2176 May 12 18:37 /System/Library
> m...@anodizer bash$ ls -lid /system/LIBRARY
> 81 drwxr-xr-x  64 root  wheel  2176 May 12 18:37 /system/LIBRARY
> m...@anodizer bash$ python
> Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12)
> [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
 import sys
 sys.platform
> 'darwin'
 import os.path
 os.path.normcase('/System/Library')
> '/System/Library'
 os.path.normcase('/system/LIBRARY')
> '/system/LIBRARY'
 ^D
>
> Even os.path.realpath returns the same results.
>
> Again, it sounds like what you really want is a pathname comparator
> that hits the disk.  You really can't do this stuff correctly on most
> systems without talking to the filesystem.  You can't even do
> general-purpose canonization without talking to the filesystem.
>
> Let me make clear: I'm not trying to shoot down the idea of needing to
> be able to compare paths or even necessarily canonize them.  I'm
> arguing primarily against doing it in FilePath, but I'm als

[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Greg Spencer
On Wed, May 13, 2009 at 2:05 PM, Darin Fisher  wrote:
>
> That conversion is not defined.  If you are on Linux, the contents of the
> file path is just an array of bytes.  It might be UTF-8, in which case you
> can convert to UTF-16.  However, it may also be some crazy encoding or it
> may not match any encoding.  This OS does not require it to match an
> encoding.
>
> When we need to convert a FilePath to Unicode, we use the SysWideToNativeMB
> and SysNativeMBToWide functions from base.  This works by inspecting what
> the system thinks the current multi-byte encoding is.  On Mac that is UTF-8.
>  On Linux, it depends on the value of $LANG.  Each time we do such a
> conversion, we are introducing a potential bug in the product (on Linux at
> least), so we try hard to avoid them.
>

Yes, I know that this is how it works (see earlier messages in this thread),
but can you tell me if there are any Linux apps that manage to do this
correctly (e.g. without having this bug), and how they do it?

I can't see how any Linux app can do any better than looking at LANG and
LC_CHAR and hoping that they're set correctly.  Certainly there's no way to
decode a pathname that includes multiple encodings, and I have no idea what
happens with NFS mounts between machines with different settings.

I'm just saying why not just do as well as can be done by the best app out
there, and punt after that?

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Greg Spencer
On Wed, May 13, 2009 at 1:03 PM, Mark Mentovai  wrote:

> If you've got to take an arbitrary FilePath and convert it for display
> to the user, or take an arbitrary string in a known encoding and
> re-encode it for the filesystem, then we don't have anything in
> FilePath for this.  I believe that if we do add something, it should
> strictly operate only on single pathname components at a time, and not
> entire pathnames.  We could add it to FilePath or we could add it
> somewhere else, because it is sort of distinct from what FilePath is
> really supposed to be, which is just a container for ferrying around
> native paths.


OK, I can see the allure of dealing in terms of lists of encoded strings so
that you
can encode them separately.   For my purposes, I need to get a string
encoded as
UTF16 (on Windows) or UTF8 (on other platforms) that represents a filename
so that
I can pass it to third party APIs, so it has to include the path separators.
 But that
can be done as a "join" operation when I get the string out.

>> It's also a specification and implementation nightmare.  Everyone has
> >> a different idea of what "normalization" means.  What's your idea?
> >
> > Yes, I know it's a nightmare all around, but I think it would be useful
> to
> > have something that addresses this.  My idea would be the same as
> Python's
> > os.path.normpath, mainly because it's a well-tested, seasoned example
> with
> > test cases.  Windows also has a routine for this (PathCanonicalize) that
> > could be used (but I know it doesn't work for UNC paths).
>
> Why would it be useful?  Do you want to compare paths for equality?


Yes, for instance to be able to place them into a map or set and be sure I
only have one
entry for a particular file.  And I want to be able to do absolute to
relative path conversions
(as far as possible, anyhow).  And yes, I know that those are *really hard*
to do properly,
which argues even more for implementing one in a common library so that
individual
developers don't roll their own all the time, thinking that it is easy (and
consequently
producing buggy implementations).


> Then we should have an API that compares paths for equality.  It would
> have to hit the disk to do so.  You might need general-purpose
> canonization to implement that on some systems.  Great, you need to
> hit the disk to do that too.  It's fine if you want these things, but
> we can't put them into FilePath.  It's important that FilePath remain
> lightweight and not make any system calls, because system calls can
> block and FilePath is just a data carrier.


Which is why I proposed in my last message not putting them into FilePath,
since I can see
that it is not your intention that it support anything that hits the
filesystem (and I can see why
you would want that).

os.path.normpath is known to be buggy.  It might be well-tested and
> seasoned, but only within the confines of its known limitations.
> Watch this. [...]


Yes, I'm aware that you can create situations (especially with symbolic
links) where
the same path conversions will succeed or fail depending on the filesystem
contents.  This is why
the class would have to have access to the filesystem.


> Again, it sounds like what you really want is a pathname comparator
> that hits the disk.  You really can't do this stuff correctly on most
> systems without talking to the filesystem.  You can't even do
> general-purpose canonization without talking to the filesystem.


Yep.  Totally agreed. (and normcase is probably not the behavior I'm looking
for, you're right).


> Let me make clear: I'm not trying to shoot down the idea of needing to
> be able to compare paths or even necessarily canonize them.  I'm
> arguing primarily against doing it in FilePath, but I'm also also
> trying to illustrate that doing proper comparisons and canonization is
> harder than it seems, that even "seasoned and well-tested" APIs are
> limited in ways that developers don't necessarily expect, and that the
> semantics and expectations need to be well-defined.


Very well illustrated, and I assure you that I'm well aware that it's a
bitch to do right.

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Darin Fisher
On Tue, Apr 28, 2009 at 2:47 PM, Greg Spencer  wrote:

> On Tue, Apr 28, 2009 at 2:41 PM, Amanda Walker wrote:
>
>>
>> On Tue, Apr 28, 2009 at 4:39 PM, Greg Spencer 
>> wrote:
>> > 1) I'd like to add some explicit routines for converting to/from UTF8
>> and
>> > UTF16.  While it's nice (and important) that FilePath uses the
>> platform's
>> > native string, we've found that many third party libraries have made
>> other
>> > assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t)
>> paths
>> > regardless of platform, and converting a FilePath to and from those
>> forms is
>> > a platform-dependent exercise which should be centralized into the class
>> > (i.e. adding "ToUTF8" and "ToWide" functions to the class, and explicit
>> > constructors that take each type).
>>
>> One thing many of us have found, across multiple projects, is that
>> wchar_t is fraught with complication as soon as more than one platform
>> is involved. "wchar_t == UTF16" is a Windowsism (gcc defaults to 4
>> bytes, for example, and L"mumble" gets stored in UCS-4, not UTF-16).
>> Chrome started with more or less what you are suggesting, and we moved
>> off of it after much pain.
>
>
> I understand those issues quite well (but I probably should call the
> conversion method ToUTF16, now that you mention it).  And char* isn't
> necessarily UTF8 on all platforms either.
>
> OK, so what's the currently recommended path for converting to UTF16 or
> UTF8 from a FilePath?
>


That conversion is not defined.  If you are on Linux, the contents of the
file path is just an array of bytes.  It might be UTF-8, in which case you
can convert to UTF-16.  However, it may also be some crazy encoding or it
may not match any encoding.  This OS does not require it to match an
encoding.

When we need to convert a FilePath to Unicode, we use the SysWideToNativeMB
and SysNativeMBToWide functions from base.  This works by inspecting what
the system thinks the current multi-byte encoding is.  On Mac that is UTF-8.
 On Linux, it depends on the value of $LANG.  Each time we do such a
conversion, we are introducing a potential bug in the product (on Linux at
least), so we try hard to avoid them.

-Darin

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Mark Mentovai

If you've got a file that begins its life as something on-disk, and
you just need to carry the path to it around, then that's fine, it
should live its life as a FilePath.

If you've got to create a file using some name where the name is some
constant in code, use FilePath with ASCII constants.  AppendASCII
exists to stick new ASCII components onto existing FilePaths.  This is
fine and is considered safe because ASCII is a subset of any rational
filesystem encoding.

If you've got to take an arbitrary FilePath and convert it for display
to the user, or take an arbitrary string in a known encoding and
re-encode it for the filesystem, then we don't have anything in
FilePath for this.  I believe that if we do add something, it should
strictly operate only on single pathname components at a time, and not
entire pathnames.  We could add it to FilePath or we could add it
somewhere else, because it is sort of distinct from what FilePath is
really supposed to be, which is just a container for ferrying around
native paths.

>> It's also a specification and implementation nightmare.  Everyone has
>> a different idea of what "normalization" means.  What's your idea?
>
> Yes, I know it's a nightmare all around, but I think it would be useful to
> have something that addresses this.  My idea would be the same as Python's
> os.path.normpath, mainly because it's a well-tested, seasoned example with
> test cases.  Windows also has a routine for this (PathCanonicalize) that
> could be used (but I know it doesn't work for UNC paths).

Why would it be useful?  Do you want to compare paths for equality?
Then we should have an API that compares paths for equality.  It would
have to hit the disk to do so.  You might need general-purpose
canonization to implement that on some systems.  Great, you need to
hit the disk to do that too.  It's fine if you want these things, but
we can't put them into FilePath.  It's important that FilePath remain
lightweight and not make any system calls, because system calls can
block and FilePath is just a data carrier.

os.path.normpath is known to be buggy.  It might be well-tested and
seasoned, but only within the confines of its known limitations.
Watch this.

m...@anodizer bash$ ls -l a/b/../c
-rw-r--r--  1 mark  staff  0 May 13 15:47 a/b/../c
m...@anodizer bash$ python
Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os.path
>>> os.path.normpath('a/b/../c')
'a/c'
>>> ^D
m...@anodizer bash$ ls -l a/c
ls: a/c: No such file or directory

> Probably the same as os.path.normcase in Python.  I want this stuff so that
> I can make sure that I can at least semi-reliably compare/manipulate
> FilePaths to do things like absolute->relative path conversion, or store
> FilePaths in a set or map and be sure I don't have multiple entries pointing
> to the same file.  Without these kinds of operations, doing these things is
> pretty much impossible.

I don't think os.path.normcase does what you're asking for either.

m...@anodizer bash$ ls -lid /System/Library
81 drwxr-xr-x  64 root  wheel  2176 May 12 18:37 /System/Library
m...@anodizer bash$ ls -lid /system/LIBRARY
81 drwxr-xr-x  64 root  wheel  2176 May 12 18:37 /system/LIBRARY
m...@anodizer bash$ python
Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.platform
'darwin'
>>> import os.path
>>> os.path.normcase('/System/Library')
'/System/Library'
>>> os.path.normcase('/system/LIBRARY')
'/system/LIBRARY'
>>> ^D

Even os.path.realpath returns the same results.

Again, it sounds like what you really want is a pathname comparator
that hits the disk.  You really can't do this stuff correctly on most
systems without talking to the filesystem.  You can't even do
general-purpose canonization without talking to the filesystem.

Let me make clear: I'm not trying to shoot down the idea of needing to
be able to compare paths or even necessarily canonize them.  I'm
arguing primarily against doing it in FilePath, but I'm also also
trying to illustrate that doing proper comparisons and canonization is
harder than it seems, that even "seasoned and well-tested" APIs are
limited in ways that developers don't necessarily expect, and that the
semantics and expectations need to be well-defined.

Mark

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-05-13 Thread Greg Spencer
(ping)
So, I had another idea.  How about a separate file path manipulation class
that has a well defined character encoding, so that we can do filename
manipulations like with FilePath (and a few more).  It could convert from a
FilePath if given an encoding, and convert back to a FilePath with the
platform's default encoding (using LC_*/LANG on Linux, falling back to
ASCII), or a given encoding.  It could touch the filesystem so that it could
know what ecoding methods and manipulations were valid for the
platform/drive combination.

Since it seems like this is not really something that Chromium needs or
wants right now (and it doesn't belong in base anyhow because of needing to
touch the filesystem), I think I'll work on this for O3D, and later you can
see if you want to use it for Chromium.

-Greg.

On Wed, Apr 29, 2009 at 3:58 PM, Greg Spencer  wrote:

> On Wed, Apr 29, 2009 at 12:22 PM, Mark Mentovai  wrote:
>
>> I understand your problem.  You're saying "I have user-supplied data
>> that I want to build a filename from," and "I have this pathname that
>> I want to display back to the user."  I agree that it would be good to
>
> have a way to handle these cases in base.  I don't know if FilePath
>> proper is the right place to do it.  If we do it in FilePath, it still
>> won't really be right.
>
>
> OK, so it sounds like you're telling me not to use FilePath to represent
> file paths from a disk for my purposes because they can't ever be converted
> reliably to a particular encoding on Linux (which is a requirement for me,
> because of the third party libraries that require a particular encoding).
>
> That's fine, but what do I do instead?  Roll my own FilePath clone that has
> some encoding assumptions?  I can do that, but it has the same issues as the
> ones you're worried about with FilePath, so it seems better to solve the
> issue in one place rather than have two versions that are both insufficient.
>  Man, it would be better if FilePath could reliably know its encoding!  (I
> realize that Linux makes this impossible, it just seems like it would be
> better that way. :-)
>
> Since Linux is the only platform where the encoding is unclear, what if we
> did the best we could on Linux:
>
> When constructing a FilePath from a char* string on Linux:
> - Test the input string for values > 127 to determine if it's really just
> ASCII (and if so, we're out of the woods).
> - Then check LANG, LC_CTYPE, LC_ALL (through appropriate Linux APIs) for an
> encoding that we can support, and note the encoding for later if we are
> requested to do a conversion.
> - If we run into an invalid sequence during a conversion, or an encoding we
> can't convert from, then use a CHECK to crash.
>
> This should work on most filenames, in almost all situations -- I'll bet
> most filenames are ASCII, even on foreign systems, and the ones that aren't
> ASCII have set LANG to something in /etc/profile, so all filenames created
> by any app running on that machine should match that encoding.
>
> Where they don't do that correctly, they're already getting garbage (and
> should expect garbage) from any application they use, not just Chrome, since
> there is no way *any *app can decode a path with multiple encodings in it,
> or where the encoding is different than LANG (or LC_*) says it is.
>
> Chrome already crashes like this when it encounters situations where it's
> just impossible to know what's right, so it's consistent with Chrome's
> behavior in other areas.
>
>
>> it should be the caller's responsibility to only deal with user-created
>> names with
>> this interface.
>
>
> What do you mean here?  Isn't that the case now with FilePath?  (It's the
> file_util routines that actually read the filesystem and make FilePaths out
> of them, afterall).  As for your suggestion to only deal with path
> components, how would you propose to parse user-supplied paths into one of
> these?
>
>
>> > 2) I'd like to make it possible to instantiate a POSIX FilePath object
>> on
>> > Windows and a Windows FilePath on POSIX platforms.  This is because some
>> > libraries (e.g. the zip library, or tar files), use POSIX semantics for
>> > their paths even on Windows (I haven't seen a use case for Windows paths
>> on
>> > POSIX yet, actually).   This would make it possible to use the nice API
>> that
>> > FilePath has to manipulate paths appropriately for these other
>> libraries.
>> > This could be easily accomplished by having POSIX and Windows versions
>> of
>> > FilePath, and then typedef'ing FilePath differently on different
>> platforms
>> > to one of these versions.
>>
>> Sounds pretty Pythonic.
>>
>> FilePath already sort of has some support for this - it does a bunch
>> of things based on feature macros, mostly so that as I was writing it,
>> I could test the Windows semantics without having to (shudder) resort
>> to running on Windows.  These could probably be adapted to do what
>> you're asking.
>
>
> Cool.
>
>
>> > 3) It would be helpful 

[chromium-dev] Re: Changes to FilePath?

2009-04-29 Thread Greg Spencer
On Wed, Apr 29, 2009 at 12:22 PM, Mark Mentovai  wrote:

> I understand your problem.  You're saying "I have user-supplied data
> that I want to build a filename from," and "I have this pathname that
> I want to display back to the user."  I agree that it would be good to

have a way to handle these cases in base.  I don't know if FilePath
> proper is the right place to do it.  If we do it in FilePath, it still
> won't really be right.


OK, so it sounds like you're telling me not to use FilePath to represent
file paths from a disk for my purposes because they can't ever be converted
reliably to a particular encoding on Linux (which is a requirement for me,
because of the third party libraries that require a particular encoding).

That's fine, but what do I do instead?  Roll my own FilePath clone that has
some encoding assumptions?  I can do that, but it has the same issues as the
ones you're worried about with FilePath, so it seems better to solve the
issue in one place rather than have two versions that are both insufficient.
 Man, it would be better if FilePath could reliably know its encoding!  (I
realize that Linux makes this impossible, it just seems like it would be
better that way. :-)

Since Linux is the only platform where the encoding is unclear, what if we
did the best we could on Linux:

When constructing a FilePath from a char* string on Linux:
- Test the input string for values > 127 to determine if it's really just
ASCII (and if so, we're out of the woods).
- Then check LANG, LC_CTYPE, LC_ALL (through appropriate Linux APIs) for an
encoding that we can support, and note the encoding for later if we are
requested to do a conversion.
- If we run into an invalid sequence during a conversion, or an encoding we
can't convert from, then use a CHECK to crash.

This should work on most filenames, in almost all situations -- I'll bet
most filenames are ASCII, even on foreign systems, and the ones that aren't
ASCII have set LANG to something in /etc/profile, so all filenames created
by any app running on that machine should match that encoding.

Where they don't do that correctly, they're already getting garbage (and
should expect garbage) from any application they use, not just Chrome, since
there is no way *any *app can decode a path with multiple encodings in it,
or where the encoding is different than LANG (or LC_*) says it is.

Chrome already crashes like this when it encounters situations where it's
just impossible to know what's right, so it's consistent with Chrome's
behavior in other areas.


> it should be the caller's responsibility to only deal with user-created
> names with
> this interface.


What do you mean here?  Isn't that the case now with FilePath?  (It's the
file_util routines that actually read the filesystem and make FilePaths out
of them, afterall).  As for your suggestion to only deal with path
components, how would you propose to parse user-supplied paths into one of
these?


> > 2) I'd like to make it possible to instantiate a POSIX FilePath object on
> > Windows and a Windows FilePath on POSIX platforms.  This is because some
> > libraries (e.g. the zip library, or tar files), use POSIX semantics for
> > their paths even on Windows (I haven't seen a use case for Windows paths
> on
> > POSIX yet, actually).   This would make it possible to use the nice API
> that
> > FilePath has to manipulate paths appropriately for these other libraries.
> > This could be easily accomplished by having POSIX and Windows versions of
> > FilePath, and then typedef'ing FilePath differently on different
> platforms
> > to one of these versions.
>
> Sounds pretty Pythonic.
>
> FilePath already sort of has some support for this - it does a bunch
> of things based on feature macros, mostly so that as I was writing it,
> I could test the Windows semantics without having to (shudder) resort
> to running on Windows.  These could probably be adapted to do what
> you're asking.


Cool.


> > 3) It would be helpful to have real path normalization for each of the
> > platforms (although I know what a testing nightmare that can be).  I
> might
> > try and tackle this if people think it would be beneficial.
>
> It's also a specification and implementation nightmare.  Everyone has
> a different idea of what "normalization" means.  What's your idea?


Yes, I know it's a nightmare all around, but I think it would be useful to
have something that addresses this.  My idea would be the same as Python's
os.path.normpath, mainly because it's a well-tested, seasoned example with
test cases.  Windows also has a routine for this (PathCanonicalize) that
could be used (but I know it doesn't work for UNC paths).

> 4) Make sure we handle case sensitivity vs case preservation correctly.
> > It's unclear to me that FilePath does this correctly on the Mac -- Mac
> file
> > names are case preserving, but case insensitive, Unix filenames are both
> > (and windows filenames are neither :-).
>
> Again with the normalization. 

[chromium-dev] Re: Changes to FilePath?

2009-04-29 Thread Mark Mentovai

Greg Spencer wrote:
> So there's currently no right way to do the conversion, but I still think
> that the FilePath constructor is probably in the best position to inspect
> LC_ALL, etc. and do as close to the right thing as possible.  I doubt most
> Linux developers even think about this, and so the chances that they will
> implement anything other than assuming that it's ASCII are slim -- this
> would allow us to at least implement a baseline for them.

Not doing the conversion is kinda the point.  Well, it's exactly the point.

(Hi, I'm the author of FilePath.)

If you've got an arbitrary path, it might be encoded in some scheme,
and it might not, and it might contain a mix of encodings.  The point
of FilePath is "we know it's a path and we don't necessarily know
anything else."  Chromium didn't used to have FilePath.  Everything
was a wstring which implied UTF-16/32, and the conversions implied
UTF-8 because we couldn't do anything smarter, and there was all sorts
of potential for messing things up.  Not a pretty story.  When
FilePath was born, the *Hack methods showed up to give us a way to
transition the old-style wstring APIs to new-style FilePath APIs at
reasonable cut points, instead of having to do everything all at once.

I understand your problem.  You're saying "I have user-supplied data
that I want to build a filename from," and "I have this pathname that
I want to display back to the user."  I agree that it would be good to
have a way to handle these cases in base.  I don't know if FilePath
proper is the right place to do it.  If we do it in FilePath, it still
won't really be right.  If we had something, it should probably be
made to operate only on single pathname components, and it should be
the caller's responsibility to only deal with user-created names with
this interface.

> 2) I'd like to make it possible to instantiate a POSIX FilePath object on
> Windows and a Windows FilePath on POSIX platforms.  This is because some
> libraries (e.g. the zip library, or tar files), use POSIX semantics for
> their paths even on Windows (I haven't seen a use case for Windows paths on
> POSIX yet, actually).   This would make it possible to use the nice API that
> FilePath has to manipulate paths appropriately for these other libraries.
> This could be easily accomplished by having POSIX and Windows versions of
> FilePath, and then typedef'ing FilePath differently on different platforms
> to one of these versions.

Sounds pretty Pythonic.

FilePath already sort of has some support for this - it does a bunch
of things based on feature macros, mostly so that as I was writing it,
I could test the Windows semantics without having to (shudder) resort
to running on Windows.  These could probably be adapted to do what
you're asking.

> 3) It would be helpful to have real path normalization for each of the
> platforms (although I know what a testing nightmare that can be).  I might
> try and tackle this if people think it would be beneficial.

It's also a specification and implementation nightmare.  Everyone has
a different idea of what "normalization" means.  What's your idea?

> 4) Make sure we handle case sensitivity vs case preservation correctly.
> It's unclear to me that FilePath does this correctly on the Mac -- Mac file
> names are case preserving, but case insensitive, Unix filenames are both
> (and windows filenames are neither :-).

Again with the normalization.  What do you want this stuff for?
What's your idea of how this should work?

Remember: FilePath is specified to be light and to never touch the
disk.  If you've got a disk-touching operation, it probably doesn't
belong in FilePath proper.

Mark

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Greg Spencer
On Tue, Apr 28, 2009 at 3:26 PM, Erik Kay  wrote:

> On Tue, Apr 28, 2009 at 3:19 PM, Greg Spencer  wrote:
>
>> But that's exactly the point.  FilePath is the class that created the path
>> to begin with.  So it can know what the LC_*/LANG variables were was when it
>> was created, and do the right conversion when you ask the FilePath to
>> convert to UTF16.  Also, if the developer calls something called
>> FilePath::CreateFromUTF8, then it can know it was supposed to be UTF8 and
>> remember that.
>>
>
> If you created it yourself, that's fine.  FilePaths aren't always created
> manually by users.  They often are populated from system APIs where you
> can't know.  See file_util* for some examples.  So the problem is that if
> you add this API, people will mistakenly use the conversion functions when
> they can't be safe.  I agree it sucks.  I just don't know of a reasonable
> solution.
>

So there's currently no right way to do the conversion, but I still think
that the FilePath constructor is probably in the best position to inspect
LC_ALL, etc. and do as close to the right thing as possible.  I doubt most
Linux developers even think about this, and so the chances that they will
implement anything other than assuming that it's ASCII are slim -- this
would allow us to at least implement a baseline for them.  Or would that
just screw things up worse?

Doesn't this mean that it's possible that the path manipulation routines
fail for sufficiently odd encodings? (jis or something where an encoded char
might include a "/"?)

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Evan Martin

On Tue, Apr 28, 2009 at 3:26 PM, Erik Kay  wrote:
>> But that's exactly the point.  FilePath is the class that created the path
>> to begin with.  So it can know what the LC_*/LANG variables were was when it
>> was created, and do the right conversion when you ask the FilePath to
>> convert to UTF16.  Also, if the developer calls something called
>> FilePath::CreateFromUTF8, then it can know it was supposed to be UTF8 and
>> remember that.
>
>
> If you created it yourself, that's fine.  FilePaths aren't always created
> manually by users.  They often are populated from system APIs where you
> can't know.  See file_util* for some examples.  So the problem is that if
> you add this API, people will mistakenly use the conversion functions when
> they can't be safe.  I agree it sucks.  I just don't know of a reasonable
> solution.

We have this problem already, when FilePaths need to work with
wstring-based APIs like the win32 one.
What we've done so far is use a function with an awkward name
(ToWStringHack, FromWStringHack) to try to create bias against them.

On the other hand, the codebase now has 309 lines containing
"WStringHack" so I don't know it's been too successful.

It might be worth figuring out a name that does what Greg needs that
is similarly awkward but doesn't involve "Hack" for circumstances
where you really just need to do the conversion.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Erik Kay
On Tue, Apr 28, 2009 at 3:19 PM, Greg Spencer  wrote:

> On Tue, Apr 28, 2009 at 3:11 PM, Erik Kay  wrote:
>
>> The biggest problem with this change is that it's not possible to do this
>> conversion on Linux in a safe way.  In Linux, there is no charset defined by
>> the filesystem.  Each filename is just a blob of bytes.  Apps are supposed
>> to respect an environment variable, but since this environment variable
>> could change over time and be different from user to user, there's no
>> reliable way to know what the charset is, so you can't convert from a
>> FilePath on Linux to UTF8 or UTF16 unless you were the one who created the
>> path to begin with.
>>
>
> But that's exactly the point.  FilePath is the class that created the path
> to begin with.  So it can know what the LC_*/LANG variables were was when it
> was created, and do the right conversion when you ask the FilePath to
> convert to UTF16.  Also, if the developer calls something called
> FilePath::CreateFromUTF8, then it can know it was supposed to be UTF8 and
> remember that.
>


If you created it yourself, that's fine.  FilePaths aren't always created
manually by users.  They often are populated from system APIs where you
can't know.  See file_util* for some examples.  So the problem is that if
you add this API, people will mistakenly use the conversion functions when
they can't be safe.  I agree it sucks.  I just don't know of a reasonable
solution.

Erik

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Greg Spencer
On Tue, Apr 28, 2009 at 3:19 PM, Greg Spencer  wrote:

> On Tue, Apr 28, 2009 at 3:11 PM, Erik Kay  wrote:
>
>> The biggest problem with this change is that it's not possible to do this
>> conversion on Linux in a safe way.
>>
>
And besides -- this problem isn't introduced by this change: it exists
already because currently there's no safe way to convert, regardless of the
API (since a consumer of a FilePath doesn't know what encoding it contains).

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Greg Spencer
On Tue, Apr 28, 2009 at 3:11 PM, Erik Kay  wrote:

> The biggest problem with this change is that it's not possible to do this
> conversion on Linux in a safe way.  In Linux, there is no charset defined by
> the filesystem.  Each filename is just a blob of bytes.  Apps are supposed
> to respect an environment variable, but since this environment variable
> could change over time and be different from user to user, there's no
> reliable way to know what the charset is, so you can't convert from a
> FilePath on Linux to UTF8 or UTF16 unless you were the one who created the
> path to begin with.
>

But that's exactly the point.  FilePath is the class that created the path
to begin with.  So it can know what the LC_*/LANG variables were was when it
was created, and do the right conversion when you ask the FilePath to
convert to UTF16.  Also, if the developer calls something called
FilePath::CreateFromUTF8, then it can know it was supposed to be UTF8 and
remember that.

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Erik Kay
(resend - arg)

On Tue, Apr 28, 2009 at 2:47 PM, Greg Spencer  wrote:

> On Tue, Apr 28, 2009 at 2:41 PM, Amanda Walker wrote:
>
>>
>> On Tue, Apr 28, 2009 at 4:39 PM, Greg Spencer 
>> wrote:
>> > 1) I'd like to add some explicit routines for converting to/from UTF8
>> and
>> > UTF16.  While it's nice (and important) that FilePath uses the
>> platform's
>> > native string, we've found that many third party libraries have made
>> other
>> > assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t)
>> paths
>> > regardless of platform, and converting a FilePath to and from those
>> forms is
>> > a platform-dependent exercise which should be centralized into the class
>> > (i.e. adding "ToUTF8" and "ToWide" functions to the class, and explicit
>> > constructors that take each type).
>>
>> One thing many of us have found, across multiple projects, is that
>> wchar_t is fraught with complication as soon as more than one platform
>> is involved. "wchar_t == UTF16" is a Windowsism (gcc defaults to 4
>> bytes, for example, and L"mumble" gets stored in UCS-4, not UTF-16).
>> Chrome started with more or less what you are suggesting, and we moved
>> off of it after much pain.
>
>
> I understand those issues quite well (but I probably should call the
> conversion method ToUTF16, now that you mention it).  And char* isn't
> necessarily UTF8 on all platforms either.
>
> OK, so what's the currently recommended path for converting to UTF16 or
> UTF8 from a FilePath?
>

The biggest problem with this change is that it's not possible to do this
conversion on Linux in a safe way.  In Linux, there is no charset defined by
the filesystem.  Each filename is just a blob of bytes.  Apps are supposed
to respect an environment variable, but since this environment variable
could change over time and be different from user to user, there's no
reliable way to know what the charset is, so you can't convert from a
FilePath on Linux to UTF8 or UTF16 unless you were the one who created the
path to begin with.

Erik

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Greg Spencer
On Tue, Apr 28, 2009 at 2:51 PM, Peter Kasting  wrote:

> On Tue, Apr 28, 2009 at 2:48 PM, Greg Spencer  wrote:
>
>> So, I was unable to find the conversion utilities in base that do the
>> conversion to/from UTF8.  What are they called?  If I missed them (and I
>> looked for a while before I gave up), then maybe they need to be more
>> prominent?
>>
>
> See base/string_util.h, UTF8ToUTF16() etc.
>

Yes, but those are generic string conversions, and so to convert a FilePath
to UTF16 on all platforms, my code has to look something like:

--
FilePath path(FILE_PATH_LITERAL("Foo.bar"));
collada::fstring collada_path; // a UTF16 path.
#if defined(OS_WIN)
  collada_path = path.value();
#elif defined(OS_MACOSX)
  collada_path = UTF8ToUTF16(path.value());
#elif defined(OS_LINUX)
  // (or whatever this linux flavor uses for a filename encoding.)
  collada_path = Latin1ToUTF16(path.value());
#endif
--

This seems like code that belongs in FilePath because it knows exactly what
the filename encoding would be on each platform.

Yes, partly because including dedicated helpers like this makes it sound as
> if the class is somehow special-cased or fastpathed to deal better with
> these than a generic converter would be.
>

But it can.  For instance, on the Mac, we know that filenames are UTF8
encoded.  We have not such guarantee on Linux, even though they both use a
char* format in FilePath.  If FilePath were doing the conversion, then it
could be very picky about doing the conversion properly on each platform,
because converting a Latin-1 string to a wide char using a UTF8 codec may
end up with some strange results.

The other argument is simply that converting utf8 to utf16 is a generic sort
> of functionality that belongs in base/ or another similar general-purpose
> location, rather than specifically in FilePath.
>

And the implementation in FilePath would be using those generic functions,
but it would be using them (or not) as applied to the specific platform it
is compiled on, whereas the conversion routines don't know anything about
FilePath's platform specific semantics.

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Peter Kasting
On Tue, Apr 28, 2009 at 2:48 PM, Greg Spencer  wrote:

> So, I was unable to find the conversion utilities in base that do the
> conversion to/from UTF8.  What are they called?  If I missed them (and I
> looked for a while before I gave up), then maybe they need to be more
> prominent?
>

See base/string_util.h, UTF8ToUTF16() etc.

What is the danger here of being lazy?  Is it that developers will
> unwittingly do expensive conversions?
>

Yes, partly because including dedicated helpers like this makes it sound as
if the class is somehow special-cased or fastpathed to deal better with
these than a generic converter would be.

The other argument is simply that converting utf8 to utf16 is a generic sort
of functionality that belongs in base/ or another similar general-purpose
location, rather than specifically in FilePath.

PK

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Greg Spencer
On Tue, Apr 28, 2009 at 2:31 PM, Peter Kasting  wrote:

> On Tue, Apr 28, 2009 at 1:39 PM, Greg Spencer  wrote:
>
>> 1) I'd like to add some explicit routines for converting to/from UTF8 and
>> UTF16.  While it's nice (and important) that FilePath uses the platform's
>> native string, we've found that many third party libraries have made other
>> assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t) paths
>> regardless of platform, and converting a FilePath to and from those forms is
>> a platform-dependent exercise which should be centralized into the class
>> (i.e. adding "ToUTF8" and "ToWide" functions to the class, and explicit
>> constructors that take each type).
>
>
> I'm pretty strongly against this for the same reasons as Evan.  I think
> consumers who need to convert should be doing the conversion using their own
> routines (e.g. Chrome uses ones in our base/ module).
>

So, I was unable to find the conversion utilities in base that do the
conversion to/from UTF8.  What are they called?  If I missed them (and I
looked for a while before I gave up), then maybe they need to be more
prominent?

What is the danger here of being lazy?  Is it that developers will
unwittingly do expensive conversions?  If so, I would expect that a member
function called "ToUTF8" would be just as much of a performance warning as a
helper function called "FilePathToUTF8", but be a heck of a lot more
convenient (since it would not require the developer to create a local
variable for use as a return value from the helper, and can be used as an
argument to another library's functions).  I can see the argument for not
having a casting constructor that isn't from the platform native form, but
in that case, a factory method called "CreateFromUTF8" should be a
sufficient warning to the developer that it might be expensive.

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Greg Spencer
On Tue, Apr 28, 2009 at 2:41 PM, Amanda Walker  wrote:

>
> On Tue, Apr 28, 2009 at 4:39 PM, Greg Spencer  wrote:
> > 1) I'd like to add some explicit routines for converting to/from UTF8 and
> > UTF16.  While it's nice (and important) that FilePath uses the platform's
> > native string, we've found that many third party libraries have made
> other
> > assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t)
> paths
> > regardless of platform, and converting a FilePath to and from those forms
> is
> > a platform-dependent exercise which should be centralized into the class
> > (i.e. adding "ToUTF8" and "ToWide" functions to the class, and explicit
> > constructors that take each type).
>
> One thing many of us have found, across multiple projects, is that
> wchar_t is fraught with complication as soon as more than one platform
> is involved. "wchar_t == UTF16" is a Windowsism (gcc defaults to 4
> bytes, for example, and L"mumble" gets stored in UCS-4, not UTF-16).
> Chrome started with more or less what you are suggesting, and we moved
> off of it after much pain.


I understand those issues quite well (but I probably should call the
conversion method ToUTF16, now that you mention it).  And char* isn't
necessarily UTF8 on all platforms either.

OK, so what's the currently recommended path for converting to UTF16 or UTF8
from a FilePath?

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Amanda Walker

On Tue, Apr 28, 2009 at 4:39 PM, Greg Spencer  wrote:
> 1) I'd like to add some explicit routines for converting to/from UTF8 and
> UTF16.  While it's nice (and important) that FilePath uses the platform's
> native string, we've found that many third party libraries have made other
> assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t) paths
> regardless of platform, and converting a FilePath to and from those forms is
> a platform-dependent exercise which should be centralized into the class
> (i.e. adding "ToUTF8" and "ToWide" functions to the class, and explicit
> constructors that take each type).

One thing many of us have found, across multiple projects, is that
wchar_t is fraught with complication as soon as more than one platform
is involved. "wchar_t == UTF16" is a Windowsism (gcc defaults to 4
bytes, for example, and L"mumble" gets stored in UCS-4, not UTF-16).
Chrome started with more or less what you are suggesting, and we moved
off of it after much pain.

--Amanda

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Peter Kasting
On Tue, Apr 28, 2009 at 1:39 PM, Greg Spencer  wrote:

> 1) I'd like to add some explicit routines for converting to/from UTF8 and
> UTF16.  While it's nice (and important) that FilePath uses the platform's
> native string, we've found that many third party libraries have made other
> assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t) paths
> regardless of platform, and converting a FilePath to and from those forms is
> a platform-dependent exercise which should be centralized into the class
> (i.e. adding "ToUTF8" and "ToWide" functions to the class, and explicit
> constructors that take each type).


I'm pretty strongly against this for the same reasons as Evan.  I think
consumers who need to convert should be doing the conversion using their own
routines (e.g. Chrome uses ones in our base/ module).

PK

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Greg Spencer
On Tue, Apr 28, 2009 at 1:57 PM, Thomas Van Lenten wrote:

> On Tue, Apr 28, 2009 at 4:39 PM, Greg Spencer  wrote:
>
>> 4) Make sure we handle case sensitivity vs case preservation correctly.
>> It's unclear to me that FilePath does this correctly on the Mac -- Mac file
>> names are case preserving, but case insensitive, Unix filenames are both
>> (and windows filenames are neither :-).
>
>
> FYI - it's a drive format time option on the Mac, so they can be case
> preserving and case sensitive.
>

Thanks for pointing that out. In fact, NTFS is actually case sensitive,
where FAT32 is not (see http://support.microsoft.com/kb/100625).  So we have
issues there as well.  The real issue would be dealing with relative paths
that don't exist yet -- there would be no way to inspect the file location
to find out what mode it was in.  I think I would just punt and go with the
widely-used defaults (the ones I mentioned above), since most apps seem to
assume those limitations.  An alternative would be to have an API to specify
the desired mode, and default to the common case on each platform.

-Greg.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Evan Martin

On Tue, Apr 28, 2009 at 1:39 PM, Greg Spencer  wrote:
> 1) I'd like to add some explicit routines for converting to/from UTF8 and
> UTF16.  While it's nice (and important) that FilePath uses the platform's
> native string, we've found that many third party libraries have made other
> assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t) paths
> regardless of platform, and converting a FilePath to and from those forms is
> a platform-dependent exercise which should be centralized into the class
> (i.e. adding "ToUTF8" and "ToWide" functions to the class, and explicit
> constructors that take each type).

Can you give some examples of where this is needed?  We've
historically fought against this pretty hard, and as soon as accessors
are available users will get lazy about it.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Changes to FilePath?

2009-04-28 Thread Thomas Van Lenten
On Tue, Apr 28, 2009 at 4:39 PM, Greg Spencer  wrote:

> Hi Chromium Developers,
>
> I'm working on Google's O3D (http://code.google.com/p/o3d), and we
> (naturally) share some of Chrome's base classes for our code, including the
> very useful class FilePath.
>
> However, in using FilePath in the last few months, I've seen that it needs
> some refinement.  I'd like to augment the FilePath class with some things
> that would make it more generally useful -- it's very nicely set up, but
> it's missing a few things that make it harder to work with than it needs to
> be:
>
> 1) I'd like to add some explicit routines for converting to/from UTF8 and
> UTF16.  While it's nice (and important) that FilePath uses the platform's
> native string, we've found that many third party libraries have made other
> assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t) paths
> regardless of platform, and converting a FilePath to and from those forms is
> a platform-dependent exercise which should be centralized into the class
> (i.e. adding "ToUTF8" and "ToWide" functions to the class, and explicit
> constructors that take each type).
>
> 2) I'd like to make it possible to instantiate a POSIX FilePath object on
> Windows and a Windows FilePath on POSIX platforms.  This is because some
> libraries (e.g. the zip library, or tar files), use POSIX semantics for
> their paths even on Windows (I haven't seen a use case for Windows paths on
> POSIX yet, actually).   This would make it possible to use the nice API that
> FilePath has to manipulate paths appropriately for these other libraries.
> This could be easily accomplished by having POSIX and Windows versions of
> FilePath, and then typedef'ing FilePath differently on different platforms
> to one of these versions.
>
> 3) It would be helpful to have real path normalization for each of the
> platforms (although I know what a testing nightmare that can be).  I might
> try and tackle this if people think it would be beneficial.
>
> 4) Make sure we handle case sensitivity vs case preservation correctly.
> It's unclear to me that FilePath does this correctly on the Mac -- Mac file
> names are case preserving, but case insensitive, Unix filenames are both
> (and windows filenames are neither :-).


FYI - it's a drive format time option on the Mac, so they can be case
preserving and case sensitive.

TVL


>
>
> So, is there any resistance to any of the above?  Do you have other
> suggestions that I might take into account?  Am I violating any design
> assumptions of FilePath?  For #2, is speed/size enough of a concern to avoid
> a virtual base class (I wouldn't think so, but you never know..)?
>
> -Greg.
>
> >
>

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---