Beman, Thanks for the clarifications - I thought I had followed the whole thread but I had misunderstood and thought the thrust was to create a new path representation -
on 1/6/03 1:51 PM, Beman Dawes at [EMAIL PROTECTED] wrote: > The syntax, semantics, and about everything else about paths that operating > system functions traffic in was defined years ago for each operating > system, standardized or not. Those native path formats aren't something we > can change. At this level - the solution becomes very platform specific. Some platforms have adopted a UTF or UTC mapping, some always assume "current local" without conversion. Any simple mappings that I've seen mentioned here will fail to give an expected name, even if they do roundtrip correctly. My recommendation would be to provide facilities to map to an appropriate space (perhaps providing some common ones such as UTF-8, UTF-16, UCS-2 (although I'm not sure what you do for characters outside the UCS-2... etc. The only safe thing I can imagine if for characters that can't be represented in the target space is to map them to UTF-7 (not UTF-8, UTF-8 on many double byte system will give you no end of headaches). UTF-7 is a good, lowest common denominator, form. > For boost::filesystem::path, any other path handling facility, the need > arises to convert a path between narrow and wide character strings. For > example, the operating system may use narrow character paths but the > program traffics in wstrings. A program traffics in strings in some encoding - be it wstrings or strings of ShifJIS characters doesn't really matter. Wide vs. narrow isn't an issue (UTF-8 poses as many problems as UTF-16 or UTF-32 do, even my e-mail reader gives me three options for Western European encodings). So long as you have a path from the encoding to UTF and then to the platform file system encoding you're doing about as well as can be expected. The toughest part is making sure you can recognize any escaped characters and that they are unlikely to appear accidentally. You may also need to escape characters that are in the character set but not allowed in a file name on a particular platform (null characters, line breaks, etc.). The other part of interest is meta-information that gets encoded into a path a name. Path separators are just one example, but notions like "if the first character is a '.' then it is invisible" on one platform but "if the first character is a '.' then it is a driver" on another. File extensions denoting a file type are another example. I haven't looked at how the filesystem library deals with this level of meta information. > That causes a need for conversions, and if I > understand correctly, there are a number of ways (all conforming to one > standard or another) to do that conversion, and it is really messy because > of locale issues. PJP is well aware of those standards; indeed he wrote > some of them, and IIRC has been to Japan and other Asian countries more > than twenty times dealing with internationalization issues. Whatever the solution - it is going to have to be somewhat platform/filesystem specific. I'd recommend a good system for providing platform solutions with a reasonable fallback mechanism. -- Sean Parent Sr. Computer Scientist II Advanced Technology Group Adobe Systems Incorporated [EMAIL PROTECTED] _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost