Hello all, I've been scanning some file api documentation and wondering what we could do that would translate across platforms reliably. I've been thinking of sort of concentric circles of operations, where the inner circles can easily be supported in a cross-platform way, and the outer ones require more and more hackery. What do you think of the following?
Group 1: Treat pathnames as opaque objects that come from outside APIs and can only be used by passing them to APIs. We can support these in a way that will be compatible everywhere. Operations: open file, close file, stat file. In order to be useful, we might also provide a "command-line-argument->file" operation, but probably no reverse operation. Group 2: treat pathnames as vectors of opaque path components Operations: list items in a directory Group 3: now we need to care about encoding Operations: string->path, path->string. This will be much harder than groups 1 and 2. I think group 1 by itself would allow for most command-line programs that people want to write. If you add group 2, you could write find, ls, cat, and probably others. You need group 3 to write grep and a web server. My thought right now is that group 3 is going to have a complex API if we really want to get encodings right. Our goal should be that this complexity doesn't affect group 1 and group 2, which really should have very simple APIs. Now, some thoughts on group 3: Mark is right that paths are basically just strings, even though occasionally they're not. I sort of like the idea of the PEP-383 encoding (making paths strings that can potentially contain unused codepoints, which represent non-character bytes), but would that make path strings break under some Guile string operations? Also, when we convert strings to paths, we need to know what encoding the local filesystem uses. That will usually be UTF-8, but potentially might not be, correct? If we can auto-discover the correct encoding, we might be able to keep all of that in the background and just pretend that we can convert Guile strings to file system paths in a clean way. Noah On Wed, May 4, 2011 at 5:24 AM, Ludovic Courtès <l...@gnu.org> wrote: > Hi Noah, > > Noah Lavine <noah.b.lav...@gmail.com> writes: > >> The reason this strangeness enters is that path strings are actually >> lists (or vectors) encoded as strings. Conceptually, the path >> ~/Desktop/Getting\ a\ Job is the list ("~" "Desktop" "Getting a Job"). >> In this representation, there are no escapes and no separators. It >> always seemed cleaner to me to think about it that way. > > Agreed. > > However, POSIX procedures deal with strings, so you still need to > convert to a string at some point. So I think there are few places > where you could really use anything other than strings to represent file > names—unless all of libguile is changed to deal with that, which seems > unreasonable to me. > > MIT Scheme’s API goes this route, but that’s heavyweight and can hardly > be retrofitted in a file-name-as-strings implementation, I think: > <http://www.gnu.org/software/mit-scheme/documentation/mit-scheme-ref/Pathnames.html>. > >> I said this is similar to the (web) module because of all of the >> discussion there of how HTTP encodes data types in text, and how it's >> better to think of a URI as URI type rather than a special string, >> etc. > > Yes. > > Thanks, > Ludo’. >