> I would say that all paths are relative to something, whether it's the > Unix root, or the current directory, or whatever. Therefore I would call > this something like PathStart, and add: > > | CurrentDirectory > | CurrentDirectoryOfWindowsDrive Char > | RootOfCurrentWindowsDrive
This is true in a sense, but I think making the distinction explicit is helpful for a number of the operations we want to do. For example, what is the parent of the relative path "."? Answer is "..". What is the parent of "/." on unix? Answer is "/.". I would also argue that it only makes sense to append a relative path on the right (ie, we can't append "/tmp/foo" onto "/usr/local", but we can append "tmp/foo"). Relative paths can refer to different things in the filesystem depending on process-local state, whereas absolute paths will always refer to the same thing (until the filesystem changes, or if you do something esoteric like "chroot"). Relative paths are really "path fragments." > On Unix, there are two nodes we can name directly, the "root" and the > "current directory". On Windows, there are 26 roots and 26 current > directories which we can name directly; additionally we can name the > root or current directory of the current drive, which is one of those > 26, and there are an arbitrary number of network share roots, and \\.\, > and perhaps some other stuff I don't know about. There are a few others. I took a look at MSDN earlier and was astounded. > Whether we're talking about the final node or the final edge depends on > the OS call; this is the usual pointer-vs-pointee confusion that's also > found in most programming languages outside the ML family. Probably we > can ignore it, with the exception of the "/foo" vs "/foo/" distinction, > which we must preserve. I've solved that as you suggested where "foo/" goes to "foo/." > > class (Show p) => Path p where > Okay, I'm not convinced that a Path class is the right approach. I'm not convinced either, but it feels natural to me. > I'm tentatively opposed to (B), since I think that the only interesting > difference between Win32 and Posix paths is in the set of starting > points you can name. (The path separator isn't very interesting.) But > maybe it does make sense to have separate starting-point ADTs for each > operating system. Then of course there's the issue that Win32 edge > labels are Unicode, while Posix edge labels are [Word8]. Hmm. I think these differences make separate implementations worthwhile. The question then is wether to abstract them via a type class, or with a datatype like: data FilePath = POSIXFilePath POSIXPath | WinFilePath WinPath Disadvantage here is that the datatype is closed. Advantage is that pattern matching tells you what kind of path you have staticly. > > pathCleanup :: p -> p -- remove .. and suchlike > > This can't be done safely except in a few special cases (e.g. "/.." -> > "/"). I'm not sure it should be here. More than you would think, if you follow the conventions of modern unix shells. eg, "foo/.." is always equal to ".", and "foo/bar/../../.." is equal to "..", and "foo///bar" is equal to "foo/bar". This is the behavior that "cd" gives on modern posix shells (rather than doing a chdir on the ".." hardlink, which does strange things in the presence of symlinks). The operation is sufficently useful that I think it should be included. It lets us know, for example, that "/bar/../foo/tmp" and "/foo/tmp" refer to the same file, without resorting to any IO operations. > > hasExtension :: p -> String -> Bool > This is really an operation on a single component of the path. I think > it would make more sense to make it an ordinary function with type > String -> String -> Bool and use the basename method to get the > appropriate path component. The problem is that String doesn't faithfully capture the representation of path edges. For POSIX it is a sequence of Word8 (except for 0x2F). In my implementation of UnixPaths, each path carries along an encoding component, which (theoreticly) tells you how to do [Word8] <-> [Char] translations. Eventually we will get a real IO layer complete with character encodings and this will be meaningful. The comparison needs to be done with encodings in mind. > > pathToForeign :: p -> IO (Ptr CChar) > > pathFromForeign :: Ptr CChar -> IO p > > This interface is problematic. Is the pointer returned by pathToForeign > a heap pointer which the caller is supposed to free? If so, a Ptr CChar > instance would have to copy the pathname every time. And I don't > understand exactly what pathFromForeign is supposed to do. Agree, I like the withCPath interface better. pathFromForeign takes a path representation directly from C land, without going through String first (again with encoding issues in mind). Although it should perhaps be: pathFromForeign :: Ptr () -> IO p instead (might be wide chars). _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe