Re: [Haskell] Reading a directory tree
On Mon, Jun 28, 2004 at 11:08:38AM -0400, Isaac Jones wrote: > My basic requirement is that they should be functions useful for > system administration. > > We've sucked in some functions in from GHC and from Cabal[1]. It would > be nice if we could just have a dumping ground for related functions > that people have written. Functions that recursively iterate directories should _never_ follow symlinks as if they were directories (possibly optionally). It is rarely what is wanted. To fix this, use getSymbolicLinkStatus instead of getFileStatus. For directory iteration, the called function will usually need to know at least some information about each file. Since you have to get a FileStatus for each file anyway, you might as well pass it to each child. Also, it is important to handle errors that happen during iteration. If you try statting, or reading something incorrect, you can cause exceptions, which will need to be handled. SlurpDirectory has some useful code (in darcs), but is fairly specific to darcs. It is actually fairly challenging to come up with useful directory iteration that is generic. What about filtering? Dave Brown ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Reading a directory tree
[Followups to [EMAIL PROTECTED] This thread makes me want to mention that I'm starting to put together a little library of useful path-based functions (I started it based on OS.Path in Python). Not sure how to organize it, exactly, since some things that might make sense here are already in System.Directory (and some things in there don't make sense in there) and some things may require Posix, and so should be split off into a separate module. My basic requirement is that they should be functions useful for system administration. We've sucked in some functions in from GHC and from Cabal[1]. It would be nice if we could just have a dumping ground for related functions that people have written. For now, if you have darcs[2], you can say: darcs get http://www.syntaxpolice.org/darcs_repos/OS.Path/ (And if you don't have darcs, you can browse around inside there with a web browser to get an idea of how it's shaping up.) That'll download the tree. You can make changes and additions, then use "darcs send" to email the changes to me. I think we should just dump together a bunch of these functions and worry about organizing it later (wiki-style). Here's a function[3] I wrote that's similar to what's mentioned here. It builds a tree out of a file directory. I'd like a function like "walk"[4] that walks this tree and executes an IO action (like printing out all the .txt files or something). It doesn't have to be based on Tree. peace, isaac [1] http://www.haskell.org/cabal [2] http://abridgegame.org/darcs/ [3] -- |Create a tree out of the given starting point. If the starting -- point is a directory, we recurse down and give the entire -- sub-structure, otherwise, we return a single node. -- I think it only requires Posix to test whether this is a directory, -- maybe we can do this with System.Directory? makeDirectoryTree :: FilePath -- ^Starting point (file or directory) -> IO (Tree (FilePath, Bool)) makeDirectoryTree path' = makeDirTree' Nothing path' where makeDirTree' :: Maybe FilePath -- ^Parent Dir -> FilePath -- ^Starting point -> IO (Tree (FilePath, Bool)) makeDirTree' parentIn pathIn = do let fullPath = case parentIn of Just pi' -> (dropLast pi' pathSeparator) ++ [pathSeparator] ++ pathIn Nothing -> pathIn isDir <- pathIsDirectory fullPath contents <- if isDir then getDirectoryContents fullPath else return [] subForest <- mapM (makeDirTree' (Just fullPath)) [c | c <- contents, c /= ".", c /= ['.', pathSeparator] , c /= ['.', '.', pathSeparator], c /= ".."] return $ Node (pathIn, isDir) subForest -- FIX: probably better not to use "error" here, but rather let exception occur. pathIsDirectory :: FilePath -> IO Bool pathIsDirectory p = do existsP <- doesFileExist p existsP2 <- doesDirectoryExist p when (not (existsP || existsP2)) (error $ "File does not exist: " ++ show p) status <- getFileStatus p return $ isDirectory status dropLast [] _ = [] dropLast (h:[]) toDrop | h == toDrop = [] | otherwise = [h] dropLast (h:t) toDrop = h:(dropLast t toDrop) [4] -- |Apply the given function in each directory starting at the -- root. use 'makeDirectoryTree'? FIX: how do we handle symlinks? walk :: FilePath -- ^root -> (FilePath -- current directory -> [FilePath] -- list of files in the directory -> IO a) -- ^visit function (current dir, list of files) -> IO (Tree a) ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Reading a directory tree
Duncan Coutts wrote: > > > BTW, one other caveat (which applies to all of the examples so far): > > > doesDirectoryExist doesn't distinguish between directories and > > > symlinks to directories. Consequently, any directory-traversal > > > algorithm which uses doesDirectoryExist to identify directories will > > > behave incorrectly in the presence of symlinks. In the worst case, you > > > can get into an infinite loop. > > > > symlinks aren't necessary to give an infinite loop: you can have > > upwards hard links as well (at least on *nix). You have to keep a > > list of inodes you've already visited (per filesystem, of course). > > I believe hard linked directories are banned to avoid this problem. Most modern Unices disallow the creation of additional hard links to directories, so the only "upward" links are the "." entry and the ".." entries for each subdirectory (which I specifically filtered in my example[1]). [1] Actually, I filtered anything beginning with a "."; I should have pointed that out. To change that, replace: let names' = filter ((/= '.') . head) names with: let names' = filter (`notElem` [".", ".."]) names Unices which do allow the creation of hard links to directories only allow this for root, and programs which perform directory traversal frequently fall down on such cases; tracking visited directories tends to be the exception rather than the norm. The usual solution is to assume that root knows what they're doing (and if they don't, tough) and ignore the issue. OTOH, directory-traversal code which follows symlinks will fall down far more readily, as symlinks to directories can be created on all platforms which allow symlinks, and require no special privileges. [On early Linux distributions, it was quite common for /etc/inet to be a symlink to /etc, as programs often disagreed as to whether certain configuration files belonged in /etc or /etc/inet. Attempting to back-up the Linux filesystem via Samba with a Windows backup tool would result in an "end of tape" error somewhere around /etc/inet/inet/inet/inet/] > This is true on Linux at least, I don't know what POSIX specifies. I don't know about POSIX, but Unix98 says: [http://www.opengroup.org/onlinepubs/009695399/functions/link.html] DESCRIPTION ... If path1 names a directory, link() shall fail unless the process has appropriate privileges and the implementation supports using link() on directories. ... ERRORS ... [EPERM] The file named by path1 is a directory and either the calling process does not have appropriate privileges or the implementation prohibits using link() on directories. ... RATIONALE Linking to a directory is restricted to the superuser in most historical implementations because this capability may produce loops in the file hierarchy or otherwise corrupt the file system. This volume of IEEE Std 1003.1-2001 continues that philosophy by prohibiting link() and unlink() from doing this. Other functions could do it if the implementor designed such an extension. There isn't a macro or sysconf/pathconf option to query whether the platform allows links to directories. -- Glynn Clements <[EMAIL PROTECTED]> ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Reading a directory tree
On Tue, 2004-06-22 at 21:02, Keith Wansbrough wrote: > > BTW, one other caveat (which applies to all of the examples so far): > > doesDirectoryExist doesn't distinguish between directories and > > symlinks to directories. Consequently, any directory-traversal > > algorithm which uses doesDirectoryExist to identify directories will > > behave incorrectly in the presence of symlinks. In the worst case, you > > can get into an infinite loop. > > symlinks aren't necessary to give an infinite loop: you can have > upwards hard links as well (at least on *nix). You have to keep a > list of inodes you've already visited (per filesystem, of course). I believe hard linked directories are banned to avoid this problem. This is true on Linux at least, I don't know what POSIX specifies. Duncan ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Reading a directory tree
> BTW, one other caveat (which applies to all of the examples so far): > doesDirectoryExist doesn't distinguish between directories and > symlinks to directories. Consequently, any directory-traversal > algorithm which uses doesDirectoryExist to identify directories will > behave incorrectly in the presence of symlinks. In the worst case, you > can get into an infinite loop. symlinks aren't necessary to give an infinite loop: you can have upwards hard links as well (at least on *nix). You have to keep a list of inodes you've already visited (per filesystem, of course). --KW 8-) -- Keith Wansbrough <[EMAIL PROTECTED]> http://www.cl.cam.ac.uk/users/kw217/ University of Cambridge Computer Laboratory. ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Reading a directory tree
Tom Hofte wrote: > I'm looking for a way to iteratively read all the files in a directory > and > its subdirectories, given the filepath of the top-level dir. > For example, I want to find a file, corresponding to a given filename, > in a directory and its subdirectories. > > Is there a way to implement this in Haskell? This is somewhat simpler than the other examples which have been given. > import Monad > import Directory > > scanDir :: FilePath -> IO [FilePath] > scanDir dir = do > names <- getDirectoryContents dir > let names' = filter ((/= '.') . head) names > let paths = map ((dir ++ "/") ++) names' > dirs <- filterM doesDirectoryExist paths > files <- filterM doesFileExist paths > rest <- mapM scanDir dirs > return $ files ++ concat rest One caveat: (scanDir "/") isn't handled correctly; you will get an extra slash, i.e. "//usr", "//bin" etc. Not that I'd recommend using this code for scanning an entire filesystem, due to performance issues. -- Glynn Clements <[EMAIL PROTECTED]> ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Reading a directory tree
Tom Hofte wrote: > I'm looking for a way to iteratively read all the files in a directory > and > its subdirectories, given the filepath of the top-level dir. > For example, I want to find a file, corresponding to a given filename, > in a directory and its subdirectories. > > Is there a way to implement this in Haskell? BTW, one other caveat (which applies to all of the examples so far): doesDirectoryExist doesn't distinguish between directories and symlinks to directories. Consequently, any directory-traversal algorithm which uses doesDirectoryExist to identify directories will behave incorrectly in the presence of symlinks. In the worst case, you can get into an infinite loop. The only alternative is to use the functions from the Posix library (getSymbolicLinkStatus and isDirectory) instead, e.g.: > import Posix > > doesDirectoryReallyExist :: FilePath -> IO Bool > doesDirectoryReallyExist path = do > stat <- getSymbolicLinkStatus > return $ isDirectory stat -- Glynn Clements <[EMAIL PROTECTED]> ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Reading a directory tree
Hi Tom, Attached is a haskell file I wrote when I was learning how to use the directory functions. It builds a tree structure corresponding to the directory tree and finds the files that end in ".txt". It then sorted the files in order of modification time. As you can guess from the program, it was for managing weblog entries in Blosxom. It's beginner-ish code, but you can probably adapt it to your needs. Note that the top level directory name "dir" is hardcoded into the program. Best Wishes, Greg On Jun 22, 2004, at 5:20 AM, Tom Hofte wrote: Hi, I'm looking for a way to iteratively read all the files in a directory and its subdirectories, given the filepath of the top-level dir. For example, I want to find a file, corresponding to a given filename, in a directory and its subdirectories. Is there a way to implement this in Haskell? Kind regards, Tom Hofte ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell dirtree.hs: -- -- walk a directory tree and find all of the files ending in .txt -- module Main (main) where import Monad import Directory import Text.Regex import List -- -- dirTree returns a tree with nodes containing file -- information or directory information and a subdirectory. -- data DTree = FileNode String | DirNode String [DTree] addPrefix d str = do return (d ++ str) addSuffix d str = do return (str ++ d) scanDir d = do dentries <- getDirectoryContents d files <- mapM (addPrefix d) dentries ffiles <- filterM doesFileExist files -- directory names require extra processing: we must delete the "." and ".." entries -- and add a trailing "/" dirs<- filterM (\x -> do return (x /= "." && x /= "..")) dentries ddirs <- mapM (addPrefix d) dirs dddirs <- mapM (addSuffix "/") ddirs subdirs <- filterM doesDirectoryExist dddirs subDTrees <- mapM scanDir subdirs return (DirNode d ((map (\f -> FileNode f) ffiles) ++ subDTrees)) -- walk a directory tree, printing the contents showDTree (FileNode fname) = do print fname showDTree (DirNode dname ds) = do print dname mapM showDTree ds return () -- given a directory tree, find the files that end in ".txt" findTxt (DirNode dname ds) = concat (map findTxt ds) findTxt (FileNode fname) = if (isTextFile fname) then [fname] else [] -- given a file name, see if it ends in ".txt" isTextFile f = (matchRegex regexp f) /= Nothing where regexp = mkRegex ".*txt" printIfMatched f = if (isTextFile f) then do print f else do return () dir = "/Users/gwright/Desktop/Blosxom/docs/" timeSortedFiles d = do directoryTree <- scanDir d showDTree directoryTree textFiles <- do return (findTxt directoryTree) modTimes <- mapM getModificationTime textFiles fs <- do return (sortBy (\(f1,t1) (f2,t2) -> compare t1 t2) (zip textFiles modTimes)) return (map fst fs) main = do fs <- timeSortedFiles dir fs' <- do return (take 1 fs) mapM (\f -> do c <- readFile f; putStr c) fs' return () ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Reading a directory tree
> Hi, > > I'm looking for a way to iteratively read all the files in a directory and > its subdirectories, given the filepath of the top-level dir. > For example, I want to find a file, corresponding to a given filename, in > a directory and its subdirectories. > > Is there a way to implement this in Haskell? > Just a supplement to my previous message: you can find better documentation of the Directory library here: http://www.haskell.org/ghc/docs/latest/html/libraries/base/System.Directory.html John Hughes ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
Re: [Haskell] Reading a directory tree
> Hi, > > I'm looking for a way to iteratively read all the files in a directory and > its subdirectories, given the filepath of the top-level dir. > For example, I want to find a file, corresponding to a given filename, in > a directory and its subdirectories. > getDirectoryContents is your friend. It's a function in the standard library Directory, documented here: http://haskell.org/onlinereport/directory.html getDirectoryContents :: FilePath -> IO [FilePath] A FilePath is just a String. John Hughes ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
[Haskell] Reading a directory tree
Hi, I'm looking for a way to iteratively read all the files in a directory and its subdirectories, given the filepath of the top-level dir. For example, I want to find a file, corresponding to a given filename, in a directory and its subdirectories. Is there a way to implement this in Haskell? Kind regards, Tom Hofte ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell