Re: [Haskell] Reading a directory tree

2004-06-29 Thread David Brown
On Mon, Jun 28, 2004 at 11:08:38AM -0400, Isaac Jones wrote:

> My basic requirement is that they should be functions useful for
> system administration.
> 
> We've sucked in some functions in from GHC and from Cabal[1]. It would
> be nice if we could just have a dumping ground for related functions
> that people have written.

Functions that recursively iterate directories should _never_ follow
symlinks as if they were directories (possibly optionally).  It is
rarely what is wanted.

To fix this, use getSymbolicLinkStatus instead of getFileStatus.

For directory iteration, the called function will usually need to know
at least some information about each file.  Since you have to get a
FileStatus for each file anyway, you might as well pass it to each
child.

Also, it is important to handle errors that happen during iteration.  If
you try statting, or reading something incorrect, you can cause
exceptions, which will need to be handled.

SlurpDirectory has some useful code (in darcs), but is fairly specific
to darcs.  It is actually fairly challenging to come up with useful
directory iteration that is generic.  What about filtering?

Dave Brown
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Reading a directory tree

2004-06-28 Thread Isaac Jones
[Followups to [EMAIL PROTECTED]

This thread makes me want to mention that I'm starting to put together
a little library of useful path-based functions (I started it based on
OS.Path in Python).  Not sure how to organize it, exactly, since some
things that might make sense here are already in System.Directory (and
some things in there don't make sense in there) and some things may
require Posix, and so should be split off into a separate module.

My basic requirement is that they should be functions useful for
system administration.

We've sucked in some functions in from GHC and from Cabal[1]. It would
be nice if we could just have a dumping ground for related functions
that people have written.

For now, if you have darcs[2], you can say:
darcs get http://www.syntaxpolice.org/darcs_repos/OS.Path/

(And if you don't have darcs, you can browse around inside there with
a web browser to get an idea of how it's shaping up.)

That'll download the tree.  You can make changes and additions, then
use "darcs send" to email the changes to me.  I think we should just
dump together a bunch of these functions and worry about organizing it
later (wiki-style).

Here's a function[3] I wrote that's similar to what's mentioned here.
It builds a tree out of a file directory.  I'd like a function like
"walk"[4] that walks this tree and executes an IO action (like
printing out all the .txt files or something).  It doesn't have to be
based on Tree.

peace,

  isaac



[1] http://www.haskell.org/cabal

[2] http://abridgegame.org/darcs/

[3]
-- |Create a tree out of the given starting point.  If the starting
-- point is a directory, we recurse down and give the entire
-- sub-structure, otherwise, we return a single node.
-- I think it only requires Posix to test whether this is a directory,
-- maybe we can do this with System.Directory?

makeDirectoryTree :: FilePath -- ^Starting point (file or directory)
  -> IO (Tree (FilePath, Bool))
makeDirectoryTree path'
= makeDirTree' Nothing path'
  where
  makeDirTree' :: Maybe FilePath -- ^Parent Dir
   -> FilePath -- ^Starting point
   -> IO (Tree (FilePath, Bool))
  makeDirTree' parentIn pathIn
  = do let fullPath = case parentIn of
   Just pi' -> (dropLast pi' pathSeparator) ++ 
[pathSeparator] ++ pathIn
   Nothing -> pathIn
   isDir <- pathIsDirectory fullPath
   contents <- if isDir
then getDirectoryContents fullPath
else return []
   subForest <- mapM (makeDirTree' (Just fullPath)) [c | c <- contents,
   c /= ".",
   c /= ['.', 
pathSeparator] ,
   c /= ['.', '.', 
pathSeparator],
   c /= ".."]
   return $ Node (pathIn, isDir) subForest

  -- FIX: probably better not to use "error" here, but rather let exception occur.
  pathIsDirectory :: FilePath -> IO Bool
  pathIsDirectory p = do existsP <- doesFileExist p
 existsP2 <- doesDirectoryExist p
 when (not (existsP || existsP2))
  (error $ "File does not exist: " ++ show p)
 status <- getFileStatus p
 return $ isDirectory status

  dropLast [] _ = []
  dropLast (h:[]) toDrop | h == toDrop = []
 | otherwise = [h]
  dropLast (h:t) toDrop = h:(dropLast t toDrop)

[4]
-- |Apply the given function in each directory starting at the
-- root. use 'makeDirectoryTree'? FIX: how do we handle symlinks?
walk :: FilePath -- ^root
 -> (FilePath  -- current directory
 -> [FilePath] -- list of files in the directory
 -> IO a) -- ^visit function (current dir, list of files)
 -> IO (Tree a)

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Reading a directory tree

2004-06-22 Thread Glynn Clements

Duncan Coutts wrote:

> > > BTW, one other caveat (which applies to all of the examples so far):
> > > doesDirectoryExist doesn't distinguish between directories and
> > > symlinks to directories. Consequently, any directory-traversal
> > > algorithm which uses doesDirectoryExist to identify directories will
> > > behave incorrectly in the presence of symlinks. In the worst case, you
> > > can get into an infinite loop.
> > 
> > symlinks aren't necessary to give an infinite loop: you can have
> > upwards hard links as well (at least on *nix).  You have to keep a
> > list of inodes you've already visited (per filesystem, of course).
> 
> I believe hard linked directories are banned to avoid this problem.

Most modern Unices disallow the creation of additional hard links to
directories, so the only "upward" links are the "." entry and the ".." 
entries for each subdirectory (which I specifically filtered in my
example[1]).

[1] Actually, I filtered anything beginning with a "."; I should have
pointed that out. To change that, replace:
let names' = filter ((/= '.') . head) names
with:
let names' = filter (`notElem` [".", ".."]) names

Unices which do allow the creation of hard links to directories only
allow this for root, and programs which perform directory traversal
frequently fall down on such cases; tracking visited directories tends
to be the exception rather than the norm. The usual solution is to
assume that root knows what they're doing (and if they don't, tough)
and ignore the issue.

OTOH, directory-traversal code which follows symlinks will fall down
far more readily, as symlinks to directories can be created on all
platforms which allow symlinks, and require no special privileges.

[On early Linux distributions, it was quite common for /etc/inet to be
a symlink to /etc, as programs often disagreed as to whether certain
configuration files belonged in /etc or /etc/inet. Attempting to
back-up the Linux filesystem via Samba with a Windows backup tool
would result in an "end of tape" error somewhere around
/etc/inet/inet/inet/inet/]

> This is true on Linux at least, I don't know what POSIX specifies.

I don't know about POSIX, but Unix98 says:

[http://www.opengroup.org/onlinepubs/009695399/functions/link.html]

  DESCRIPTION
  ...
If path1 names a directory, link() shall fail unless the process has
appropriate privileges and the implementation supports using link()
on directories.
  ...
  ERRORS
  ...
[EPERM]
The file named by path1 is a directory and either the calling
process does not have appropriate privileges or the
implementation prohibits using link() on directories.
  ...
  RATIONALE

Linking to a directory is restricted to the superuser in most
historical implementations because this capability may produce
loops in the file hierarchy or otherwise corrupt the file system. 
This volume of IEEE Std 1003.1-2001 continues that philosophy by
prohibiting link() and unlink() from doing this. Other functions
could do it if the implementor designed such an extension.

There isn't a macro or sysconf/pathconf option to query whether the
platform allows links to directories.

-- 
Glynn Clements <[EMAIL PROTECTED]>
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Reading a directory tree

2004-06-22 Thread Duncan Coutts
On Tue, 2004-06-22 at 21:02, Keith Wansbrough wrote:
> > BTW, one other caveat (which applies to all of the examples so far):
> > doesDirectoryExist doesn't distinguish between directories and
> > symlinks to directories. Consequently, any directory-traversal
> > algorithm which uses doesDirectoryExist to identify directories will
> > behave incorrectly in the presence of symlinks. In the worst case, you
> > can get into an infinite loop.
> 
> symlinks aren't necessary to give an infinite loop: you can have
> upwards hard links as well (at least on *nix).  You have to keep a
> list of inodes you've already visited (per filesystem, of course).

I believe hard linked directories are banned to avoid this problem. This
is true on Linux at least, I don't know what POSIX specifies.

Duncan

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Reading a directory tree

2004-06-22 Thread Keith Wansbrough

> BTW, one other caveat (which applies to all of the examples so far):
> doesDirectoryExist doesn't distinguish between directories and
> symlinks to directories. Consequently, any directory-traversal
> algorithm which uses doesDirectoryExist to identify directories will
> behave incorrectly in the presence of symlinks. In the worst case, you
> can get into an infinite loop.

symlinks aren't necessary to give an infinite loop: you can have upwards hard links as 
well (at least on *nix).  You have to keep a list of inodes you've already visited 
(per filesystem, of course).

--KW 8-)
-- 
Keith Wansbrough <[EMAIL PROTECTED]>
http://www.cl.cam.ac.uk/users/kw217/
University of Cambridge Computer Laboratory.

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Reading a directory tree

2004-06-22 Thread Glynn Clements

Tom Hofte wrote:

> I'm looking for a way to iteratively read all the files in a directory
> and
> its subdirectories, given the filepath of the top-level dir.
> For example, I want to find a file, corresponding to a given filename,
> in a directory and its subdirectories.
> 
> Is there a way to implement this in Haskell?

This is somewhat simpler than the other examples which have been
given.

> import Monad
> import Directory
> 
> scanDir :: FilePath -> IO [FilePath]
> scanDir dir = do
>   names <- getDirectoryContents dir
>   let names' = filter ((/= '.') . head) names
>   let paths = map ((dir ++ "/") ++) names'
>   dirs <- filterM doesDirectoryExist paths
>   files <- filterM doesFileExist paths
>   rest <- mapM scanDir dirs
>   return $ files ++ concat rest

One caveat: (scanDir "/") isn't handled correctly; you will get an
extra slash, i.e. "//usr", "//bin" etc. Not that I'd recommend using
this code for scanning an entire filesystem, due to performance
issues.

-- 
Glynn Clements <[EMAIL PROTECTED]>
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Reading a directory tree

2004-06-22 Thread Glynn Clements

Tom Hofte wrote:

> I'm looking for a way to iteratively read all the files in a directory
> and
> its subdirectories, given the filepath of the top-level dir.
> For example, I want to find a file, corresponding to a given filename,
> in a directory and its subdirectories.
> 
> Is there a way to implement this in Haskell?

BTW, one other caveat (which applies to all of the examples so far):
doesDirectoryExist doesn't distinguish between directories and
symlinks to directories. Consequently, any directory-traversal
algorithm which uses doesDirectoryExist to identify directories will
behave incorrectly in the presence of symlinks. In the worst case, you
can get into an infinite loop.

The only alternative is to use the functions from the Posix library
(getSymbolicLinkStatus and isDirectory) instead, e.g.:

> import Posix
> 
> doesDirectoryReallyExist :: FilePath -> IO Bool
> doesDirectoryReallyExist path = do
>   stat <- getSymbolicLinkStatus
>   return $ isDirectory stat

-- 
Glynn Clements <[EMAIL PROTECTED]>
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Reading a directory tree

2004-06-22 Thread Gregory Wright

Hi Tom,

Attached is a haskell file I wrote when I was learning how to use the
directory functions. It builds a tree structure corresponding to the directory
tree and finds the files that end in ".txt". It then sorted the files in order
of modification time. As you can guess from the program, it was for managing
weblog entries in Blosxom.

It's beginner-ish code, but you can probably adapt it to your needs.
Note that the top level directory name "dir" is hardcoded into the program.


Best Wishes,
Greg


On Jun 22, 2004, at 5:20 AM, Tom Hofte wrote:

Hi,
 
I'm looking for a way to iteratively read all the files in a directory and
its subdirectories, given the filepath of the top-level dir.
For example, I want to find a file, corresponding to a given filename, in a directory and its subdirectories.
 
Is there a way to implement this in Haskell?
 
Kind regards,
 
Tom Hofte
 
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


dirtree.hs:

--
-- walk a directory tree and find all of the files ending in .txt
--

module Main (main) where

import Monad
import Directory
import Text.Regex
import List

--
-- dirTree returns a tree with nodes containing file
-- information or directory information and a subdirectory.
--

data DTree = FileNode String | DirNode String [DTree]


addPrefix d str = do return (d ++ str)
addSuffix d str = do return (str ++ d)

scanDir d = do
dentries <- getDirectoryContents d

files  <- mapM (addPrefix d) dentries
ffiles <- filterM doesFileExist files

-- directory names require extra processing: we must delete the "." and ".." entries
-- and add a trailing "/"

dirs<- filterM (\x -> do return (x /= "." && x /= "..")) dentries
ddirs   <- mapM (addPrefix d) dirs
dddirs  <- mapM (addSuffix "/") ddirs
subdirs <- filterM doesDirectoryExist dddirs

subDTrees <- mapM scanDir subdirs

return (DirNode d ((map (\f -> FileNode f) ffiles) ++ subDTrees))


-- walk a directory tree, printing the contents

showDTree (FileNode fname) = do print fname
showDTree (DirNode  dname ds) = do
print dname
mapM showDTree ds
return ()

-- given a directory tree, find the files that end in ".txt"

findTxt (DirNode dname ds) = concat (map findTxt ds)
findTxt (FileNode fname)   = if (isTextFile fname) then [fname] else []

-- given a file name, see if it ends in ".txt"

isTextFile f = (matchRegex regexp f) /= Nothing where regexp = mkRegex ".*txt"
printIfMatched f = if (isTextFile f) then do print f else do return ()


dir = "/Users/gwright/Desktop/Blosxom/docs/"

timeSortedFiles d = do
directoryTree <- scanDir d
showDTree directoryTree
textFiles <- do return (findTxt directoryTree)
modTimes  <- mapM getModificationTime textFiles

fs  <- do return (sortBy (\(f1,t1) (f2,t2) -> compare t1 t2) (zip textFiles modTimes))
return (map fst fs)


main = do
fs  <- timeSortedFiles dir
fs' <- do return (take 1 fs)
mapM (\f -> do c <- readFile f; putStr c) fs'
return ()


___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Reading a directory tree

2004-06-22 Thread John Hughes
> Hi,
>
> I'm looking for a way to iteratively read all the files in a directory and
> its subdirectories, given the filepath of the top-level dir.
> For example, I want to find a file, corresponding to a given filename, in
> a directory and its subdirectories.
>
> Is there a way to implement this in Haskell?
>

Just a supplement to my previous message: you can find better
documentation of the Directory library here:

http://www.haskell.org/ghc/docs/latest/html/libraries/base/System.Directory.html

John Hughes
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: [Haskell] Reading a directory tree

2004-06-22 Thread John Hughes
> Hi,
>
> I'm looking for a way to iteratively read all the files in a directory and
> its subdirectories, given the filepath of the top-level dir.
> For example, I want to find a file, corresponding to a given filename, in
> a directory and its subdirectories.
>

getDirectoryContents is your friend. It's a function in the standard
library Directory, documented here:
http://haskell.org/onlinereport/directory.html

getDirectoryContents  :: FilePath -> IO [FilePath]

A FilePath is just a String.

John Hughes
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


[Haskell] Reading a directory tree

2004-06-22 Thread Tom Hofte



Hi,
 
I'm looking for a way to iteratively read all the 
files in a directory and
its subdirectories, given the filepath of the 
top-level dir.
For example, I want to find a file, corresponding 
to a given filename, in a directory and its 
subdirectories.
 
Is there a way to implement this in 
Haskell?
 
Kind regards,
 
Tom Hofte
 
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell