Re: [Haskell-cafe] Re: Lazy IO and closing of file handles
[trigger garbage collection when open runs out of free file descriptors, then try again] so, instead of documenting limitations and workarounds, this issue should be fixed in GHC as well. This may help in some cases but it cannot be relied upon. Finalizers are always run in a separate thread (must be, see http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html). Thus, even if you force a GC when handles are exhausted, as hugs seems to do, there is no guarantee that by the time the GC is done the finalizers have freed any handles (assuming that the GC run really detects any handles to be garbage). useful reference to collect!-) but even that mentions giving back os resources such as file descriptors as one of the simpler cases. running the GC/finalizers sequence repeatedly until nothing more changes might be worth thinking about, as are possible race conditions. here is the thread the paper is refering to as one of its origins: http://gcc.gnu.org/ml/java/2001-12/msg00113.html http://gcc.gnu.org/ml/java/2001-12/msg00390.html i also like the idea mentioned as one of the alternatives in 3.1, where the finalizer does not notify the object that is to become garbage, but a different manager object. in this case, one might notify the i/o handler, and that could take care of avoiding trouble. in my opinion, if my code or my finalizers hold on to resources i'd like to see freed, then i'm responsible, even if i might need language help to remedy the situation. but if i take care to avoid such references, and the system still runs out of resources just because it can't be bothered to check right now whether it has some left to free, there is nothing i can do about it (apart from complaining, that is!-). of course, this isn't new. see, for instance, this thread view: http://groups.google.com/group/fa.haskell/browse_thread/thread/2f1f855c8ba33a5/74d32070dbcc92fc?lnk=st&q=hugs+openFile+file+descriptor+garbage+collection&rnum=1#74d32070dbcc92fc where Remi Turk points out System.Mem.performGC, and Simon Marlow agrees that GHC should do more to free file descriptors, but also mentions that performGC doesn't run finalizers. actually, if i have readFile-based code that immediately processes the file contents before the next readFile, as in Matthew's test code, my ghci (on windows) doesn't seem to run out of file descriptors easily, but if i force a descriptor leak by leaving unreferenced contents unprocessed, then performGC does seem to help (not that this is ideal in general, as discussed in the thread above): import System.Environment import System.Mem import System.IO main = do n:f:_ <- getArgs (sequence (repeat (openFile f ReadMode)) >> return ()) `catch` (\_->return ()) test1 (take (read n) $ repeat f) test1 files = mapM_ doStuff files where doStuff f = {- performGC >> -} readFile f >>= print.map length.take 10.lines interestingly, if i do that, even Hugs seems to need the performGC? claus ps. one could even try to go further, and have virtual file descriptors, like virtual memory. but that is something for the os, i guess. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Lazy IO and closing of file handles
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pete Kazmier wrote: > Bryan O'Sullivan <[EMAIL PROTECTED]> writes: > >> Pete Kazmier wrote: >> >>> I understand the intent of this code, but I am having a hard time >>> understanding the implementation, specifically the combination of >>> 'fix', 'flip', and 'interate'. I looked up 'fix' and I'm unsure how >>> one can call 'flip' on a function that takes one argument. > >> As to why it's okay to call "flip" on "fix" at all, look at the types >> involved. >> >> fix :: (a -> a) -> a >> flip :: (a -> b -> c) -> b -> a -> c >> >> By substitution: >> >> flip fix :: a -> ((a -> b) -> a -> b) -> b > > Sadly, I'm still confused. I understand how 'flip' works in the case > where its argument is a function that takes two arguments. I've > started to use this in my own code lately. But my brain refuses to > understand how 'flip' is applied to 'fix', a function that takes one > argument only, which happens to be a function itself. What is 'flip' > flipping when the function passed to it only takes one argument? fix :: (a -> a) -> a In this case, we know something about 'a': it is a function (b -> c). Substitute: fix :: ((b -> c) -> (b -> c)) -> (b -> c) Take advantage of the right-associativity of (->) fix :: ((b -> c) -> b -> c) -> b -> c Now it looks like a function of two arguments, because the return value (normally ordinary data) can in fact, in this case, take arguments. Here's another example of that: data Box a = Box a get (Box a) = a - -- get (Box 1) :: Int - -- get (Box (\a -> a)) :: Int -> Int - -- (get (Box (\a -> a))) 1 :: Int --function application is left-associative: - -- get (Box (\a -> a)) 1 :: Int - -- flip get 1 (Box (\a -> a)) :: Int Yes, it sometimes confuses me too. Isaac -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF/vcXHgcxvIWYTTURAj5RAKCUMeAF0vosJ6ROAVlBIDHsEq/vzgCfflnR 50BmW6tuAF6mKXBtrlHdQ5Y= =uv3G -END PGP SIGNATURE- ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Lazy IO and closing of file handles
Here's what happens: fix has type (x->x)->x and that has to match the first argument to flip, namely 'a->b->c'. The only chance of that is if x is actually a function type. Pick x=b->c, now we have fix has type ((b->c)->b->c)->b->c and it matches a->b->c if a=(b->c)->b->c Flip returns b->a->c, and if we substitute we get b->((b->c)->b->c)->c If you rename the variables you get the suggested type. -- Lennart On Mar 19, 2007, at 20:35 , Pete Kazmier wrote: Bryan O'Sullivan <[EMAIL PROTECTED]> writes: Pete Kazmier wrote: I understand the intent of this code, but I am having a hard time understanding the implementation, specifically the combination of 'fix', 'flip', and 'interate'. I looked up 'fix' and I'm unsure how one can call 'flip' on a function that takes one argument. As to why it's okay to call "flip" on "fix" at all, look at the types involved. fix :: (a -> a) -> a flip :: (a -> b -> c) -> b -> a -> c By substitution: flip fix :: a -> ((a -> b) -> a -> b) -> b Sadly, I'm still confused. I understand how 'flip' works in the case where its argument is a function that takes two arguments. I've started to use this in my own code lately. But my brain refuses to understand how 'flip' is applied to 'fix', a function that takes one argument only, which happens to be a function itself. What is 'flip' flipping when the function passed to it only takes one argument? Thanks, Pete ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Lazy IO and closing of file handles
Pete Kazmier wrote: I understand the intent of this code, but I am having a hard time understanding the implementation, specifically the combination of 'fix', 'flip', and 'interate'. I looked up 'fix' and I'm unsure how one can call 'flip' on a function that takes one argument. If you look at the code, that's not really what's happening. See the embedded anonymous function below? > flip fix accum $ > \iterate accum -> do > ... It's a function of two arguments. All "flip" is doing is switching the order of the arguments to "fix", in this case for readability. If you were to get rid of the "flip", you'd need to remove the "accum" after "fix" and move it after the lambda expression, which would make the expression much uglier to write and read. So all the "flip" is doing here is tidying up the code. (If you're still confused, look at the difference between forM and mapM. The only reason forM exists is readability when you have - in terms of the amount of screen space they consume - a big function and a small piece of data, just as here.) As to why it's okay to call "flip" on "fix" at all, look at the types involved. fix :: (a -> a) -> a flip :: (a -> b -> c) -> b -> a -> c By substitution: flip fix :: a -> ((a -> b) -> a -> b) -> b In the case above, accum has type a, and the lambda has type (a -> IO a) -> a -> IO a, and these fit nicely into the type expected by "flip fix". http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Lazy IO and closing of file handles
Quoth Pete Kazmier, nevermore, > the same error regarding max open files. Incidentally, the lazy > bytestring version of my program was by far the fastest and used the > least amount of memory, but it still crapped out regarding max open > files. I've tried the approach you appear to be using and it can be tricky to predict how the laziness will interact with the list of actions. For example, I tried to download a temporary file, read a bit of data out of it and then download another one. I thought I would save thinking and use the same file name for each download: /tmp/feed.xml. What happened was that it downloaded them all in rapid succession, over-writing each one with the next and not actually reading the data until the end. So I ended up parsing N identical copies of the final file, instead of one of each. You need to refactor how you map the functions so that fewer whole lists are passed around. I'd guess that (1) is being executed in its entirety before being passed to (2), but it's not until (2) that the file data is actually used. > main = > getArgs >>= > mapM fileContentsOfDirectory >>= -- (1) > mapM_ print . threadEmails . map parseEmail . concat -- (2) This means there are a lot of files sitting open doing nothing. I've had a lot of success by recreating this as: > main = > getArgs >>= > mapM_ readAndPrint > where readAndPrint = fileContentsOfDirectory >>= print -- etc. It may seem semantically identical but it sometimes makes a difference when things actually happen. -- Dougal Stanton ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Lazy IO and closing of file handles
pete-expires-20070513: > [EMAIL PROTECTED] (Donald Bruce Stewart) writes: > > > pete-expires-20070513: > >> When using readFile to process a large number of files, I am exceeding > >> the resource limits for the maximum number of open file descriptors on > >> my system. How can I enhance my program to deal with this situation > >> without making significant changes? > > > > Read in data strictly, and there are two obvious ways to do that: > > > > -- Via strings: > > > > readFileStrict f = do > > s <- readFile f > > length s `seq` return s > > > > -- Via ByteStrings > > readFileStrict = Data.ByteString.readFile > > readFileStrictString = liftM Data.ByteString.unpack > > Data.ByteString.readFile > > > > If you're reading more than say, 100k of data, I'd use strict > > ByteStrings without hesitation. More than 10M, and I'd use lazy > > bytestrings. > > Correct me if I'm wrong, but isn't this exactly what I wanted to > avoid? Reading the entire file into memory? In my previous email, I > was trying to state that I wanted to lazily read the file because some > of the files are quite large and there is no reason to read beyond the > small set of headers. If I read the entire file into memory, this > design goal is no longer met. > > Nevertheless, I was benchmarking with ByteStrings (both lazy and > strict), and in both cases, the ByteString versions of readFile yield > the same error regarding max open files. Incidentally, the lazy > bytestring version of my program was by far the fastest and used the > least amount of memory, but it still crapped out regarding max open > files. > > So I'm back to square one. Any other ideas? Hmm. Ok. So we need to have more hClose's happen somehow. Can you process files one at a time? -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe