Send Beginners mailing list submissions to beginners@haskell.org To subscribe or unsubscribe via the World Wide Web, visit http://www.haskell.org/mailman/listinfo/beginners or, via email, send a message with subject or body 'help' to beginners-requ...@haskell.org
You can reach the person managing the list at beginners-ow...@haskell.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Beginners digest..." Today's Topics: 1. space leak processing multiple compressed files (Ian Knopke) 2. monads / do syntax (Christopher Howard) 3. Re: monads / do syntax (Ozgur Akgun) 4. Re: space leak processing multiple compressed files (Lorenzo Bolla) 5. Re: space leak processing multiple compressed files (Ian Knopke) 6. Re: space leak processing multiple compressed files (Benjamin Edwards) 7. Re: space leak processing multiple compressed files (Michael Orlitzky) ---------------------------------------------------------------------- Message: 1 Date: Tue, 4 Sep 2012 11:00:48 +0100 From: Ian Knopke <ian.kno...@gmail.com> Subject: [Haskell-beginners] space leak processing multiple compressed files To: beginners@haskell.org Message-ID: <CAC+f4w=PL_8CbaqjGtPy_bEHOUnCRfLUCDzC6a=6+0tzdtk...@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Hi everyone, I have a collection of bzipped files. Each file has a different number of items per line, with a separator between them. What I want to do is count the items in each file. I'm trying to read the files lazily but I seem to be running out of memory. I'm assuming I'm holding onto resources longer than I need to. Does anyone have any advice on how to improve this? Here's the basic program, slightly sanitized: main = do -- get a list of file names filelist <- getFileList "testsetdir" -- process each compressed file files <- mapM (\x -> do thisfile <- B.readFile x return (Z.decompress thisfile) ) filelist display $ processEntries files putStrLn "finished" -- processEntries -- processEntries is defined elsewhere, but basically does some string processing per line, -- counts the number of resulting elements and sums them per file processEntries :: [B.ByteString] -> Int processEntries xs = foldl' (\x y -> x + processEntries (B.lines y)) 0 xs -- display a field that returns a number display :: Int -> IO () display = putStrLn . show ------------------------------ Message: 2 Date: Tue, 04 Sep 2012 04:03:28 -0800 From: Christopher Howard <christopher.how...@frigidcode.com> Subject: [Haskell-beginners] monads / do syntax To: Haskell Beginners <beginners@haskell.org> Message-ID: <5045ee10.3040...@frigidcode.com> Content-Type: text/plain; charset="iso-8859-1" What does the following do expression translate into? (I.e., using >>= operator and lambda functions.) code: -------- h = do a <- (return 1 :: IO Integer) b <- (return 2) return (a + b + 1) -------- -- frigidcode.com indicium.us -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 554 bytes Desc: OpenPGP digital signature URL: <http://www.haskell.org/pipermail/beginners/attachments/20120904/b3bac86d/attachment-0001.pgp> ------------------------------ Message: 3 Date: Tue, 4 Sep 2012 13:10:18 +0100 From: Ozgur Akgun <ozgurak...@gmail.com> Subject: Re: [Haskell-beginners] monads / do syntax To: Christopher Howard <christopher.how...@frigidcode.com> Cc: Haskell Beginners <beginners@haskell.org> Message-ID: <calzazpdcx_cgg1xkoezprbppb9xf7vhh5-qsu9w0owdueki...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Hi, On 4 September 2012 13:03, Christopher Howard < christopher.how...@frigidcode.com> wrote: > h = do a <- (return 1 :: IO Integer) > b <- (return 2) > return (a + b + 1) > h = (return 1 :: IO Integer) >>= \ a -> (return 2) >>= \ b -> return (a + b + 1) HTH, Ozgur -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.haskell.org/pipermail/beginners/attachments/20120904/faefb460/attachment-0001.htm> ------------------------------ Message: 4 Date: Tue, 4 Sep 2012 13:55:38 +0100 From: Lorenzo Bolla <lbo...@gmail.com> Subject: Re: [Haskell-beginners] space leak processing multiple compressed files To: Ian Knopke <ian.kno...@gmail.com> Cc: beginners@haskell.org Message-ID: <cadjgtry+rh+nkon4jwnmtdt_b5nmypw-kqpkuwfdxscerxn...@mail.gmail.com> Content-Type: text/plain; charset=UTF-8 On Tue, Sep 4, 2012 at 11:00 AM, Ian Knopke <ian.kno...@gmail.com> wrote: > main = do > > -- get a list of file names > filelist <- getFileList "testsetdir" > > -- process each compressed file > files <- mapM (\x -> do > thisfile <- B.readFile x > return (Z.decompress thisfile) > ) filelist > > > display $ processEntries files > > > putStrLn "finished" > > -- processEntries > -- processEntries is defined elsewhere, but basically does some string > processing per line, > -- counts the number of resulting elements and sums them per file > processEntries :: [B.ByteString] -> Int > processEntries xs = foldl' (\x y -> x + processEntries (B.lines y)) 0 xs The problem seems to be your `processEntries` function: it is recursively defined, and as far as I understand, it's never going to end because "y" (inside the lambda function) is always going to be the full list of files (xs). Probably, `processEntries` should be something like: processEntries = foldl' (\acc fileContent -> acc + processFileContent fileContent) 0 processFileContent :: B.ByteString -> Int processFileContent = -- count what you have to, in a file In fact, processEntries could be rewritten without using foldl': processEntries = sum . map processFileContent hth, L. ------------------------------ Message: 5 Date: Tue, 4 Sep 2012 14:34:13 +0100 From: Ian Knopke <ian.kno...@gmail.com> Subject: Re: [Haskell-beginners] space leak processing multiple compressed files To: Lorenzo Bolla <lbo...@gmail.com> Cc: beginners@haskell.org Message-ID: <CAC+f4wnnE6f0iUbbYekjZAtdvtV8uWipfC-J1D_4fNX=t8c...@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Hi Lorenzo, You're correct. Well spotted! I must have created that doing some copy and paste. The program is basically as you suggested it. Here's a corrected version: main = do -- get a list of file names filelist <- getFileList "testsetdir" -- process each compressed file files <- mapM (\x -> do thisfile <- B.readFile x return (Z.decompress thisfile) ) filelist display $ processEntries files putStrLn "finished" -- processEntries -- processEntries is defined elsewhere, but basically does some string -- processing per line, counts the number of resulting elements and sums them per file processEntries :: [B.ByteString] -> Int processEntries xs = foldl' (\x y -> x + countItems (B.lines y)) 0 xs I'm still running into memory issues though. I think it's the mapM loop above and that each file is not being released after reading through it. Does that seem reasonable, and is there any way to write this better? Ian ... and countItems uses foldl' On Tue, Sep 4, 2012 at 1:55 PM, Lorenzo Bolla <lbo...@gmail.com> wrote: > On Tue, Sep 4, 2012 at 11:00 AM, Ian Knopke <ian.kno...@gmail.com> wrote: >> main = do >> >> -- get a list of file names >> filelist <- getFileList "testsetdir" >> >> -- process each compressed file >> files <- mapM (\x -> do >> thisfile <- B.readFile x >> return (Z.decompress thisfile) >> ) filelist >> >> >> display $ processEntries files >> >> >> putStrLn "finished" >> >> -- processEntries >> -- processEntries is defined elsewhere, but basically does some string >> processing per line, >> -- counts the number of resulting elements and sums them per file >> processEntries :: [B.ByteString] -> Int >> processEntries xs = foldl' (\x y -> x + processEntries (B.lines y)) 0 xs > > The problem seems to be your `processEntries` function: it is > recursively defined, and as far as I understand, it's never going to > end because "y" (inside the lambda function) is always going to be the > full list of files (xs). > > Probably, `processEntries` should be something like: > > processEntries = foldl' (\acc fileContent -> acc + processFileContent > fileContent) 0 > > processFileContent :: B.ByteString -> Int > processFileContent = -- count what you have to, in a file > > In fact, processEntries could be rewritten without using foldl': > processEntries = sum . map processFileContent > > hth, > L. ------------------------------ Message: 6 Date: Tue, 4 Sep 2012 14:38:52 +0100 From: Benjamin Edwards <edwards.b...@gmail.com> Subject: Re: [Haskell-beginners] space leak processing multiple compressed files To: Ian Knopke <ian.kno...@gmail.com> Cc: haskellbeginners <beginners@haskell.org> Message-ID: <CAN6k4nh-L7xQovAXmj0MbW=mtr1gtxunqfkxgcdje83375b...@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" You might want to look at conduits if you need deterministic and prompt finalisation. I would sketch out a solution but I have only my phone. On Sep 4, 2012 2:36 PM, "Ian Knopke" <ian.kno...@gmail.com> wrote: > Hi Lorenzo, > > You're correct. Well spotted! I must have created that doing some copy > and paste. The program is basically as you suggested it. Here's a > corrected version: > > main = do > > -- get a list of file names > filelist <- getFileList "testsetdir" > > -- process each compressed file > files <- mapM (\x -> do > thisfile <- B.readFile x > return (Z.decompress thisfile) > ) filelist > > display $ processEntries files > > putStrLn "finished" > > -- processEntries > -- processEntries is defined elsewhere, but basically does some string > -- processing per line, counts the number of resulting elements and > sums them per file > processEntries :: [B.ByteString] -> Int > processEntries xs = foldl' (\x y -> x + countItems (B.lines y)) 0 xs > > I'm still running into memory issues though. I think it's the mapM > loop above and that each file is not being released after reading > through it. Does that seem reasonable, and is there any way to write > this better? > > > Ian > > > > ... and countItems uses foldl' > On Tue, Sep 4, 2012 at 1:55 PM, Lorenzo Bolla <lbo...@gmail.com> wrote: > > On Tue, Sep 4, 2012 at 11:00 AM, Ian Knopke <ian.kno...@gmail.com> > wrote: > >> main = do > >> > >> -- get a list of file names > >> filelist <- getFileList "testsetdir" > >> > >> -- process each compressed file > >> files <- mapM (\x -> do > >> thisfile <- B.readFile x > >> return (Z.decompress thisfile) > >> ) filelist > >> > >> > >> display $ processEntries files > >> > >> > >> putStrLn "finished" > >> > >> -- processEntries > >> -- processEntries is defined elsewhere, but basically does some string > >> processing per line, > >> -- counts the number of resulting elements and sums them per file > >> processEntries :: [B.ByteString] -> Int > >> processEntries xs = foldl' (\x y -> x + processEntries (B.lines y)) 0 xs > > > > The problem seems to be your `processEntries` function: it is > > recursively defined, and as far as I understand, it's never going to > > end because "y" (inside the lambda function) is always going to be the > > full list of files (xs). > > > > Probably, `processEntries` should be something like: > > > > processEntries = foldl' (\acc fileContent -> acc + processFileContent > > fileContent) 0 > > > > processFileContent :: B.ByteString -> Int > > processFileContent = -- count what you have to, in a file > > > > In fact, processEntries could be rewritten without using foldl': > > processEntries = sum . map processFileContent > > > > hth, > > L. > > _______________________________________________ > Beginners mailing list > Beginners@haskell.org > http://www.haskell.org/mailman/listinfo/beginners > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.haskell.org/pipermail/beginners/attachments/20120904/ecd101fb/attachment-0001.htm> ------------------------------ Message: 7 Date: Tue, 04 Sep 2012 10:20:34 -0400 From: Michael Orlitzky <mich...@orlitzky.com> Subject: Re: [Haskell-beginners] space leak processing multiple compressed files To: beginners@haskell.org Message-ID: <50460e32.1030...@orlitzky.com> Content-Type: text/plain; charset=ISO-8859-1 On 09/04/2012 06:00 AM, Ian Knopke wrote: > -- display a field that returns a number > display :: Int -> IO () > display = putStrLn . show This is just 'print', specialized to Int: Prelude> :t print print :: Show a => a -> IO () ------------------------------ _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners End of Beginners Digest, Vol 51, Issue 6 ****************************************