Send Beginners mailing list submissions to
[email protected]
To subscribe or unsubscribe via the World Wide Web, visit
http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
[email protected]
You can reach the person managing the list at
[email protected]
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."
Today's Topics:
1. space leak processing multiple compressed files (Ian Knopke)
2. monads / do syntax (Christopher Howard)
3. Re: monads / do syntax (Ozgur Akgun)
4. Re: space leak processing multiple compressed files
(Lorenzo Bolla)
5. Re: space leak processing multiple compressed files (Ian Knopke)
6. Re: space leak processing multiple compressed files
(Benjamin Edwards)
7. Re: space leak processing multiple compressed files
(Michael Orlitzky)
----------------------------------------------------------------------
Message: 1
Date: Tue, 4 Sep 2012 11:00:48 +0100
From: Ian Knopke <[email protected]>
Subject: [Haskell-beginners] space leak processing multiple compressed
files
To: [email protected]
Message-ID:
<CAC+f4w=PL_8CbaqjGtPy_bEHOUnCRfLUCDzC6a=6+0tzdtk...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Hi everyone,
I have a collection of bzipped files. Each file has a different number
of items per line, with a separator between them. What I want to do is
count the items in each file. I'm trying to read the files lazily but
I seem to be running out of memory. I'm assuming I'm holding onto
resources longer than I need to. Does anyone have any advice on how to
improve this?
Here's the basic program, slightly sanitized:
main = do
-- get a list of file names
filelist <- getFileList "testsetdir"
-- process each compressed file
files <- mapM (\x -> do
thisfile <- B.readFile x
return (Z.decompress thisfile)
) filelist
display $ processEntries files
putStrLn "finished"
-- processEntries
-- processEntries is defined elsewhere, but basically does some string
processing per line,
-- counts the number of resulting elements and sums them per file
processEntries :: [B.ByteString] -> Int
processEntries xs = foldl' (\x y -> x + processEntries (B.lines y)) 0 xs
-- display a field that returns a number
display :: Int -> IO ()
display = putStrLn . show
------------------------------
Message: 2
Date: Tue, 04 Sep 2012 04:03:28 -0800
From: Christopher Howard <[email protected]>
Subject: [Haskell-beginners] monads / do syntax
To: Haskell Beginners <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset="iso-8859-1"
What does the following do expression translate into? (I.e., using >>=
operator and lambda functions.)
code:
--------
h = do a <- (return 1 :: IO Integer)
b <- (return 2)
return (a + b + 1)
--------
--
frigidcode.com
indicium.us
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 554 bytes
Desc: OpenPGP digital signature
URL:
<http://www.haskell.org/pipermail/beginners/attachments/20120904/b3bac86d/attachment-0001.pgp>
------------------------------
Message: 3
Date: Tue, 4 Sep 2012 13:10:18 +0100
From: Ozgur Akgun <[email protected]>
Subject: Re: [Haskell-beginners] monads / do syntax
To: Christopher Howard <[email protected]>
Cc: Haskell Beginners <[email protected]>
Message-ID:
<calzazpdcx_cgg1xkoezprbppb9xf7vhh5-qsu9w0owdueki...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi,
On 4 September 2012 13:03, Christopher Howard <
[email protected]> wrote:
> h = do a <- (return 1 :: IO Integer)
> b <- (return 2)
> return (a + b + 1)
>
h =
(return 1 :: IO Integer) >>= \ a ->
(return 2) >>= \ b ->
return (a + b + 1)
HTH,
Ozgur
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.haskell.org/pipermail/beginners/attachments/20120904/faefb460/attachment-0001.htm>
------------------------------
Message: 4
Date: Tue, 4 Sep 2012 13:55:38 +0100
From: Lorenzo Bolla <[email protected]>
Subject: Re: [Haskell-beginners] space leak processing multiple
compressed files
To: Ian Knopke <[email protected]>
Cc: [email protected]
Message-ID:
<cadjgtry+rh+nkon4jwnmtdt_b5nmypw-kqpkuwfdxscerxn...@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
On Tue, Sep 4, 2012 at 11:00 AM, Ian Knopke <[email protected]> wrote:
> main = do
>
> -- get a list of file names
> filelist <- getFileList "testsetdir"
>
> -- process each compressed file
> files <- mapM (\x -> do
> thisfile <- B.readFile x
> return (Z.decompress thisfile)
> ) filelist
>
>
> display $ processEntries files
>
>
> putStrLn "finished"
>
> -- processEntries
> -- processEntries is defined elsewhere, but basically does some string
> processing per line,
> -- counts the number of resulting elements and sums them per file
> processEntries :: [B.ByteString] -> Int
> processEntries xs = foldl' (\x y -> x + processEntries (B.lines y)) 0 xs
The problem seems to be your `processEntries` function: it is
recursively defined, and as far as I understand, it's never going to
end because "y" (inside the lambda function) is always going to be the
full list of files (xs).
Probably, `processEntries` should be something like:
processEntries = foldl' (\acc fileContent -> acc + processFileContent
fileContent) 0
processFileContent :: B.ByteString -> Int
processFileContent = -- count what you have to, in a file
In fact, processEntries could be rewritten without using foldl':
processEntries = sum . map processFileContent
hth,
L.
------------------------------
Message: 5
Date: Tue, 4 Sep 2012 14:34:13 +0100
From: Ian Knopke <[email protected]>
Subject: Re: [Haskell-beginners] space leak processing multiple
compressed files
To: Lorenzo Bolla <[email protected]>
Cc: [email protected]
Message-ID:
<CAC+f4wnnE6f0iUbbYekjZAtdvtV8uWipfC-J1D_4fNX=t8c...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Hi Lorenzo,
You're correct. Well spotted! I must have created that doing some copy
and paste. The program is basically as you suggested it. Here's a
corrected version:
main = do
-- get a list of file names
filelist <- getFileList "testsetdir"
-- process each compressed file
files <- mapM (\x -> do
thisfile <- B.readFile x
return (Z.decompress thisfile)
) filelist
display $ processEntries files
putStrLn "finished"
-- processEntries
-- processEntries is defined elsewhere, but basically does some string
-- processing per line, counts the number of resulting elements and
sums them per file
processEntries :: [B.ByteString] -> Int
processEntries xs = foldl' (\x y -> x + countItems (B.lines y)) 0 xs
I'm still running into memory issues though. I think it's the mapM
loop above and that each file is not being released after reading
through it. Does that seem reasonable, and is there any way to write
this better?
Ian
... and countItems uses foldl'
On Tue, Sep 4, 2012 at 1:55 PM, Lorenzo Bolla <[email protected]> wrote:
> On Tue, Sep 4, 2012 at 11:00 AM, Ian Knopke <[email protected]> wrote:
>> main = do
>>
>> -- get a list of file names
>> filelist <- getFileList "testsetdir"
>>
>> -- process each compressed file
>> files <- mapM (\x -> do
>> thisfile <- B.readFile x
>> return (Z.decompress thisfile)
>> ) filelist
>>
>>
>> display $ processEntries files
>>
>>
>> putStrLn "finished"
>>
>> -- processEntries
>> -- processEntries is defined elsewhere, but basically does some string
>> processing per line,
>> -- counts the number of resulting elements and sums them per file
>> processEntries :: [B.ByteString] -> Int
>> processEntries xs = foldl' (\x y -> x + processEntries (B.lines y)) 0 xs
>
> The problem seems to be your `processEntries` function: it is
> recursively defined, and as far as I understand, it's never going to
> end because "y" (inside the lambda function) is always going to be the
> full list of files (xs).
>
> Probably, `processEntries` should be something like:
>
> processEntries = foldl' (\acc fileContent -> acc + processFileContent
> fileContent) 0
>
> processFileContent :: B.ByteString -> Int
> processFileContent = -- count what you have to, in a file
>
> In fact, processEntries could be rewritten without using foldl':
> processEntries = sum . map processFileContent
>
> hth,
> L.
------------------------------
Message: 6
Date: Tue, 4 Sep 2012 14:38:52 +0100
From: Benjamin Edwards <[email protected]>
Subject: Re: [Haskell-beginners] space leak processing multiple
compressed files
To: Ian Knopke <[email protected]>
Cc: haskellbeginners <[email protected]>
Message-ID:
<CAN6k4nh-L7xQovAXmj0MbW=mtr1gtxunqfkxgcdje83375b...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
You might want to look at conduits if you need deterministic and prompt
finalisation. I would sketch out a solution but I have only my phone.
On Sep 4, 2012 2:36 PM, "Ian Knopke" <[email protected]> wrote:
> Hi Lorenzo,
>
> You're correct. Well spotted! I must have created that doing some copy
> and paste. The program is basically as you suggested it. Here's a
> corrected version:
>
> main = do
>
> -- get a list of file names
> filelist <- getFileList "testsetdir"
>
> -- process each compressed file
> files <- mapM (\x -> do
> thisfile <- B.readFile x
> return (Z.decompress thisfile)
> ) filelist
>
> display $ processEntries files
>
> putStrLn "finished"
>
> -- processEntries
> -- processEntries is defined elsewhere, but basically does some string
> -- processing per line, counts the number of resulting elements and
> sums them per file
> processEntries :: [B.ByteString] -> Int
> processEntries xs = foldl' (\x y -> x + countItems (B.lines y)) 0 xs
>
> I'm still running into memory issues though. I think it's the mapM
> loop above and that each file is not being released after reading
> through it. Does that seem reasonable, and is there any way to write
> this better?
>
>
> Ian
>
>
>
> ... and countItems uses foldl'
> On Tue, Sep 4, 2012 at 1:55 PM, Lorenzo Bolla <[email protected]> wrote:
> > On Tue, Sep 4, 2012 at 11:00 AM, Ian Knopke <[email protected]>
> wrote:
> >> main = do
> >>
> >> -- get a list of file names
> >> filelist <- getFileList "testsetdir"
> >>
> >> -- process each compressed file
> >> files <- mapM (\x -> do
> >> thisfile <- B.readFile x
> >> return (Z.decompress thisfile)
> >> ) filelist
> >>
> >>
> >> display $ processEntries files
> >>
> >>
> >> putStrLn "finished"
> >>
> >> -- processEntries
> >> -- processEntries is defined elsewhere, but basically does some string
> >> processing per line,
> >> -- counts the number of resulting elements and sums them per file
> >> processEntries :: [B.ByteString] -> Int
> >> processEntries xs = foldl' (\x y -> x + processEntries (B.lines y)) 0 xs
> >
> > The problem seems to be your `processEntries` function: it is
> > recursively defined, and as far as I understand, it's never going to
> > end because "y" (inside the lambda function) is always going to be the
> > full list of files (xs).
> >
> > Probably, `processEntries` should be something like:
> >
> > processEntries = foldl' (\acc fileContent -> acc + processFileContent
> > fileContent) 0
> >
> > processFileContent :: B.ByteString -> Int
> > processFileContent = -- count what you have to, in a file
> >
> > In fact, processEntries could be rewritten without using foldl':
> > processEntries = sum . map processFileContent
> >
> > hth,
> > L.
>
> _______________________________________________
> Beginners mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/beginners
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.haskell.org/pipermail/beginners/attachments/20120904/ecd101fb/attachment-0001.htm>
------------------------------
Message: 7
Date: Tue, 04 Sep 2012 10:20:34 -0400
From: Michael Orlitzky <[email protected]>
Subject: Re: [Haskell-beginners] space leak processing multiple
compressed files
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1
On 09/04/2012 06:00 AM, Ian Knopke wrote:
> -- display a field that returns a number
> display :: Int -> IO ()
> display = putStrLn . show
This is just 'print', specialized to Int:
Prelude> :t print
print :: Show a => a -> IO ()
------------------------------
_______________________________________________
Beginners mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/beginners
End of Beginners Digest, Vol 51, Issue 6
****************************************