Send Beginners mailing list submissions to
[email protected]
To subscribe or unsubscribe via the World Wide Web, visit
http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
[email protected]
You can reach the person managing the list at
[email protected]
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."
Today's Topics:
1. Re: Space leak while reading from a file? (Bob Ippolito)
2. Re: Space leak while reading from a file? (Christopher Allen)
----------------------------------------------------------------------
Message: 1
Date: Mon, 12 May 2014 11:32:54 -0700
From: Bob Ippolito <[email protected]>
To: The Haskell-Beginners Mailing List - Discussion of primarily
beginner-level topics related to Haskell <[email protected]>
Subject: Re: [Haskell-beginners] Space leak while reading from a file?
Message-ID:
<CACwMPm-dz62Ri5h7Qr=X4MQett65AHT_ZRT_J=ar6cjr7dc...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
I don't know why I ended up looking at an old version of Data.Text but the
copy function has been around for a year, since 0.11.3.0.
http://hackage.haskell.org/package/text-1.1.1.2/docs/Data-Text.html#copy ?
you'll want to use it for this use case.
On Mon, May 12, 2014 at 9:54 AM, Bob Ippolito <[email protected]> wrote:
> I haven't looked closely but I suspect if you use foldM to build your Map
> rather than untilM it might consume less memory since you won't be
> allocating this big list.
>
> BUT? the real reason why the memory usage is so much higher than you
> expect is because slicing a Data.Text is an O(1) operation that shares the
> underlying buffer between the original and the slice. Calling T.words will
> ensure that the full original line stays around. You can see this in the
> implementation:
>
> http://hackage.haskell.org/package/text-0.11.2.0/docs/src/Data-Text.html#words
>
> This behavior is documented somewhat near there in the docs, but I think
> it should really be a top-level thing:
> http://hackage.haskell.org/package/text-0.11.2.0/docs/Data-Text.html#g:18
>
> I don't know what function to use to force the array to be copied,
> hopefully there is one! Erlang's binaries work similarly and there is a
> copy function for them for exactly this purpose:
> http://www.erlang.org/doc/man/binary.html
>
> -bob
>
>
>
> On Sun, May 11, 2014 at 4:24 AM, Jan Snajder <[email protected]> wrote:
>
>> Dear all,
>>
>> I'm trying to implement a simple file-based database. I apparently have
>> a space leak, but I have no clue where it comes from.
>>
>> Here's the file-based database implementation:
>> http://pastebin.com/QqiqcXFw
>>
>> The idea to have a database table in a single textual file. One line
>> equals one table row. The fields within a row are whitespace separated.
>> The first field is the key. Because I'd like to work with large files, I
>> don't want to load the whole file into memory. Instead, I'd like to be
>> able to fetch the rows on demand, by keys. Thus I first create an index
>> that links keys to file seeks. I use the readerT to add the index to the
>> IO monad.
>>
>> For testing, I use a dummy table produced as follows:
>>
>> import System.IO
>> import Text.Printf
>> import Control.Monad
>>
>> row = unwords [printf "field%03d" (i::Int) | i <- [1..999]]
>>
>> main = do
>> forM_ [1..250000] $ \i ->
>> putStrLn $ printf "row%06d %s" (i::Int) row
>>
>> This generates a 2.1G textual file, which I store on my disk.
>>
>> The testing code:
>>
>> import FileDB
>> import qualified Data.Text as T
>> import Text.Printf
>> import Control.Applicative
>> import Control.Monad
>> import Control.Monad.Trans
>> import System.IO
>> import System.Environment
>>
>> main = do
>> (f:_) <- getArgs
>> t <- openTable f
>> runDB t $ do
>> ks <- getKeys
>> liftIO $ do
>> putStrLn . printf "%d keys read" $ length ks
>> putStrLn "Press any key to continue..."
>> getChar
>> forM_ ks $ \k -> do
>> Just r <- getRow k
>> liftIO . putStrLn $ printf "Row \"%s\" has %d fields"
>> (T.unpack k) (length r)
>>
>> When I run the test on the 2.1GB file, the whole program consumes 10GB.
>>
>> 6GB seem to be allocated after the index is built (just before entering
>> the forM_ function). The remaining 4GB are allocated while fetching all
>> the rows.
>>
>> I find both things difficult to explain.
>>
>> 6GB seems too much for the index. Each key is 9 characters (stored as
>> Data.Text), and I have 250K such keys in a Data.Map. Should this really
>> add up to 6GB?
>>
>> Also, I have no idea why fetching all the rows, one by one, should
>> consume any additional memory. Each row is fetched and its length is
>> computed and printed out. I see no reason for the rows to be retained in
>> the memory.
>>
>> Here's the memory allocation summary:
>>
>> > 1,093,931,338,632 bytes allocated in the heap
>> > 2,225,144,704 bytes copied during GC
>> > 4,533,898,000 bytes maximum residency (26 sample(s))
>> > 3,080,926,336 bytes maximum slop
>> > 10004 MB total memory in use (0 MB lost due to fragmentation)
>> >
>> > Tot time (elapsed) Avg pause Max
>> pause
>> > Gen 0 2171739 colls, 0 par 45.29s 45.26s 0.0000s
>> 0.0030s
>> > Gen 1 26 colls, 0 par 1.50s 1.53s 0.0589s
>> 0.7087s
>> >
>> > INIT time 0.00s ( 0.00s elapsed)
>> > MUT time 279.92s (284.85s elapsed)
>> > GC time 46.80s ( 46.79s elapsed)
>> > EXIT time 0.68s ( 0.71s elapsed)
>> > Total time 327.40s (332.35s elapsed)
>> >
>> > %GC time 14.3% (14.1% elapsed)
>> >
>> > Alloc rate 3,908,073,170 bytes per MUT second
>> >
>> > Productivity 85.7% of total user, 84.4% of total elapsed
>>
>>
>> Btw., I don't get the "bytes allocated in the heap" figure, which is
>> approx. 1000 GB (?).
>>
>> I'm obviously doing something wrong here. I'd be thankful for any help.
>>
>> Best,
>> Jan
>> _______________________________________________
>> Beginners mailing list
>> [email protected]
>> http://www.haskell.org/mailman/listinfo/beginners
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.haskell.org/pipermail/beginners/attachments/20140512/4c0bb747/attachment-0001.html>
------------------------------
Message: 2
Date: Mon, 12 May 2014 13:34:49 -0500
From: Christopher Allen <[email protected]>
To: The Haskell-Beginners Mailing List - Discussion of primarily
beginner-level topics related to Haskell <[email protected]>
Subject: Re: [Haskell-beginners] Space leak while reading from a file?
Message-ID:
<CADnndOpfq7-VLM=jmxku9uqm6lyps3u_qsc7gvypg1oqypy...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
This Chrome plugin helps avoid the Hackage version problem btw:
https://chrome.google.com/webstore/detail/hackage-fu/dnpldbohleinhdgfnhlkofpgkdcfcfmf?hl=en-US
On Mon, May 12, 2014 at 1:32 PM, Bob Ippolito <[email protected]> wrote:
> I don't know why I ended up looking at an old version of Data.Text but the
> copy function has been around for a year, since 0.11.3.0.
> http://hackage.haskell.org/package/text-1.1.1.2/docs/Data-Text.html#copy?
> you'll want to use it for this use case.
>
>
> On Mon, May 12, 2014 at 9:54 AM, Bob Ippolito <[email protected]> wrote:
>
>> I haven't looked closely but I suspect if you use foldM to build your Map
>> rather than untilM it might consume less memory since you won't be
>> allocating this big list.
>>
>> BUT? the real reason why the memory usage is so much higher than you
>> expect is because slicing a Data.Text is an O(1) operation that shares the
>> underlying buffer between the original and the slice. Calling T.words will
>> ensure that the full original line stays around. You can see this in the
>> implementation:
>>
>> http://hackage.haskell.org/package/text-0.11.2.0/docs/src/Data-Text.html#words
>>
>> This behavior is documented somewhat near there in the docs, but I think
>> it should really be a top-level thing:
>> http://hackage.haskell.org/package/text-0.11.2.0/docs/Data-Text.html#g:18
>>
>> I don't know what function to use to force the array to be copied,
>> hopefully there is one! Erlang's binaries work similarly and there is a
>> copy function for them for exactly this purpose:
>> http://www.erlang.org/doc/man/binary.html
>>
>> -bob
>>
>>
>>
>> On Sun, May 11, 2014 at 4:24 AM, Jan Snajder <[email protected]> wrote:
>>
>>> Dear all,
>>>
>>> I'm trying to implement a simple file-based database. I apparently have
>>> a space leak, but I have no clue where it comes from.
>>>
>>> Here's the file-based database implementation:
>>> http://pastebin.com/QqiqcXFw
>>>
>>> The idea to have a database table in a single textual file. One line
>>> equals one table row. The fields within a row are whitespace separated.
>>> The first field is the key. Because I'd like to work with large files, I
>>> don't want to load the whole file into memory. Instead, I'd like to be
>>> able to fetch the rows on demand, by keys. Thus I first create an index
>>> that links keys to file seeks. I use the readerT to add the index to the
>>> IO monad.
>>>
>>> For testing, I use a dummy table produced as follows:
>>>
>>> import System.IO
>>> import Text.Printf
>>> import Control.Monad
>>>
>>> row = unwords [printf "field%03d" (i::Int) | i <- [1..999]]
>>>
>>> main = do
>>> forM_ [1..250000] $ \i ->
>>> putStrLn $ printf "row%06d %s" (i::Int) row
>>>
>>> This generates a 2.1G textual file, which I store on my disk.
>>>
>>> The testing code:
>>>
>>> import FileDB
>>> import qualified Data.Text as T
>>> import Text.Printf
>>> import Control.Applicative
>>> import Control.Monad
>>> import Control.Monad.Trans
>>> import System.IO
>>> import System.Environment
>>>
>>> main = do
>>> (f:_) <- getArgs
>>> t <- openTable f
>>> runDB t $ do
>>> ks <- getKeys
>>> liftIO $ do
>>> putStrLn . printf "%d keys read" $ length ks
>>> putStrLn "Press any key to continue..."
>>> getChar
>>> forM_ ks $ \k -> do
>>> Just r <- getRow k
>>> liftIO . putStrLn $ printf "Row \"%s\" has %d fields"
>>> (T.unpack k) (length r)
>>>
>>> When I run the test on the 2.1GB file, the whole program consumes 10GB.
>>>
>>> 6GB seem to be allocated after the index is built (just before entering
>>> the forM_ function). The remaining 4GB are allocated while fetching all
>>> the rows.
>>>
>>> I find both things difficult to explain.
>>>
>>> 6GB seems too much for the index. Each key is 9 characters (stored as
>>> Data.Text), and I have 250K such keys in a Data.Map. Should this really
>>> add up to 6GB?
>>>
>>> Also, I have no idea why fetching all the rows, one by one, should
>>> consume any additional memory. Each row is fetched and its length is
>>> computed and printed out. I see no reason for the rows to be retained in
>>> the memory.
>>>
>>> Here's the memory allocation summary:
>>>
>>> > 1,093,931,338,632 bytes allocated in the heap
>>> > 2,225,144,704 bytes copied during GC
>>> > 4,533,898,000 bytes maximum residency (26 sample(s))
>>> > 3,080,926,336 bytes maximum slop
>>> > 10004 MB total memory in use (0 MB lost due to
>>> fragmentation)
>>> >
>>> > Tot time (elapsed) Avg pause Max
>>> pause
>>> > Gen 0 2171739 colls, 0 par 45.29s 45.26s 0.0000s
>>> 0.0030s
>>> > Gen 1 26 colls, 0 par 1.50s 1.53s 0.0589s
>>> 0.7087s
>>> >
>>> > INIT time 0.00s ( 0.00s elapsed)
>>> > MUT time 279.92s (284.85s elapsed)
>>> > GC time 46.80s ( 46.79s elapsed)
>>> > EXIT time 0.68s ( 0.71s elapsed)
>>> > Total time 327.40s (332.35s elapsed)
>>> >
>>> > %GC time 14.3% (14.1% elapsed)
>>> >
>>> > Alloc rate 3,908,073,170 bytes per MUT second
>>> >
>>> > Productivity 85.7% of total user, 84.4% of total elapsed
>>>
>>>
>>> Btw., I don't get the "bytes allocated in the heap" figure, which is
>>> approx. 1000 GB (?).
>>>
>>> I'm obviously doing something wrong here. I'd be thankful for any help.
>>>
>>> Best,
>>> Jan
>>> _______________________________________________
>>> Beginners mailing list
>>> [email protected]
>>> http://www.haskell.org/mailman/listinfo/beginners
>>>
>>
>>
>
> _______________________________________________
> Beginners mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/beginners
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.haskell.org/pipermail/beginners/attachments/20140512/38f0e751/attachment.html>
------------------------------
Subject: Digest Footer
_______________________________________________
Beginners mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/beginners
------------------------------
End of Beginners Digest, Vol 71, Issue 15
*****************************************