Hi Tim, Sorry I can't tell you more about slop (I know less than you at this point), but I do see the problem. You're reading each line from a Handle as a String (bad), then creating ByteStrings from that string with BS.pack (really bad). You want to read a ByteString (or Data.Text, or other compact representation) directly from the handle without going through an intervening string format. Also, you'll be better off using a real parser instead of "read", which is very difficult to use robustly.
John L. > From: Tim Docker <t...@dockerz.net> > Subject: memory slop (was: Using the GHC heap profiler) > To: glasgow-haskell-users@haskell.org > Message-ID: <4d895bb0.1080...@dockerz.net> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > On Mon, Mar 21, 2011 at 9:59 AM, I wrote: > > > > My question on the ghc heap profiler on stack overflow: > > > > > http://stackoverflow.com/questions/5306717/how-should-i-interpret-the-output-of-the-ghc-heap-profiler > > > > remains unanswered :-( Perhaps that's not the best forum. Is there > someone > > here prepared to explain how the memory usage in the heap profiler > relates > > to the "Live Bytes" count shown in the garbage collection statistics? > > I've made a little progress on this. I've simplified my program down to > a simple executable that loads a bunch of data into an in-memory map, > and then writes it out again. I've added calls to `seq` to ensure that > laziness is not causing excessing memory consumption. When I run this on > my sample data set, it takes ~7 cpu seconds, and uses ~120 MB of vm An > equivalent python script, takes ~2 secs and ~19MB of vm :-(. > > The code is below. I'm mostly concerned with the memory usage rather > than performance at this stage. What is interesting, is that when I turn > on garbage collection statistics (+RTS -s), I see this: > > 10,089,324,996 bytes allocated in the heap > 201,018,116 bytes copied during GC > 12,153,592 bytes maximum residency (8 sample(s)) > 59,325,408 bytes maximum slop > 114 MB total memory in use (1 MB lost due to fragmentation) > > Generation 0: 19226 collections, 0 parallel, 1.59s, 1.64selapsed > Generation 1: 8 collections, 0 parallel, 0.04s, 0.04selapsed > > INIT time 0.00s ( 0.00s elapsed) > MUT time 5.84s ( 5.96s elapsed) > GC time 1.63s ( 1.68s elapsed) > EXIT time 0.00s ( 0.00s elapsed) > Total time 7.47s ( 7.64s elapsed) > > %GC time 21.8% (22.0% elapsed) > > Alloc rate 1,726,702,840 bytes per MUT second > > Productivity 78.2% of total user, 76.5% of total elapsed > > This seems strange. The maximum residency of 12MB sounds about correct > for my data. But what's with the 59MB of "slop"? According to the ghc docs: > > | The "bytes maximum slop" tells you the most space that is ever wasted > | due to the way GHC allocates memory in blocks. Slop is memory at the > | end of a block that was wasted. There's no way to control this; we > | just like to see how much memory is being lost this way. > > There's this page also: > > http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/Slop > > but it doesn't really make things clearer for me. > > Is the slop number above likely to be a significant contribution to net > memory usage? Are there any obvious reasons why the code below could be > generating so much? The data file in question has 61k lines, and is <6MB > in total. > > Thanks, > > Tim > > -------- Map2.hs -------------------------------------------- > > module Main where > > import qualified Data.Map as Map > import qualified Data.ByteString.Char8 as BS > import System.Environment > import System.IO > > type MyMap = Map.Map BS.ByteString BS.ByteString > > foldLines :: (a -> String -> a) -> a -> Handle -> IO a > foldLines f a h = do > eof <- hIsEOF h > if eof > then (return a) > else do > l <- hGetLine h > let a' = f a l > a' `seq` foldLines f a' h > > undumpFile :: FilePath -> IO MyMap > undumpFile path = do > h <- openFile path ReadMode > m <- foldLines addv Map.empty h > hClose h > return m > where > addv m "" = m > addv m s = let (k,v) = readKV s > in k `seq` v `seq` Map.insert k v m > > readKV s = let (ks,vs) = read s in (BS.pack ks, BS.pack vs) > > dump :: [(BS.ByteString,BS.ByteString)] -> IO () > dump vs = mapM_ putV vs > where > putV (k,v) = putStrLn (show (BS.unpack k, BS.unpack v)) > > main :: IO () > main = do > args <- getArgs > case args of > [path] -> do > v <- undumpFile path > dump (Map.toList v) > return () >
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users