Jefferson Heard wrote:
> I thought this was fairly straightforward, but where the marked line
> finishes in 0.31 seconds on my machine, the actual transpose takes
> more than 5 minutes.  I know it must be possible to read data in
> haskell faster than this.

I took a look into this, writing a similar, but simpler, program.  Half
of the runtime, and 2/3 of the allocation, is spent in ByteString's
split function.  The remaining portions are spent in transposing the list.

COST CENTRE   %time %alloc  ticks     bytes
split          66.7   65.1     56 120130000
xpose          31.0   32.8     26  60618031
read            1.2    2.0      1   3640229
lines           1.2    0.1      1    260002

I've attached two programs to demonstrate the problem.  One creates a
sample speadsheet; the other transposes it.

I spent a little time trying to find a faster replacement for
ByteString.split, but no luck before I had to return to other things.

        <b
import Data.List (foldl', transpose)
import qualified Data.ByteString.Char8 as C
import qualified Data.Map as M
import System.Environment (getArgs)

xpose name = do
    sheet <- (transpose
           .      {-# SCC "split" #-} map (C.split ',')
           .      {-# SCC "lines" #-} C.lines)
           `fmap` {-# SCC "read" #-}  C.readFile name
    let m = foldl' go M.empty sheet
    print (M.size m)
  where go m (x:xs) = {-# SCC "go" #-} M.insert x xs m

main = getArgs >>= mapM_ xpose
import Data.List
import System.IO
import System.Random

rint = show `fmap` (randomRIO (0,100) :: IO Int)

dump cols rows name = do
  h <- openFile name WriteMode
  sequence_ . take rows . repeat $ do
    cs <- sequence . take cols . repeat $ rint
    hPutStrLn h . concat . intersperse "," $ cs
  hClose h

main = dump 1000 10000 "dump.csv"
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to