Jefferson Heard wrote: > I thought this was fairly straightforward, but where the marked line > finishes in 0.31 seconds on my machine, the actual transpose takes > more than 5 minutes. I know it must be possible to read data in > haskell faster than this.
I took a look into this, writing a similar, but simpler, program. Half of the runtime, and 2/3 of the allocation, is spent in ByteString's split function. The remaining portions are spent in transposing the list. COST CENTRE %time %alloc ticks bytes split 66.7 65.1 56 120130000 xpose 31.0 32.8 26 60618031 read 1.2 2.0 1 3640229 lines 1.2 0.1 1 260002 I've attached two programs to demonstrate the problem. One creates a sample speadsheet; the other transposes it. I spent a little time trying to find a faster replacement for ByteString.split, but no luck before I had to return to other things. <b
import Data.List (foldl', transpose) import qualified Data.ByteString.Char8 as C import qualified Data.Map as M import System.Environment (getArgs) xpose name = do sheet <- (transpose . {-# SCC "split" #-} map (C.split ',') . {-# SCC "lines" #-} C.lines) `fmap` {-# SCC "read" #-} C.readFile name let m = foldl' go M.empty sheet print (M.size m) where go m (x:xs) = {-# SCC "go" #-} M.insert x xs m main = getArgs >>= mapM_ xpose
import Data.List import System.IO import System.Random rint = show `fmap` (randomRIO (0,100) :: IO Int) dump cols rows name = do h <- openFile name WriteMode sequence_ . take rows . repeat $ do cs <- sequence . take cols . repeat $ rint hPutStrLn h . concat . intersperse "," $ cs hClose h main = dump 1000 10000 "dump.csv"
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe