If the strings are relatively short, there can be a bottleneck in the current Ord instance for Bytestrings. When lots of them are in a map, the ffi calls to memcmp dominate.
I've a fix for this (do it all in Haskell for small strings), and can push it if someone complains some more. jefferson.r.heard: > I thought this was fairly straightforward, but where the marked line > finishes in 0.31 seconds on my machine, the actual transpose takes > more than 5 minutes. I know it must be possible to read data in > haskell faster than this. I'm trying to read a 100MB comma delimited > file. I've tried both CSV modules, and these take even longer to read > the file. Is there some general best-practice for reading and parsing > large amounts of data that I'm not aware of? > > I have tried, by the way, a couple of things, including putting a bang > (!) before row in transposeRow and using foldr instead of foldl', but > that didn't change anything other than force me to increase the stack > size to 100M on the command line. > > I'm running in the profiler now, and I'll update this, but I thought I > would check and see if my head was on remotely straight to begin with. > > -- Jeff > > --- > module ColumnMajorCSV where > > import qualified Data.ByteString.Char8 as S > import qualified Data.Map as M > import qualified Data.List as L > > transposeRow cols row = zipWith (:) row cols > > transposeCSV :: [[S.ByteString]] -> M.Map String [S.ByteString] > transposeCSV (header:rows) = M.fromList (zip (map S.unpack header) > spreadsheet) > where spreadsheet = L.foldl' transposeRow emptySheet rows > emptySheet = take (length header) $ repeat [] > > dataFromFile :: String -> IO (M.Map String [S.ByteString]) > dataFromFile filename = do > f <- S.readFile filename > print . length . map (S.split ',' $!) . S.lines $ f > -- finishes in 0.31 seconds > return . transposeCSV . map (S.split ',' $!) . S.lines $ f -- > this takes 5 minutes and 6 seconds > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe