Re: [Haskell-cafe] Slow IO?
On 02/09/2009, at 2:26 PM, Eugene Kirpichov wrote: I've got a Centrino Duo 2000 (I'm on a notebook), Ubuntu 9.04 and ghc 6.10.2. However, we have not set up on what exact input file we're using :) I'm using one where it is written "1000 3" and then 1000 lines of "9" follow. Also, I wonder what one'd get if one compiled this program with jhc, but I don't know whether jhc is able to compile Data.ByteString. It couldn't last time I tried - choked on some INLINE pragmas. Might not be a massive job, but there aren't enough hours in the day... mark ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
On Wed, 2009-09-02 at 11:55 +0800, Steve wrote: > On Tue, 2009-09-01 at 08:45 +0400, Eugene Kirpichov wrote: > > Hm, on my machine Don's code has exactly the same performance my code above. > That's strange. > > > Also, replacing the 'test' and 'parse' functions with this one > > > > add :: Int -> Int -> S.ByteString -> Int > > add k i s = fst $ S.foldl' f (i, 0) s > > where f (!i, !n) '\n' | n`divisibleBy`k = (i+1, 0) > > | otherwise = (i, 0) > > f (!i, !n) w = (i, 10*n+ord w-ord '0') > > > > increases performance by another 15% (0.675s vs 0.790s) > > On my system I get a 50% slowdown using this add function! > > I guess is just shows that benchmarking code on one single > CPU/memory/OS/ghc combination does not give results that apply widely. > I'm using: > AMD Athlon X2 4800 > 2GB memory > Linux (Fedora 11, 64-bit version) > ghc 6.10.3 I should have also said that the test method and test data is important too. This is what I have been using: $ time ./0450 < 0450.input.data and looking at the 'real' value. The file 0450.input.data is generated with a Python script: #!/usr/bin/env python ''' generate a data file for problem 0450 ''' from __future__ import division# new in 2.2, redundant in 3.0 from __future__ import absolute_import # new in 2.5, redundant in 2.7/3.0 from __future__ import print_function # new in 2.6, redundant in 3.0 import io import random inFile = '0450.input.data' #n, k, tiMax = 10**6, 3, 10**9 n, k, tiMax = 10**7, 3, 10**9 with io.open(inFile, 'wb') as f: f.write('%d %d\n' % (n, k)) for i in xrange(n): ti = random.randint(1, tiMax) f.write('%d\n' % (ti,)) Steve ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
2009/9/2 Steve : > On Tue, 2009-09-01 at 08:45 +0400, Eugene Kirpichov wrote: >> Hm, on my machine Don's code has exactly the same performance my code above. > That's strange. > >> Also, replacing the 'test' and 'parse' functions with this one >> >> add :: Int -> Int -> S.ByteString -> Int >> add k i s = fst $ S.foldl' f (i, 0) s >> where f (!i, !n) '\n' | n`divisibleBy`k = (i+1, 0) >> | otherwise = (i, 0) >> f (!i, !n) w = (i, 10*n+ord w-ord '0') >> >> increases performance by another 15% (0.675s vs 0.790s) > > On my system I get a 50% slowdown using this add function! > > I guess is just shows that benchmarking code on one single > CPU/memory/OS/ghc combination does not give results that apply widely. > I'm using: > AMD Athlon X2 4800 > 2GB memory > Linux (Fedora 11, 64-bit version) > ghc 6.10.3 > I've got a Centrino Duo 2000 (I'm on a notebook), Ubuntu 9.04 and ghc 6.10.2. However, we have not set up on what exact input file we're using :) I'm using one where it is written "1000 3" and then 1000 lines of "9" follow. Also, I wonder what one'd get if one compiled this program with jhc, but I don't know whether jhc is able to compile Data.ByteString. > Steve > > -- Eugene Kirpichov Web IR developer, market.yandex.ru ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
On Tue, 2009-09-01 at 08:45 +0400, Eugene Kirpichov wrote: > Hm, on my machine Don's code has exactly the same performance my code above. That's strange. > Also, replacing the 'test' and 'parse' functions with this one > > add :: Int -> Int -> S.ByteString -> Int > add k i s = fst $ S.foldl' f (i, 0) s > where f (!i, !n) '\n' | n`divisibleBy`k = (i+1, 0) > | otherwise = (i, 0) > f (!i, !n) w = (i, 10*n+ord w-ord '0') > > increases performance by another 15% (0.675s vs 0.790s) On my system I get a 50% slowdown using this add function! I guess is just shows that benchmarking code on one single CPU/memory/OS/ghc combination does not give results that apply widely. I'm using: AMD Athlon X2 4800 2GB memory Linux (Fedora 11, 64-bit version) ghc 6.10.3 Steve ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
I've uploaded a new version of bytestring-nums that, while still slower than the fast/custom codes, allows Eugene's earlier program to a little more than 20% faster than it did before. It no longer handles spurious characters in the input by skipping over them (this is probably not a common requirement, anyways). http://hackage.haskell.org/package/bytestring-nums-0.3.0 I suspect that splitting the string into pieces and then mapping the parser over the pieces will never be faster than an all-in-one parser/tester/incrementer like the fast programs have. -- Jason Dusek ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
2009/08/31 Don Stewart : > If you can abstract out a common function for lexing ints out of > bytestrings, we could add it to the bytestring-lexing package. All the really performant implementations operate on strings with multiple ints in them; I suspect this reduces memory traffic -- and indeed, Eugene's code using my libs allocates about twice as much memory as Don's code. I've tried a few different things with strictness annotations to no avail. I'm having some trouble understanding the meaning of "entries" in the profiler's output. I have a file with 5 million random integers in it, totalling 26210408 bytes (21210408 bytes of which are not newlines). The relevant part is here: COST CENTRE MODULE entries MAIN MAIN 0 mainMain 1 bint Main 501 lazy_int Data.ByteString.Nums.Careless.Int 41211385 digitize Data.ByteString.Nums.Careless.Int 21210408 The number of "entries" to `lazy_int` is puzzling. Eugene's `bint` is called for each line of the file -- once for the header and then 5 million times for each of the integers. (There are two numbers on the first line but Eugene's program only uses `k` so `bint` is only actually entered once.) However, `bint` just calls my `int` and `int` calls `lazy_int` so why are there 41 million plus "entries" of `lazy_int`? -- Jason Dusek spoj-eugene-prof-opt-bang-acc-scc-dfold.prof Description: Binary data ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
Good work guys. If you can abstract out a common function for lexing ints out of bytestrings, we could add it to the bytestring-lexing package. ekirpichov: > Hm, on my machine Don's code has exactly the same performance my code above. > > Also, replacing the 'test' and 'parse' functions with this one > > add :: Int -> Int -> S.ByteString -> Int > add k i s = fst $ S.foldl' f (i, 0) s > where f (!i, !n) '\n' | n`divisibleBy`k = (i+1, 0) > | otherwise = (i, 0) > f (!i, !n) w = (i, 10*n+ord w-ord '0') > > increases performance by another 15% (0.675s vs 0.790s) > > 2009/9/1 Jason Dusek : > > I've updated Don Stewart's solution to compile with the modern > > ByteString libs. I'll be looking at ways to improve the > > performance of the `bytestring-nums` package. > > > > -- > > Jason Dusek > > > > > > http://github.com/jsnx/bytestring-nums/blob/d7de9db83e44ade9958fb3bfad0b29ede065b5dd/SPOJDons.hs > > ___ > > Haskell-Cafe mailing list > > Haskell-Cafe@haskell.org > > http://www.haskell.org/mailman/listinfo/haskell-cafe > > > > > > -- > Eugene Kirpichov > Web IR developer, market.yandex.ru > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
Hm, on my machine Don's code has exactly the same performance my code above. Also, replacing the 'test' and 'parse' functions with this one add :: Int -> Int -> S.ByteString -> Int add k i s = fst $ S.foldl' f (i, 0) s where f (!i, !n) '\n' | n`divisibleBy`k = (i+1, 0) | otherwise = (i, 0) f (!i, !n) w = (i, 10*n+ord w-ord '0') increases performance by another 15% (0.675s vs 0.790s) 2009/9/1 Jason Dusek : > I've updated Don Stewart's solution to compile with the modern > ByteString libs. I'll be looking at ways to improve the > performance of the `bytestring-nums` package. > > -- > Jason Dusek > > > http://github.com/jsnx/bytestring-nums/blob/d7de9db83e44ade9958fb3bfad0b29ede065b5dd/SPOJDons.hs > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > -- Eugene Kirpichov Web IR developer, market.yandex.ru ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
I've updated Don Stewart's solution to compile with the modern ByteString libs. I'll be looking at ways to improve the performance of the `bytestring-nums` package. -- Jason Dusek http://github.com/jsnx/bytestring-nums/blob/d7de9db83e44ade9958fb3bfad0b29ede065b5dd/SPOJDons.hs ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
On Sun, 2009-08-30 at 18:50 +0400, Eugene Kirpichov wrote: > Thanks :) I wonder why SPOJ didn't accept the same thing from me. > > I think that in order to obtain even higher performance we need to > resort to low-level IO: raw reading into a byte buffer and parsing the > very buffer to avoid memcpy'ing. > Or, better, to use Oleg's iteratees with a file handle enumerator. > I'll probably give it a try when I have time, but there's a 70% chance > that I won't, so someone please try it, it should work :) I just discovered that the SPOJ question regarding the problem http://www.spoj.pl/problems/INTEST/ had already been asked about 2 years ago. http://groups.google.com/group/fa.haskell/browse_thread/thread/4133fa71ce97eb0e/fef34d1c3943bbe0#fef34d1c3943bbe0 Donald Stewart gave a solution - long, complex and highly optimised using knowledge of Data.ByteString internals. But fast - 2 or 3 times as fast as your method. Steve ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
2009/08/30 Eugene Kirpichov : > Here's my version that works in 0.7s for me for a file with 10^7 > "9"'s but for some reason gets a 'wrong answer' at SPOJ :) Maybe it gets a wrong answer because it reads all the input, regardless of `n`. -- Jason Dusek ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
Bang patterns are easy: they make a part of the pattern strict by seq'ing it. f (x, !y) = ... ~ f (x, y) = y `seq` ... 2009/8/30 Steve : > On Sun, 2009-08-30 at 16:34 +0400, Eugene Kirpichov wrote: >> Here's my version that works in 0.7s for me for a file with 10^7 >> "9"'s but for some reason gets a 'wrong answer' at SPOJ :) >> >> {-# LANGUAGE BangPatterns #-} >> module Main where >> >> import qualified Data.ByteString.Lazy as B >> import Data.Word >> >> answer :: Int -> B.ByteString -> Int >> answer k = fst . B.foldl' f (0, 0) >> where f :: (Int,Int) -> Word8 -> (Int,Int) >> f (!countSoFar, !x) 10 >> | x`mod`k==0 = (countSoFar+1, 0) >> | otherwise = (countSoFar, 0) >> f (!countSoFar, !x) c = (countSoFar, 10*x+(fromIntegral c)-48) >> >> readInt :: B.ByteString -> Int >> readInt = B.foldl' (\x c -> 10*x+fromIntegral c-48) 0 >> >> main = do >> (line, rest) <- B.break (==10) `fmap` B.getContents >> let [n, k] = map readInt . B.split 32 $ line >> putStrLn . show $ answer k rest - 1 > > Eugene, > I ran your code on one of my test files and it gave the same answer as > my code. So I submitted it and it was accepted. Its fast - twice as fast > as my solution, using much less memory. Overall its the 4th fastest > Haskell solution. (but its still 10 * slower than C/C++) > I'll have to read up on BangPatterns to try to understand what its > doing! > > I submitted it as: > > {-# LANGUAGE BangPatterns #-} > module Main where > > import qualified Data.ByteString.Lazy as B > import qualified Data.Word as DW > > answer :: Int -> B.ByteString -> Int > answer k = fst . B.foldl' f (0, 0) > where f :: (Int,Int) -> DW.Word8 -> (Int,Int) > f (!countSoFar, !x) 10 > | x`mod`k==0 = (countSoFar+1, 0) > | otherwise = (countSoFar, 0) > f (!countSoFar, !x) c = (countSoFar, 10*x+(fromIntegral c)-48) > > readInt :: B.ByteString -> Int > readInt = B.foldl' (\x c -> 10*x+fromIntegral c-48) 0 > > main :: IO () > main = do > (line, rest) <- B.break (==10) `fmap` B.getContents > let [_, k] = map readInt . B.split 32 $ line > putStrLn . show $ answer k rest - 1 > > > -- Eugene Kirpichov Web IR developer, market.yandex.ru ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
Thanks :) I wonder why SPOJ didn't accept the same thing from me. I think that in order to obtain even higher performance we need to resort to low-level IO: raw reading into a byte buffer and parsing the very buffer to avoid memcpy'ing. Or, better, to use Oleg's iteratees with a file handle enumerator. I'll probably give it a try when I have time, but there's a 70% chance that I won't, so someone please try it, it should work :) 2009/8/30 Steve : > On Sun, 2009-08-30 at 16:34 +0400, Eugene Kirpichov wrote: >> Here's my version that works in 0.7s for me for a file with 10^7 >> "9"'s but for some reason gets a 'wrong answer' at SPOJ :) >> >> {-# LANGUAGE BangPatterns #-} >> module Main where >> >> import qualified Data.ByteString.Lazy as B >> import Data.Word >> >> answer :: Int -> B.ByteString -> Int >> answer k = fst . B.foldl' f (0, 0) >> where f :: (Int,Int) -> Word8 -> (Int,Int) >> f (!countSoFar, !x) 10 >> | x`mod`k==0 = (countSoFar+1, 0) >> | otherwise = (countSoFar, 0) >> f (!countSoFar, !x) c = (countSoFar, 10*x+(fromIntegral c)-48) >> >> readInt :: B.ByteString -> Int >> readInt = B.foldl' (\x c -> 10*x+fromIntegral c-48) 0 >> >> main = do >> (line, rest) <- B.break (==10) `fmap` B.getContents >> let [n, k] = map readInt . B.split 32 $ line >> putStrLn . show $ answer k rest - 1 > > Eugene, > I ran your code on one of my test files and it gave the same answer as > my code. So I submitted it and it was accepted. Its fast - twice as fast > as my solution, using much less memory. Overall its the 4th fastest > Haskell solution. (but its still 10 * slower than C/C++) > I'll have to read up on BangPatterns to try to understand what its > doing! > > I submitted it as: > > {-# LANGUAGE BangPatterns #-} > module Main where > > import qualified Data.ByteString.Lazy as B > import qualified Data.Word as DW > > answer :: Int -> B.ByteString -> Int > answer k = fst . B.foldl' f (0, 0) > where f :: (Int,Int) -> DW.Word8 -> (Int,Int) > f (!countSoFar, !x) 10 > | x`mod`k==0 = (countSoFar+1, 0) > | otherwise = (countSoFar, 0) > f (!countSoFar, !x) c = (countSoFar, 10*x+(fromIntegral c)-48) > > readInt :: B.ByteString -> Int > readInt = B.foldl' (\x c -> 10*x+fromIntegral c-48) 0 > > main :: IO () > main = do > (line, rest) <- B.break (==10) `fmap` B.getContents > let [_, k] = map readInt . B.split 32 $ line > putStrLn . show $ answer k rest - 1 > > > -- Eugene Kirpichov Web IR developer, market.yandex.ru ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
On Sun, 2009-08-30 at 16:34 +0400, Eugene Kirpichov wrote: > Here's my version that works in 0.7s for me for a file with 10^7 > "9"'s but for some reason gets a 'wrong answer' at SPOJ :) > > {-# LANGUAGE BangPatterns #-} > module Main where > > import qualified Data.ByteString.Lazy as B > import Data.Word > > answer :: Int -> B.ByteString -> Int > answer k = fst . B.foldl' f (0, 0) > where f :: (Int,Int) -> Word8 -> (Int,Int) > f (!countSoFar, !x) 10 > | x`mod`k==0 = (countSoFar+1, 0) > | otherwise = (countSoFar, 0) > f (!countSoFar, !x) c = (countSoFar, 10*x+(fromIntegral c)-48) > > readInt :: B.ByteString -> Int > readInt = B.foldl' (\x c -> 10*x+fromIntegral c-48) 0 > > main = do > (line, rest) <- B.break (==10) `fmap` B.getContents > let [n, k] = map readInt . B.split 32 $ line > putStrLn . show $ answer k rest - 1 Eugene, I ran your code on one of my test files and it gave the same answer as my code. So I submitted it and it was accepted. Its fast - twice as fast as my solution, using much less memory. Overall its the 4th fastest Haskell solution. (but its still 10 * slower than C/C++) I'll have to read up on BangPatterns to try to understand what its doing! I submitted it as: {-# LANGUAGE BangPatterns #-} module Main where import qualified Data.ByteString.Lazy as B import qualified Data.Wordas DW answer :: Int -> B.ByteString -> Int answer k = fst . B.foldl' f (0, 0) where f :: (Int,Int) -> DW.Word8 -> (Int,Int) f (!countSoFar, !x) 10 | x`mod`k==0 = (countSoFar+1, 0) | otherwise = (countSoFar, 0) f (!countSoFar, !x) c = (countSoFar, 10*x+(fromIntegral c)-48) readInt :: B.ByteString -> Int readInt = B.foldl' (\x c -> 10*x+fromIntegral c-48) 0 main :: IO () main = do (line, rest) <- B.break (==10) `fmap` B.getContents let [_, k] = map readInt . B.split 32 $ line putStrLn . show $ answer k rest - 1 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Re[4]: [Haskell-cafe] Slow IO?
On Sun, Aug 30, 2009 at 2:51 PM, Bulat Ziganshin wrote: > these all are different things, and talking about ByteString IO speed > is the same as talking of speed of red cars > Okay, but statistically, most red cars are very fast Ferrari's no? :-) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[4]: [Haskell-cafe] Slow IO?
Hello Peter, Sunday, August 30, 2009, 4:36:55 PM, you wrote: >> I compared the top 10 C/C++ results against the top 10 Haskell results: > So to me it seems he's not talking about his code. well, he talks about 20 programs > Anyway, I thought Haskell's ByteString IO should not be that much slower > anyway. what you mean by ByteString IO speed? speed of reading 100 gb file? speed of reading 1 byte? speed of readInt? speed of those 10 programs? these all are different things, and talking about ByteString IO speed is the same as talking of speed of red cars -- Best regards, Bulatmailto:bulat.zigans...@gmail.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Re[2]: [Haskell-cafe] Slow IO?
On Sun, 2009-08-30 at 16:15 +0400, Bulat Ziganshin wrote: > Hello Steve, > > Sunday, August 30, 2009, 3:54:53 PM, you wrote: > > > So it looks like Haskell is ~13 slower for IO than C/C++, even (I > > assume) when using Data.ByteString or other speed-up tricks. > > it means that *your* program is 13x slower than C one and nothing > more. in particular, your program may be constrained by readInt > speed > No, not at all. I did not count my program when comparing C/C++ to Haskell. I was counting the the *top 10* programs (submitted by everybody) in the "Best Solutions" list. So its a general survey of the best haskell solutions against the best C/C++ solutions. Steve ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
On Sun, 2009-08-30 at 14:40 +0400, Eugene Kirpichov wrote: Thanks, that works nicely too. However, I believe its not a standard package, so I don't think it can be used for Sphere Online problems. I timed a test run on a 10MB file and its a little slower than my solution with the ByteString readInt improvement. Steve > module Main where > > import qualified Data.ByteString.Lazy as B > import Data.ByteString.Nums.Careless -- from bytestring-nums package > > bint :: B.ByteString -> Int > bint = int > > main = do > line : rest <- B.split 10 `fmap` B.getContents > let [n, k] = map int . B.split 32 $ line > putStrLn . show . length . tail . filter ((==0).(`mod`k).bint) $ rest > > This does a 100MB file in 2.7s (probably because the file is cached by > the filesystem). > > 2009/8/30 Steve : > > Hi, > > I'm tackling a Sphere Online Judge tutorial question where it tests how > > fast you can process input data. You need to achieve at least 2.5MB of > > input data per second at runtime (on an old machine running ghc 6.6.1). > > This is probably close to the limit of Haskell's ability. > > > > https://www.spoj.pl/problems/INTEST/ > > > > I can see that 24 haskell programmers have solved it, but most are very > > close to the 8 secs limit (and 6/24 are even over the limit!). > > > > Here's my code. It fails with a "time limit exceeded" error. (I think it > > would calculate the correct result, eventually). > > > > module Main where > > > > import qualified Data.List as DLi > > import qualified System.IO as SIO > > > > main :: IO () > > main = do > > line1 <- SIO.hGetLine SIO.stdin > > let k = read $ words line1 !! 1 > > s <- SIO.hGetContents SIO.stdin > > print $ count s k > > > > count :: String -> Int -> Int > > count s k = DLi.foldl' foldFunc 0 (map read $ words s) > > where > >foldFunc :: Int -> Int -> Int > >foldFunc a b > > | mod b k == 0 = a+1 > > | otherwise = a > > > > > > I tried using Data.ByteString but then found that 'read' needs a String, > > not a ByteString. > > I tried using buffered IO, but it did not make any difference. > > > > Any suggestions on how to speed it up? > > > > Regards, > > Steve > > > > ___ > > Haskell-Cafe mailing list > > Haskell-Cafe@haskell.org > > http://www.haskell.org/mailman/listinfo/haskell-cafe > > > > > ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Re[2]: [Haskell-cafe] Slow IO?
Well, Steve wrote: > I compared the top 10 C/C++ results against the top 10 Haskell results: So to me it seems he's not talking about his code. Anyway, I thought Haskell's ByteString IO should not be that much slower anyway. Not sure how lazy ByteString IO is implemented, but if it performs async (aka overlapped) IO, it could be very very fast (faster than C), since the reading of the next buffer from (or writing of the previous buffer to) the file is then completely parallel with the computation (when done inplace you even don't need a memcpy, although these days the overhead of copying 64KB of memory might be very tiny, it used to be different in the old days :-) At least that's how I did it in the past in C++ with templates, which was faster than the C approach. On Sun, Aug 30, 2009 at 2:15 PM, Bulat Ziganshin wrote: > Hello Steve, > > Sunday, August 30, 2009, 3:54:53 PM, you wrote: > > > So it looks like Haskell is ~13 slower for IO than C/C++, even (I > > assume) when using Data.ByteString or other speed-up tricks. > > it means that *your* program is 13x slower than C one and nothing > more. in particular, your program may be constrained by readInt > speed > > > -- > Best regards, > Bulatmailto:bulat.zigans...@gmail.com > > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
Here's my version that works in 0.7s for me for a file with 10^7 "9"'s but for some reason gets a 'wrong answer' at SPOJ :) {-# LANGUAGE BangPatterns #-} module Main where import qualified Data.ByteString.Lazy as B import Data.Word answer :: Int -> B.ByteString -> Int answer k = fst . B.foldl' f (0, 0) where f :: (Int,Int) -> Word8 -> (Int,Int) f (!countSoFar, !x) 10 | x`mod`k==0 = (countSoFar+1, 0) | otherwise = (countSoFar, 0) f (!countSoFar, !x) c = (countSoFar, 10*x+(fromIntegral c)-48) readInt :: B.ByteString -> Int readInt = B.foldl' (\x c -> 10*x+fromIntegral c-48) 0 main = do (line, rest) <- B.break (==10) `fmap` B.getContents let [n, k] = map readInt . B.split 32 $ line putStrLn . show $ answer k rest - 1 2009/8/30 Steve : > On Sun, 2009-08-30 at 06:30 -0400, Gwern Branwen wrote: >> On Sun, Aug 30, 2009 at 6:14 AM, Steve wrote: >> > Hi, >> > I'm tackling a Sphere Online Judge tutorial question where it tests how >> > fast you can process input data. You need to achieve at least 2.5MB of >> > input data per second at runtime (on an old machine running ghc 6.6.1). >> > This is probably close to the limit of Haskell's ability. >> > >> > https://www.spoj.pl/problems/INTEST/ >> > >> > I can see that 24 haskell programmers have solved it, but most are very >> > close to the 8 secs limit (and 6/24 are even over the limit!). >> > >> > Here's my code. It fails with a "time limit exceeded" error. (I think it >> > would calculate the correct result, eventually). >> > >> > module Main where >> > >> > import qualified Data.List as DLi >> > import qualified System.IO as SIO >> > >> > main :: IO () >> > main = do >> > line1 <- SIO.hGetLine SIO.stdin >> > let k = read $ words line1 !! 1 >> > s <- SIO.hGetContents SIO.stdin >> > print $ count s k >> > >> > count :: String -> Int -> Int >> > count s k = DLi.foldl' foldFunc 0 (map read $ words s) >> > where >> > foldFunc :: Int -> Int -> Int >> > foldFunc a b >> > | mod b k == 0 = a+1 >> > | otherwise = a >> > >> > >> > I tried using Data.ByteString but then found that 'read' needs a String, >> > not a ByteString. >> > I tried using buffered IO, but it did not make any difference. >> > >> > Any suggestions on how to speed it up? >> > >> > Regards, >> > Steve >> >> Did you try readInt? >> http://hackage.haskell.org/packages/archive/bytestring/0.9.1.4/doc/html/Data-ByteString-Char8.html#23 >> > > Thanks. I didn't see readInt. It allows me to use ByteString and produce > results about 10 times faster than System.IO hGetContents. It makes me > wonder why the System.IO functions have not been replaced by > Data.ByteString. > > My program runs in 8.56 seconds (its over the 8 secs limit but it was > accepted). > > I compared the top 10 C/C++ results against the top 10 Haskell results: > C/C++ ~0.4 secs > Haskell ~5.0 secs > So it looks like Haskell is ~13 slower for IO than C/C++, even (I > assume) when using Data.ByteString or other speed-up tricks. > > Steve > > > > > > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > -- Eugene Kirpichov Web IR developer, market.yandex.ru ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Slow IO?
Hello Steve, Sunday, August 30, 2009, 3:54:53 PM, you wrote: > So it looks like Haskell is ~13 slower for IO than C/C++, even (I > assume) when using Data.ByteString or other speed-up tricks. it means that *your* program is 13x slower than C one and nothing more. in particular, your program may be constrained by readInt speed -- Best regards, Bulatmailto:bulat.zigans...@gmail.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
On Sun, 2009-08-30 at 06:30 -0400, Gwern Branwen wrote: > On Sun, Aug 30, 2009 at 6:14 AM, Steve wrote: > > Hi, > > I'm tackling a Sphere Online Judge tutorial question where it tests how > > fast you can process input data. You need to achieve at least 2.5MB of > > input data per second at runtime (on an old machine running ghc 6.6.1). > > This is probably close to the limit of Haskell's ability. > > > > https://www.spoj.pl/problems/INTEST/ > > > > I can see that 24 haskell programmers have solved it, but most are very > > close to the 8 secs limit (and 6/24 are even over the limit!). > > > > Here's my code. It fails with a "time limit exceeded" error. (I think it > > would calculate the correct result, eventually). > > > > module Main where > > > > import qualified Data.List as DLi > > import qualified System.IO as SIO > > > > main :: IO () > > main = do > > line1 <- SIO.hGetLine SIO.stdin > > let k = read $ words line1 !! 1 > > s <- SIO.hGetContents SIO.stdin > > print $ count s k > > > > count :: String -> Int -> Int > > count s k = DLi.foldl' foldFunc 0 (map read $ words s) > > where > >foldFunc :: Int -> Int -> Int > >foldFunc a b > > | mod b k == 0 = a+1 > > | otherwise = a > > > > > > I tried using Data.ByteString but then found that 'read' needs a String, > > not a ByteString. > > I tried using buffered IO, but it did not make any difference. > > > > Any suggestions on how to speed it up? > > > > Regards, > > Steve > > Did you try readInt? > http://hackage.haskell.org/packages/archive/bytestring/0.9.1.4/doc/html/Data-ByteString-Char8.html#23 > Thanks. I didn't see readInt. It allows me to use ByteString and produce results about 10 times faster than System.IO hGetContents. It makes me wonder why the System.IO functions have not been replaced by Data.ByteString. My program runs in 8.56 seconds (its over the 8 secs limit but it was accepted). I compared the top 10 C/C++ results against the top 10 Haskell results: C/C++ ~0.4 secs Haskell ~5.0 secs So it looks like Haskell is ~13 slower for IO than C/C++, even (I assume) when using Data.ByteString or other speed-up tricks. Steve ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
module Main where import qualified Data.ByteString.Lazy as B import Data.ByteString.Nums.Careless -- from bytestring-nums package bint :: B.ByteString -> Int bint = int main = do line : rest <- B.split 10 `fmap` B.getContents let [n, k] = map int . B.split 32 $ line putStrLn . show . length . tail . filter ((==0).(`mod`k).bint) $ rest This does a 100MB file in 2.7s (probably because the file is cached by the filesystem). 2009/8/30 Steve : > Hi, > I'm tackling a Sphere Online Judge tutorial question where it tests how > fast you can process input data. You need to achieve at least 2.5MB of > input data per second at runtime (on an old machine running ghc 6.6.1). > This is probably close to the limit of Haskell's ability. > > https://www.spoj.pl/problems/INTEST/ > > I can see that 24 haskell programmers have solved it, but most are very > close to the 8 secs limit (and 6/24 are even over the limit!). > > Here's my code. It fails with a "time limit exceeded" error. (I think it > would calculate the correct result, eventually). > > module Main where > > import qualified Data.List as DLi > import qualified System.IO as SIO > > main :: IO () > main = do > line1 <- SIO.hGetLine SIO.stdin > let k = read $ words line1 !! 1 > s <- SIO.hGetContents SIO.stdin > print $ count s k > > count :: String -> Int -> Int > count s k = DLi.foldl' foldFunc 0 (map read $ words s) > where > foldFunc :: Int -> Int -> Int > foldFunc a b > | mod b k == 0 = a+1 > | otherwise = a > > > I tried using Data.ByteString but then found that 'read' needs a String, > not a ByteString. > I tried using buffered IO, but it did not make any difference. > > Any suggestions on how to speed it up? > > Regards, > Steve > > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > -- Eugene Kirpichov Web IR developer, market.yandex.ru ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO?
On Sun, Aug 30, 2009 at 6:14 AM, Steve wrote: > Hi, > I'm tackling a Sphere Online Judge tutorial question where it tests how > fast you can process input data. You need to achieve at least 2.5MB of > input data per second at runtime (on an old machine running ghc 6.6.1). > This is probably close to the limit of Haskell's ability. > > https://www.spoj.pl/problems/INTEST/ > > I can see that 24 haskell programmers have solved it, but most are very > close to the 8 secs limit (and 6/24 are even over the limit!). > > Here's my code. It fails with a "time limit exceeded" error. (I think it > would calculate the correct result, eventually). > > module Main where > > import qualified Data.List as DLi > import qualified System.IO as SIO > > main :: IO () > main = do > line1 <- SIO.hGetLine SIO.stdin > let k = read $ words line1 !! 1 > s <- SIO.hGetContents SIO.stdin > print $ count s k > > count :: String -> Int -> Int > count s k = DLi.foldl' foldFunc 0 (map read $ words s) > where > foldFunc :: Int -> Int -> Int > foldFunc a b > | mod b k == 0 = a+1 > | otherwise = a > > > I tried using Data.ByteString but then found that 'read' needs a String, > not a ByteString. > I tried using buffered IO, but it did not make any difference. > > Any suggestions on how to speed it up? > > Regards, > Steve Did you try readInt? http://hackage.haskell.org/packages/archive/bytestring/0.9.1.4/doc/html/Data-ByteString-Char8.html#23 -- gwern ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Slow IO?
Hi, I'm tackling a Sphere Online Judge tutorial question where it tests how fast you can process input data. You need to achieve at least 2.5MB of input data per second at runtime (on an old machine running ghc 6.6.1). This is probably close to the limit of Haskell's ability. https://www.spoj.pl/problems/INTEST/ I can see that 24 haskell programmers have solved it, but most are very close to the 8 secs limit (and 6/24 are even over the limit!). Here's my code. It fails with a "time limit exceeded" error. (I think it would calculate the correct result, eventually). module Main where import qualified Data.List as DLi import qualified System.IO as SIO main :: IO () main = do line1 <- SIO.hGetLine SIO.stdin let k = read $ words line1 !! 1 s <- SIO.hGetContents SIO.stdin print $ count s k count :: String -> Int -> Int count s k = DLi.foldl' foldFunc 0 (map read $ words s) where foldFunc :: Int -> Int -> Int foldFunc a b | mod b k == 0 = a+1 | otherwise = a I tried using Data.ByteString but then found that 'read' needs a String, not a ByteString. I tried using buffered IO, but it did not make any difference. Any suggestions on how to speed it up? Regards, Steve ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO or bad code?
On 8/9/07, Marc Weber <[EMAIL PROTECTED]> wrote: > > I wrote a code, but seems to give "Time limit exceeded"! > ?? > Your code writes > 15 to stdout which is correct (with the example given on the page).. > You have to explain what you mean by >>seems to give "Time limit exceeded"<< > > > loop t function > Does already exist. > sequence $ replicate 10 function > is a much shorter way :-) > > oor perhaps mapM_ [ function | i <- [1..10] ] ) > > prod, to_int: > You can both implement using higher order functions > > prod = sum . zipWith (*) > to_int = map read > Thanks :) Yes, I see no reason why the code should be rejected by the judge (Time limit exceeded) just because I had defined all the functions. I had done this on many other occasions, and they all had worked well. I think that it has a lot to do with the (read) function and how it is implemented. So parsing takes quite a bit of time, and eventually most of the time gets used up in processing input. But the new functions are wonderful :-) I had a hunch that these functions should have been defined, but I learnt a lot in the process of writing those functions again! -- Vimal ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO or bad code?
On 8/9/07, Brent Yorgey <[EMAIL PROTECTED]> wrote: > On 8/9/07, Chaddaï Fouché <[EMAIL PROTECTED]> wrote: > > I get "Wrong answer" with the following code for the same problem... > > Is there something strange in this code : > > This problem description is not worded very well. You have to figure out > the matching that maximizes the sum of hotnesses; you don't necessarily just > do a sum . zipWith (*). > Exactly. The description says: "Company XYZ has done the work for you, and now do xxx". This confused me a lot :-) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO or bad code?
On 8/9/07, Chaddaï Fouché <[EMAIL PROTECTED]> wrote: > > I get "Wrong answer" with the following code for the same problem... > Is there something strange in this code : This problem description is not worded very well. You have to figure out the matching that maximizes the sum of hotnesses; you don't necessarily just do a sum . zipWith (*). -Brent ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO or bad code?
Note that this code isn't more successful, clearly I have misunderstood one requirement : import qualified Data.ByteString.Char8 as B import Data.List (unfoldr) main = B.interact $ hot hot = B.unlines . map (B.pack . show) . processList . tail . unfoldr readInt1 readInt1 cs = do (n, cs') <- B.readInt cs return (n, B.tail cs') processList [] = [] processList (x:xs) = (sum $ zipWith (*) men women) : processList rest where (men,r1) = splitAt x xs (women,rest) = splitAt x r1 -- Jedaï ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO or bad code?
I get "Wrong answer" with the following code for the same problem... Is there something strange in this code : module Main where import qualified Data.ByteString.Char8 as B main = B.getLine >>= sequence_ . flip replicate hot . maybe 0 fst . B.readInt hot = do B.getLine men <- B.getLine women <- B.getLine print $ sum $ zipWith (*) (map (maybe 0 fst . B.readInt) $ B.words men) (map (maybe 0 fst . B.readInt) $ B.words women) ??? I get the expected results with my tests. -- Jedaï ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO or bad code?
On 8/9/07, Marc Weber <[EMAIL PROTECTED]> wrote: > > > I wrote a code, but seems to give "Time limit exceeded"! > ?? > Your code writes > 15 to stdout which is correct (with the example given on the page).. > You have to explain what you mean by >>seems to give "Time limit > exceeded"<< > I think Vimal is referring to a message from SPOJ rather than the compiler. I.e. the program runs too slowly so it is rejected by the judging software. -Brent ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO or bad code?
> I wrote a code, but seems to give "Time limit exceeded"! ?? Your code writes 15 to stdout which is correct (with the example given on the page).. You have to explain what you mean by >>seems to give "Time limit exceeded"<< > loop t function Does already exist. sequence $ replicate 10 function is a much shorter way :-) oor perhaps mapM_ [ function | i <- [1..10] ] ) prod, to_int: You can both implement using higher order functions prod = sum . zipWith (*) to_int = map read Marc ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO or bad code?
@Donald: Thanks for the link. > prod = sum . zipWith (*) > > This is the slow part. Prelude.read ist really slow. > > Futhermore use the recusion pattern again: > to_int = map read > > What is n used for? @Lutz: Those are some nice tricks... Thanks! Now, the 'n' is for getting the number of numbers in the list. Which I don't need, since I had a way around it. I just had to skip that line... ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO or bad code?
* Vimal wrote: >>> Beginning of CODE > loop t function > | t == 1 = do function > | otherwise = do { function; loop (t - 1) function } > > prod [] [] = 0 > prod (a:as) (b:bs) = a*b + prod as bs prod = sum . zipWith (*) > to_int :: [String] -> [Integer] > to_int [] = [] > to_int (x:xs) = (read x) : to_int xs This is the slow part. Prelude.read ist really slow. Futhermore use the recusion pattern again: to_int = map read > doit = do > n <- getLine > a <- getLine > b <- getLine > let la = to_int (words a); > lb = to_int (words b); in > print (prod la lb) What is n used for? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO or bad code?
j.vimal: > Hi > I am practicing writing code in haskell, by solving problems at this > site. http://spoj.pl. > The problem http://spoj.pl/problems/FASHION , is pretty simple. > > 1. Given two lists A,B , of N numbers, sort them and take sum of products. > i.e. Sum ai * bi > > I wrote a code, but seems to give "Time limit exceeded"! We have a page for these SPOJ problems: http://haskell.org/haskellwiki/SPOJ#Techniques_for_dealing_with_problems_efficiently With bytestring IO, you can aim to be around the speed of OCaml or C++, according to the existing bytestring entries. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Slow IO or bad code?
Hi I am practicing writing code in haskell, by solving problems at this site. http://spoj.pl. The problem http://spoj.pl/problems/FASHION , is pretty simple. 1. Given two lists A,B , of N numbers, sort them and take sum of products. i.e. Sum ai * bi I wrote a code, but seems to give "Time limit exceeded"! >> Beginning of CODE loop t function | t == 1 = do function | otherwise = do { function; loop (t - 1) function } prod [] [] = 0 prod (a:as) (b:bs) = a*b + prod as bs to_int :: [String] -> [Integer] to_int [] = [] to_int (x:xs) = (read x) : to_int xs doit = do n <- getLine a <- getLine b <- getLine let la = to_int (words a); lb = to_int (words b); in print (prod la lb) main = do t <- getLine loop (read t) doit << END OF CODE I would love to see if there is any improvement that can be done, to the code ... Thanks! Vimal ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO
Daniel Fischer <[EMAIL PROTECTED]> writes: > Yes, I did it wrong, but I didn't keep anything (but the first and last Char > of each line) in memory on purpose. I hoped for the lines to be read one > after the other, head and last extracted [...] > Profiling (hy,hc) shows that the IO part of the programme holds on to tons of > lists - that couldn't be anything but parts of the file-contents, I believe. Chances are that you don't evaluate these characters right away, so what you are retaining is just unevaluated thunks that refer to the lines themselves. So the fix is to make the extraction of the characters stricter. -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO
Am Mittwoch, 13. September 2006 11:07 schrieben Sie: > Daniel Fischer wrote: > > > Most certainly not. I'm pretty sure this is to a bug in your code. > > > Something retains a data structure which is actually unneeded. > > > Probably > > > > Apparently. And my money is on a load of lines from the file (of which I > > need only the first and last Char). > > Then you're doing it wrong[TM]. You shouldn't need to keep any part of Yes, I did it wrong, but I didn't keep anything (but the first and last Char of each line) in memory on purpose. I hoped for the lines to be read one after the other, head and last extracted - possibly immediately passed to accumArray, but I wouldn't necessarily expect that - and the already used lines thrown in the dustbin on next GC. Maybe the compiler couldn't figure out that it wouldn't access these lines anymore. > the input in memory. Whatever it is, nobody can tell you without seeing > the code. Try heap profiling, should you have no idea where to look for > leaks. Profiling (hy,hc) shows that the IO part of the programme holds on to tons of lists - that couldn't be anything but parts of the file-contents, I believe. > > > How could I solve the problem without representing the graph in some way? > > By using an advanced tool called "brains". Sorry for not being more > specific, but that's actually the fun part of the challenge and I'm not > going to spoil it for you. ;-) > > > Forgive the stupid question, but where if not RAM would the chunk > > currently processed reside? > > Oh, I overlooked "chunk". Well, yes, the "chunk" currently processed > needs to fit into RAM. But how much of a problem could a single Char > pose? Well, if it's the last straw... But not much, I presume and even though it might be that we must have a few thousand Chars inmemory, that shouldn't do much harm either. > > Donald Bruce Stewart wrote: > > I agree. Some problems simply require you to hold large strings in > > memory. And for those, [Char] conks out around 5-10M (try reversing a > > 10M [Char]). > > Sure, this one just isn't of that kind. Yes, but I didn't tell the compiler :-( > > > Udo. Cheers, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Slow IO
Hello Ketil, Wednesday, September 13, 2006, 10:41:13 AM, you wrote: > But a String is something like 8 or 12 bytes per character, a > ByteString gets you down to 1. 12-16. Char itself, pointer to the next list element, and two boxes around them - this count for 16 bytes on 32-bit CPU. but cells with small Char are preallocated at program startup, so it will be 12 bytes for ascii-only strings but that is not the whole story :) copying GC makes program's memory usage 3 times larger, on average, than it really allocates while compacting GC has only 2x overhead ByteStrings are allocated in _unmovable_ part of GHC heap, so they don't suffer from this problem. of course, it is not free - program that creates and destroys large number of ByteStrings will suffer from memory holes, which is right the problem solved by ghc's GC so, for program that only allocates the difference may be 24/36 times on average while for create-use-destroy-loop scenarios i can't make any detailed prognoses also, traversing those lengthy lists on GCs is very time-consuming -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO
Ketil Malde wrote: > Daniel Fischer <[EMAIL PROTECTED]> writes: > > > Maybe I've misused the word segfault. > > I think so. A segfault is the operating-system complaining about an > illegal memory access. If you get them from Haskell, it is likely a > bug in the compiler or run-time system (or you were using unsafeAt, or > FFI). Far simpler: This is really a segfault, and it's because of a misfeature of Linux called "memory overcommitment". When physical memory runs out, Linux happily hands out more to applications requesting it, in the vain hope that at least some of it is never accessed. Therefore, malloc() is always successful, but when the memory is finally accessed, it suddenly turns out that there isn't anything to access, which results in a segfault. No amount of error checking can prevent that and it could have hit any process allocating memory when it ran out. Sane people turn overcommitment off. Sane people wouldn't have implemented it in the first place, either. Udo. -- The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. -- George Bernard Shaw signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO
Daniel Fischer wrote: > > Most certainly not. I'm pretty sure this is to a bug in your code. > > Something retains a data structure which is actually unneeded. Probably > > Apparently. And my money is on a load of lines from the file (of which I need > only the first and last Char). Then you're doing it wrong[TM]. You shouldn't need to keep any part of the input in memory. Whatever it is, nobody can tell you without seeing the code. Try heap profiling, should you have no idea where to look for leaks. > How could I solve the problem without representing the graph in some way? By using an advanced tool called "brains". Sorry for not being more specific, but that's actually the fun part of the challenge and I'm not going to spoil it for you. ;-) > Forgive the stupid question, but where if not RAM would the chunk currently > processed reside? Oh, I overlooked "chunk". Well, yes, the "chunk" currently processed needs to fit into RAM. But how much of a problem could a single Char pose? Donald Bruce Stewart wrote: > I agree. Some problems simply require you to hold large strings in > memory. And for those, [Char] conks out around 5-10M (try reversing a > 10M [Char]). Sure, this one just isn't of that kind. Udo. -- "Irrationality is the square root of all evil" -- Douglas Hofstadter signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO
Daniel Fischer <[EMAIL PROTECTED]> writes: > Maybe I've misused the word segfault. I think so. A segfault is the operating-system complaining about an illegal memory access. If you get them from Haskell, it is likely a bug in the compiler or run-time system (or you were using unsafeAt, or FFI). > The programme consumed more and more memory (according to top), > kswapd started to have a higher CPU-percentage than my programme, > programme died, system yelling 'Speicherzugriffsfehler', top displays > 'kswapd'. I find that swapping the GHC heap is not productive, so I tend to limit memory to available RAM. You can do that either by limiting available memory to process in the shell (ulimit or limit), or by specifying RTS options to the haskell program (typically +RTS -Mxxx, where xxx is 80-90% of physical RAM). >From the GHC docs (6.4.2): -Msize [Default: unlimited] Set the maximum heap size to size bytes. The I thought the default was set according to limits? heap normally grows and shrinks according to the memory requirements of the program. The only reason for having this option is to stop the heap growing without bound and filling up all the available swap space, which at the least will result in the program being summarily killed by the operating system. In my experience, a program which thrashes severely without heap limits can run fine if you just limit the heap. So it's not as much an issue of 'filling up swap' vs. 'killed by the OS', but 'takes forever, making the system unresponsive in the process' vs. 'tries hard to complete in a timely fashion, or halts with an error'. I much prefer the latter. > However the problem might be too little lazyness, because if I explicitly > read > the file line by line, memory usage remains low enough -- but ByteString is > *much* faster anyway. Right. You should still try to consume the file lazily, and make sure the data can get discarded and GC'ed when you have no further use for it. But a String is something like 8 or 12 bytes per character, a ByteString gets you down to 1. -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO
daniel.is.fischer: > Am Dienstag, 12. September 2006 22:26 schrieben Sie: > > Daniel Fischer wrote: > > > The programme consumed more and more memory (according to top), > > > kswapd started to have a higher CPU-percentage than my programme, > > > programme died, system yelling 'Speicherzugriffsfehler', top displays > > > 'kswapd'. > > > I believe that means my programme demanded more memory than I have > > > available (only 256MB RAM + 800MB swap). Is that a segfault or what is > > > the correct term? > > > > > > That is probably due to (apart from the stupidity of my IO-code) the > > > large overhead of Haskell lists. > > > > Most certainly not. I'm pretty sure this is to a bug in your code. > > Something retains a data structure which is actually unneeded. Probably > > Apparently. And my money is on a load of lines from the file (of which I need > only the first and last Char). > > > a case of "foldl" where "foldl'" should be used or a "try" in Parsec > > code where it should be left out or a lot of "updateWiths" to a Map, > > etc. Or it could be a bad choice of data structure. I bet, it's the > > map you're using to represent the graph (which you don't even need to > > represent at all, btw). > > No foldl nor parsec around. I represent the graph as a > > UArray (Char,Char) Int > > (I've switched to Int for the index type, too, when tuning the code), so that > shouldn't use much memory (array size is 676). > The array is built via accumArray, I hope that's sufficiently efficient > (though now I use unsafeAccumArrayUArray, that's faster). > > How could I solve the problem without representing the graph in some way? > Possibly that could be done more efficiently than I do it, but I can't > imagine > how to do it without representing the graph in some data structure. > > > > > So the chunk of the file which easily fits into my > > > RAM in ByteString form is too large as a list of ordinary Strings. > > > > The chunk of file should never need to fit into RAM. If that's a > > problem, you also forgot to prime a crucial "foldl". > > > > Forgive the stupid question, but where if not RAM would the chunk currently > processed reside? I agree. Some problems simply require you to hold large strings in memory. And for those, [Char] conks out around 5-10M (try reversing a 10M [Char]). -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO
Am Dienstag, 12. September 2006 22:26 schrieben Sie: > Daniel Fischer wrote: > > The programme consumed more and more memory (according to top), > > kswapd started to have a higher CPU-percentage than my programme, > > programme died, system yelling 'Speicherzugriffsfehler', top displays > > 'kswapd'. > > I believe that means my programme demanded more memory than I have > > available (only 256MB RAM + 800MB swap). Is that a segfault or what is > > the correct term? > > > > That is probably due to (apart from the stupidity of my IO-code) the > > large overhead of Haskell lists. > > Most certainly not. I'm pretty sure this is to a bug in your code. > Something retains a data structure which is actually unneeded. Probably Apparently. And my money is on a load of lines from the file (of which I need only the first and last Char). > a case of "foldl" where "foldl'" should be used or a "try" in Parsec > code where it should be left out or a lot of "updateWiths" to a Map, > etc. Or it could be a bad choice of data structure. I bet, it's the > map you're using to represent the graph (which you don't even need to > represent at all, btw). No foldl nor parsec around. I represent the graph as a UArray (Char,Char) Int (I've switched to Int for the index type, too, when tuning the code), so that shouldn't use much memory (array size is 676). The array is built via accumArray, I hope that's sufficiently efficient (though now I use unsafeAccumArrayUArray, that's faster). How could I solve the problem without representing the graph in some way? Possibly that could be done more efficiently than I do it, but I can't imagine how to do it without representing the graph in some data structure. > > > So the chunk of the file which easily fits into my > > RAM in ByteString form is too large as a list of ordinary Strings. > > The chunk of file should never need to fit into RAM. If that's a > problem, you also forgot to prime a crucial "foldl". > Forgive the stupid question, but where if not RAM would the chunk currently processed reside? > > Udo. Cheers, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO
Daniel Fischer wrote: > The programme consumed more and more memory (according to top), > kswapd started to have a higher CPU-percentage than my programme, > programme died, system yelling 'Speicherzugriffsfehler', top displays > 'kswapd'. > I believe that means my programme demanded more memory than I have available > (only 256MB RAM + 800MB swap). Is that a segfault or what is the correct > term? > > That is probably due to (apart from the stupidity of my IO-code) the large > overhead of Haskell lists. Most certainly not. I'm pretty sure this is to a bug in your code. Something retains a data structure which is actually unneeded. Probably a case of "foldl" where "foldl'" should be used or a "try" in Parsec code where it should be left out or a lot of "updateWiths" to a Map, etc. Or it could be a bad choice of data structure. I bet, it's the map you're using to represent the graph (which you don't even need to represent at all, btw). > So the chunk of the file which easily fits into my > RAM in ByteString form is too large as a list of ordinary Strings. The chunk of file should never need to fit into RAM. If that's a problem, you also forgot to prime a crucial "foldl". Udo. -- "Proof by analogy is fraud." -- Bjarne Stroustrup signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO
Maybe I've misused the word segfault. What happened is: The programme consumed more and more memory (according to top), kswapd started to have a higher CPU-percentage than my programme, programme died, system yelling 'Speicherzugriffsfehler', top displays 'kswapd'. I believe that means my programme demanded more memory than I have available (only 256MB RAM + 800MB swap). Is that a segfault or what is the correct term? I thought a segfault was when a program reads or writes to protected memory (for example, out of bounds on an array, or dereferencing a null pointer, etc. things mostly avoided in Haskell). Luckily there are lots of fun ways to crash computers! I've heard your "heap overflow" called "exploding" because of how the process rapdily expands its memory usage, filling all available space, causing system collapse. Jared. -- http://www.updike.org/~jared/ reverse ")-:" ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO
Am Montag, 11. September 2006 18:22 schrieben Sie: > On 9/11/06, Daniel Fischer <[EMAIL PROTECTED]> wrote: > > The problem spec states that the input file contains about 500 test > > cases, each given by between 1 and 100,000 lines, each line containing a > > single word of between 2 and 1000 letters. > > So the file should be about 12.5G on average. > > I don't think that that necessarily follows. Although I've never seen Not necessarily of course. But as the problem contains a warning about large input/output data, I assumed the worst reasonable case (uniform distribution). > the input file, of course, I imagine that many cases are fairly small, > but designed to test the accuracy of your algorithm. A few are large > (in one way or another) to test the extremes of your algorithm. But > the overall size of the input file is probably much, much smaller than > that estimate. (Maybe 1MB? Maybe 10MB?) That'd be peanuts with ByteString, 1MB even without. > > > A time limit of 7s is given. > > That's CPU time, and thus not including I/O, right? I couldn't find Doesn't IO use the CPU? But seriously, I would have thought that's what 'time' lists under user and that includes IO-time as far as I can tell. > the answer on the SPOJ site. Their FAQ is a forum, but I don't see it > there: > > > http://www.spoj.pl/forum/viewforum.php?f=6&sid=6c8fb9c3216c3abd1e720f8b4b56 >82b3 > > In any case, the problem can't be too large; the top twenty programs > all finished in under 0.35 (CPU seconds?). Even if yours is a tenth as > fast as the C and C++ programs, that would be 3.5s -- twice as fast as > it needs to be. > > http://www.spoj.pl/ranks/WORDS1/ > > Of course, you don't have access to these other programs for > comparison; but I hope that this gives you a better idea of the size > (and manageability) of the task. > > Good luck with it! > > --Tom Phoenix -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO
Am Montag, 11. September 2006 17:44 schrieben Sie: > Daniel Fischer wrote: > > > Try Don Stewart's ByteString library > > > > -- and some data that made the programme segfault before now run in a > > couple of seconds. > > But that shouldn't happen! Segfaults aren't performance problems, but > errors. So your program contained a bug (lazyness where it isn't > useful, I suspect), and ByString is hiding it. Maybe I've misused the word segfault. What happened is: The programme consumed more and more memory (according to top), kswapd started to have a higher CPU-percentage than my programme, programme died, system yelling 'Speicherzugriffsfehler', top displays 'kswapd'. I believe that means my programme demanded more memory than I have available (only 256MB RAM + 800MB swap). Is that a segfault or what is the correct term? That is probably due to (apart from the stupidity of my IO-code) the large overhead of Haskell lists. So the chunk of the file which easily fits into my RAM in ByteString form is too large as a list of ordinary Strings. However the problem might be too little lazyness, because if I explicitly read the file line by line, memory usage remains low enough -- but ByteString is *much* faster anyway. > > > So even if we just counted newlines, we would have to scan 1,700 million > > (1.7*10^9) chars per second. > > Could any ordinary computer of today do that, using whatever language? > > That rate should be no problem for a 2GHz machine. However, a 200MHz 64 > bit wide bus won't deliver the data fast enough and it is 50x as much as > the best hard disks could deliver in optimal circumstances. I guess, > most of the test cases are a lot smaller than your guesstimate. > I suppose so, too, but not knowing the test data, I assumed a bad case (uniform distribution of line-lengths and test-case-size in the specified range). > > Udo. Bulat: > are you mean arithmetic or geometric average? ;) I meant 'expected value'. If X_i are independent random variables uniformly distributed on [0 .. k], Y is a random variable (independent from the X_i) uniformly distributed on [1 .. n] and Z is the sum of the first Y of the X_i, then the expected value of Z is (n+1)*k/4. So we might call that a weighted arithmetic average, I suppose. Cheers, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Re: [Haskell-cafe] Slow IO
On 9/11/06, Daniel Fischer <[EMAIL PROTECTED]> wrote: The problem spec states that the input file contains about 500 test cases, each given by between 1 and 100,000 lines, each line containing a single word of between 2 and 1000 letters. So the file should be about 12.5G on average. I don't think that that necessarily follows. Although I've never seen the input file, of course, I imagine that many cases are fairly small, but designed to test the accuracy of your algorithm. A few are large (in one way or another) to test the extremes of your algorithm. But the overall size of the input file is probably much, much smaller than that estimate. (Maybe 1MB? Maybe 10MB?) A time limit of 7s is given. That's CPU time, and thus not including I/O, right? I couldn't find the answer on the SPOJ site. Their FAQ is a forum, but I don't see it there: http://www.spoj.pl/forum/viewforum.php?f=6&sid=6c8fb9c3216c3abd1e720f8b4b5682b3 In any case, the problem can't be too large; the top twenty programs all finished in under 0.35 (CPU seconds?). Even if yours is a tenth as fast as the C and C++ programs, that would be 3.5s -- twice as fast as it needs to be. http://www.spoj.pl/ranks/WORDS1/ Of course, you don't have access to these other programs for comparison; but I hope that this gives you a better idea of the size (and manageability) of the task. Good luck with it! --Tom Phoenix ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Slow IO
Hello Daniel, Monday, September 11, 2006, 6:05:38 PM, you wrote: > The problem spec states that the input file contains about 500 test cases, > each given by between 1 and 100,000 lines, each line containing a single word > of between 2 and 1000 letters. > So the file should be about 12.5G on average. are you mean arithmetic or geometric average? ;) -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO
Am Sonntag, 10. September 2006 02:29 schrieben Sie: > Hello, > > Try Don Stewart's ByteString library > (http://www.cse.unsw.edu.au/~dons/fps.html). It is much faster than > the standard Haskell IO and now has lazy. > > -Jeff Yay, that's really an improvement! Depending on the size of the file/graph data, using ByteString.Lazy.Char8 reduces the run time by ranging from 'just a little' to a factor of over 30 -- and some data that made the programme segfault before now run in a couple of seconds. Thanks Bryan, Simon, David, Don & Duncan. However, the programme still spends the overwhelming part of the time doing IO, and I've started thinking about it. The problem spec states that the input file contains about 500 test cases, each given by between 1 and 100,000 lines, each line containing a single word of between 2 and 1000 letters. So the file should be about 12.5G on average. A time limit of 7s is given. So even if we just counted newlines, we would have to scan 1,700 million (1.7*10^9) chars per second. Could any ordinary computer of today do that, using whatever language? Cheers, Daniel > > On 9/9/06, Daniel Fischer <[EMAIL PROTECTED]> wrote: > > Hello all, > > Now I have an IO-problem, too. > > SPOJ problem 41 asks basically to determine whether a directed graph > > allows a path that uses every edge exactly once. The data from which the > > graphs are to be constructed are contained in a (huge) file, every item > > (number of test cases, size of test case, edges) on a single line. I've > > created my own test case file (much smaller, but already 217 MB, 2.2 > > million lines) to check my programme's performance. Now profiling reveals > > that over 90% of time and allocation are consumed by reading the file > > (line by line, reading the whole file as a lazy String and then chopping > > that to pieces is rather worse). > > > > So how do I quickly > > 1. read an Integer n from the file, > > 2. read the next n lines and > > 3. pass the first and last letter of those to the function that builds > > and checks the graph? > > > > When that is achieved, I could see whether my algorithm is good. > > > > Thanks for any help, > > Daniel > > -- > > -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Slow IO
Hello, Try Don Stewart's ByteString library (http://www.cse.unsw.edu.au/~dons/fps.html). It is much faster than the standard Haskell IO and now has lazy. -Jeff On 9/9/06, Daniel Fischer <[EMAIL PROTECTED]> wrote: Hello all, Now I have an IO-problem, too. SPOJ problem 41 asks basically to determine whether a directed graph allows a path that uses every edge exactly once. The data from which the graphs are to be constructed are contained in a (huge) file, every item (number of test cases, size of test case, edges) on a single line. I've created my own test case file (much smaller, but already 217 MB, 2.2 million lines) to check my programme's performance. Now profiling reveals that over 90% of time and allocation are consumed by reading the file (line by line, reading the whole file as a lazy String and then chopping that to pieces is rather worse). So how do I quickly 1. read an Integer n from the file, 2. read the next n lines and 3. pass the first and last letter of those to the function that builds and checks the graph? When that is achieved, I could see whether my algorithm is good. Thanks for any help, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Slow IO
Hello all, Now I have an IO-problem, too. SPOJ problem 41 asks basically to determine whether a directed graph allows a path that uses every edge exactly once. The data from which the graphs are to be constructed are contained in a (huge) file, every item (number of test cases, size of test case, edges) on a single line. I've created my own test case file (much smaller, but already 217 MB, 2.2 million lines) to check my programme's performance. Now profiling reveals that over 90% of time and allocation are consumed by reading the file (line by line, reading the whole file as a lazy String and then chopping that to pieces is rather worse). So how do I quickly 1. read an Integer n from the file, 2. read the next n lines and 3. pass the first and last letter of those to the function that builds and checks the graph? When that is achieved, I could see whether my algorithm is good. Thanks for any help, Daniel -- "In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt." -- Blair P. Houghton ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe