Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform
Stefan O'Rear wrote: Mr. C++ apparently isn't a very good C++ programmer, since his best effort absolutely *pales* in comparison to Julian Seward's BWT: [EMAIL PROTECTED]:/usr/local/src/hpaste$ head -c 135000 /usr/share/dict/words | (time bzip2 -vvv) /dev/null (stdin): block 1: crc = 0x25a18961, combined CRC = 0x25a18961, size = 135000 0 work, 135000 block, ratio 0.00 135000 in block, 107256 after MTF 1-2 coding, 61+2 syms in use initial group 6, [0 .. 0], has 20930 syms (19.5%) initial group 5, [1 .. 1], has 4949 syms ( 4.6%) initial group 4, [2 .. 2], has 20579 syms (19.2%) initial group 3, [3 .. 4], has 17301 syms (16.1%) initial group 2, [5 .. 10], has 24247 syms (22.6%) initial group 1, [11 .. 62], has 19250 syms (17.9%) pass 1: size is 127140, grp uses are 339 550 192 440 12 613 pass 2: size is 51693, grp uses are 321 440 288 316 139 642 pass 3: size is 51358, grp uses are 329 387 376 304 122 628 pass 4: size is 51302, grp uses are 298 421 397 304 125 601 bytes: mapping 21, selectors 433, code lengths 110, codes 51297 final combined CRC = 0x25a18961 2.602:1, 3.075 bits/byte, 61.57% saved, 135000 in, 51887 out. real0m0.165s user0m0.044s sys 0m0.012s Yup, does slightly more work (huffman coding) in 1/200 the time :) (Note, on my system .Lazy BWT3 takes 5.3s on the same input) ...OK...so how do I make Haskell go faster still? Presumably by transforming the code into an ugly mess that nobody can read any more...? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform
...OK...so how do I make Haskell go faster still? Presumably by transforming the code into an ugly mess that nobody can read any more...? http://haskell.org/haskellwiki/Performance -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform
Bulat Ziganshin wrote: Hello Andrew, bwt transformation is very good researched area, so probably you will not get decent performance (megabytes per second) without lot of work. Hey, I'm just glad I managed to get within striking distance of Mr C++. So much for Haskell being inherently less performant. :-P and of course, no Haskell at all. take look at http://darchiver.narod.ru/dark/Archon3fs.zip Mmm, ok... ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform
Hello Andrew, Saturday, June 23, 2007, 2:45:01 PM, you wrote: Hey, I'm just glad I managed to get within striking distance of Mr C++. So much for Haskell being inherently less performant. :-P my little analysis says that it's probably due to different sort() implementations, so this says nothing about general case -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform
Bulat Ziganshin wrote: Hello Andrew, Saturday, June 23, 2007, 2:45:01 PM, you wrote: Hey, I'm just glad I managed to get within striking distance of Mr C++. So much for Haskell being inherently less performant. :-P my little analysis says that it's probably due to different sort() implementations, so this says nothing about general case The point being that lots of people look at Haskell and go oh, that's very cute for writing trivial example code, but it can never be fast; for that you must use C or C++. Well, I altered the code, and it's *still* very short and very readable, and it's just as fast as the (3 pages long) C++ version. :-D ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform
On Saturday 23 June 2007 13:02:54 Andrew Coppin wrote: Well, I altered the code, and it's *still* very short and very readable, and it's just as fast as the (3 pages long) C++ version. :-D Indeed. The performance of modern functional programming languages never ceases to amaze me. INRIA typically claim performance within 2x of C for OCaml but there are very few programs where OCaml isn't within 50% now, at least on AMDs. Most of the pedagogical examples of functional programming (e.g. n-queens) are too academic for general consumption but there are plenty of excellent examples. Burrows-Wheeler is a great one. Ray tracing is another. Sudoku solving is fairy good, albeit unusually well suited to arrays. I also like to compare the performance of term-level interpreters or rewriters implemented in different languages. I think it would be interesting to compare Scheme/Mathematica interpreters written in OCaml, SML (MLton) and Haskell, for example. I've benchmarked some interpreters in OCaml and SML (MLton) and MLton's whole-program optimizations are a huge benefit here, making it several times faster than OCaml. I'd like to know how well Haskell would do. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. The OCaml Journal http://www.ffconsultancy.com/products/ocaml_journal/?e ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform
Hello Andrew, Saturday, June 23, 2007, 4:02:54 PM, you wrote: The point being that lots of people look at Haskell and go oh, that's very cute for writing trivial example code, but it can never be fast; for that you must use C or C++. and that's true :) as i said, your C++ code is very far from perfect. by comparing with bad code you can find that even Logo is faster than asm -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Need for speed: the Burrows-Wheeler Transform
OK, so I *started* writing a request for help, but... well I answered my own question! See the bottom... I was reading Wikipedia, and I found this: http://en.wikipedia.org/wiki/Burrows-Wheeler_transform I decided to sit down and see what that looks like in Haskell. I came up with this: module BWT where import Data.List rotate1 :: [x] - [x] rotate1 [] = [] rotate1 xs = last xs : init xs rotate :: [x] - [[x]] rotate xs = take (length xs) (iterate rotate1 xs) bwt :: (Ord x) = [x] - [x] bwt = map last . sort . rotate step :: (Ord x) = [x] - [[x]] - [[x]] step xs = zipWith (:) xs . sort inv_bwt :: (Ord x) = x - [x] - [x] inv_bwt mark xs = head . filter (\xs - head xs == mark) . head . drop ((length xs) - 1) . iterate (step xs) . map (\x - [x]) $ xs My my, isn't that SO much shorter? I love Haskell! :-D Unfortunately, the resident C++ expert fails to grasp the concept of example code, and insists on comparing the efficiency of this program to the C one on the website. Fact is, he's translated the presented C into C++, and it can apparently transform a 145 KB file in 8 seconds using only 3 MB of RAM. The code above, however, took about 11 seconds to transform 4 KB of text, and that required about 60 MB of RAM. (I tried larger, but the OS killed the process for comsuming too much RAM.) Well anyway, the code was written for simplicity, not efficiency. I've tried to explain this, but apparently that is beyond his comprehension. So anyway, it looks like we have a race on. :-D The first thing I did was the optimisation mentioned on Wikipedia: you don't *need* to build a list of lists. You can just throw pointers around. So I arrived at this: module BWT2 (bwt) where import Data.List rotate :: Int - [x] - Int - [x] rotate l xs n = (drop (l-n) xs) ++ (take (l-n) xs) bwt xs = let l = length xs ys = rotate l xs in map (last . rotate l xs) $ sortBy (\n m - compare (ys n) (ys m)) [0..(l-1)] This is indeed *much* faster. With this, I can transform 52 KB of text in 9 minutes + 60 MB RAM. The previous version seemed to have quadratic memory usage, whereas this one seems to be linear. 52 KB would have taken many months with the first version! Still, 9 minutes (for a file 3 times smaller) is nowhere near 8 seconds. So we must try harder... For my next trick, ByteStrings! (Never used them before BTW... this is my first try!) module BWT3 (bwt) where import Data.List import qualified Data.ByteString as Raw rotate :: Int - Raw.ByteString - Int - Raw.ByteString rotate l xs n = (Raw.drop (l-n) xs) `Raw.append` (Raw.take (l-n) xs) bwt xs = let l = Raw.length xs ys = rotate l xs in Raw.pack $ map (Raw.last . rotate l xs) $ sortBy (\n m - compare (ys n) (ys m)) [0..(l-1)] Now I can transform 52 KB in 54 seconds + 30 MB RAM. Still nowhere near C++, but a big improvement none the less. Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM! Vast speed increases... Jeepers, I can transform 52 KB so fast I can't even get to Task Manager fast enough to *check* the RAM usage! Blimey... OK, just tried the 145 KB test file that Mr C++ used. That took 2 seconds + 43 MB RAM. Ouch. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform
On Fri, Jun 22, 2007 at 09:26:40PM +0100, Andrew Coppin wrote: OK, so I *started* writing a request for help, but... well I answered my own question! See the bottom... ... Unfortunately, the resident C++ expert fails to grasp the concept of example code, and insists on comparing the efficiency of this program to the C one on the website. ... Fact is, he's translated the presented C into C++, and it can apparently transform a 145 KB file in 8 seconds using only 3 MB of RAM. The code ... Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM! Vast speed increases... Jeepers, I can transform 52 KB so fast I can't even get to Task Manager fast enough to *check* the RAM usage! Blimey... OK, just tried the 145 KB test file that Mr C++ used. That took 2 seconds + 43 MB RAM. Ouch. Mr. C++ apparently isn't a very good C++ programmer, since his best effort absolutely *pales* in comparison to Julian Seward's BWT: [EMAIL PROTECTED]:/usr/local/src/hpaste$ head -c 135000 /usr/share/dict/words | (time bzip2 -vvv) /dev/null (stdin): block 1: crc = 0x25a18961, combined CRC = 0x25a18961, size = 135000 0 work, 135000 block, ratio 0.00 135000 in block, 107256 after MTF 1-2 coding, 61+2 syms in use initial group 6, [0 .. 0], has 20930 syms (19.5%) initial group 5, [1 .. 1], has 4949 syms ( 4.6%) initial group 4, [2 .. 2], has 20579 syms (19.2%) initial group 3, [3 .. 4], has 17301 syms (16.1%) initial group 2, [5 .. 10], has 24247 syms (22.6%) initial group 1, [11 .. 62], has 19250 syms (17.9%) pass 1: size is 127140, grp uses are 339 550 192 440 12 613 pass 2: size is 51693, grp uses are 321 440 288 316 139 642 pass 3: size is 51358, grp uses are 329 387 376 304 122 628 pass 4: size is 51302, grp uses are 298 421 397 304 125 601 bytes: mapping 21, selectors 433, code lengths 110, codes 51297 final combined CRC = 0x25a18961 2.602:1, 3.075 bits/byte, 61.57% saved, 135000 in, 51887 out. real0m0.165s user0m0.044s sys 0m0.012s Yup, does slightly more work (huffman coding) in 1/200 the time :) (Note, on my system .Lazy BWT3 takes 5.3s on the same input) Stefan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform
On Fri, Jun 22, 2007 at 09:26:40PM +0100, Andrew Coppin wrote: OK, so I *started* writing a request for help, but... well I answered my own question! See the bottom... ... module BWT3 (bwt) where import Data.List import qualified Data.ByteString as Raw rotate :: Int - Raw.ByteString - Int - Raw.ByteString rotate l xs n = (Raw.drop (l-n) xs) `Raw.append` (Raw.take (l-n) xs) bwt xs = let l = Raw.length xs ys = rotate l xs in Raw.pack $ map (Raw.last . rotate l xs) $ sortBy (\n m - compare (ys n) (ys m)) [0..(l-1)] Now I can transform 52 KB in 54 seconds + 30 MB RAM. Still nowhere near C++, but a big improvement none the less. The trouble is that Raw.append is an O(N) operation, making the computation O(N^2) where it ought to be O(NlogN). Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM! Vast speed increases... Jeepers, I can transform 52 KB so fast I can't even get to Task Manager fast enough to *check* the RAM usage! Blimey... OK, just tried the 145 KB test file that Mr C++ used. That took 2 seconds + 43 MB RAM. Ouch. In this case append is an O(1) operation. But you're still getting killed on prefactors, because you're generating a list of size N and then sorting it. Lists are just not nice data structures to sort, nor are they nice to have for large N. To get better speed and memory use, I think you'd want to avoid the intermediate list in favor of some sort of strict array, but that'd be ugly. -- David Roundy Department of Physics Oregon State University ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform
On Fri, 22 Jun 2007, Andrew Coppin wrote: Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM! Vast speed increases... Jeepers, I can transform 52 KB so fast I can't even get to Task Manager fast enough to *check* the RAM usage! Blimey... OK, just tried the 145 KB test file that Mr C++ used. That took 2 seconds + 43 MB RAM. Ouch. A note re RAM usage: it behaves differently in a GCed environment, you might want to see if it runs with a smaller max heap size. Obviously you'll spend more time GCing. -- [EMAIL PROTECTED] My religion says so explains your beliefs. But it doesn't explain why I should hold them as well, let alone be restricted by them. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe