Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform

2007-06-23 Thread Andrew Coppin

Stefan O'Rear wrote:

Mr. C++ apparently isn't a very good C++ programmer, since his best
effort absolutely *pales* in comparison to Julian Seward's BWT:

[EMAIL PROTECTED]:/usr/local/src/hpaste$ head -c 135000 /usr/share/dict/words | 
(time bzip2 -vvv)  /dev/null
  (stdin): 
block 1: crc = 0x25a18961, combined CRC = 0x25a18961, size = 135000

  0 work, 135000 block, ratio  0.00
  135000 in block, 107256 after MTF  1-2 coding, 61+2 syms in use
  initial group 6, [0 .. 0], has 20930 syms (19.5%)
  initial group 5, [1 .. 1], has 4949 syms ( 4.6%)
  initial group 4, [2 .. 2], has 20579 syms (19.2%)
  initial group 3, [3 .. 4], has 17301 syms (16.1%)
  initial group 2, [5 .. 10], has 24247 syms (22.6%)
  initial group 1, [11 .. 62], has 19250 syms (17.9%)
  pass 1: size is 127140, grp uses are 339 550 192 440 12 613 
  pass 2: size is 51693, grp uses are 321 440 288 316 139 642 
  pass 3: size is 51358, grp uses are 329 387 376 304 122 628 
  pass 4: size is 51302, grp uses are 298 421 397 304 125 601 
  bytes: mapping 21, selectors 433, code lengths 110, codes 51297

final combined CRC = 0x25a18961
2.602:1,  3.075 bits/byte, 61.57% saved, 135000 in, 51887 out.

real0m0.165s
user0m0.044s
sys 0m0.012s


Yup, does slightly more work (huffman coding) in 1/200 the time :)

(Note, on my system .Lazy BWT3 takes 5.3s on the same input)
  


...OK...so how do I make Haskell go faster still?

Presumably by transforming the code into an ugly mess that nobody can 
read any more...?


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform

2007-06-23 Thread Donald Bruce Stewart
 ...OK...so how do I make Haskell go faster still?
 
 Presumably by transforming the code into an ugly mess that nobody can 
 read any more...?
 

http://haskell.org/haskellwiki/Performance

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform

2007-06-23 Thread Andrew Coppin

Bulat Ziganshin wrote:

Hello Andrew,

bwt transformation is very good researched area, so probably you will
not get decent performance (megabytes per second) without lot of work.
  


Hey, I'm just glad I managed to get within striking distance of Mr C++. 
So much for Haskell being inherently less performant. :-P



and of course, no Haskell at all. take look at
http://darchiver.narod.ru/dark/Archon3fs.zip
  


Mmm, ok...

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform

2007-06-23 Thread Bulat Ziganshin
Hello Andrew,

Saturday, June 23, 2007, 2:45:01 PM, you wrote:

 Hey, I'm just glad I managed to get within striking distance of Mr C++.
 So much for Haskell being inherently less performant. :-P

my little analysis says that it's probably due to different sort()
implementations, so this says nothing about general case

-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform

2007-06-23 Thread Andrew Coppin

Bulat Ziganshin wrote:

Hello Andrew,

Saturday, June 23, 2007, 2:45:01 PM, you wrote:

  

Hey, I'm just glad I managed to get within striking distance of Mr C++.
So much for Haskell being inherently less performant. :-P



my little analysis says that it's probably due to different sort()
implementations, so this says nothing about general case
  


The point being that lots of people look at Haskell and go oh, that's 
very cute for writing trivial example code, but it can never be fast; 
for that you must use C or C++.


Well, I altered the code, and it's *still* very short and very readable, 
and it's just as fast as the (3 pages long) C++ version. :-D


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform

2007-06-23 Thread Jon Harrop
On Saturday 23 June 2007 13:02:54 Andrew Coppin wrote:
 Well, I altered the code, and it's *still* very short and very readable,
 and it's just as fast as the (3 pages long) C++ version. :-D

Indeed. The performance of modern functional programming languages never 
ceases to amaze me. INRIA typically claim performance within 2x of C for 
OCaml but there are very few programs where OCaml isn't within 50% now, at 
least on AMDs.

Most of the pedagogical examples of functional programming (e.g. n-queens) are 
too academic for general consumption but there are plenty of excellent 
examples. Burrows-Wheeler is a great one. Ray tracing is another. Sudoku 
solving is fairy good, albeit unusually well suited to arrays.

I also like to compare the performance of term-level interpreters or rewriters 
implemented in different languages. I think it would be interesting to 
compare Scheme/Mathematica interpreters written in OCaml, SML (MLton) and 
Haskell, for example. I've benchmarked some interpreters in OCaml and SML 
(MLton) and MLton's whole-program optimizations are a huge benefit here, 
making it several times faster than OCaml. I'd like to know how well Haskell 
would do.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
The OCaml Journal
http://www.ffconsultancy.com/products/ocaml_journal/?e
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re[2]: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform

2007-06-23 Thread Bulat Ziganshin
Hello Andrew,

Saturday, June 23, 2007, 4:02:54 PM, you wrote:
 The point being that lots of people look at Haskell and go oh, that's
 very cute for writing trivial example code, but it can never be fast; 
 for that you must use C or C++.

and that's true :)  as i said, your C++ code is very far from perfect.
by comparing with bad code you can find that even Logo is faster than
asm


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Need for speed: the Burrows-Wheeler Transform

2007-06-22 Thread Andrew Coppin
OK, so I *started* writing a request for help, but... well I answered my 
own question! See the bottom...





I was reading Wikipedia, and I found this:

 http://en.wikipedia.org/wiki/Burrows-Wheeler_transform

I decided to sit down and see what that looks like in Haskell. I came up 
with this:




module BWT where

import Data.List

rotate1 :: [x] - [x]
rotate1 [] = []
rotate1 xs = last xs : init xs

rotate :: [x] - [[x]]
rotate xs = take (length xs) (iterate rotate1 xs)

bwt :: (Ord x) = [x] - [x]
bwt = map last . sort . rotate


step :: (Ord x) = [x] - [[x]] - [[x]]
step xs = zipWith (:) xs . sort

inv_bwt :: (Ord x) = x - [x] - [x]
inv_bwt mark xs =
 head . filter (\xs - head xs == mark) .
 head . drop ((length xs) - 1) . iterate (step xs) .
 map (\x - [x]) $ xs



My my, isn't that SO much shorter? I love Haskell! :-D

Unfortunately, the resident C++ expert fails to grasp the concept of 
example code, and insists on comparing the efficiency of this program 
to the C one on the website.


Fact is, he's translated the presented C into C++, and it can apparently 
transform a 145 KB file in 8 seconds using only 3 MB of RAM. The code 
above, however, took about 11 seconds to transform 4 KB of text, and 
that required about 60 MB of RAM. (I tried larger, but the OS killed the 
process for comsuming too much RAM.)


Well anyway, the code was written for simplicity, not efficiency. I've 
tried to explain this, but apparently that is beyond his comprehension. 
So anyway, it looks like we have a race on. :-D


The first thing I did was the optimisation mentioned on Wikipedia: you 
don't *need* to build a list of lists. You can just throw pointers 
around. So I arrived at this:




module BWT2 (bwt) where

import Data.List

rotate :: Int - [x] - Int - [x]
rotate l xs n = (drop (l-n) xs) ++ (take (l-n) xs)

bwt xs =
 let l  = length xs
 ys = rotate l xs
 in  map (last . rotate l xs) $
 sortBy (\n m - compare (ys n) (ys m)) [0..(l-1)]


This is indeed *much* faster. With this, I can transform 52 KB of text 
in 9 minutes + 60 MB RAM. The previous version seemed to have quadratic 
memory usage, whereas this one seems to be linear. 52 KB would have 
taken many months with the first version!


Still, 9 minutes (for a file 3 times smaller) is nowhere near 8 seconds. 
So we must try harder... For my next trick, ByteStrings! (Never used 
them before BTW... this is my first try!)



module BWT3 (bwt) where

import Data.List
import qualified Data.ByteString as Raw

rotate :: Int - Raw.ByteString - Int - Raw.ByteString
rotate l xs n = (Raw.drop (l-n) xs) `Raw.append` (Raw.take (l-n) xs)

bwt xs =
 let l  = Raw.length xs
 ys = rotate l xs
 in  Raw.pack $
 map (Raw.last . rotate l xs) $
 sortBy (\n m - compare (ys n) (ys m)) [0..(l-1)]


Now I can transform 52 KB in 54 seconds + 30 MB RAM. Still nowhere near 
C++, but a big improvement none the less.


Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM! 
Vast speed increases... Jeepers, I can transform 52 KB so fast I can't 
even get to Task Manager fast enough to *check* the RAM usage! Blimey...


OK, just tried the 145 KB test file that Mr C++ used. That took 2 
seconds + 43 MB RAM. Ouch.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform

2007-06-22 Thread Stefan O'Rear
On Fri, Jun 22, 2007 at 09:26:40PM +0100, Andrew Coppin wrote:
 OK, so I *started* writing a request for help, but... well I answered my 
 own question! See the bottom...
...
 Unfortunately, the resident C++ expert fails to grasp the concept of 
 example code, and insists on comparing the efficiency of this program 
 to the C one on the website.
...
 Fact is, he's translated the presented C into C++, and it can apparently 
 transform a 145 KB file in 8 seconds using only 3 MB of RAM. The code 
...
 Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM! 
 Vast speed increases... Jeepers, I can transform 52 KB so fast I can't 
 even get to Task Manager fast enough to *check* the RAM usage! Blimey...
 
 OK, just tried the 145 KB test file that Mr C++ used. That took 2 
 seconds + 43 MB RAM. Ouch.

Mr. C++ apparently isn't a very good C++ programmer, since his best
effort absolutely *pales* in comparison to Julian Seward's BWT:

[EMAIL PROTECTED]:/usr/local/src/hpaste$ head -c 135000 /usr/share/dict/words | 
(time bzip2 -vvv)  /dev/null
  (stdin): 
block 1: crc = 0x25a18961, combined CRC = 0x25a18961, size = 135000
  0 work, 135000 block, ratio  0.00
  135000 in block, 107256 after MTF  1-2 coding, 61+2 syms in use
  initial group 6, [0 .. 0], has 20930 syms (19.5%)
  initial group 5, [1 .. 1], has 4949 syms ( 4.6%)
  initial group 4, [2 .. 2], has 20579 syms (19.2%)
  initial group 3, [3 .. 4], has 17301 syms (16.1%)
  initial group 2, [5 .. 10], has 24247 syms (22.6%)
  initial group 1, [11 .. 62], has 19250 syms (17.9%)
  pass 1: size is 127140, grp uses are 339 550 192 440 12 613 
  pass 2: size is 51693, grp uses are 321 440 288 316 139 642 
  pass 3: size is 51358, grp uses are 329 387 376 304 122 628 
  pass 4: size is 51302, grp uses are 298 421 397 304 125 601 
  bytes: mapping 21, selectors 433, code lengths 110, codes 51297
final combined CRC = 0x25a18961
2.602:1,  3.075 bits/byte, 61.57% saved, 135000 in, 51887 out.

real0m0.165s
user0m0.044s
sys 0m0.012s


Yup, does slightly more work (huffman coding) in 1/200 the time :)

(Note, on my system .Lazy BWT3 takes 5.3s on the same input)

Stefan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform

2007-06-22 Thread David Roundy
On Fri, Jun 22, 2007 at 09:26:40PM +0100, Andrew Coppin wrote:
 OK, so I *started* writing a request for help, but... well I answered my 
 own question! See the bottom...

...

 module BWT3 (bwt) where
 
 import Data.List
 import qualified Data.ByteString as Raw
 
 rotate :: Int - Raw.ByteString - Int - Raw.ByteString
 rotate l xs n = (Raw.drop (l-n) xs) `Raw.append` (Raw.take (l-n) xs)
 
 bwt xs =
  let l  = Raw.length xs
  ys = rotate l xs
  in  Raw.pack $
  map (Raw.last . rotate l xs) $
  sortBy (\n m - compare (ys n) (ys m)) [0..(l-1)]
 
 
 Now I can transform 52 KB in 54 seconds + 30 MB RAM. Still nowhere near 
 C++, but a big improvement none the less.

The trouble is that Raw.append is an O(N) operation, making the computation
O(N^2) where it ought to be O(NlogN).

 Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM! 
 Vast speed increases... Jeepers, I can transform 52 KB so fast I can't 
 even get to Task Manager fast enough to *check* the RAM usage! Blimey...
 
 OK, just tried the 145 KB test file that Mr C++ used. That took 2 
 seconds + 43 MB RAM. Ouch.

In this case append is an O(1) operation.  But you're still getting killed
on prefactors, because you're generating a list of size N and then sorting
it.  Lists are just not nice data structures to sort, nor are they nice to
have for large N.

To get better speed and memory use, I think you'd want to avoid the
intermediate list in favor of some sort of strict array, but that'd be
ugly.
-- 
David Roundy
Department of Physics
Oregon State University
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Need for speed: the Burrows-Wheeler Transform

2007-06-22 Thread Philippa Cowderoy
On Fri, 22 Jun 2007, Andrew Coppin wrote:

 Woah... What the hell? I just switched to Data.ByteString.Lazy and WHAM! Vast
 speed increases... Jeepers, I can transform 52 KB so fast I can't even get to
 Task Manager fast enough to *check* the RAM usage! Blimey...
 
 OK, just tried the 145 KB test file that Mr C++ used. That took 2 seconds + 43
 MB RAM. Ouch.
 

A note re RAM usage: it behaves differently in a GCed environment, you 
might want to see if it runs with a smaller max heap size. Obviously 
you'll spend more time GCing.

-- 
[EMAIL PROTECTED]

My religion says so explains your beliefs. But it doesn't explain
why I should hold them as well, let alone be restricted by them.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe