Re: [Haskell-cafe] Performance help

2007-11-13 Thread Derek Elkins
On Tue, 2007-11-13 at 14:21 -0800, Ryan Ingram wrote:
> On 11/13/07, Ryan Ingram <[EMAIL PROTECTED]> wrote: 
> Also, what stops getRule from going off the end of the array?
> I didn't see anything that prevented that in the code, and
> you're using unsafeAt, which seems like a potential bug.
>  
> Never mind, I realized this is a ring buffer with `mod` s.   That's
> another slow operation when you're doing code as tight as this.  If
> you can guarantee the ring is a power of 2 in size you can use a mask
> instead, or use my original suggestion of deriving rules from the
> previous rule and the new bit; the initial state is determined by the
> last bits in the buffer and you never wrap.


rem is faster than mod and equivalent if everything is positive.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance help

2007-11-13 Thread Justin Bailey
I implement bit shifting to get the next rule, as you suggested, and
that cut my run time by 75%. It went from 200 seconds to do 100 rules
on 100 CAs to 50 seconds. Amazing.

Justin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance help

2007-11-13 Thread Justin Bailey
On Nov 13, 2007 2:49 PM, Stefan O'Rear <[EMAIL PROTECTED]> wrote:
> About how wide are your rules usually?

7 bits (3 neighbors on each side plus the current cell).

Justin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance help

2007-11-13 Thread Ryan Ingram
Sure, if the ring size is a power of two, and a is greater than or equal
to 0, then

a `mod` ringSize == a .&. (ringSize - 1)

that is:

a `mod` 8 == a .&. 7
a `mod` 256 == a .&. 255

etc.

On 11/13/07, Justin Bailey <[EMAIL PROTECTED]> wrote:
>
> On Nov 13, 2007 2:21 PM, Ryan Ingram <[EMAIL PROTECTED]> wrote:
> > Never mind, I realized this is a ring buffer with `mod` s.   That's
> another
> > slow operation when you're doing code as tight as this.  If you can
> > guarantee the ring is a power of 2 in size you can use a mask instead,
> or
> > use my original suggestion of deriving rules from the previous rule and
> the
> > new bit; the initial state is determined by the last bits in the buffer
> and
> > you never wrap.
>
> I can't guarantee the ring is a power of 2 but do you feel like
> explaining the mask suggestion anyways?
>
> Thanks for the bits suggestion - I'll see if that helps performance at
> all. It looks like you have to be very careful in which concrete type
> you choose or you'll get a lot of  conversion going on.
>
> Justin
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance help

2007-11-13 Thread Stefan O'Rear
On Tue, Nov 13, 2007 at 02:45:33PM -0800, Justin Bailey wrote:
> On Nov 13, 2007 2:21 PM, Ryan Ingram <[EMAIL PROTECTED]> wrote:
> > Never mind, I realized this is a ring buffer with `mod` s.   That's another
> > slow operation when you're doing code as tight as this.  If you can
> > guarantee the ring is a power of 2 in size you can use a mask instead, or
> > use my original suggestion of deriving rules from the previous rule and the
> > new bit; the initial state is determined by the last bits in the buffer and
> > you never wrap.
> 
> I can't guarantee the ring is a power of 2 but do you feel like
> explaining the mask suggestion anyways?
> 
> Thanks for the bits suggestion - I'll see if that helps performance at
> all. It looks like you have to be very careful in which concrete type
> you choose or you'll get a lot of  conversion going on.

About how wide are your rules usually?

Stefan


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance help

2007-11-13 Thread Justin Bailey
On Nov 13, 2007 2:21 PM, Ryan Ingram <[EMAIL PROTECTED]> wrote:
> Never mind, I realized this is a ring buffer with `mod` s.   That's another
> slow operation when you're doing code as tight as this.  If you can
> guarantee the ring is a power of 2 in size you can use a mask instead, or
> use my original suggestion of deriving rules from the previous rule and the
> new bit; the initial state is determined by the last bits in the buffer and
> you never wrap.

I can't guarantee the ring is a power of 2 but do you feel like
explaining the mask suggestion anyways?

Thanks for the bits suggestion - I'll see if that helps performance at
all. It looks like you have to be very careful in which concrete type
you choose or you'll get a lot of  conversion going on.

Justin
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance help

2007-11-13 Thread Ryan Ingram
On 11/13/07, Ryan Ingram <[EMAIL PROTECTED]> wrote:
>
> Also, what stops getRule from going off the end of the array?  I didn't
> see anything that prevented that in the code, and you're using unsafeAt,
> which seems like a potential bug.
>

Never mind, I realized this is a ring buffer with `mod` s.   That's another
slow operation when you're doing code as tight as this.  If you can
guarantee the ring is a power of 2 in size you can use a mask instead, or
use my original suggestion of deriving rules from the previous rule and the
new bit; the initial state is determined by the last bits in the buffer and
you never wrap.

  -- ryan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance help

2007-11-13 Thread Ryan Ingram
One observation is that you're doing a lot of redundant Bool -> Int
conversion.

As you iterate across the array in fillArray, the rule you are using for the
next cell is almost entirely determined by the rule you are using for the
current cell; lop off the top bit, shift left by one, and add the new bit.
Instead, you're recalculating the entire rule at that point.

Sadly, (at least as of GHC 6.6.1) the performance of the operations in
Data.Bits was horrendous, and any time I wanted to use them for
performance, I ended up going through the FFI and writing C code.  I haven't
had a chance to test this on GHC 6.8.

In this case, that might not be so bad, however.  You can probably write a
10-20 line C function that implements "fill" and call it via the FFI.

Alternatively, you could create a new rule table which maps a rule and a
bool into a new rule index.  This would only be twice the size of the rule
table, and you can index that way.

Also, what stops getRule from going off the end of the array?  I didn't see
anything that prevented that in the code, and you're using unsafeAt, which
seems like a potential bug.

  -- ryan


On 11/13/07, Justin Bailey <[EMAIL PROTECTED]> wrote:
>
> I've been working on a program over the last few days to evolve
> cellular automata rules using a genetic algorithm. Luckily, this email
> has nothing to do with CAs but everything to do with Haskell
> performance.
>
> For those who don't know, a CA is represented as a row of cells, where
> each can be either black (on/1) or white (off/0). A CA is "run" by
> generating a new row from the previous row according to some rule.
> Each cell is examined in turn and based on the state of it's neighbors
> and itself, a new cell in the next row is generated that is either
> black or white.
>
> The function below is my "step" function that generates this new row.
> It's at the heart of my program and where all  the execution time is
> spent. In this scenario it's executed around 800 million times. On my
> relatively modest desktop using GHC 6.8.1, the program takes about 30
> seconds to run. Here's the function, with some of the type
> declarations:
>
> data Rule = Rule { ruleWidth :: Int, rules :: UArray Int Bool }
> data Ring = Ring { currIdx :: !Int, vals :: (UArray Int Bool), lower,
> upper, size:: !Int }
> type CA = Ring
>
> currR :: Int -> Ring -> Bool
> currR amt r@(Ring curr arr l u s) = unsafeAt arr ((curr + amt) `mod` s)
>
> stepWithUArray :: Int -> Rule -> CA -> CA
> stepWithUArray rowLen rule@(Rule width rules) row =
> let upper = rowLen - 1
>  getRule currIdx = pattern' start 0
>where
>  start = currIdx - width
>  end = currIdx + width
>  pattern' cnt !result
>| cnt > end = result
>| otherwise = if (currR cnt row)
>then pattern' (cnt + 1) (result * 2 + 1)
>else pattern' (cnt + 1) (result * 2)
>  makeNewRow :: ST s (ST.STUArray s Int Bool)
>  makeNewRow =
>do
>  arr <- ST.newArray_ (0, upper)
>  let fill idx
>| idx > upper = return ()
>| otherwise = do
>unsafeWrite arr idx (unsafeAt rules (getRule idx))
>fill (idx + 1)
>  fill 0
>  return $! arr
> in fromUArray (ST.runSTUArray makeNewRow)
>
> fromUArray produces a new Ring (i.e. CA) from an array. 'makeNewRow'
> iterates over every cell in the current row using getRule to get the
> new value for each cell and returns the new row as an array. getRule
> essentially treats the neighbors of the current cell as bits, with the
> most significant to the left. An index into the rules array is
> constructed based on the values around the cell being examined (which
> wrap on the ends, thus the Ring construct). That index is used to get
> the value of the new cell from the rules array.
>
> Profiling shows that the following lines take up the most execution
> and allocation:
>
> makeNewRow = ... -- 20.5% execution,  26.7% allocation
> if (currR cnt row) ... -- 14.7% execution, 26.6% allocation,  in pattern'
> currR ... -- 14.7% execution, 0% allocation
>
> Any suggestions for improving this code? Thanks in advance!
>
> Justin
>
> p.s. The entire program is attached. Compile with ghc -O2
> -funbox-strict-fields -fbang-patterns --make GA-CA.hs. It runs 25
> rules on each of 25 initial CAs for 2 generations.
> p.p.s. On the bright side, this program has excellent memory
> performance. It uses a constant 2 - 7 MB depending on the initial
> parameters for the entire run. Beautiful!
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Performance help

2007-11-13 Thread Justin Bailey
I've been working on a program over the last few days to evolve
cellular automata rules using a genetic algorithm. Luckily, this email
has nothing to do with CAs but everything to do with Haskell
performance.

For those who don't know, a CA is represented as a row of cells, where
each can be either black (on/1) or white (off/0). A CA is "run" by
generating a new row from the previous row according to some rule.
Each cell is examined in turn and based on the state of it's neighbors
and itself, a new cell in the next row is generated that is either
black or white.

The function below is my "step" function that generates this new row.
It's at the heart of my program and where all  the execution time is
spent. In this scenario it's executed around 800 million times. On my
relatively modest desktop using GHC 6.8.1, the program takes about 30
seconds to run. Here's the function, with some of the type
declarations:

data Rule = Rule { ruleWidth :: Int, rules :: UArray Int Bool }
data Ring = Ring { currIdx :: !Int, vals :: (UArray Int Bool), lower,
upper, size:: !Int }
type CA = Ring

currR :: Int -> Ring -> Bool
currR amt r@(Ring curr arr l u s) = unsafeAt arr ((curr + amt) `mod` s)

stepWithUArray :: Int -> Rule -> CA -> CA
stepWithUArray rowLen rule@(Rule width rules) row =
  let upper = rowLen - 1
  getRule currIdx = pattern' start 0
where
  start = currIdx - width
  end = currIdx + width
  pattern' cnt !result
| cnt > end = result
| otherwise = if (currR cnt row)
then pattern' (cnt + 1) (result * 2 + 1)
else pattern' (cnt + 1) (result * 2)
  makeNewRow :: ST s (ST.STUArray s Int Bool)
  makeNewRow =
do
  arr <- ST.newArray_ (0, upper)
  let fill idx
| idx > upper = return ()
| otherwise = do
unsafeWrite arr idx (unsafeAt rules (getRule idx))
fill (idx + 1)
  fill 0
  return $! arr
  in fromUArray (ST.runSTUArray makeNewRow)

fromUArray produces a new Ring (i.e. CA) from an array. 'makeNewRow'
iterates over every cell in the current row using getRule to get the
new value for each cell and returns the new row as an array. getRule
essentially treats the neighbors of the current cell as bits, with the
most significant to the left. An index into the rules array is
constructed based on the values around the cell being examined (which
wrap on the ends, thus the Ring construct). That index is used to get
the value of the new cell from the rules array.

Profiling shows that the following lines take up the most execution
and allocation:

  makeNewRow = ... -- 20.5% execution,  26.7% allocation
  if (currR cnt row) ... -- 14.7% execution, 26.6% allocation,  in pattern'
  currR ... -- 14.7% execution, 0% allocation

Any suggestions for improving this code? Thanks in advance!

Justin

p.s. The entire program is attached. Compile with ghc -O2
-funbox-strict-fields -fbang-patterns --make GA-CA.hs. It runs 25
rules on each of 25 initial CAs for 2 generations.
p.p.s. On the bright side, this program has excellent memory
performance. It uses a constant 2 - 7 MB depending on the initial
parameters for the entire run. Beautiful!


ca.zip.safe
Description: Binary data
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance Help

2007-03-24 Thread Ian Lynagh
On Sat, Mar 24, 2007 at 01:46:33PM +, Dominic Steinitz wrote:
> 
> Thanks. I'm trying to build just SHA1 but I am getting the following linker 
> errors. Do you know what option I should be adding?
> 
> [EMAIL PROTECTED]:~/sha11> ghc -o perfTest 
> perfTest.hs -iIgloo/darcs-unstable/src --make -lz
> Linking perfTest ...
> Igloo/darcs-unstable/src/FastPackedString.o: In function `r4Lk_info':
> (.text+0x34c): undefined reference to `utf8_to_ints'

You need to link with fpstring.o (which in turn you get by compiling
fpstring.c).


Thanks
Ian

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance Help

2007-03-24 Thread Dominic Steinitz
On Friday 16 March 2007 21:24, Ian Lynagh wrote:
> On Sun, Mar 11, 2007 at 08:18:44PM +, Dominic Steinitz wrote:
> > I have re-written the sha1 code so that it is (hopefully) easy to see
> > that it faithfully implements the algorithm (see
> > http://www.itl.nist.gov/fipspubs/fip180-1.htm). Having got rid of the
> > space leak, I have been trying to improve performance.
> >
> > Currently, the haskell code is 2 orders of magnitude slower than the
> > sha1sum that ships with my linux.
>
> I don't know if this is useful to you, but darcs has some SHA1 code that
> IIRC is much closer to C's performance. It currently uses darcs' own
> FastPackedString library, but porting it to ByteString should be fairly
> easy.
>
> See SHA1.lhs in http://www.abridgegame.org/repos/darcs-unstable
>
> It might even be able to be made faster still by calling lower-level
> functions than {shift,rotate}{L,R} directly.
>
>
> Thanks
> Ian
Ian,

Thanks. I'm trying to build just SHA1 but I am getting the following linker 
errors. Do you know what option I should be adding?

Dominic.

[EMAIL PROTECTED]:~/sha11> ghc -o perfTest 
perfTest.hs -iIgloo/darcs-unstable/src --make -lz
Linking perfTest ...
Igloo/darcs-unstable/src/FastPackedString.o: In function `r4Lk_info':
(.text+0x34c): undefined reference to `utf8_to_ints'
Igloo/darcs-unstable/src/FastPackedString.o: In function `r4Lm_info':
(.text+0x370): undefined reference to `first_nonwhite'
Igloo/darcs-unstable/src/FastPackedString.o: In function `r4Lo_info':
(.text+0x430): undefined reference to `first_white'
Igloo/darcs-unstable/src/FastPackedString.o: In function `r4Lq_info':
(.text+0x4f0): undefined reference to `has_funky_char'
Igloo/darcs-unstable/src/FastPackedString.o: In function `r4LE_info':
(.text+0x848): undefined reference to `my_mmap'
Igloo/darcs-unstable/src/FastPackedString.o: In function `r4LM_info':
(.text+0x8dc): undefined reference to `conv_to_hex'
Igloo/darcs-unstable/src/FastPackedString.o: In function `r4LO_info':
(.text+0x904): undefined reference to `conv_from_hex'
collect2: ld returned 1 exit status

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance Help

2007-03-19 Thread Dominic Steinitz
On Friday 16 March 2007 21:29, David Brown wrote:
> Ian Lynagh wrote:
> > On Sun, Mar 11, 2007 at 08:18:44PM +, Dominic Steinitz wrote:
> >> I have re-written the sha1 code so that it is (hopefully) easy to see
>
> that it
>
> >> faithfully implements the algorithm (see
> >> http://www.itl.nist.gov/fipspubs/fip180-1.htm). Having got rid of the
>
> space
>
> >> leak, I have been trying to improve performance.
> >>
> >> Currently, the haskell code is 2 orders of magnitude slower than the
>
> sha1sum
>
> >> that ships with my linux.
> >
> > I don't know if this is useful to you, but darcs has some SHA1 code that
> > IIRC is much closer to C's performance. It currently uses darcs' own
> > FastPackedString library, but porting it to ByteString should be fairly
> > easy.
> >
> > See SHA1.lhs in http://www.abridgegame.org/repos/darcs-unstable
> >
> > It might even be able to be made faster still by calling lower-level
> > functions than {shift,rotate}{L,R} directly.
>
> I ended up deciding to call SHA1 routines out of openssl.  For
> applications where this is possible, it does very well, I got about
> 2.5 times the speed out of it, compared to ordinary C implementations.
>
> But, since harchive spends most of its CPU computing SHA1 hashes (and
> almost all of the rest in zlib), it is worth a complex binding there.
>
> Dave
Ian, Dave,

Thanks. My goal is to have natural haskell code that's reasonably efficient. I 
don't have a problem to solve that needs an efficient implementation of SHA1.

I'm going to try apfelmus' suggestions next and then (if I ever get yhc to 
build) start looking at what gets generated.

Dominic.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance Help

2007-03-16 Thread David Brown
Ian Lynagh wrote:
> On Sun, Mar 11, 2007 at 08:18:44PM +, Dominic Steinitz wrote:
>> I have re-written the sha1 code so that it is (hopefully) easy to see
that it
>> faithfully implements the algorithm (see
>> http://www.itl.nist.gov/fipspubs/fip180-1.htm). Having got rid of the
space
>> leak, I have been trying to improve performance.
>>
>> Currently, the haskell code is 2 orders of magnitude slower than the
sha1sum
>> that ships with my linux.
>
> I don't know if this is useful to you, but darcs has some SHA1 code that
> IIRC is much closer to C's performance. It currently uses darcs' own
> FastPackedString library, but porting it to ByteString should be fairly
> easy.
>
> See SHA1.lhs in http://www.abridgegame.org/repos/darcs-unstable
>
> It might even be able to be made faster still by calling lower-level
> functions than {shift,rotate}{L,R} directly.

I ended up deciding to call SHA1 routines out of openssl.  For
applications where this is possible, it does very well, I got about
2.5 times the speed out of it, compared to ordinary C implementations.

But, since harchive spends most of its CPU computing SHA1 hashes (and
almost all of the rest in zlib), it is worth a complex binding there.

Dave

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance Help

2007-03-16 Thread Ian Lynagh
On Sun, Mar 11, 2007 at 08:18:44PM +, Dominic Steinitz wrote:
> I have re-written the sha1 code so that it is (hopefully) easy to see that it 
> faithfully implements the algorithm (see 
> http://www.itl.nist.gov/fipspubs/fip180-1.htm). Having got rid of the space 
> leak, I have been trying to improve performance.
> 
> Currently, the haskell code is 2 orders of magnitude slower than the sha1sum 
> that ships with my linux.

I don't know if this is useful to you, but darcs has some SHA1 code that
IIRC is much closer to C's performance. It currently uses darcs' own
FastPackedString library, but porting it to ByteString should be fairly
easy.

See SHA1.lhs in http://www.abridgegame.org/repos/darcs-unstable

It might even be able to be made faster still by calling lower-level
functions than {shift,rotate}{L,R} directly.


Thanks
Ian

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance Help

2007-03-12 Thread Neil Mitchell

Hi


I notice that there's not much user-accessible documentation of what you
can expect GHC (or some other Haskell implementation) to do and not do
with a given piece of code.


Yhc/nhc/Hugs - nothing

GHC - inlining, simplification, fusion if you use the build in
functions in a specified way, strictness analysis, maybe some
unboxing, let movement, some other tricks

JHC - everything

The problem is that things like strictness are a static analysis which
is undecidable in general. Static analysis is probably too complex for
the user to figure out what is going on, its much better to look at
the Core generated and see what actually happened.

Thanks

Neil
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance Help

2007-03-11 Thread Bryan O'Sullivan

Dominic Steinitz wrote:


Any help would be appreciated.


I notice that there's not much user-accessible documentation of what you 
can expect GHC (or some other Haskell implementation) to do and not do 
with a given piece of code.  For example, you have a lot of little 
definitions that clearly traverse the same lists many times.  I've no 
idea where I would look, except for the compiler source, to get a sense 
for when, if ever, the compiler might apply CSE, fusion, or any other 
techniques that come to mind.  So transmitting folk wisdom on what "the 
compiler" might do with any given piece of code counts as another half 
chapter in the "Practical Haskell" book that ought to get written :-)


http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance Help

2007-03-11 Thread Dominic Steinitz
On Sunday 11 March 2007 20:46, Stefan O'Rear wrote:
> On Sun, Mar 11, 2007 at 08:18:44PM +, Dominic Steinitz wrote:
> > >   Word128 ai bi ci di ei = ss
>
> 128 is not divisible by 5.  You should probably rename that type :)
>
> Stefan
I must have been thinking of MD5. Yes Word160 would be better. Dominic.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Performance Help

2007-03-11 Thread Stefan O'Rear
On Sun, Mar 11, 2007 at 08:18:44PM +, Dominic Steinitz wrote:
> >   Word128 ai bi ci di ei = ss 

128 is not divisible by 5.  You should probably rename that type :)

Stefan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Performance Help

2007-03-11 Thread Dominic Steinitz
I have re-written the sha1 code so that it is (hopefully) easy to see that it 
faithfully implements the algorithm (see 
http://www.itl.nist.gov/fipspubs/fip180-1.htm). Having got rid of the space 
leak, I have been trying to improve performance.

Currently, the haskell code is 2 orders of magnitude slower than the sha1sum 
that ships with my linux.

> [EMAIL PROTECTED]:~/sha1/testdist/sha1> time ./perfTest perfTest
> c7eae62ddabb653bb9ce4eb18fa8b94264f92a76
> Success
>
> real0m2.152s
> user0m2.112s
> sys 0m0.028s
> [EMAIL PROTECTED]:~/sha1/testdist/sha1> time sha1sum perfTest
> c7eae62ddabb653bb9ce4eb18fa8b94264f92a76  perfTest
>
> real0m0.057s
> user0m0.008s
> sys 0m0.004s

I've played around with profiling and doubled the performance of the haskell 
code but I'm nowhere near the C performance.

> Sun Mar 11 19:32 2007 Time and Allocation Profiling Report  (Final)
>
>perfTest +RTS -p -RTS eg
>
> total time  =6.75 secs   (135 ticks @ 50 ms)
> total alloc = 1,483,413,752 bytes  (excludes profiling overheads)
>
> COST CENTREMODULE   %time %alloc
>
> oneBlock   Data.Digest.SHA1  39.3   40.1
> $& Data.Digest.SHA1  20.7   21.6
> f  Data.Digest.SHA1  13.36.2
> getWord32s Data.Digest.SHA1   7.46.6
> test2  Main   5.98.7
> blockWord8sIn32Data.Digest.SHA1   5.25.3
> blockWord8sIn512   Data.Digest.SHA1   3.04.4
> padData.Digest.SHA1   1.53.5
> k  Data.Digest.SHA1   1.50.0
> fromBytes  Data.Digest.SHA1   0.03.5

Here's the code that is taking the majority of the time.

> ($&) :: [Word32] -> [Word32] -> [Word32]
> a $& b = zipWith (+) a b
> 
> -- Word128 -> Word512 -> Word128
> oneBlock ss xs = Word128 (as!!80) (bs!!80) (cs!!80) (ds!!80) (es!!80)
>where
>   ws = xs ++ map (rotL 1) (zipWith4 xxxor wm3s wm8s wm14s ws)
>  where
> xxxor a b c d = a `xor` b `xor` c `xor` d
> wm3s  = drop (16-3)  ws
> wm8s  = drop (16-8)  ws
> wm14s = drop (16-14) ws
>   as = ai:ts
>   bs = bi:as
>   cs = ci:(map (rotL 30) bs)
>   ds = di:cs
>   es = ei:ds
>   ts = (map (rotL 5) as) $& (zipWith4 f [0..] bs cs ds) $& es $& (map k
> [0..]) $& ws 
>   Word128 ai bi ci di ei = ss 

Any help would be appreciated.

I've put a copy of a working system here if anyone wants to experiment 
(http://www.haskell.org/crypto/downloads/sha1.tar.gz).

Thanks, Dominic.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe