RE: Working character by character in Haskell

2001-10-19 Thread Simon Marlow

 Humn... I agree with both of you, Albert and Tom. I started 
 it from the
 beginning, using map and don't using reverse anymore. But the 
 C program is
 still 7x faster than the Haskell one. Here is the code of the Haskell
 program:
 
 main :: IO ()
 main = do
  bmFile - openFileEx in.txt (BinaryMode ReadMode)
  bmString - hGetContents bmFile
  writeFile out.txt (map inc bmString)
  hClose bmFile
 
 inc :: Char - Char
 inc a = toEnum ((fromEnum a) + 1)

Well, in Haskell each character of the string takes 20 bytes: 12 bytes
for the list cell, and 8 bytes for the character itself - the memory
used by the character will be recovered at GC time though, as long as
the character is  chr 256.  The map operation also allocates a further
28 bytes per character: list cell + thunk(8) + character, assuming the
inc operation is suitably optimised not to do any extra allocation.
That's a total of 48 bytes per character.

The C code, by comparison, doesn't do any dynamic allocation at all.

To really match the C program, you need to use IOExts.hGetBuf and
IOExts.hPutBuf, and do the operations on raw characters in memory.
Using a UArray of Word8 would be better, but there aren't any operations
to do IO to/from a UArray yet (actually I've written these, but they
aren't in the tree yet).

You should find that the IO library in GHC 5.02 is slightly faster than
the one in 5.00.2.

Anyway, I hope all this helps to explain why the Haskell version is so
slow.

Cheers,
Simon

___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



Re: Working character by character in Haskell

2001-10-19 Thread Ketil Malde

Simon Marlow [EMAIL PROTECTED] writes:

 Well, in Haskell each character of the string takes 20 bytes: 12 bytes
 for the list cell, and 8 bytes for the character itself 

Why does a list cell consume as much as 12 bytes?  Two pointers (data
and next) and a 32-bit tag field, perhaps?  And a 64-bit minimum
allocation quatnity, accounting for the 8-byte character?

Isn't it possible to optimize this, e.g. by embedding small data
directly in the cons cell?  21 bits for a Unicode character should
leave enough bits for tagging, shouldn't it?

(Since I'm halfway planning to use Haskell next Spring to process long
lists of data with a small set of values (for instance:

data Base = A | C | G | T

) I'm curious about the performance.)

-kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants

___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



RE: Working character by character in Haskell

2001-10-19 Thread Jan-Willem Maessen

Simon Marlow [EMAIL PROTECTED] writes:
 To really match the C program, you need to use IOExts.hGetBuf and
 IOExts.hPutBuf, and do the operations on raw characters in memory.
 Using a UArray of Word8 would be better, but there aren't any
 operations to do IO to/from a UArray yet (actually I've written
 these, but they aren't in the tree yet).

So why don't getContents / putStr / etc. deforest cleanly to calls to
hGetBuf and hPutBuf?  I'm genuinely curious; my own experience in this
direction is The engineering is challenging.  These functions are so
commonly used, though---and they're vastly easier to use, actually
portable, etc.  The effort would surely be repaid.

Plus, I'm curious to hear about any challenges which may be involved
in pulling this off. :-)  If it's genuinely tricky, the tricks are
likely to be generally useful for other stream-ish things.

-Jan-Willem Maessen
Eager Haskell Project
[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



Working character by character in Haskell

2001-10-18 Thread Andre W B Furtado

I'm trying to create a fast program in Haskell that reads the content of a
file, do some stuff with its characters (one by one) an then save the final
result (the modified characters) in another file. The fastest program I've
developed so far is:

main :: IO ()
main = do
 bmFile - openFileEx in.txt (BinaryMode ReadMode)
 bmString - hGetContents bmFile
 str - copyFile bmString []
 writeFile out.txt str
 hClose bmFile

copyFile :: String - String - IO String
copyFile [] s = return (reverse s)
copyFile (a:as) s = copyFile as ( (doSomeStuffWith a):s)

What I can't understand is why a similar program, written in C, is much
faster than the Haskell program. Here you have the C program:

void main()
{
 FILE *in, *out;
 char ch;

 in = fopen(in.txt,rb);
 out = fopen(out.txt,wb);

 while ( (ch=getc(in)) != EOF)
  putc(doSomeStuffWith(ch),out);
 fclose(out);
 fclose(in);
}

For example, suppose function doSomeStuffWith returns its own parameter.
Using a 1.5MB file in this case, the Haskell program ends in almost 5
seconds, while the C program ends in less than 0.5 seconds... Is my Haskell
program too bad implemented? (I'm using GHC 4.08-1 under a Windows 98
platform.)

Thanks,
-- Andre


___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



Re: Working character by character in Haskell

2001-10-18 Thread Albert Lai

Andre W B Furtado [EMAIL PROTECTED] writes:

[...]

 For example, suppose function doSomeStuffWith returns its own parameter.
 Using a 1.5MB file in this case, the Haskell program ends in almost 5
 seconds, while the C program ends in less than 0.5 seconds... Is my Haskell
 program too bad implemented? (I'm using GHC 4.08-1 under a Windows 98
 platform.)

I indeed think that your Haskell program is inefficient and is not a
translation of your C program.  It reverses a huge string, which takes
not only execution time but also garbage collection time and memory,
which entails even more time for OS overhead and swapping out other
things in the physical memory.

(Many programmers complain my linear-time algorithm takes just 1
second on a 100M list, so why is it taking 5 seconds on a 200M list?
They forget that because of issues such as cache locality and virtual
memory, their computers do not scale.)

I am wondering that if doSomeStuffWith is pure-functional, why are you
writing and using copyFile instead of just using map?  I mean:

main :: IO ()
main = do
 bmFile - openFileEx in.txt (BinaryMode ReadMode)
 bmString - hGetContents bmFile
 writeFile out.txt (map doSomestuffWith bmString)
 hClose bmFile

Because both hGetContents and map are lazy and use O(1) memory, this
program behaves exactly as your C program plus occasional garbage
collections that don't hurt too much.  In partcular, the OS will see
no difference (although the cache does).

___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



Re: Working character by character in Haskell

2001-10-18 Thread Andre W B Furtado

Humn... I agree with both of you, Albert and Tom. I started it from the
beginning, using map and don't using reverse anymore. But the C program is
still 7x faster than the Haskell one. Here is the code of the Haskell
program:

main :: IO ()
main = do
 bmFile - openFileEx in.txt (BinaryMode ReadMode)
 bmString - hGetContents bmFile
 writeFile out.txt (map inc bmString)
 hClose bmFile

inc :: Char - Char
inc a = toEnum ((fromEnum a) + 1)

And now the C program:

#include stdio.h

void main()
{
 FILE *in, *out;
 char ch;

 in = fopen(in.txt,rb);
 out = fopen(out.txt,wb);

 while ( (ch=getc(in)) != EOF)
  putc(ch + 1,out);
 fclose(out);
 fclose(in);
}

As you noticed, the main idea here is to increment the character by one.
Is the Haskell program now a nice translation of the C program? If it is (I
think so...) why is the C program so much faster than it?

Thanks again,
-- Andre

- Original Message -
From: Albert Lai [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, October 18, 2001 9:42 PM
Subject: Re: Working character by character in Haskell


 Andre W B Furtado [EMAIL PROTECTED] writes:

 [...]

  For example, suppose function doSomeStuffWith returns its own parameter.
  Using a 1.5MB file in this case, the Haskell program ends in almost 5
  seconds, while the C program ends in less than 0.5 seconds... Is my
Haskell
  program too bad implemented? (I'm using GHC 4.08-1 under a Windows 98
  platform.)

 I indeed think that your Haskell program is inefficient and is not a
 translation of your C program.  It reverses a huge string, which takes
 not only execution time but also garbage collection time and memory,
 which entails even more time for OS overhead and swapping out other
 things in the physical memory.

 (Many programmers complain my linear-time algorithm takes just 1
 second on a 100M list, so why is it taking 5 seconds on a 200M list?
 They forget that because of issues such as cache locality and virtual
 memory, their computers do not scale.)

 I am wondering that if doSomeStuffWith is pure-functional, why are you
 writing and using copyFile instead of just using map?  I mean:

 main :: IO ()
 main = do
  bmFile - openFileEx in.txt (BinaryMode ReadMode)
  bmString - hGetContents bmFile
  writeFile out.txt (map doSomestuffWith bmString)
  hClose bmFile

 Because both hGetContents and map are lazy and use O(1) memory, this
 program behaves exactly as your C program plus occasional garbage
 collections that don't hurt too much.  In partcular, the OS will see
 no difference (although the cache does).

 ___
 Glasgow-haskell-users mailing list
 [EMAIL PROTECTED]
 http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users