Re: pinned byte arrays

2006-10-19 Thread Duncan Coutts
Pinning the arrays gives the GC much less flexibility. Especially if
your objects are small. It means that the GC can't move things to
compact the heap and you'll end up with lots of holes in between heap
objects of things that were collected but the space couldn't be re-used
because other objects didn't happen to fit in it.

You end up with a very fragmented heap. You're essentially completely
preventing the compacting GC from doing any compacting. Why can't it
compact? Because you've told it that it isn't allowed to move anything!

Honestly, I don't see why you think pinning all these little arrays
should make things any better. It just means they can't be moved around
it doesn't mean that magically they never have to be considered or freed
by the GC.

You only want to pin if you must do for some external reason or if the
object is very big and therefore takes a significant amount of time to
move.

Duncan

On Wed, 2006-10-18 at 19:50 +0400, Bulat Ziganshin wrote:
> Hello glasgow-haskell-users,
> 
> i have program that holds in memory a lot of strings (filenames) and
> use my own packed-string library to represent these string. this
> library uses newByteArray# to allocate strings. in my benchmark run
> (with 300.000 filenames in memory) program prints the following
> statistics:
> 
>  37,502,956 bytes maximum residency (16 sample(s))
> 
>2928 collections in generation 0 (  6.90s)
>  16 collections in generation 1 ( 12.20s)
> 
>  65 Mb total memory in use
> 
> most of this memory occupied by filenames. note that 28 mb of memory
> is used just for GC procedure (i use compacting GC)
> 
> then, i thought that using pinned byte arrays should significantly
> improve memory usage because these arrays can't be moved around and
> therefore, i thought, these arrays will not be involved in GC. amount
> of data which should be compacted by GC will decrease and amount of
> memory wasted by GC should also decrease
> 
> so i replaced in my packed str lib newByteArray# with
> newPinnedByteArray# and seen the following:
> 
>  53,629,344 bytes maximum residency (14 sample(s))
> 
>2904 collections in generation 0 (  6.97s)
>  14 collections in generation 1 ( 13.16s)
> 
> 100 Mb total memory in use
> 
> why? why memory usage was increased? why another 47 megs is just
> wasted out? how GC works with pinned byte arrays? and why GC times was
> almost the same - i thought that GC for pinned data should be very
> different than GC for unpinned byte arrays
> 

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Re[2]: Benchmarking GHC

2006-10-19 Thread Duncan Coutts
On Thu, 2006-10-19 at 21:10 +0400, Bulat Ziganshin wrote:

> btw, writing this message i thought that
> -fconvert-strings-to-ByteStrings option will give a significant boost
> to many programs without rewriting them :)

This kind of data refinement has a side condition on the strictness of
the function. You need to know that your list function is strict in the
spine and elements before it can be swapped to a version that operates
on packed strings. If it's merely strict in the spine then one could
switch to lazy arrays. There's also the possibility to use lists that
are strict in the elements.

It'd be an interesting topic to research, to see if this strictness
analysis and transformation could be done automatically, and indeed if
it is properly meaning preserving.

Duncan

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: Benchmarking GHC

2006-10-19 Thread Bulat Ziganshin
Hello Simon,

Thursday, October 19, 2006, 6:40:54 PM, you wrote:

> These days -O2, which invokes the SpecConstr pass, can have a big
> effect, but only on some programs.

it also enables -optc-O2. so, answering Neil's question:

-O2 -funbox-strict-fields

(sidenote to SPJ: -funbox-simple-strict-fields may be a good way to
_safe_ optimization)

RTS -A10m option may be helpful (even with 6.6), so you may allow to
run program two times - with and without this option - and select the
best run

btw, writing this message i thought that
-fconvert-strings-to-ByteStrings option will give a significant boost
to many programs without rewriting them :)


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


recursive commentaries :)

2006-10-19 Thread Bulat Ziganshin
Hello glasgow-haskell-users,

http://hackage.haskell.org/trac/ghc/wiki/Commentary
contains a link 'The old GHC Commentary' which points to the
http://www.cse.unsw.edu.au/~chak/haskell/ghc/comm/ page.
guess what this page contains? :)

it will be great to restore link to old commentaries

-- 
Best regards,
 Bulat  mailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: Benchmarking GHC

2006-10-19 Thread Simon Peyton-Jones
These days -O2, which invokes the SpecConstr pass, can have a big
effect, but only on some programs.

Simon


| -Original Message-
| From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
| On Behalf Of Neil Mitchell
| Sent: 19 October 2006 11:22
| To: Donald Bruce Stewart
| Cc: GHC Users Mailing List
| Subject: Re: Benchmarking GHC
| 
| Hi
| 
| > > One thing that IME makes a difference is -funbox-strict-fields.
It's
| > > probably better to use pragmas for this, though.  Another thing to
| > > consider is garbage collection RTS flags, those can sometimes make
a
| > > big difference.
| >
| 
| I _don't_ want to speed up a particular program by modifying it, I
| want to take a set of existing programs which are treated as black
| boxes, and compile them all with the same flags. I don't want to
| experiment to see which flags give the best particular result on a per
| program basis, or even for the benchmark as a whole, I just want to
| know what the "standard recommendation" is for people who want fast
| code but not to understand anything.
| 
| > All this and more on the under-publicised Performance wiki,
| > http://haskell.org/haskellwiki/Performance
| 
| It's a very good resource, and I've read it before :)
| 
| Another way to treat my question is, the wiki says "Of course, if a
| GHC compiled program runs slower than the same program compiled with
| another Haskell compiler, then it's definitely a bug" - in this
| sentance what does the command line look like in the GHC compiled
| case?
| 
| Thanks
| 
| Neil
| ___
| Glasgow-haskell-users mailing list
| Glasgow-haskell-users@haskell.org
| http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[2]: Benchmarking GHC

2006-10-19 Thread Bulat Ziganshin
Hello Ketil,

Thursday, October 19, 2006, 11:05:48 AM, you wrote:

> One thing that IME makes a difference is -funbox-strict-fields.  It's
> probably better to use pragmas for this, though.  Another thing to
> consider is garbage collection RTS flags, those can sometimes make a
> big difference.

yes, it's better to unbox individual fields. i had a program where
this flag leads to significant memory usage increase. smth like this:

data T1 = T1 ... -- many fields

data T2 = T2 !T1 !T1 !T1

make t1 = T2 t1 t1 t1



-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Benchmarking GHC

2006-10-19 Thread Bulat Ziganshin
Hello Neil,

Wednesday, October 18, 2006, 10:49:37 PM, you wrote:

> * At the moment, -O2 is unlikely to produce better code than -O.

ghc manual full of text that was written 10 years or more ago :)

> * When we want to go for broke, we tend to use -O2 -fvia-C

>>From this I guess the answer is "-O2 -fvia-C"? I just wanted to check

just "-O2" does the same


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: [Haskell] Expecting more inlining for bit shifting

2006-10-19 Thread John Meacham
On Wed, Oct 18, 2006 at 07:00:18AM -0400, [EMAIL PROTECTED] wrote:
> I'm not sure this approach is best.  In my case the ... needs to be the 
> entire body of the shift code.  It would be ridiculous to have two copies 
> of the same code.  What would be better is a hint pragma that says, 
> ``inline me if the following set of parameters are literals''.
> 

not at all:

> {-# RULES "shift/const-inline"  
>forall x y# . shift x y# = inline shift x y# #-}

of course, you would still need to make sure the body of shift were
available. (perhaps instances of inline in rules should force the
argument to appear in the hi file in full)

a hint pragma would be trickier to implement and not as flexible I would
think. for instance, what if you want to inline only when one argument
is an arbitrary constant, but the other arg is a certain value?

John
-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


pinned byte arrays

2006-10-19 Thread Bulat Ziganshin
Hello glasgow-haskell-users,

i have program that holds in memory a lot of strings (filenames) and
use my own packed-string library to represent these string. this
library uses newByteArray# to allocate strings. in my benchmark run
(with 300.000 filenames in memory) program prints the following
statistics:

 37,502,956 bytes maximum residency (16 sample(s))

   2928 collections in generation 0 (  6.90s)
 16 collections in generation 1 ( 12.20s)

 65 Mb total memory in use

most of this memory occupied by filenames. note that 28 mb of memory
is used just for GC procedure (i use compacting GC)

then, i thought that using pinned byte arrays should significantly
improve memory usage because these arrays can't be moved around and
therefore, i thought, these arrays will not be involved in GC. amount
of data which should be compacted by GC will decrease and amount of
memory wasted by GC should also decrease

so i replaced in my packed str lib newByteArray# with
newPinnedByteArray# and seen the following:

 53,629,344 bytes maximum residency (14 sample(s))

   2904 collections in generation 0 (  6.97s)
 14 collections in generation 1 ( 13.16s)

100 Mb total memory in use

why? why memory usage was increased? why another 47 megs is just
wasted out? how GC works with pinned byte arrays? and why GC times was
almost the same - i thought that GC for pinned data should be very
different than GC for unpinned byte arrays

-- 
Best regards,
 Bulat  mailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Benchmarking GHC

2006-10-19 Thread Neil Mitchell

Hi


> One thing that IME makes a difference is -funbox-strict-fields.  It's
> probably better to use pragmas for this, though.  Another thing to
> consider is garbage collection RTS flags, those can sometimes make a
> big difference.



I _don't_ want to speed up a particular program by modifying it, I
want to take a set of existing programs which are treated as black
boxes, and compile them all with the same flags. I don't want to
experiment to see which flags give the best particular result on a per
program basis, or even for the benchmark as a whole, I just want to
know what the "standard recommendation" is for people who want fast
code but not to understand anything.


All this and more on the under-publicised Performance wiki,
http://haskell.org/haskellwiki/Performance


It's a very good resource, and I've read it before :)

Another way to treat my question is, the wiki says "Of course, if a
GHC compiled program runs slower than the same program compiled with
another Haskell compiler, then it's definitely a bug" - in this
sentance what does the command line look like in the GHC compiled
case?

Thanks

Neil
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Benchmarking GHC

2006-10-19 Thread Donald Bruce Stewart
ketil+haskell:
> "Neil Mitchell" <[EMAIL PROTECTED]> writes:
> 
> > I want to benchmark GHC vs some other Haskell compilers, what flags
> > should I use?
> 
> > [...] I guess the answer is "-O2 -fvia-C"?
> 
> I tend to use -O2, but haven't really tested it against plain -O.
> >From what I've seen -fvia-C is sometimes faster, sometimes slower, I
> tend to cross my fingers and hope the compiler uses sensible defaults
> on the current architecture.
> 
> One thing that IME makes a difference is -funbox-strict-fields.  It's
> probably better to use pragmas for this, though.  Another thing to
> consider is garbage collection RTS flags, those can sometimes make a
> big difference.

All this and more on the under-publicised Performance wiki,
http://haskell.org/haskellwiki/Performance

-- Don
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Benchmarking GHC

2006-10-19 Thread Ketil Malde
"Neil Mitchell" <[EMAIL PROTECTED]> writes:

> I want to benchmark GHC vs some other Haskell compilers, what flags
> should I use?

> [...] I guess the answer is "-O2 -fvia-C"?

I tend to use -O2, but haven't really tested it against plain -O.
>From what I've seen -fvia-C is sometimes faster, sometimes slower, I
tend to cross my fingers and hope the compiler uses sensible defaults
on the current architecture.

One thing that IME makes a difference is -funbox-strict-fields.  It's
probably better to use pragmas for this, though.  Another thing to
consider is garbage collection RTS flags, those can sometimes make a
big difference.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users