Peter Tanski wrote:
I will look at these more closely later; if you are curious about any
of them I will send the logs.
On a side-note, I will make an effort to test ghc-6.4.3 and ghc-6.6 in
Parallel (with PAR defined). Even though I have a uni-processor I have
pvm version 3 installed and have used pvm in C programs.
I don't expect PAR to work in any branch of GHC, except in sources you get
directly from the GPH folk. The PAR code in released versions of GHC isn't
tested, certainly doesn't work, and probably doesn't compile either.
ghc/rts/Schedule.c:2022-2032
#if defined(RTS_SUPPORTS_THREADS)
// Allocating a new condition for each thread is expensive, so we
// cache one. This is a pretty feeble hack, but it helps speed up
// consecutive call-ins quite a bit.
if (bound_cond_cache_full) {
m->bound_thread_cond = bound_cond_cache;
bound_cond_cache_full = 0;
} else {
initCondition(&m->bound_thread_cond);
}
#endif
That hack may become worrisome, should threads obtain separate mutexes
(I did not check the code for GranSim).
I don't think there's a problem here - these condition variables are "free" in
the sense that they should be unused by the rest of the system when they are
recycled, so there's no chance that we could be confused about which mutex is
associated with a condition variable.
I haven't used ZLA's before, so that was partly my misconception and
partly an underlying disagreement I have with flexible array members in
general. I always thought of flexible array member as a pointer; I now
understand that structures containing a ZLA-flexible array member are
treated as if the ZLA-member does not exist, especially for sizeof(),
but their incomplete type must reserve space (sizeof (void), which is
also an incomplete type, is 1).
I don't understand that last comment. Why should the incomplete type have a
size that is 1 byte larger than a normal sizeof()?
The actual implementation does include
the offset to the flexible array member, so sizeof() should account for
that.
Do you mean that the size of the type should include any padding required before
the flexible array, but no size for the array itself? (that I agree with).
Bug 25805 [4.0 regression], at <http://gcc.gnu.org/bugzilla/
show_bug.cgi?id=25805> is a good example of a worrisome error:
The following program fails to initialise d1.x[] using Apple's gcc
4.0.1 (build 5363), even when -fno-zero-initialized-in-bss is defined.
(Note: -fzero-initialized-in-bss is defined by default.) This program
does not fail when using a version of gcc 4.2.0 I built.
Of course, this bug only affects initialisation.
We never use initializers for flexible arrays, as far as I'm aware. The
CostCentreStack and CostCentre types don't have flexible members.
Finally, if there are alignment issues, wouldn't that be better
controlled explicitly through pragmas?
Could you elaborate a bit? Where do you want to use alignment pragmas,
and what would they buy us?
Hopefully they would buy speed and maybe even some space, at least on
some RISC architectures. If ZLA's are used and there is any padding on
the end of the structure (in place of the ZLA member or after it), the
alignment of the subsequent structures may be off.
All of the closure structures are carefully written to be a multiple of StgWord
in size, and are always aligned to an StgWord boundary (no more, no less),
because that is how the heap/stack works in GHC. Info tables are also a
multiple of StgWord, but there it is less clear: we might like the code to be
more aligned than StgWord, but to do that we would probably have to pad *before*
the info table.
You might like to take a look at
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage
in particular the sections on heap layout and slop.
For the PowerPC,
Integers are optimally aligned on byte boundaries according to their
size (i.e., a 4-byte int32_t would only have 'good' performance if it
was aligned at 2 and would have 'optimal' performance at 4); similarly
for floats (8-byte doubles = 8 byte algned optimal, etc.). There is a
space (not performance) penalty for aligning members on greater-size
boundaries.
The c99 Standard (TC2), on Section 6.7.2.1 (Flexible Array Members)
states that:
In most situations, the flexible array member is ignored. In
particular, the size of the structure is as if the flexible array
member were omitted except that it may have more trailing padding
than the omission would imply.
Right. This isn't an issue for us (no padding is ever required).
Gcc (4.x) seems to align structures with flexible array members (ZLAs)
at 2:
for the above example program:
_d1:
.space 4
.globl _d2
.align 2
(Note: it is an error to use the __alignof__ keyword to find this out
because ZLAs have incomplete type so you have to look in the assembly
output.)
It might be interesting to view the performance difference between
using the Darwin pragma:
#pragma options align=4
Remember that most of the code that manipulates these structures is not compiled
C code; it is the C-- code generated by GHC. Even when compiling via C, the
generated C code is not using these structure definitions, it is just
manipulating StgWords. Adding alignment constraints here would have no effect.
Cheers,
Simon
_______________________________________________
Cvs-ghc mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/cvs-ghc