Peter Tanski wrote:

I will look at these more closely later; if you are curious about any of them I will send the logs.

On a side-note, I will make an effort to test ghc-6.4.3 and ghc-6.6 in Parallel (with PAR defined). Even though I have a uni-processor I have pvm version 3 installed and have used pvm in C programs.

I don't expect PAR to work in any branch of GHC, except in sources you get directly from the GPH folk. The PAR code in released versions of GHC isn't tested, certainly doesn't work, and probably doesn't compile either.


ghc/rts/Schedule.c:2022-2032

#if defined(RTS_SUPPORTS_THREADS)
    // Allocating a new condition for each thread is expensive, so we
    // cache one.  This is a pretty feeble hack, but it helps speed up
    // consecutive call-ins quite a bit.
    if (bound_cond_cache_full) {
    m->bound_thread_cond = bound_cond_cache;
    bound_cond_cache_full = 0;
    } else {
    initCondition(&m->bound_thread_cond);
    }
#endif

That hack may become worrisome, should threads obtain separate mutexes (I did not check the code for GranSim).

I don't think there's a problem here - these condition variables are "free" in the sense that they should be unused by the rest of the system when they are recycled, so there's no chance that we could be confused about which mutex is associated with a condition variable.

I haven't used ZLA's before, so that was partly my misconception and partly an underlying disagreement I have with flexible array members in general. I always thought of flexible array member as a pointer; I now understand that structures containing a ZLA-flexible array member are treated as if the ZLA-member does not exist, especially for sizeof(), but their incomplete type must reserve space (sizeof (void), which is also an incomplete type, is 1).

I don't understand that last comment. Why should the incomplete type have a size that is 1 byte larger than a normal sizeof()?

The actual implementation does include the offset to the flexible array member, so sizeof() should account for that.

Do you mean that the size of the type should include any padding required before the flexible array, but no size for the array itself? (that I agree with).

Bug 25805 [4.0 regression], at <http://gcc.gnu.org/bugzilla/ show_bug.cgi?id=25805> is a good example of a worrisome error:

The following program fails to initialise d1.x[] using Apple's gcc 4.0.1 (build 5363), even when -fno-zero-initialized-in-bss is defined. (Note: -fzero-initialized-in-bss is defined by default.) This program does not fail when using a version of gcc 4.2.0 I built.

Of course, this bug only affects initialisation.

We never use initializers for flexible arrays, as far as I'm aware. The CostCentreStack and CostCentre types don't have flexible members.

Finally, if there are alignment issues, wouldn't that be better
controlled explicitly through pragmas?

Could you elaborate a bit?  Where do you want to use alignment  pragmas,
and what would they buy us?

Hopefully they would buy speed and maybe even some space, at least on some RISC architectures. If ZLA's are used and there is any padding on the end of the structure (in place of the ZLA member or after it), the alignment of the subsequent structures may be off.

All of the closure structures are carefully written to be a multiple of StgWord in size, and are always aligned to an StgWord boundary (no more, no less), because that is how the heap/stack works in GHC. Info tables are also a multiple of StgWord, but there it is less clear: we might like the code to be more aligned than StgWord, but to do that we would probably have to pad *before* the info table.

You might like to take a look at

  http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage

in particular the sections on heap layout and slop.

For the PowerPC, Integers are optimally aligned on byte boundaries according to their size (i.e., a 4-byte int32_t would only have 'good' performance if it was aligned at 2 and would have 'optimal' performance at 4); similarly for floats (8-byte doubles = 8 byte algned optimal, etc.). There is a space (not performance) penalty for aligning members on greater-size boundaries.

The c99 Standard (TC2), on Section 6.7.2.1 (Flexible Array Members) states that:

In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply.

Right.  This isn't an issue for us (no padding is ever required).


Gcc (4.x) seems to align structures with flexible array members (ZLAs) at 2:

for the above example program:

_d1:
    .space 4
    .globl _d2
    .align 2

(Note: it is an error to use the __alignof__ keyword to find this out because ZLAs have incomplete type so you have to look in the assembly output.)

It might be interesting to view the performance difference between using the Darwin pragma:
#pragma options align=4

Remember that most of the code that manipulates these structures is not compiled C code; it is the C-- code generated by GHC. Even when compiling via C, the generated C code is not using these structure definitions, it is just manipulating StgWords. Adding alignment constraints here would have no effect.

Cheers,
        Simon
_______________________________________________
Cvs-ghc mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to