Re: A few Questions on OS X ghc-6.4.3 fix

Simon Marlow Mon, 23 Oct 2006 04:18:11 -0700

Peter Tanski wrote:

I will look at these more closely later; if you are curious about anyof them I will send the logs.
On a side-note, I will make an effort to test ghc-6.4.3 and ghc-6.6 inParallel (with PAR defined). Even though I have a uni-processor I havepvm version 3 installed and have used pvm in C programs.

I don't expect PAR to work in any branch of GHC, except in sources you getdirectly from the GPH folk. The PAR code in released versions of GHC isn'ttested, certainly doesn't work, and probably doesn't compile either.

ghc/rts/Schedule.c:2022-2032

#if defined(RTS_SUPPORTS_THREADS)
    // Allocating a new condition for each thread is expensive, so we
    // cache one.  This is a pretty feeble hack, but it helps speed up
    // consecutive call-ins quite a bit.
    if (bound_cond_cache_full) {
    m->bound_thread_cond = bound_cond_cache;
    bound_cond_cache_full = 0;
    } else {
    initCondition(&m->bound_thread_cond);
    }
#endif

That hack may become worrisome, should threads obtain separate mutexes(I did not check the code for GranSim).

I don't think there's a problem here - these condition variables are "free" inthe sense that they should be unused by the rest of the system when they arerecycled, so there's no chance that we could be confused about which mutex isassociated with a condition variable.

I haven't used ZLA's before, so that was partly my misconception andpartly an underlying disagreement I have with flexible array members ingeneral. I always thought of flexible array member as a pointer; I nowunderstand that structures containing a ZLA-flexible array member aretreated as if the ZLA-member does not exist, especially for sizeof(),but their incomplete type must reserve space (sizeof (void), which isalso an incomplete type, is 1).

I don't understand that last comment. Why should the incomplete type have asize that is 1 byte larger than a normal sizeof()?

The actual implementation does includethe offset to the flexible array member, so sizeof() should account forthat.

Do you mean that the size of the type should include any padding required beforethe flexible array, but no size for the array itself? (that I agree with).

Bug 25805 [4.0 regression], at <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25805> is a good example of a worrisome error:
The following program fails to initialise d1.x[] using Apple's gcc4.0.1 (build 5363), even when -fno-zero-initialized-in-bss is defined.(Note: -fzero-initialized-in-bss is defined by default.) This programdoes not fail when using a version of gcc 4.2.0 I built.
Of course, this bug only affects initialisation.

We never use initializers for flexible arrays, as far as I'm aware. TheCostCentreStack and CostCentre types don't have flexible members.

Finally, if there are alignment issues, wouldn't that be better
controlled explicitly through pragmas?
Could you elaborate a bit?  Where do you want to use alignment  pragmas,
and what would they buy us?
Hopefully they would buy speed and maybe even some space, at least onsome RISC architectures. If ZLA's are used and there is any padding onthe end of the structure (in place of the ZLA member or after it), thealignment of the subsequent structures may be off.

All of the closure structures are carefully written to be a multiple of StgWordin size, and are always aligned to an StgWord boundary (no more, no less),because that is how the heap/stack works in GHC. Info tables are also amultiple of StgWord, but there it is less clear: we might like the code to bemore aligned than StgWord, but to do that we would probably have to pad *before*the info table.


You might like to take a look at

  http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage

in particular the sections on heap layout and slop.

For the PowerPC,Integers are optimally aligned on byte boundaries according to theirsize (i.e., a 4-byte int32_t would only have 'good' performance if itwas aligned at 2 and would have 'optimal' performance at 4); similarlyfor floats (8-byte doubles = 8 byte algned optimal, etc.). There is aspace (not performance) penalty for aligning members on greater-sizeboundaries.
The c99 Standard (TC2), on Section 6.7.2.1 (Flexible Array Members)states that:
In most situations, the flexible array member is ignored. Inparticular, the size of the structure is as if the flexible arraymember were omitted except that it may have more trailing paddingthan the omission would imply.


Right.  This isn't an issue for us (no padding is ever required).

Gcc (4.x) seems to align structures with flexible array members (ZLAs)at 2:
for the above example program:

_d1:
    .space 4
    .globl _d2
    .align 2
(Note: it is an error to use the __alignof__ keyword to find this outbecause ZLAs have incomplete type so you have to look in the assemblyoutput.)
It might be interesting to view the performance difference betweenusing the Darwin pragma:
#pragma options align=4

Remember that most of the code that manipulates these structures is not compiledC code; it is the C-- code generated by GHC. Even when compiling via C, thegenerated C code is not using these structure definitions, it is justmanipulating StgWords. Adding alignment constraints here would have no effect.


Cheers,
        Simon
_______________________________________________
Cvs-ghc mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/cvs-ghc

Re: A few Questions on OS X ghc-6.4.3 fix

Reply via email to