I am no longer subscribed to this list so please be sure to include my email address in any replies.

I have been working on a set of class templates. Currently my example is rather large and cumbersome.

I get the same results using g++ 4.0.1 on the Mac and g++ 4.0.2 on AIX.

The templates are doing "template expressions". There are two points that may or may not be of interest.

1) I put the typical NoCopy as a base class to hide the copy and copy- assignment methods. When I do that, one class no longer compiles. If I add a copy constructor, the class compiles but by the time the final output is done, it has been optimized away. This is without using any -O flags. I understand that all this is normal and o.k. but it made it hard for me to figure out what, if anything, I could do to change my code to prevent the need of the copy. I am still trying to figure out an approach to figure out where and why the extra copy is needed. (Note, that this will be a copy of a temporary.)

2) The class templaltes, as mentioned, are implementing what the books call "template expressions". As an expression is parsed, a tree of C++ classes that replicates the parse tree is produced. Then the tree is evaluated and frequently compiles down to just constants which are folded into very efficient code. One of these expressions my look like this:

    p->*gr.portSlotStatusArray[2]->*pss.allSlotStatus = 0x12345678;

(which assigns the r value to a particular field of bits in a particular hardware register in memory mapped I/O).

If I put these back to back like this:

    p->*gr.portSlotStatusArray[2]->*pss.allSlotStatus = 0x12345678;
    p->*gr.portSlotStatusArray[2]->*pss.allSlotStatus = 0x12345678;
    p->*gr.portSlotStatusArray[2]->*pss.allSlotStatus = 0x12345678;
    p->*gr.portSlotStatusArray[2]->*pss.allSlotStatus = 0x12345678;

the code produced is fantastic -- optimum code. Everything is inlined. Just wonderful code! (And I can alter the value of the [2] and other things and its all very very nice code.)

If I break up the basic block by calling out like this:

    p->*gr.portSlotStatusArray[2]->*pss.allSlotStatus = 0x12345678;
    nothing(1);
    p->*gr.portSlotStatusArray[2]->*pss.allSlotStatus = 0x12345678;
    nothing(1);
    p->*gr.portSlotStatusArray[2]->*pss.allSlotStatus = 0x12345678;
    nothing(1);
    p->*gr.portSlotStatusArray[2]->*pss.allSlotStatus = 0x12345678;
    nothing(1);

(nothing is a function in an external file that does nothing)

The compiler produces code that creates part (but not all) of the expression tree on the stack but never references it any place that I can see. The other odd thing is that these trees are in separate places in memory (so, in the case above, four separate copies of these trees will be created in four unique locations). So the space consumed by the temporary trees is not reclaimed (until the function returns).

My question is if this is of interest to any of the developers. If it is, I will package up a nice test program and submit it via bugzilla.

Perry Smith
Ease Software, Inc.
[EMAIL PROTECTED]
http://www.easesoftware.com

Low cost SATA Products for IBMs p5, pSeries, and RS/6000 AIX systems



Reply via email to