john_w_gilm...@msn.com (john gilmore) writes:
> These optimizations are also devised by groups whose full-time job is
> to optimize code skeletons that are used stereotypically in
> compiler-generated code; and these groups inevitably come to have a
> vested interest in cleverness, i.e., non-standard, less than obvious
> ways of doing things.

more recent state-of-the-art ... is to build a model of the hardware &
instruction operation ... and have code that selects instruction
combinations based on specified cycles (or some other criteria) ... that
code is now finding non-standard, possibly non-obvious, instruction
sequences (also makes it easier to do large number of backends across
variety of different machine architectures). Part of machine model (for
execution) includes things like out-of-order execution dependencies.

long ago and far away, early cp67 (virtual machine precursor to vm370,
ran on 360/67) shipped with virtual address tables initialized pointing
to a special "zeros" page on disk. I changed that to indicate a "zeros"
page and just cleared the storage to zeros. Common operation of the
period was to use (multiple) overlapping MVC. I did implementation that
saved registers, zero'ed ten registers and did BXLE STM loop for those
ten registers (significantly faster than overlapping MVC ... on 360/67).

GCC 4.6
http://gcc.gnu.org/gcc-4.6/changes.html

from above:

S/390, zSeries and System z9/z10, IBM zEnterprise z196

Support for the zEnterprise z196 processor has been added. When using
the -march=z196 option, the compiler will generate code making use of
the following instruction facilities:

        Conditional load/store
        Distinct-operands
        Floating-point-extension
        Interlocked-access
        Population-count

The -mtune=z196 option avoids the compare and branch instructions as
well as the load address instruction with an index register as much as
possible and performs instruction scheduling appropriate for the new
out-of-order pipeline architecture.

When using the -m31 -mzarch options the generated code still conforms to
the 32-bit ABI but uses the general purpose registers as 64-bit
registers internally. This requires a Linux kernel saving the whole
64-bit registers when doing a context switch. Kernels providing that
feature indicate that by the 'highgprs' string in /proc/cpuinfo.

The SSA loop prefetching pass is enabled when using -O3.

... snip ...

-- 
virtualization experience starting Jan1968, online at home since Mar1970

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to