john_w_gilm...@msn.com (john gilmore) writes: > These optimizations are also devised by groups whose full-time job is > to optimize code skeletons that are used stereotypically in > compiler-generated code; and these groups inevitably come to have a > vested interest in cleverness, i.e., non-standard, less than obvious > ways of doing things.
more recent state-of-the-art ... is to build a model of the hardware & instruction operation ... and have code that selects instruction combinations based on specified cycles (or some other criteria) ... that code is now finding non-standard, possibly non-obvious, instruction sequences (also makes it easier to do large number of backends across variety of different machine architectures). Part of machine model (for execution) includes things like out-of-order execution dependencies. long ago and far away, early cp67 (virtual machine precursor to vm370, ran on 360/67) shipped with virtual address tables initialized pointing to a special "zeros" page on disk. I changed that to indicate a "zeros" page and just cleared the storage to zeros. Common operation of the period was to use (multiple) overlapping MVC. I did implementation that saved registers, zero'ed ten registers and did BXLE STM loop for those ten registers (significantly faster than overlapping MVC ... on 360/67). GCC 4.6 http://gcc.gnu.org/gcc-4.6/changes.html from above: S/390, zSeries and System z9/z10, IBM zEnterprise z196 Support for the zEnterprise z196 processor has been added. When using the -march=z196 option, the compiler will generate code making use of the following instruction facilities: Conditional load/store Distinct-operands Floating-point-extension Interlocked-access Population-count The -mtune=z196 option avoids the compare and branch instructions as well as the load address instruction with an index register as much as possible and performs instruction scheduling appropriate for the new out-of-order pipeline architecture. When using the -m31 -mzarch options the generated code still conforms to the 32-bit ABI but uses the general purpose registers as 64-bit registers internally. This requires a Linux kernel saving the whole 64-bit registers when doing a context switch. Kernels providing that feature indicate that by the 'highgprs' string in /proc/cpuinfo. The SSA loop prefetching pass is enabled when using -O3. ... snip ... -- virtualization experience starting Jan1968, online at home since Mar1970 ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html