CG = Chris Gray <[EMAIL PROTECTED]> wrote: PR = Patrik Reali <[EMAIL PROTECTED]> wrote:
PR> * class layout (do we really need this? I guess the fields are PR> allocated in the row as they are declared) CG> Some VMs may group all reference fields together, or try to CG> "pack" fields smaller than 16 bits (boolean, byte, short, char). Please apologize for this kind of off-topic post, but in case anyone was interested in the subject, please find below a few references to background reading. There surely exist plenty more papers in this area. ------------------------------------------ [Chilimbi et al., 1999] Trishul M. Chilimbi, Bob Davidson, and James R. Larus. Cache-Conscious Structure Definition. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '99), May 1999, pp. 13--24. (a) Class Splitting separates the fields of eligible classes into a frequently and a rarely accessed part, based on instrumentation data. The optimization is applicable to classes whose size roughly equals a cache block (common for Java programs), provided enough variation in field access frequency. Because of fewer cache misses, the performance of five Java programs has improved by 6 -- 18%. -- (b) Field Reordering could further improve the performance of Microsoft SQL server by 3%, despite previous cache-concious C programming. Illustrates that structure layout is better left to the compiler. ------------------------------------------ [Dolby and Chien, 2000] Julian Dolby and Andrew A. Chen. An Automatic Object Inlining Optimization and its Evaluation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2000), Vancouver BC, Canada, June 2000, pp. 345 - 357. Object Inlining transforms heap data structures by fusing parent and child objects together. It can speed up a program by reducing pointer dereferencing and by improving memory access behavior. With the benchmark programs, 30% of objects could be inlined, leading to 12% fewer loads, 25% fewer L1 cache misses and 25% fewer read stalls. ------------------------------------------ [Kistler and Franz, 2000] Thomas Kistler and Michael Franz. Automated Data-Member Layout of Heap Objects to Improve Memory-Hierarchy Performance. ACM Transactions on Programming Languages and Systems (TOPLAS) 22, No. 3, May 2000. pp. 490--505. Instrumentation data is used to build a weighted graph whose edges represent temporal dependencies between fields. In order to assign fields to cache lines, the graph is partitioned; a second step orders the fields within each partition. This Field reordering improved the performance of six Oberon programs by 3 to 50%. ------------------------------------------ [Shuf et al., 2001] Yefim Shuf, Mauricio J. Serrano, Manish Gupta, and Jaswinder Pal Singh. Characterizing the Memory Behavior of Java Workloads: A Structured View and Opportunities for Optimizations. In Proceedings of SIGMETRICS 2001/Performance 2001, Cambridge MA, USA, June 2001. Draws conclusions from detailed instrumentation of the SPEC JVM98 and JBB2000 benchmark suites, running on a modified version of the Jalape�o VM. The L1 data cache is less effective than for C/C++ desktop workloads (4% misses, compared to 1%). Object Inlining could partially fix the problem caused by "pointer chasing." Field re-ordering is unlikely to increase L1-cache performance for Java, because most "hot" objects fit into a 32-byte cache line. While Data Prefetching could mitigate the L1 cache situation, TLB misses are frequent as well, and current hardware ignores prefetching instructions if the fetched address is not in the TLB. To increase TLB hit rate, large VM pages should be used, and class data should be co-located (because virtual method tables contribute noticeably to TLB misses). -- Sascha Sascha Brawer, [EMAIL PROTECTED], http://www.dandelis.ch/people/brawer/ _______________________________________________ Classpath mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/classpath

