I've recently run into this effect in some performance tests on FLEX-ES based systems. It turns out that FLEX-ES exhibits processor cache behavior very similar to the zSeries systems, being very affected by programs that do a lot of "store into/near code" operations. We have an advantage in that we have a command that will display a number of processor cache statistics, including one that indicates the level of this store-into-code behavior. Apparently, when the zSeries first came, out a number of SAS product users encountered very significant performance degradation because their products were structured so as to store the data 'close' to the code. They (SAS) have since changed most of their products to eliminate this characteristic. Other products exhibit this behavior also, including some IBM products. For example IBM's DL/1 for VSE does some of this store-into-code.
Mike C. M. (Mike) Hammock zSeries Enablement Cornerstone Systems (404) 643-3258 [EMAIL PROTECTED] Peter Vander Woude To: [EMAIL PROTECTED] <[EMAIL PROTECTED] cc: ter.com> Subject: Re: [LINUX-390] zSeries performance heavily dependent on working Sent by: Linux on set size 390 Port <[EMAIL PROTECTED] IST.EDU> 08/14/2003 08:14 AM Please respond to Linux on 390 Port Ulrich, Yes on the zSeries machines, the separation of code can be huge. If the data that is being updated is within 256 bytes of the instruction that is updating it, there is a huge performance impact. Moving the data to being outside of that range (or having the code "get" the storage for the data), has been shown on the zSeries to reduce cpu usage by a factor of at least 5 (that is the 1st case uses at least 5 times more cpu than the second). On other IBM mainframes (i.e. 9675 boxes), there is also an impact, but the L1 cache is smaller, and a lower cpu impact. Peter I. Vander Woude Sr. Mainframe Engineer Harris Teeter, Inc. >>> [EMAIL PROTECTED] 08/13/2003 8:31:47 PM >>> Dave Rivers wrote: > On a per-function basis - but not within functions; because > gcc points R13 at the literal pool; which can be quite large > (and different from the code location in sufficiently large > functions.) Separating code and literal pool would appear likely to cause a net win on machines with separate i-cache and d-cache (i.e. all zSeries machines). I don't have specific measurements to prove that point, though. Bye, Ulrich -- Dr. Ulrich Weigand [EMAIL PROTECTED]