I've recently run into this effect in some performance tests on FLEX-ES
based systems.
It turns out that FLEX-ES exhibits processor cache behavior very similar to
the
zSeries systems, being very affected by programs that do a lot of "store
into/near code"
operations.  We have an advantage in that we have a command that will
display a
number of processor cache statistics, including one that indicates the
level of this
store-into-code behavior.  Apparently, when the zSeries first came, out a
number of
SAS product users encountered very significant performance degradation
because
their products were structured so as to store the data 'close' to the code.
They (SAS)
have since changed most of their products to eliminate this characteristic.
Other
products exhibit this behavior also, including some IBM products.  For
example
IBM's DL/1 for VSE does some of this store-into-code.

Mike
C. M. (Mike) Hammock
zSeries Enablement
Cornerstone Systems
(404) 643-3258
[EMAIL PROTECTED]



                      Peter Vander
                      Woude                    To:       [EMAIL PROTECTED]
                      <[EMAIL PROTECTED]        cc:
                      ter.com>                 Subject:  Re: [LINUX-390] zSeries 
performance heavily dependent on working
                      Sent by: Linux on         set         size
                      390 Port
                      <[EMAIL PROTECTED]
                      IST.EDU>


                      08/14/2003 08:14
                      AM
                      Please respond to
                      Linux on 390 Port






Ulrich,

  Yes on the zSeries machines, the separation of code can be huge.  If
the data that is being updated is within 256 bytes of the instruction
that is updating it, there is a huge performance impact.  Moving the
data to being outside of that range (or having the code "get" the
storage for the data), has been shown on the zSeries to reduce cpu usage
by a factor of at least 5 (that is the 1st case uses at least 5 times
more cpu than the second).

  On other IBM mainframes (i.e. 9675 boxes), there is also an impact,
but the L1 cache is smaller, and a lower cpu impact.



Peter I. Vander Woude

Sr. Mainframe Engineer
Harris Teeter, Inc.



>>> [EMAIL PROTECTED] 08/13/2003 8:31:47 PM >>>
Dave Rivers wrote:

> On a per-function basis - but not within functions; because
> gcc points R13 at the literal pool; which can be quite large
> (and different from the code location in sufficiently large
> functions.)

Separating code and literal pool would appear likely to cause
a net win on machines with separate i-cache and d-cache (i.e.
all zSeries machines).  I don't have specific measurements to
prove that point, though.

Bye,
Ulrich

--
  Dr. Ulrich Weigand
  [EMAIL PROTECTED]

Reply via email to