RE: Mersenne: Timing(?) errors

1999-09-20 Thread Willmore, David

> On Mon, Sep 20, 1999 at 09:49:51AM -0500, Willmore, David wrote:
> >Not really much you can do.  The way windows hands out memory almost
> >guarentees TLB and L2 cache thrashing.
> 
> Yeah, but some of the same problems are present when it comes to Linux...
> Perhaps I should go back to ReCache... Perhaps as a cron job? `Flush the
> darn caches every hour -- this ain't no fileserver' :-)
> 
Since it's a cache reading problem there's no real way to 'flush' it.
Normally, that means to write back dirty data to whatever backing store
exists, not 'invalidate everything'.  Even if you did, it would't solve the
problem.

> >Unless the code can look up the physical address where it's data
> >is stored and ensure the correct alignment, there's not much that can be
> >done--in a controled fashon.
> 
> I didn't think the alignment was a problem?
> 
Anyone correct me here, but this sounds like an L2 cache alias thrashing the
cache kind of problem.  Let me see if I can explain it.

Okay, quickly as I can, here's how caches work.  A cache is composed of
'lines'.  In the Intel arch, a line is normally 32b.  This is the minimum
size of data reads and writes on the main memory side of the cache.  A 256K
cache would be composed of 8K lines of 32b each.  Call the cache C[x] where
x is the number of the line.  If this is a 'direct mapped' cache, each C[x]
can store the contents of one main memory line(32b in this case).  A table
called the 'tag' is used to store *which* place the data came from.  Let's
call this T[x].

Here's how a simple 256K direct mapped cache works.  The address 'a' coming
out of the processor is broken down into two sections.  Call them T_loc(x)
and L(x).  T_loc is the tag location and L is the line--for a particular 'x'
or locaiton in main memory.  In this case, T_loc(x) = x/256K/32 and L(x)=(x
mod 256K/32).  To check if something is in the cache, we calculate T_loc(x)
to find what index into the tag array at which to look.  If the contents
T[L(x)] = T_loc(x) then the line L(x) in the cache C[x] contains our data
(at location L(x)).

The nice property of this is that division by powers of two is free.
Basically there is *no* calculation involved here, just runing wires about
this is a Good Thing(tm) as it's very quick.  The bad side is that x and
x+256K both have the same L(x) and are not part of the same line.  When the
circuitry detects this, it means that your data is not in the cache and the
cache line, if dirty--written to, but not updated in main memory--must be
written back and then the new line read in.  This process takes a lot of
time and is exactly what the cache is there to avoid.  If you repeatedly
access data in a manner which causes the cache to load/unload lines like
this, you 'thrash' the cache--you beat it up.

Associative caches--like the L2 of a PII/PIII or Celeron--is '4 way set
associative'.  This means that C[x] and T[x] are two dimensional.  This
makes things nicer.  For an access, you need to look in T[L(x),z] = T_loc(x)
where z is in {0,1,2,3}.  If it's in any of them, you lookup your data in
the appropriate 'set' of C[].  If not, then you have missed the cache and
must unload a line/load the new one.  Since you have four places to put the
line, you can access four locations in memory which have the same location
in the cache and *not* thrash the cache.  This is good.  Of course, if you
access a *fifth* location in that set, you start thrashing.

What does this have to do with the Prime95 and mprime?  Well, depending on
where the OS allocates your data storage, you may have or not have thrashing
problems.  There is no clean way to ensure this as a user.  Linux has some
cache 'coloring' considerations built into the memory allocator.  Each line
in the cache gets as many colors as it has sets and the memory manager tries
not to give you pages of the same color if they are in the same cache
'line'.  Of course, if you ask for more memory than the L2 is big, it has no
choice.  Once you ask for something that big, it just falls down to chance
on how the pages will be allocated.

> And would looking up the real, physical address be any easier in Linux?
> (The code could run with root access, if neccessary...) But perhaps you
> don't know anything about Linux at all, and I'm just throwing out
> questions
> in the wild... Guess a cc to [EMAIL PROTECTED] is in order. (Thanks to all
> you replyers, BTW.)
> 
Yes, it would probably be easier in Linux, but it might not do you any good.


Cheers,
David
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Re: Mersenne: Timing(?) errors

1999-09-20 Thread Steinar H. Gunderson

On Mon, Sep 20, 1999 at 09:49:51AM -0500, Willmore, David wrote:
>Not really much you can do.  The way windows hands out memory almost
>guarentees TLB and L2 cache thrashing.

Yeah, but some of the same problems are present when it comes to Linux...
Perhaps I should go back to ReCache... Perhaps as a cron job? `Flush the
darn caches every hour -- this ain't no fileserver' :-)

>Unless the code can look up the physical address where it's data
>is stored and ensure the correct alignment, there's not much that can be
>done--in a controled fashon.

I didn't think the alignment was a problem?

And would looking up the real, physical address be any easier in Linux?
(The code could run with root access, if neccessary...) But perhaps you
don't know anything about Linux at all, and I'm just throwing out questions
in the wild... Guess a cc to [EMAIL PROTECTED] is in order. (Thanks to all
you replyers, BTW.)

>Good weekend?

If you asked if I had a good weekend, the answer was yes, I had. Thank you
very much :-) (Now only one more week, and we'll have a week's vacation.
Sometimes, going to school is not that stupid...)

/* Steinar */
-- 
Homepage: http://members.xoom.com/sneeze/
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



Mersenne: Timing(?) errors

1999-09-17 Thread Steinar H. Gunderson

All,

I'm running Prime95 v18 on a Dell XPS P60 (no, I don't want to hear `switch
to factoring' or anything -- it's running double-checks). When activity
happens (in this case a Word document being opened and looked at), sometimes
the log shows stuff like:

[lots of 0.814 sec iteration times]
Iteration: 3077000 / 3644xxx. Clocks: [48.8 million] = 0.814 sec.
Iteration: 3078000 / 3644xxx. Clocks: [56.1 million] = 0.936 sec. <-- Word
Iteration: 3079000 / 3644xxx. Clocks: [46.7 million] = 0.779 sec.
Iteration: 308 / 3644xxx. Clocks: [47.0 million] = 0.784 sec.
Iteration: 3081000 / 3644xxx. Clocks: [46.7 million] = 0.779 sec.

Anybody have an idea about why it actually is faster now?
My best guess would be some kind of timing mistake, but it still doesn't
sound right...

(The computer has been left totally untouched after the Word usage. The Word
usage shows itself in the 0.936 timing. The machine should have enough mem --
40 MB. Running Win95, no evil CPU-hogging programs.)

/* Steinar */
-- 
Homepage: http://members.xoom.com/sneeze/
_
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers