Mersenne: Re: Mersenne Digest V1 #557

Steinar H. Gunderson Sat, 15 May 1999 13:37:49 -0700
On Sat, May 15, 1999 at 12:52:56AM -0700, Mersenne Digest wrote:
>Why not just write a piece of code that (during installation of Prime95)
>removes the screensaver start-up line in the ini (windows) files.

Well, as Prime95 is only installed once, and the users are adding screen
savers all the time, this will help little.

--- snip ---

>My solution was to put a different copy of Prime95 on the home
>directory of each workstation on the server they connect to.

On our system, each workstation doesn't have their own home
directory, and there's nothing I can do about it :-(

>Of course it does waste about 4 MB of disk space for each
>workstation * 275 workstations = 1.1 GB which is nothing in server
>space these days.

Well, considering that our server is only a bit over 10 GB, and we're
already hard pushed ;-)

--- snip ---

>I now have this 486 machine doing the factoring that was 
>being done by the P120. Generally, it is taking almost two weeks per 
>exponent.

The machines at school are estimated to use about three months each 
(not 24/7) :-) Just have patience.

>Therefore, my question is even with this two week time, is the 486 
>machine doing "useful work" for GIMPS, or is it merely heating up the 
>CPU? I ask this because in a couple or three weeks I may have access 
>to a quantity (30 to 75) of 486DX-50 machines.  If these machines can 
>contribute useful work to GIMPS, I will happily give them each a mouthful 
>of exponents to factor :-)

I asked George about this once. His answer (not 100% accurate, but the
wording was similiar): `Every little bit helps!' Yes, they will be doing
useful work. 

--- snip ---

>On a related note, I've found that for LL testing, the speed of a 
>Pentium MMX and a Pentium II is about the same adjusted for clock speed,
>but for factoring, the P-II seems to finish in about half the time.

Curious -- I'm just now talking to George about improving the LL code
for P6 (PPro/PII/PIII). Some volunteer already improved the factoring
code. Look at the whatsnew.txt file from Prime95:

===
New features in Version 14.3 of prime95.exe
-------------------------------------------

1)  The Pentium Pro factoring code is nearly twice as fast compared
    to version 14.2.
===

In fact, 486 code is often better for P6 than Pentium code. If you don't
believe me, look at what GNU did for glibc2 (the now-standard Linux C
library): In the bottom, you have C code. In addition, there are some
special 386-optimized routines. The 486 `inherits' these routines, and
adds some more 486-specific ones. Pentium inherits the 386 and 486
routines. BUT... Pentium Pro inherits _only_ 386 and 486 code, not
Pentium code. (Trying to pipeline FPU-code the same way for P6 as for
Pentium is generally not helping, since the P6 does out-of-order
execution already. Therefore, all the FXCHs are not useful, and just
eating up time.)

The problem is that I don't have access to masm :-( George has promised
to look into PII optimization for v19.

>This indicates to me that there's room for more improvement, though how
>that would work in details isn't clear, since I couldn't find the source
>for the factoring part of the program when I looked.

It's in factor64.asm (at least part of it).

---

>I'm not in favour of "forcing" this solution on to users, it sounds a
>bit draconian to me. Also, I've been known to criticise vehemently
>software vendors whose setup programs trample on users' personalizations.

The problem is that our `users' (pupils) generally know nothing about
PCs, most of them can barely surf the web. Trying to explain to them
that a maths program (there are not many of us knowing that Prime95
is running anyway, I've used `No Icon' so people won't tamper with it)
doesn't like screen savers will not be very constructive.

>I would much prefer a programme of user education - either convince them
>that animated screensavers are a waste of resources which could be used
>more profitably, or at least get them to change the priority of Prime95
>to 4 so that it's guaranteed a reasonable chance of getting CPU cycles.

For `normal' users, yes. For those, sorry, no :-)

>BTW experiments indicate that screensavers usually don't consume more than
>25% of the available CPU cycles anyway. I'd rather have a user who feels
>they really need their animated screensaver run Prime95 at 75% of its
>potential than not run Prime95 at all.

What about letting users keep their screen savers all day, and just reset
them the first time they're rebooted in the morning?

--- snip ---

>Perhaps because the LL test is critical on the FPU speed, whereas the
>factoring code is critical on the integer part of the CPU, in particular
>the efficiency of the (I)MUL instruction.

There are two factoring versions: FPU (for Pentium/P6) and integer (all
others).

>This would appear to adequately explain the performance difference
>between P5 and P6, and also give an explanation as to why the 486 is
>apparently so much less efficient even after correction for clock
>speed.

P6 is a totally different concept that P5 in general. P6 is actually
a RISC processor. A decoder unit converts x86 CISC instructions to
what Intel refers to as `micro-ops'. The Pentium II/Pro can schedule up
to five micro-ops per cycle, but a much more typical rate is three
micro-ops per cycle. (This data is loosely based on Intel documents.)

Clearly, the intent must be using as few micro-ops as possible to
do a given task, AND making sure the decoder can provide enough micro-
ops to the computation core (this is done by placing more complex/
less complex instructions in a special pattern). (Again based on
Intel info.)

The differences spring from that the cycle count on P5 is not always
the same as the micro-op count on the P6. To take a very typical
example:

FADD QWORD PTR mem      (mem is a memory reference)
FXCH ST(1)
FADD QWORD PTR mem      (mem is the same memory reference)
FXCH ST(1)

On the P5, this will take only two cycles to complete. The FADD is scheduled
in the U-pipe (correct me if I'm wrong... perhaps the U-pipe is an integer-
only pipe), and the FXCH is scheduled in the V-pipe. However, on the P6,
this is sub-optimal code. The FADD translates into two micro-ops (one to
load `mem' into memory, and one to do the add), and the FXCH translates
into one. Consequently, this will take 6 micro-ops to execute. However,

FLD   mem
FADD  ST(1),ST
FADDP ST(2),ST

will do exactly the same (I hope...), and just take three micro-ops. One
cycle saved.

Sorry for bothering you all -- when I first start to write, I can't stop.

--- snip ---

>It might be worth trying at the end of the
>first pass then exiting ReCache immediately, or at the end of the
>first pass & starting the second pass - you could then abort the
>second pass at the halfway mark, the idea being that ReCache should
>continue thrashing long enough for Prime95/mprime to get itself
>initialized as far as allocating its work vectors.

I'll try... But why should ReCache still be thrashing when Prime95 boots
up?

>I did find that ReCache "worked" in the sense of at least making things
>no worse on a wide selection of systems running Win 9x & NT.

Yes, it makes things more `stable', as one user pointed out.

>If you find ReCache doesn't work for you - even on a Windoze machine -
>then I'm sorry, but you do have the option not to use it!

Hmmmm, good thing -- I don't always have that option :-(

/* Steinar */
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne: Re: Mersenne Digest V1 #557

Reply via email to