Mersenne Digest       Tuesday, November 28 2000       Volume 01 : Number 795




----------------------------------------------------------------------

Date: Sun, 26 Nov 2000 22:45:41 +0100
From: "Hoogendoorn, Sander" <[EMAIL PROTECTED]>
Subject: Mersenne: OT: Home Primes

A bit Off Topic, but i finaly had time to put some data i have of Home
Primes
(I.E. repeated factorizations of concatenated prime factors) online.
In case you're interested http://www.geocities.com/home_primes
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt

------------------------------

Date: Sun, 26 Nov 2000 15:06:40 -0800
From: "Stephan T. Lavavej" <[EMAIL PROTECTED]>
Subject: Mersenne: Compressing Prime95

I discovered something interesting.  The most recent Prime95 executable is
very large, 1212928 bytes long.  However, if I use UPX (Ultimate Packer for
eXecutables, upx.tsx.org) with the --best option, I can compress Prime95
down to 238542 bytes!  That's about 20% of its original size.  The
compression process, happily, is invisible to the end user, and incurs no
time or memory cost; the compressed Prime95 behaves just as the original
did.  Saving space is always a good idea.  This compression is actually
better than ZIP compression, so when the distribution zipfile includes the
smaller executable, it too becomes smaller: from 405953 bytes to 299756
bytes.  This makes the download go quicker for everyone, which especially
matters to modem users, but is nice even for people with fast connections.

Awesome.

Stephan T. Lavavej

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt

------------------------------

Date: Sun, 26 Nov 2000 23:25:09 +0100 (MET)
From: [EMAIL PROTECTED]
Subject: Re:  Mersenne: P4

> Hi all,
> 
        > The mailing list has been quiet.  I hope everyone enjoyed
> a happy Thanksgiving (or at least a good weekend for non-U.S. readers).

    The focus has been on double-checking, by a different mechanism
than that which made the original count, alas in a non-Mersenne context.

>       I've received 2 queries about the recently released Pentium 4
> and prime95.  I have no timings at this point, but I figured some folks
> would like to know how the architecture helps or hurts our cause.  I've
> downloaded the manuals and have the following observations:

     (stuff deleted)

> 2)  The P4 introduces SSE2 instructions.  Intel hopes new programs
> stop using the old FPU instructions and start using these new instructions.
> The SSE2 instructions work on 2 floating point values at the same time!
> An ADD takes 4 clocks, but can only issue every other clock cycle.  A
> MUL takes 6 clocks and also can be issued every other clock cycle.
> 
> The theoretical maximum throughput for SSE2 is one ADD *AND* one
> MUL every clock cycle.  The average latency is 2 for a ADD and 3 for
> a MUL.
> 
> Summary:  If a program can be effectively recoded to use SSE2,
> then it can have greater throughput than even the Athlon.  Of course,
> months ago I had hoped that the P4 would be able to get a throughput
> of 2 ADDs and 2 MULs per clock cycle.  Maybe in a few years, a
> future P4 or AMD chip will do this.

     I understand that the SSE2 instructions operate only on
64-bit (and 32-bit) floating point data, whereas the 
FPU registers support 80-bit intermediate results.
How will the loss of precision affect the FFT length?

    Vector processors such as the Cray typically support both

              vector op vector -> vector
              vector op scalar -> vector

opcodes, so one can (for example) form all b[i]^2 - 4.0*a[i]*c[i]
when solving several quadratic equations.
[We need two vector*vector multiplications, 
one vector*scalar multiplication, 
one vector-vector subtraction.]
I find it strange that the MMX and XMM and SSE2 instruction sets
lack vector*scalar operations and also lack a way to make multiple
copies of the constant 4.0, other than to store multiple copies
in memory.  While data replication in memory 
(or adding a[i]*c[i] to itself twice) may be acceptable here, 
we don't want multiple copies of the table of roots of unity,
for example.

        Peter Montgomery


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt

------------------------------

Date: Mon, 27 Nov 2000 10:40:52 -0000
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Compressing Prime95

On 26 Nov 00, at 15:06, Stephan T. Lavavej wrote:

> Saving space is always a good idea.  This compression is actually
> better than ZIP compression, so when the distribution zipfile includes the
> smaller executable, it too becomes smaller: from 405953 bytes to 299756
> bytes.  This makes the download go quicker for everyone, which especially
> matters to modem users, but is nice even for people with fast connections.
> 
> Awesome.

True, but...

a) I prefer to download files in a form which can be unpacked by 
"standard" software - which includes zip - rather than relying on 
inbuilt executable code. This is more secure, and makes the download 
process less platform dependent.

b) I download several megabytes across a modem line every week. The 
suggestion would save about 20 seconds each time I download Prime95. 
Say once every six months.

I think the benefits of staying with the current method exceed the 
benefits of switching to a slightly more effective compression 
method.


Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt

------------------------------

Date: Mon, 27 Nov 2000 07:58:46 -0800
From: "Stephan T. Lavavej" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Compressing Prime95

> True, but...
>
> a) I prefer to download files in a form which can be unpacked by
> "standard" software - which includes zip - rather than relying on
> inbuilt executable code. This is more secure, and makes the download
> process less platform dependent.

Well, any platform that can run Prime95's code (Win32/PE) will run a packed
Prime95, so I don't see platform dependence issues here.  The compression is
completely transparent, and I don't understand your objection.  The
executable itself is still standalone, and for distribution must be ZIPped
up with the other ancilliary files.

Stephan T. Lavavej

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt

------------------------------

Date: Mon, 27 Nov 2000 21:23:30 -0000
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re:  Mersenne: P4

On 26 Nov 00, at 23:25, [EMAIL PROTECTED] wrote:

[... snip ...]
>      I understand that the SSE2 instructions operate only on
> 64-bit (and 32-bit) floating point data, whereas the 
> FPU registers support 80-bit intermediate results.
> How will the loss of precision affect the FFT length?

We have good experimental data from several other processor 
architectures which use only IEEE 64-bit floating point. e.g. the FFT 
size
breaks in Mlucas are a bit smaller than in Prime95, e.g:

256K: Mlucas 5,150,000: Prime95, 5,250,000
320K: Mlucas 6,380,000: Prime95, 6,515,000
384K: Mlucas 7,580,000: Prime95, 7,730,000
448K: Mlucas 8,840,000: Prime95, 9,020,000
512K: Mlucas 10,110,000: Prime95, 10,320,000

Mlucas also checks for excess rounding error every iteration, so it 
is obvious if an over-aggressive run length is being used.
My experience so far on Alpha 21164 is that the Mlucas size breaks 
are OK, though I'm aware that other people have run into problems. 
Note that the math libraries supplied externally to Mlucas can vary 
in accuracy as well as any effect caused by different compilers or 
detailed differences in FPU hardware implementation.

I suggest most strongly that we don't want to reduce the size breaks in
Prime95 to suit the reduced accuracy inevitable with the implementation of
SSE2. 

I think the best way to proceed is to keep Prime95 using x86 floating-
point architecture (which _will_ run on a P4, subject to a probable loss
of efficiency) and have a new program with a different name using SSE2
floating-point throughout (which will obviously not run except on a P4).

Obviously the look, feel & PrimeNet interface of the new program 
could & should be the same! This implies that the extra cost of 
producing a new program should be negligible in comparison with the 
enormous effort involved in the creation, debugging & optimization of the
SSE2 FFT code.

This approach also facilitates arranging double-checking so that 
residues computed using x86 could be cross-checked using SSE2 and 
vice versa, should this be thought desirable.

>     Vector processors such as the Cray typically support both
> 
>               vector op vector -> vector
>               vector op scalar -> vector
> 
> opcodes, so one can (for example) form all b[i]^2 - 4.0*a[i]*c[i]
> when solving several quadratic equations.
> [We need two vector*vector multiplications, 
> one vector*scalar multiplication, 
> one vector-vector subtraction.]
> I find it strange that the MMX and XMM and SSE2 instruction sets
> lack vector*scalar operations and also lack a way to make multiple
> copies of the constant 4.0, other than to store multiple copies
> in memory.  While data replication in memory 
> (or adding a[i]*c[i] to itself twice) may be acceptable here, 
> we don't want multiple copies of the table of roots of unity,
> for example.

The omissions are indeed obvious and regrettable. I guess that the 
Intel engineers found them impractical to implement for some reason.
Either that, or they're saving up something for the Pentium 5 ...?


Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt

------------------------------

Date: Mon, 27 Nov 2000 17:19:43 -0600
From: "Jeramy Ross" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Compressing Prime95

This 'Wonderful' compression technology maybe "Awesome"; however, MY main
objection ....or perhaps philosophy towards all of this is that Prime95 is
not a large
piece of code.  It takes a relatively small amount of time to download over
a modem
compared to other software items that we modem users may download in a weeks
period.
Maybe if Prime95 was ..say .. a 20MB download.  Then perhaps shaking things
up to save
some download time would be a good idea, but as things stand now we are only
talking about saving 20 or so seconds.  Perhaps that is the reason people
may find it 'objectionable'..
Maybe its just not worth the hassle at the moment for this particular
application....

- - Jeramy A. Ross


> Well, any platform that can run Prime95's code (Win32/PE) will run a
packed
> Prime95, so I don't see platform dependence issues here.  The compression
is
> completely transparent, and I don't understand your objection.  The
> executable itself is still standalone, and for distribution must be ZIPped
> up with the other ancilliary files.
>
> Stephan T. Lavavej
>
> _________________________________________________________________________
> Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
> Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt
>

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt

------------------------------

Date: Mon, 27 Nov 2000 18:27:17 -0500
From: Jud McCranie <[EMAIL PROTECTED]>
Subject: Re:  Mersenne: P4

> >      I understand that the SSE2 instructions operate only on
> > 64-bit (and 32-bit) floating point data, whereas the
> > FPU registers support 80-bit intermediate results.

I know this is a little off-topic, but how good is the P4 at integer operations?

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt

------------------------------

Date: Mon, 27 Nov 2000 18:58:41 -0500
From: Nathan Russell <[EMAIL PROTECTED]>
Subject: Re: Mersenne: Compressing Prime95

Jeramy Ross wrote:
> 
> This 'Wonderful' compression technology maybe "Awesome"; however, MY main
> objection ....or perhaps philosophy towards all of this is that Prime95 is
> not a large
> piece of code.  It takes a relatively small amount of time to download over
> a modem
> compared to other software items that we modem users may download in a weeks
> period.
> Maybe if Prime95 was ..say .. a 20MB download.  Then perhaps shaking things
> up to save
> some download time would be a good idea, but as things stand now we are only
> talking about saving 20 or so seconds.  Perhaps that is the reason people
> may find it 'objectionable'..
> Maybe its just not worth the hassle at the moment for this particular
> application....

I'd tend to agree, as someone who uses a modem at home.  If we were
talking about something like the Sun JDK, sure, better compression might
be in order.  But in the case of a 1-meg program, it isn't.  

At school, of course, I have no real concerns - I've downloaded a CD
image in twenty minutes!  

Nathan

P.S. The website for the compression program never resolved for me; I'd
like to take a look at it, if someone would send me the IP.
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt

------------------------------

Date: Mon, 27 Nov 2000 16:50:47 -0800
From: "John R Pierce" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4

> > >      I understand that the SSE2 instructions operate only on
> > > 64-bit (and 32-bit) floating point data, whereas the
> > > FPU registers support 80-bit intermediate results.
>
> I know this is a little off-topic, but how good is the P4 at integer
operations?

not that off topic at all.  Integer multiplies can be more efficient than FP
for this sort of thing, IF they are as fast.  If a processor could pipeline
64 bit integer multiplies at one per clock with parallel adds at the same
time it would be as fast or faster than using the 80 bit FP format...  It
will be interesting seeing what IA64 brings to the table when it finally
gets up to speed...

- -jrp


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt

------------------------------

Date: Mon, 27 Nov 2000 23:05:29 -0500
From: George Woltman <[EMAIL PROTECTED]>
Subject: Mersenne: P4 - a correction

Hi again,

         One correction to my previous post.  I said that the latency to
access the L1 data cache was 2 clocks.  This is correct for integer
instructions only.  For floating point and SSE2 instructions the latency
is 6 clocks!  Interestingly, the L2 cache latency is 7 clocks for both
integer and floating point instructions.

At 11:25 PM 11/26/00 +0100, [EMAIL PROTECTED] wrote:
>      I understand that the SSE2 instructions operate only on
>64-bit (and 32-bit) floating point data, whereas the
>FPU registers support 80-bit intermediate results.
>How will the loss of precision affect the FFT length?

Brian's post is correct.  This will have a minor affect on the maximum
exponent each FFT size can handle.   The reason the effect is minor
is that prime95 stores FFT data in memory, converting to 64-bit format.

>     Vector processors such as the Cray typically support both
>               vector op vector -> vector
>               vector op scalar -> vector
>I find it strange that the MMX and XMM and SSE2 instruction sets
>lack vector*scalar operations and also lack a way to make multiple
>copies of the constant 4.0, other than to store multiple copies
>in memory.  While data replication in memory
>may be acceptable here,
>we don't want multiple copies of the table of roots of unity,
>for example.

There may be no alternative to multiple copies of the roots of unity.

A question for readers.  Prime95 currently uses about 8MB (exponent
around 11 million).  How would you feel if the P4 optimized version
used 13MB?   23MB?   33MB?

I hate to code up 3 versions of the P4 code  (small, medium, large),
but it might be necessary.

Regards,
George

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt

------------------------------

Date: Tue, 28 Nov 2000 00:05:18 -0500
From: Jud McCranie <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4 - a correction

At 11:05 PM 11/27/2000 -0500, George Woltman wrote:
A question for readers.  Prime95 currently uses about 8MB (exponent
>around 11 million).  How would you feel if the P4 optimized version
>used 13MB?   23MB?   33MB?


Larger memory use would be OK with me, since I have 320MB.

+---------------------------------------------------------+
|     Jud McCranie                                        |
|                                                         |
| Programming Achieved with Structure, Clarity, And Logic |
+---------------------------------------------------------+


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 00:13:22 -0500
From: Nathan Russell <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4 - a correction

George Woltman wrote:

(big snip)

> A question for readers.  Prime95 currently uses about 8MB (exponent
> around 11 million).  How would you feel if the P4 optimized version
> used 13MB?   23MB?   33MB?
> 
> I hate to code up 3 versions of the P4 code  (small, medium, large),
> but it might be necessary.

I'd say it would very much depend on what the typical mass-market P4 box
ends up coming with.  Any modern, graphical web browser requires 5-6
megs of memory, more in typical operation.  Five years ago, I would not
have dreamed that a single program would require that much, but now i
use them routinely.  For that matter, my family's 286, which was out
primary machine into the 1990's, ran quite happily with a total of 640 k
of RAM.  

If the typical P4 has 256 or 384 megs of memory, I don't think many
users running the now-typical programs would complain at Prime95 using
33 megs.  OTOH, if the amount of memory is in the range now typical for
P3/Athlon systems, that might be a different story.  

I should probably also note that most people using high-end systems now
are gamers or are using those systems as servers.  In both cases, large
amounts of memory are necessary, and if people typically get systems
with memory in convenient powers of two, they may have some that they do
not use in day-to-day operation.  

Nathan

P.S. My P3-600 with 128 megs of memory is set to make 70 megs available
to P-1 and the performance does not seem to greatly suffer.
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 00:15:47 -0600
From: "Jeramy Ross" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4 - a correction

*SNIP*
> A question for readers.  Prime95 currently uses about 8MB (exponent
> around 11 million).  How would you feel if the P4 optimized version
> used 13MB?   23MB?   33MB?

33MB shouldn't be too unreasonable.  I, like Nathan, have 128MB and 70MB of
that is set to be available in Prime95 and have not seen any loss of
performance.

- - Jeramy

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 27 Nov 2000 22:31:33 -0800
From: "xqrpa" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4 - a correction

The P4 at 1.5 GHz is just the beginning of a sequence of ever faster
processors which I think will dominate the market for a long time
to come, and, yes, with RDRAM giving much improved memory
performance, which should give Prime95 a real kick.

A review of the chip from this perspective can be found at:

http://www.gamepc.com/reviews/hardware_review.asp?review=pentium4&page=1&msc
ssid=&tp=

So, make a P4 version as fast as possible, using 33MB if that would be
enough.

My 2 kopecks, anyway!

Best Wishes,
Stefanovic

> A question for readers.  Prime95 currently uses about 8MB (exponent
> around 11 million).  How would you feel if the P4 optimized version
> used 13MB?   23MB?   33MB?
>
> I hate to code up 3 versions of the P4 code  (small, medium, large),
> but it might be necessary.
>
> Regards,
> George
>
> _________________________________________________________________________
> Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
> Mersenne Prime FAQ      -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt
>

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 27 Nov 2000 22:21:35 -0800
From: "Osher Doctorow" <[EMAIL PROTECTED]>
Subject: Mersenne: An unusual paper related to number theory - Doctorow

From: Osher Doctorow, Ph.D. [EMAIL PROTECTED], Mon. Nov. 27, 2000 10:17PM

As I mentioned on primes-L, my recent paper on logic-based probability (LBP)
was published in the volume Quantum Gravity, Generalized Theory of
Gravitation, and Superstring Theory-Based Unification, Editors B. N.
Kursunoglu, S. L. Mintz, and A. Perlmutter, Kluwer Academic/Plenum: New York
2000.   The paper has the title "Magnetic monopoles, massive neutrinos and
gravitation via logical-expermental unification theory (LEUT) and Kursunoglu
and is on pages 89-97 of the book.   Mathematical physics is only one small
field to which LBP applies.  I am especially interested in its number theory
applications, which now include quadratic fields and a whole array of primes
problems.  Since the paper cited above, I have expanded the theory of LBP
considerably.  Some of the power of LBP derives from its generalization of
maximum entropy and its slight modification of elementary operations.  For
example, it replaces division in Bayesian probability/statistics with
subtraction (plus the addition of the constant 1).  Like Non-Euclidean
geometry, it provides a whole different world of results, but it also
provides powerful ordering principles somewhat analogous to the Principle of
Equivalence in general relativity and beyond.  One remarkable result is that
it indicates that linear and quadratic/conic (including cross product terms
like cxy in the unrotated conic) expressions rather than higher degree
polynomials are key in number theory (actually across mathematics).  Simple
exponentials of form exp(kx) are also considered by LBP to be very important
across mathematics, where k may be complex, but composite functions and
inverse functions are much less important (in fact, composite functions are
among the worst, which is interesting in view of the high degree of
composition involved in iterated fractal/chaos expressions).

Osher Doctorow
Doctorow Consultants, West Los Angeles College, etc.

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 27 Nov 2000 23:27:58 -0800
From: "John R Pierce" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4 - a correction

> A question for readers.  Prime95 currently uses about 8MB (exponent
> around 11 million).  How would you feel if the P4 optimized version
> used 13MB?   23MB?   33MB?

my machines have anywheres from 128MB to 512MB, so a 32MB footprint would be
little problem.



_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 07:45:25 -0000
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4 - a correction

On 27 Nov 00, at 23:05, George Woltman wrote:

>          One correction to my previous post.  I said that the latency to
> access the L1 data cache was 2 clocks.  This is correct for integer
> instructions only.  For floating point and SSE2 instructions the latency
> is 6 clocks!  Interestingly, the L2 cache latency is 7 clocks for both
> integer and floating point instructions.

Surely the latency is much less relevant than the throughput, 
provided that there are sufficient registers and pipeline entries.
The idea is to keep the execution units busy ...

The Pentium 4 architecture has a deeper pipeline than the old PPro 
architecture used up to Pentium III, which should help reduce 
mispredicted branches. It also has a new feature, the "Execution 
Trace cache", which stores up to 12K instructions in a ready-decoded 
format, meaning they can be re-used without calling on the main 
decoder. These features should help throughput - though whether this 
is sufficient to offset the smaller L1 cache (especially in 
comparison with Athlon) is arguable, and may well be application 
dependent.

Presumably we get these performance enhancements, whether we choose 
to run SSE2 or stick with x86 FPU code.

Bear in mind that there may be a severe penalty for switching between 
SSE2 and x86 FPU formats. To avoid this, 100% conversion to SSE2 is 
indicated. The assembler code is transparent, but there may be some 
floating-point code generated by the C compiler which would need to 
be attended to.

> A question for readers.  Prime95 currently uses about 8MB (exponent
> around 11 million).  How would you feel if the P4 optimized version
> used 13MB?   23MB?   33MB?
> 
> I hate to code up 3 versions of the P4 code  (small, medium, large),
> but it might be necessary.

I may not be fully up to speed on Intel's marketing policy, but I 
gathered that Pentium 4 was targeted at high-end systems - which we 
could expect to have "adequate" memory installed. The idea was that 
Pentium 3 was to be retained for the lower end of the consumer 
market. Bearing in mind that the existing "small memory footprint" 
PPro code will execute on a Pentium 4 (albeit with a probably 
significant performance penalty) I personally doubt that it's worth 
worrying overmuch about the memory footprint of the P4 code - 
especially if my suggestion that the P4 version is made available as 
a seperate program, rather than being implemented as a parallel 
instruction stream in Prime95.

The problem with this approach is that the 850 chipset currently 
required to support P4 only supports Rambus (RDRAM). Ignoring the 
performance argument, RDRAM is still very expensive compared with 
SDRAM, therefore it is possible that P4 systems will initially be 
delivered with rather less memory than would be expected in a top-end 
system.

I feel we should also bear in mind that the people who read this list 
are probably the more enthusiastic users - those who run Prime95 as a 
background activity are less likely to put up with a memory hog than 
those of us who run number-crunchers which double as general-purpose 
PCs!


Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 10:26:53 +0100 (CET)
From: Martijn <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4 - a correction

I do not have a problem with prime95 / mprime taking any amount of memory, as 
long as thrashing is adequately dealt with. I am currently trying to get 
assignments as small as possible for memory reasons, with (from prime95 point 
of view) enough memory present 95% of the day, the other 5% being at "random" 
times. It does not harm if prime95 takes 100 MB but it would be nice if 
prime95 / mprime sleeps some time if a higher priority job needs the memory.
For instance if prime memory gets swapped out its process could sleep for 5-10 
seconds.

Kind Regards, Martijn
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 15:05:34 +0100
From: "Jean Flinois" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4 - a correction

Hello all, here are my 2 euros ...

Why don't you act with the memory as you act with the CPU ?
I mean : If the machine needs it, swap the info to disk, give the memory
back, and go survey the machine's need until you can grab the memory again.
It's not as real time as if can be done to steal the Cpu-idle cycles, but
should lead to a reasonable situation.

It would fit my usage of big machines, since I need all the memory sometime,
the same way I need all the CPU sometime, and that's why I bought that
amount of memory for it... but it gets to be idle very often and then you
can get all the memory if you want.

Rgds all,

Jean Flinois <[EMAIL PROTECTED]>
V-Technologies, Savennières
tél +33 (0)2 4172 1077


- ----- Message d'origine (truncated) -----
De : "George Woltman" <[EMAIL PROTECTED]>
À : <[EMAIL PROTECTED]>
Envoyé : mardi 28 novembre 2000 05:05
Objet : Mersenne: P4 - a correction



> A question for readers.  Prime95 currently uses about 8MB (exponent
> around 11 million).  How would you feel if the P4 optimized version
> used 13MB?   23MB?   33MB?
>
> I hate to code up 3 versions of the P4 code  (small, medium, large),
> but it might be necessary.
>
> Regards,
> George


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 10:25:31 -0600
From: Shane & Amy Sanford <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4 - a correction

>A question for readers.  Prime95 currently uses about 8MB (exponent
>around 11 million).  How would you feel if the P4 optimized version
>used 13MB?   23MB?   33MB?
>
>I hate to code up 3 versions of the P4 code  (small, medium, large),
>but it might be necessary.


Whether 33MB is important depends when on the P4 time line you are talking 
about.  IMO there are 3 distinct periods for the P4 in terms of  likely 
memory configurations.

1.  next 2 months:
During this period most systems will be sold to enthusiasts but often 
shipping with only 128MB.  33MB would likely be a problem memory 
wise.  However that is probably a mute point since the SSE2 rewrite & 
validation will not likely to be done by then....

2.  from 2 months to 6-9 months:
Either few systems will be sold & memory configurations remain on the tight 
side OR RDRAM prices will significantly drop so that a average system will 
have 256MB so 33MB wouldn't be a big factor.

3.  6-9 months from now:
In this final time frame the P4 will go through the 0.13 micron die shrink 
& will be equipped with DDR motherboards.  This is the point that the P4 
will likely become a processor for the masses & most systems will come 
equipped with 256MB+.

So 33MB shouldn't be a problem in the foreseeable future & until that point 
the Prime95 code will work although be it at a penality.

Shane


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 18:15:52 -0000
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4 - a correction

On 28 Nov 00, at 0:15, Jeramy Ross wrote:

> 33MB shouldn't be too unreasonable.  I, like Nathan, have 128MB and 70MB
> of that is set to be available in Prime95 and have not seen any loss of
> performance.

Well, you won't be _using_ 70 MBytes, except whilst running P-1 stage 
2 on a large exponent, or possibly if running ECM. What George is 
talking about is the memory used during "normal" LL testing (or 
double checking); the memory usage during P-1 stage 1 is similar, 
whilst that required during trial factoring is minimal.

FYI I ran P-1 stage 2 on an exponent around 40 million on a 128 MByte 
system running Windows 2000 Professional with Prime95 memory set to 
96 MBytes. I would not reccomend this setup; there was some paging 
evident, even with no other active applications. You _might_ get away 
with 96/128 on a lightly loaded system with a less greedy OS.


Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 18:15:52 -0000
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4 - a correction

On 28 Nov 00, at 10:26, Martijn wrote:

> I do not have a problem with prime95 / mprime taking any amount of memory,
> as long as thrashing is adequately dealt with. I am currently trying to
> get assignments as small as possible for memory reasons, with (from
> prime95 point of view) enough memory present 95% of the day, the other 5%
> being at "random" times. It does not harm if prime95 takes 100 MB but it
> would be nice if prime95 / mprime sleeps some time if a higher priority
> job needs the memory. For instance if prime memory gets swapped out its
> process could sleep for 5-10 seconds.

Nice idea - is this facility in any OS kernel? I'm pretty sure it 
isn't in windoze ... the point is that the application is unaware 
that it is running up page faults; the kernel doesn't bother to 
inform it. There could be a significant performance penalty involved 
in continuously reading the kernel memory usage table in order to 
determine whether memory stress is evident, as well as complications 
due to memory usage being apparently high due to the presence of 
large amounts of disposable I/O buffers, or the existence of pages 
written to the page file but not required due to the processes which 
own them sleeping.

The "usual" task switching mechanism is that, if a job gets CPU 
cycles, it gets access to memory too. The variant of this of which 
I'm aware was VAX VMS, where a process which couldn't get access to 
the CPU for an extended period (because other higher-priority 
processes were using all available cycles) and there was pressure on 
memory, the whole process would be swapped out (written to the 
page/swap file). When this happened, the process would not be 
eligible to get reloaded for some time - I seem to remember that 15 
secs was the default.

The problem with VMS was that, once swapping started, pressure on the 
I/O channels usually caused a more or less total loss of system 
performance. (It was not accidental that the VMS manual's primary 
reccomendation for _any_ system tuning problem was "add more 
memory"!)

One of the nicer things about the windoze implementation of mprime is 
the facility to manually stop & restart the program from the main 
menu. It gives up its main workspace whilst suspended in this way, 
though obviously retaining the relatively small resources belonging 
to its process header block, console window, task bar icon etc. 
Perhaps this could be exported to mprime by provision of a version to 
run as an X client? So long as I don't _have_ to run X in order to 
run mprime!!!


Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 10:50:41 -0800
From: "John R Pierce" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4 - a correction

> One of the nicer things about the windoze implementation of mprime is 
> the facility to manually stop & restart the program from the main 
> menu. It gives up its main workspace whilst suspended in this way, 
> though obviously retaining the relatively small resources belonging 
> to its process header block, console window, task bar icon etc. 
> Perhaps this could be exported to mprime by provision of a version to 
> run as an X client? So long as I don't _have_ to run X in order to 
> run mprime!!!

a few USR signals would suffice.  `killall -USR2 mprime` or something...

- -jrp


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Tue, 28 Nov 2000 20:52:01 +0100
From: Martijn <[EMAIL PROTECTED]>
Subject: Re: Mersenne: P4 - a correction

John R Pierce wrote:

> > One of the nicer things about the windoze implementation of mprime is
> > the facility to manually stop & restart the program from the main
> > menu. It gives up its main workspace whilst suspended in this way,
> > though obviously retaining the relatively small resources belonging
> > to its process header block, console window, task bar icon etc.
> > Perhaps this could be exported to mprime by provision of a version to
> > run as an X client? So long as I don't _have_ to run X in order to
> > run mprime!!!
>
> a few USR signals would suffice.  `killall -USR2 mprime` or something...
>
> -jrp
>

It is even easier, no coding needed to mprime (for Linux at least)!

killall -STOP mprime
and
killall -CONT mprime

parse /proc/stat,
watch the swap numbers go up, and if is too drastically kill mprime with the
- -STOP signal,
after some seconds, watch the numbers again, and if those numbers do not go
up to dramatically
kill mprime with the -CONT signal. (Write a tcl/tk frondend and do the same,
write a server script and do the same, ad infinitum)

Can something similar be done on the windows service?



_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

End of Mersenne Digest V1 #795
******************************

Reply via email to