RE: Mersenne: P4 - a correction

2000-11-28 Thread Willmore, David (VS Central)

George wrote:
  One correction to my previous post.  I said that the latency to
 access the L1 data cache was 2 clocks.  This is correct for integer
 instructions only.  For floating point and SSE2 instructions the latency
 is 6 clocks!  Interestingly, the L2 cache latency is 7 clocks for both
 integer and floating point instructions.
 
Look at the coupling that the FPU has to the cache for one reason.  I would
expect
that the FPU(s) have more ports on the L1 than that integer units do.  Also,
if you look
at the sensitivity of different types of code to load latency, integer code,
by far, is
more sensitive than floating point.  Think about the length of the floating
point
pipeline, it's pretty long to start with, so you're gonig to *have* to
unroll your code
to take advantage of the pipeline, so you might as well cover the additional
load to
use latency the same way.  With enough rename registers, it's all good. :)

Cheers,
David
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: experimental x86-64

2000-10-06 Thread Willmore, David (VS Central)

For those interested, I have a machine capable of running the simulator if
you need something tested or measured.  Contact me by personal email.

 -Original Message-
 From: Henk Stokhorst [SMTP:[EMAIL PROTECTED]]
 Sent: Friday, October 06, 2000 1:22 PM
 To:   [EMAIL PROTECTED]
 Subject:  Mersenne: experimental x86-64
 
 L.S.,
 
 Anyone already working on a 64 bits version of Prime for AMD's
 Hammerfamily?
 
 YotN,
 
 Henk Stokhorst
 
 http://www.x86-64.org/
 
 _
 Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
 Mersenne Prime FAQ  -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt



Mersenne: Athlons on DDR MBs

2000-08-11 Thread Willmore, David (VS Central)

Hello, all.

I'm curious if anyone has any preliminary performance figures for Athlon
processors on the new crop of DDR MBs.  If noone has any data, I guess I'll
have to go out and buy one. :)

Cheers,
David
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.exu.ilstu.edu/mersenne/faq-mers.txt



RE: Mersenne: Re: Mersenne Digest V1 #757

2000-07-13 Thread Willmore, David (VS Central)

All benchmarks say is that "It's good for *this* application".  I've heard
that it speeds up mersenne calculations on some MBs.  Now, that's nice, but
if it doesn't speed it up enough vs its cost.  That's the question.

I'm just going to hold my breath for DDR-SDRAM.  Cheaper, faster, better
supported. Dual Athlon/interleaved DDR memory.  Hold on to your hats

 -Original Message-
 From: [EMAIL PROTECTED] [SMTP:[EMAIL PROTECTED]]
 Sent: Wednesday, July 12, 2000 11:57 PM
 To:   [EMAIL PROTECTED]
 Subject:  Mersenne: Re: Mersenne Digest V1 #757
 
  Direct RDRAM 
 
 HOLY WAREveryone knows (rather, should know) that RDRAM memory provides
 a 
 minor speed boost compared to SDRAM, and the much higher cost of RDRAM is 
 completely unjustified.  I've heard that Dell is switching back to SDRAM
 in 
 its computers now, which is a Good Thing(TM)./HOLY WAR
 
 :-D
 
 Stephan Lavavej
 _
 Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
 Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: What's happening?

2000-06-09 Thread Willmore, David (VS Central)

I only DC with my Alphas--heck, I only have *two* machines (out of 8) doing
first time LL. :)  Don't get me wrong, I'd like to find a new mersenne
prime, but I feel the DCs are underrated in importance.  And since this
project is 'vote with your CPU', I am. :)

Thanks for the suggestion.  Ernst said the same thing.

 -Original Message-
 From: Nathan Russell [SMTP:[EMAIL PROTECTED]]
 Sent: Thursday, June 08, 2000 4:29 PM
 To:   [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject:  RE: Mersenne: What's happening?
 
 You can email George about testing one of the exponents on his manual 
 testing page (www.mersenne.org/range2.htm).  You should be aware that
 those 
 exponents are rather larger than those currently being given by PrimeNet.
 
 This means a longer runtime (by about 8-10 percent) and reduced chance of 
 finding a prime.
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: What's happening?

2000-06-09 Thread Willmore, David (VS Central)

If we only had some great P-1 code for the Alpha... Ernst? :)

 -Original Message-
 From: Stefan Struiker [SMTP:[EMAIL PROTECTED]]
 Sent: Friday, June 09, 2000 1:43 PM
 To:   Willmore, David (VS Central)
 Cc:   'Nathan Russell'; [EMAIL PROTECTED]
 Subject:  Re: Mersenne: What's happening?
 Verily DCing is the place to be with a mean machine, with fast turnaround
 to
 spice up doldrum summer daze, and -- yes! -- a P-1 result 67 bits deep
 just when all hope seems gone.  Love it!
 
 Best Regards,
 Stefanovic
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: What's happening?

2000-06-08 Thread Willmore, David (VS Central)

Yeah, the manual pages are apparently offline.  Will they be back online
soon?  I have a machine about to 'go dry' (500MHZ Alpha EV6).

Help.

 -Original Message-
 From: Barry Stokes [SMTP:[EMAIL PROTECTED]]
 Sent: Tuesday, June 06, 2000 4:03 PM
 To:   [EMAIL PROTECTED]
 Subject:  Mersenne: What's happening?
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 
 http://mersenne.org/primenet/ftops.txt contains this:
 
 
 Mersenne PrimeNet Server 4.0 (Build 4.0.031)
 Top Factoring Producers Report 06 Jun 2000 20:04 (Jun  6 2000  1:04PM
 Pacific)
 
 This report is updated every 60 minutes
 
 Rank   Account ID  LL P90*  Exponents  Fact.P90  Exponents   P90
 CPU
CPU yrs  LL Tested  CPU yrs*  w/ Factor  
 hrs/day
 - -  --  ---  -    - 
 - 
 DB-Library: DBPROCESS is dead or not enabled.
 DB-Library: DBPROCESS is dead or not enabled.
 DB-Library: DBPROCESS is dead or not enabled.
 
 
 I assume that this is a problem? I also cannot access my personal
 stats. Anyone else experiencing this?
 
 -BEGIN PGP SIGNATURE-
 Version: PGP 6.0.2i
 
 iQA/AwUBOT1m+TmWaVGyUt3iEQJyPQCeJV5nDD7jOCpHz6LazGL1qyON1iAAnRqF
 TuH56235MdxtMVKAJG42Is0i
 =GSJY
 -END PGP SIGNATURE-
 
 
 _
 Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
 Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: The recent popularity of Factoring

2000-03-27 Thread Willmore, David

Brian J. Beesley wrote:
 A more general  more secure method of preventing the type of problem 
 exposed by this incident would be to have the server enforce a quota 
 for the maximum number of assignments issued to any user/computer id 
 combo in a particular time interval e.g. 20 per day. Yes, this could 
 still be got round by anyone determined to cause mischief by changing 
 the computer id and grabbing another bunch of assignments, but it 
 would be effective against accidents.
 
I would hope there would be a way to bypass such a limit as I have a dual
PII/333 system where one processor does first time LL and the other does
factoring--I was experiencing severe slowdown with two LL running.  I don't
connect to the net all that often with that machine, so I have a Work to Get
value of 90 days.  For some reason, I'm not allowed to get more than about a
month of factoring work.  It's starting to cut into my efficiency.  Is there
a way around this?

Cheers,
David
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: The recent popularity of Factoring

2000-03-27 Thread Willmore, David

  From: Willmore, David [mailto:[EMAIL PROTECTED]]
  value of 90 days.  For some reason, I'm not allowed to get more than
 about
 a
  month of factoring work.  It's starting to cut into my efficiency.  Is
 there
  a way around this?
 
Paul Leyland wrote:
 Try joining the people at the bleeding edge.  I've been factoring an
 exponent just below 60M.  It's taking several weeks on a PII-366.  Ken
 Kriesel is co-ordinating these folk.
 
Wonderful Idea(tm)!  I hadn't thought of that.  Funny, considering I did
some factoring as part of the v19 QA effort.  Should have though of it

Ken!  Sign me up! :)  I'd be glad to leave these smaller numbers to the
486s.

Cheers,
David
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: (no subject)

2000-03-24 Thread Willmore, David

 There's previously been several posts discussing the performance penalty
 one suffers when running multiple LL tests on a multiprocessor system
 with a single shared system bus. It would be interesting to see whether
 this
 penalty could be alleviated in a reasonably cost-effective fashion through
 use of larger L2 caches.
 
This is a fundimental problem in processor/system design.  The quick answer
is no.

 Here's the idea: if the L2 caches are much smaller than the dataset size,
 the system bus will be heavily used by each process, leading to memory
 delays. If each processor has an L2 cache large enough to hold the full
 dataset (between 4 and 5MB for an LL test of an exponent ~10M), there
 will be essentially no memory traffic and no shared-bus penalty. Such
 large caches are expensive. But it seems that between these two extremes
 (very small and very large caches) there should be some kind of optimal
 compromise, where the cache size is just large enough that the resulting
 reduction in memory traffic allows each process to access the main memory
 at basically the same speed it would if there were no other jobs competing
 for
 bus bandwidth, i.e. where the bus traffic just begins to saturate.
 
The loading on the memory bus by the other processor may only use half of
the available bandwidth (for a 2 CPU system) but the latency due to bus
contention will decrease performance, still.  The curve of performance loss
vs foreign bus utilization is not very sharp.  With modern processors
supporting many outstanding loads from main memory, it's not getting any
sharper.

 I suspect for LL tests in the ~10M range, this happy medium may be as
 'small' as 1-2MB. Are PC systems with L2 caches in this size range
 available? If so, how much of a premium does one pay for the extra cache?
 
The cheap answer these days is faster shared bus.  Next to that is switched
memory fabric (throw out the shared bus idea).  Only after that is
increasing the caches on package or on chip.  External caches are getting
less and less popular.  For reasons similar to what I've mentioned above.
The fundimental problem of caches (and layers of them) is that smaller and
'closer' (in terms of latency) competes with larger and 'farther'.  Off chip
and off package is just *so* far away these days that it doesn't buy you
much unless you put a ton of fast memory there.  At that point, you have to
worry that it's just using bandwidth that could be data from main memory.
So, you put it on a dedicated bus, but now you have to ask 'would it have
just been better to make the pipe to main memory wider with these pins?'

*stops for breath*

Okay, the quick answer if that, for PC hardware, the Intel Xenon systems are
the only ones with bigger than 512K of chip caches.  Those guys only run in
expensive motherboards.  So, the simple solution is to not run on SMP
systems where you run into this problem.  The cheapest way is to use
uni-processor systems with a reasonable memory architecture.  

But, for those of us with SMP systems--I have one because it was cheap and
I've always wanted one--we just have to keep in mind what kind of demands
our applications put on the shared memory bus.  My dual PII/333 system with
a 66MHz 64bit shared bus does *not* like to run two coppies of LL testing at
any one time.  Once LL and one factoring is just fine, though--as factoring
stayes on CPU.

That's just something that I need to live with.  As numbers get bigger, that
dependency on main memory BW will get worse and worse as the cache becomes
less and less effective.  At some point, I'll probably retire the machine to
some less memory BW demanding task--like building kernels or (may I die
before this happens) RC5 cracking.

Cheers,
David
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: L2 Cache size

2000-03-24 Thread Willmore, David

 And dont the K6-III and Athlon support an L3 design, using slower memory
 of
 course, but dedicated to each CPU so eliminating bus contention?  Of
 course,
 the K6-III doesn't do SMP, but the Athlon supports it, doesn't it?  Are
 there any SMP motherboards out there yet for the Athlon?
 
The K6's and the Cyrix 6x86's could do SMP just like the pentium.  The only
problem was that they used the OpenPIC standard and noone ever built a
chipset that implemented that, so no SMP systems.

The Athlon not only supports SMP, but it does it the, IMHO, Right Way(tm).
They use a point to point bus between the processor and the core logic.
Hence, a SMP Athlon system has no shared bus.  Each processor gets a pipe to
the core logic and it has however many pipes to memory/IO that it wants.
This is what you have to do to scale SMP very high, anyway.  I think the
Intel chips force this for #CPUs4, too.  Maybe it's 2--I can't remember
right now.

Cheers,
David
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Re: Quiet list?

1999-11-24 Thread Willmore, David

No, no, no, you guys have it all wrong, you take a pencil and a big sheet of
paper and write:
(brute force spoiler)

 power 1 3-1 -2

9-3-6
   -3 1 2
   -6 2 4

 power2 9 -6   -11 4   4

   81-54   -9936  36
 -543666 -24-24
   -9966 121-44-44
  36 -24-44 1616
  36-24-441616

 power4   81   -108  -162   204 145   -136-7232
16

 729   -972 -1458  18361305  -1224   -648   288
144
   -486   648   972   -1224   -870816   432
-192-96
 -891  11881782  -2244  -1595  1496
792   -352  -176
324-432   -648816   580
-544   -288   128   64
324   -432   -648   816
580   -544  -288  128  64

  power   6  729  -1458 -1701  43201755  -5418  -1259  3612
780  -1280  -336  192  64

even exp sum  32

You may need to widen your windows for this one. :)

Cheers,
David
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Re: Quiet list?

1999-11-22 Thread Willmore, David

Is it 584, and if so, what's so intuitive? :)

 -Original Message-
 From: Steinar H. Gunderson [SMTP:[EMAIL PROTECTED]]
 Sent: Saturday, November 20, 1999 6:32 AM
 To:   [EMAIL PROTECTED]
 Subject:  Mersenne: Re: Quiet list?
 
 On Fri, Nov 19, 1999 at 08:27:12PM -0800, Spike Jones wrote:
 The total number of posts
 I receive each day is far more predictable than the number
 of posts on GIMPS.  Predictably.
 
 Funnily enough, when I receive nothing from GIMPS, I receive nothing
 from other people either! (No, this is not a mail burp. I still receive
 _some_ mail.)
 
 To get you something to do, here's an interesting problem. The beaty
 if it, is that you could do it by brute-force, or you could simply
 look at it, or see a solution and solve it in under a minute. Pen and
 paper allowed only -- no calculators, computers etc.
 
 First, perhaps I should explain some notation :-) a_11 is the letter `a',
 followed by 11 in subscript. x^2 is the letter `x', followed by the letter
 2 in superscript (ie. `x^2' would be mathematically the same as `x*x').
 OK, here goes:
 
 If (3x^2 - x - 2)^6 = (a_12)x^12 + (a_11)x^11 + ... + (a_1)x + a_0, what
 is a_0 + a_2 + a_4 + ... + a_12?
 
 The answer is an integer from 0 to 999, inclusive.
 
 (This is just one of the problems from something called the `Abel
 contest', a voluntary contest in maths open to all pupils (21, but in
 general nobody under 18 enters) in Norway. They have a tradition of
 making problems requiring very little actual mathematical knowledge
 (generally if you know (a+b)^2 = a^2 + 2ab + b^2 and Pythagoras, you
 have 90% of what you need), but more requiring the right way of thinking.
 Also, only pen and paper is used, to prevent some ways of solving the
 problems... This question was number 9 (of 10), from the second round,
 where the contestants (the 10% best from round one) are given 100
 minutes to try to solve the 10 questions.)
 
 Hope most of you see the quick solution :-) Please don't post the answer
 to the list quite yet, give people their time...
 
 /* Steinar */
 -- 
 Homepage: http://members.xoom.com/sneeze/
 _
 Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
 Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers

_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: What is Windows doing?

1999-10-04 Thread Willmore, David

One more idea--virus scanners.
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Re: Timing(?) errors

1999-09-21 Thread Willmore, David

 On Mon, Sep 20, 1999 at 05:45:17PM -0500, Willmore, David wrote:
 Since it's a cache reading problem there's no real way to 'flush' it.
 Normally, that means to write back dirty data to whatever backing store
 exists, not 'invalidate everything'.  Even if you did, it would't solve
 the
 problem.
 
 How does swap space come into this? Linux isn't forced to swap the data
 in exactly where it used to be, is it?
 
Correct, it does not.  Normally, though, when you're swapping, proper L2
cache coloring is the least of your performance problems.

 Yes, it would probably be easier in Linux, but it might not do you any
 good.
 
 Perhaps I could do a manual restart if it was a problem? (I can thing of
 several crazy ways to do this... Perhaps fill the all the buffers with 
 some random number, and find them in /proc/kcore? ;-) )
 
A syscall that walked the page tables to find the translation for an address
would probably the easiest.

Cheers,
David
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Re: Re: Timing(?) errors

1999-09-21 Thread Willmore, David

Random chance.  I wouldn't count on it.

 -Original Message-
 From: Steinar H. Gunderson [SMTP:[EMAIL PROTECTED]]
 Sent: Tuesday, September 21, 1999 4:12 PM
 To:   [EMAIL PROTECTED]
 Subject:  Mersenne: Re: Re: Timing(?) errors
 
 On Tue, Sep 21, 1999 at 02:03:54PM -0500, Willmore, David wrote:
 Correct, it does not.  Normally, though, when you're swapping, proper L2
 cache coloring is the least of your performance problems.
 
 Yes, but if you _force_ swap-out-swap-in, like ReCache does?
 
 /* Steinar */
 -- 
 Homepage: http://members.xoom.com/sneeze/
 _
 Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
 Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: K7 vs. x86

1999-08-23 Thread Willmore, David

 From: Brian J. Beesley [SMTP:[EMAIL PROTECTED]]
 This is _still_ remarkable, since the "consumer" Athlons starting to 
 trickle onto the market have 64-bit 100 MHz FSB and 512KB L2 cache, 
 like PII / PIII / Xeon, but run their L2 cache at only 1/3 clock 
 speed (c.f. full clock speed for Xeon  1/2 clock speed for PII/PIII)
 
Well, they do run the point-to-point bus clock at 100MHz, but they
send/receive data on each clock transition, so it's 200M-transfers/sec.
And, I thought the production Athlons were using 1/2 speed cache--it was
only the pre-production (evaluation) processors which were using the 1/3
speed.  Of course, I could easily be wrong.

 The high performance Athlons with 128 bit FSB @ 200 MHz  larger, 
 relatively faster L2 cache should be really impressive. Maybe 
 starting to approach what Alpha has been doing for a while ;-) (The 
 critical difference here is that Athlon does run native IA32 code!)
 
Yes, 400 M-transfers/second at 16 bytes each is a nice 6.4GB/s. :)  Run the
8M L2 at 1:1 with the processor at 800MHz and get twice that. :)  I can't
find a line in the document stating the width of the L2 bus, but I would be
suprised if it's  128 bits.  256 for a server version would be nice. 12.8
GB/s to 25.6 GB/s, geezz

  If you were pipelining fmuls, the Athlon could spit them out in 1 clock
  cycle (after the 4 cycle latency), compared to 2 cycles (after the 5
 cycle
  latency) on the PIII, so it would be REAL important to get lots of yummy
  pipelined fmuls to the Athlon to really let it strut it's stuff.
 
 Am I missing something here? I thought that the _throughput_ was 1 
 FMUL per clock, but there was a 4 clock period between the 
 instruction entering the execution unit (meaning that it has already 
 had to be prefetched, decoded and the operands made available) and 
 the result of the operation becoming available. So, provided there is 
 no delay in fetching instructions, there is capacity in the decoders 
 and there is no stall due to operands being unavailable, you _should_ 
 get a throughput of 1 FMUL per clock (assuming that you aren't also 
 scheduling other instructions which block the multiplier execution 
 unit, or use its pipeline).
 
I believe what he's saying is that there is a bit of 'granularity' to the
pipe in the PII.  It sounds like a FMUL and only enter the pipe every other
cycle--to emerge five cycles later.  If so, that's what used to be called
'superpipelined'.  Hmmm, no, that would be backwards.  Maybe it's
'subpipelined'.  Think of it as a 2.5 cycle long pipe running at half core
speed.  This can result from a not fully pipelined stage in the middle of
the pipe.  Say single precision goes:

stageA, stageB, stageC,

but double precision goes:

stageA, stageB, stageB, stageC, stageC

That way, stage B is used for two successive cycles by the same operand.
This will force stage A to stall waiting for the following stage to clear.
This isn't all that unlikely when normally single precision stages are
available.

Cheers,
David


_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Merced (was Re: Mersenne: Re: Alpha DS20 timings.)

1999-08-20 Thread Willmore, David

 IA64 is really VLIW (very long instruction word), which is quite different
 than traditional sequential RISC.  It requires the compiler to do a LOT of
 massively parallel pipeline scheduling to achieve optimal results.  HP has
 a
 leg up on this compiler technology as IA64 is based on their existing
 PA-RISC, and is sharing there compiler backend optimzation technology with
 Intel and Microsoft.
 
Intel has some experience with some degree of parallelism dating back to the
i860--which had visible pipelines. 

 Having once programmed a VLIW machine in 'assembler', I would not wish
 that
 task on ANYONE.  The machine I worked on had 8 parallel asymetrical
 execution units, and a 288 bit wide opcode which launched 8 parallel
 different instructions in every cycle.   The assembler (micro?) coder had
 to
 keep track of which parts of what execution unit would take how long to do
 each instruction, and not rely on results before they were ready.  To keep
 the machine actually humming along at even close to half its theoretical
 performance levels bordered on nightmarish.
 
Yes, it's not fun at all.  I've programmed on the new TI VLIW DSPs and
they're certainly a trip.  The things that make this more practical are that
IA64 doesn't have exposed pipelines--*you* don't have to code in pipeline
delays to ensure correct behavior, just decent performance.

You are quite right, though, getting anywhere near even half of theoretical
performance on general purpose VLIW machines for general purpose code is a
monstrous task.  DSP is a bit easier as the class of alg. have more self
similarity.  *But*  this is where we're on the good side of this debate.  An
FFT is one of those types of alg.--it has been extensively researched WRT
implementations like this.  Intel, I believe, will be providing some
'machine speed' FFT code--if the recent press release is to be believed.
Maybe LL testing code for IA64 will be easier? :)

Cheers,
David
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Merced (was Re: Mersenne: Re: Alpha DS20 timings.)

1999-08-19 Thread Willmore, David

 From: Simon Burge [SMTP:[EMAIL PROTECTED]]
 From what I understand of Merced, compiler technology is going to be the
 problem.  It's probably not unreasonable to expect large performance
 increases as the intelligence of compilers (especially the "free"
 compilers like gcc and egcs) catches up to the theoretical performance
 of the CPU.
 
Assembly! :)
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Alpha DS20 timings.

1999-08-18 Thread Willmore, David

We ran the DC for M38 on a DS20/500.  Using Ernst's code we were getting .18
s/i, but I don't remember the FFT size.   I want to say 384K?  Ernst?

Cheers,
David
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: The sound of number searching

1999-07-22 Thread Willmore, David

All,

There are several things in a computer which will have their operational
parameters vary with CPU activity.  The most likely ones are: audio
subsystem receiving noise coupled either directly (magnetically) into its
signal lines or via its power feed; or the load on the power regulation unit
itself.  Yes, coils do can make a buzzing sound.  The name of the phenomina
is 'magnetorestrictance' (who wants to be I spelled that one wrong?) which
is the property of a material to change its physical dimensions under
varying magnetic fields--it's the reason large power transformers 'hum' and
how most high power sonar units generate their signals.

If I remember correctly, this is an IFAQ, but could go in the FAQ so that
the-word-which-will-go-unmentioned can get spelled correctly once and for
all.

Cheers,
David
_
Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ  -- http://www.tasam.com/~lrwiman/FAQ-mers



RE: Mersenne: Re: Self-test (was: Prime 95 Error Messages/ Misc)

1999-06-09 Thread Willmore, David

For a few years.  Most PII class MBs have them and some late Socket7 ones
do, too.  They're not standard as far as how they interface both at the
hardware and software levels.  You need MB specific drivers.  Linux has a
project called lm_sensors (named after one of the early chips used to
perform this function, the National LM78).

Cheers,
David

 --
 From: David L. Nicol[SMTP:[EMAIL PROTECTED]]
 Sent: Wednesday, June 09, 1999 11:51 AM
 To:   Pierre Abbat; Yngvwe Mersenne
 Subject:  Re: Mersenne: Re: Self-test (was: Prime 95 Error Messages/
 Misc)
 
 Pierre Abbat wrote:
  
   Most modern motherboards contain case and/or CPU temperature
   sensors which can be read by software.
  
  Is there a file in /proc that will tell me this?
  
  phma
 
 This is the first I've heard of such sensors being a standard
 item.  How long have they been a standard item?
 
 
 
   David Nicol 816.235.1187 UMKC Network Operations [EMAIL PROTECTED]
 "The radix is always 10." -- Paul Leyland
 
 Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
 

Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



RE: Mersenne: EFF and 10,000,000 digits

1999-06-09 Thread Willmore, David

1) see my other email.  Yes, they did.

2) Yes, it was Compaq.  Intel bought foundry technology that is used on the
StrongARM as well as the StrongARM archetecture itself.

Cheers,
David

 --
 From: Joth Tupper[SMTP:[EMAIL PROTECTED]]
 Sent: Wednesday, June 09, 1999 1:13 PM
 To:   GIMPS
 Subject:  Re: Mersenne: EFF and 10,000,000 digits
 
 Two things:
 
 1) I do seem to recall a 1GHz Alpha announcement.
 
 2) Was it Intel that bought the Alpha rights?  It might have been IBM but
 was NOT Compac.
 
 Joth
 
 - Original Message -
 From: David L. Nicol [EMAIL PROTECTED]
 To: Aaron Blosser [EMAIL PROTECTED]
 Cc: Mersenne@Base. Com [EMAIL PROTECTED]
 Sent: Wednesday, June 09, 1999 9:28 AM
 Subject: Re: Mersenne: EFF and 10,000,000 digits
 
 
  Aaron Blosser wrote:
  
I have heard some insider news that Intel *could* hit the 1
 GigaHertz
 mark
by years end if they had a reason to
 
  Did DEC not demonstrate a gigahertz Alpha chip shortly before Compaq
  purchased them?
 
  
David Nicol 816.235.1187 UMKC Network Operations [EMAIL PROTECTED]
"unpersuasive and dubious"
  
  Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
 
 
 
 Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm
 

Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm