Richard Walsh wrote:
> Here you are arguing for an ASIC for each typical HPC kernel ... ala > the GRAPE processor. I will buy that ... but > a commodity multi-core, CPU is not HPC-special-purpose or low power > compared to an FPGA. FPGA power is good, several Watts in most cases. When you don't have to power extra cruft things are good. Latest quad core from AMD/Intel are in the 20W/core region (30 for the current Intel, 20 for the new gen). It would not surprise me to see this get to 10W/core and below. >> purpose CPUs (PowerPCs), DSPs (ADSP21020), and FPGAs for some signal >> processing applications. At that time, the DSP could do the FFTs, >> etc, for the least joules and least time. Since then, however, the >> FPGAs have pulled ahead, at least for spaceflight applications. But >> that's not because of architectural superiority in a given process.. >> it's that the FPGAs are benefiting from improvements in process >> (higher density) and nobody is designing space qualified DSPs using >> those processes (so they are stuck with the old processes). > Better process is good, but I think I hear you arguing for > HPC-specific ASICs again like the GRAPE ... if they > can be made cheaply, then you are right ... take the bit stream from > the FPGA CFD code I have written and tuned, and > produce 1000 ASICs for my special purpose CFD-only cluster. I can This sounds like D.E.Shaw's work (though I think they are doing it in FPGA) > run it at higher clock rates, but I may need a > new chip every time I change my code. You need a new bitfile everytime you change FPGAs or FPGA boards. This means that FPGA bitfiles are largely immobile. Of course the process to change the bitfile is a rebuild and ... >> Heck, the latest SPARC V8 core from ESA (LEON 3) is often implemented >> in an FPGA, although there are a couple of space qualified ASIC >> implementations (from Atmel and Aeroflex). >> >> In a high volume consumer application, where cost is everything, the >> ASIC is always going to win over the FPGA. For more specialized >> scientific computing, the trade is a bit more even ... But even so, >> the beowulf concept of combining large numbers of commodity computers >> leverages the consumer volume for the specialized application, giving >> up some theoretical performance in exchange for dollars. > Right, otherwise we would all be using our own version of GRAPE, Some things can be specialized and made fast. GPUs. > but we are all looking for "New, New Thing" > ... a new price-performance regime to take us up to the next level. > Is it going to be FPGAs, GPGPUs, commodity > multi-core, PIM, or novel 80-processor Intel chips. I think we are > in for a period of extend HPC market > fragmentation, but in any case I think two features of FPGA I am not convinced it is going to be fragmented for long. Take everything more expensive than $5000US and call it DOA unless it can easily drop right in and hit 10-100x node performance. Node pricing is dropping rapidly. A 5+ TF cluster quoted several months ago using previous generation technology came in around a few million $. One quoted recently came in well under $1M. > processing, the programmable core and data flow > programming model have intrinsic/theoretical appeal. These forces > may be completely overwhelmed by other > forces in the market place of course ... Unless GPUs just won't work, they may be a safe bet as one of the emerging winners. Cell should be in there as well. We demo'ed a little FPGA board (disclosure: we work with the company that builds it, and we do sell it) that attached to a USB2 port, that ran HMMer faster than an 8 core cluster. The cost and power difference is huge there, but hopefully we will be able to run p7Viterbi fast on GPUs. Then economies of scale may be able to drive some of this into motherboards, though most MB makers are reluctant to add anything that increases the cost of their product. Even if it is better and makes their product stand out. Graphics cards are in *everything* so you should pretty much expect them to be one of the winners, if they can get the codes to run on them. Cell-BE's are going into millions of PS3s, and I while it might be a stretch, it is possible that some places may deploy clusters of these (PSC deploying a PS3 cluster? :) ). What is pretty clear right now is that anyone with an excessively high price per unit or per SDK, is pretty much knocking themselves out of the market. Anyone who cannot build and create large volumes of these things is pretty much in trouble in this space. The other thing that is pretty clear is that as the multi-cores go even more multi, chips that hyperspecialize in one area may become marginalized. There is some data I am not sure if I can talk about, so I'll talk about the other data that I can. The Intel quad core units can do something like 35 GF/socket (rough calc, I am sure some Intel person can correct me, so please do). This is good, though it puts pressure on the hyperspecialized chips. Joe > > Regards, > > rbw > > -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: [EMAIL PROTECTED] web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
