On 03/05/2014 10:55 AM, Douglas Eadline wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/03/14 13:52, Joe Landman wrote:
The one everyone thinks about first is power, but the other one she
touched on was reliability and uptime.

Indeed, the fact that these issues were not even mentioned
means to me the project is not very well thought out.
At exascale (using current tech) failure recovery must be built
into any design, either software and/or hardware.

Although, as stated, I think this guy's full of it, he did touch on both of them in the call. Very high level (fuel cells for power, modular enclosures for failure tolerance, etc), but he they were "mentioned." Some supposed NDA with Intel prevented him from talking about a chunk of it.

Also, failure may or may not be a huge issue for the application described (genetic algoritm-based HFT applications). Not tightly coupled. Lose a node (or a rack) and you just lose some fraction of your population. GA's (like real biology) are by design intended to deal with losses like that.

Making HPL run through successfully is a whole 'nother beast.

Best,

ellis

--
Ph.D. Candidate
Department of Computer Science and Engineering
The Pennsylvania State University
www.ellisv3.com
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to