On 08/03/16 00:32, John Hearns wrote: > Us old style guys are going to have our lunch money stolen by young > upstarts. Or is that startups?
This presumes that everyone is going to be running massive clusters at huge scale with completely new codes. That might be true for a few large labs, but I suspect a lot of other sites are going to be running older, smaller systems with existing codes that will never get completely rewritten and someone will have to keep them running. > Seriously - these guys know how to keep things running at scale and how > to tolerate failures. As I mentioned in another thread the Slurm folks are already working on that issue through their nonstop plugin which is intended to let jobs bargain with the scheduler on how to react to failure. http://slurm.schedmd.com/nonstop.html Of course the user codes have to know what to do when something breaks too (and I don't mean SEGV)... All the best, Chris -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: [email protected] Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
