> had some of all of this. The cool thing about IaaS providers is they > can (not always) get a hold of more recent kit, and if you are smart > about how you use it you can see huge benefits from all of this, even > down to simple changes to CPU spec at a simple level, to much larger > wins if you fix all of the bad/old/shabby/poor stuff.
color me skeptical. I don't see why a computing service provider has any greater access to new kit than the service user (except in the small-user niche.) unless there's some externality that gives them, say, a tax advantage, and allows them to depreciate HW at a higher rate. if Amazon installs a new haswell/80GIB/Phi cluster just because I ask for it, they're going to charge me the cost + operating + margin. sure, they can pool it other users if my duty cycle is low, but in a macroecon sense, the only thing that's happening by renting rather than owning is that they make a profit. > PUE is starting to be a big issue. We built the MGHPCC as both a > responsible response to sustainable energy costs (hydro), and the fact > that on campus machine rooms were heading towards the 3.0 PUE scale, > which is just plain irresponsible. forget the moral outrage: it's just incompetent. AFAIKT, institutional IT people cultivate an inscrutable BOFHishness (in the guise of security or reliability or some other laudable metric.) I always say that HPC is an entirely different culture than institutional IT. that's certainly the case on my campus, where the central IT organization is all excited about embarking on a "5 year" >$50M ERP project that will, of course, _ do_everything_, and do it integrated. they have the waterfall diagrams to prove it. I consider the web2.0 world to be part of the HPC culture, since companies like Google/FB/etc are set up to avoid BOFH-think and readily embrace traditional research/hacker virtues like modularity/iteration/experimentation. > software. Smart software costs a whole lot less than fixed > infrastructure. that seems to be the central claim, for which I see no justification. any HPC organization does devops, whether they call it that or not (just as HPC is normally PaaS, though not often by that name.) why do you think "smart" software somehow saves money in such a way that a large, compute-intensive organization couldn't do the same but by buying rather than renting? > Re: "special sauce" - there is a fair amount of it at scale. I've seen > inside the source for what Cycle does, and have also run 25K+ proc > environments, we all need some sauce to make any of this large scale > stuff even slightly palatable! color me skeptical again. admittedly my organization's largest cluster is ~8500 cores, but such clusters are pretty trivial, devops-wise. I doubt that some crazy nonlinearity kicks in between 8k and 25k. >> the one place where cloud/utility outsourcing makes the most sense is at >> small scale. if you don't have enough work to keep tens of racks busy, >> then there are some scaling and granularity effects. you probably can't >> hire 3% of a sysadmin, and some of your nodes will be idle at times... > > Agree - I've seen significant win at both the small end and the large > end of things. Utility being a phrase that identifies how one can > turn on and off resource at the drop of a hat. If you are running but that's totally ho-hum. it takes less than one hat-fall-time to ssh to our clusters and submit a job. if we provisioned VMs, rather than ran resource-dedicated jobs on shared machines (IaaS vs PaaS), there would be some modestly larger latency for starting the job. (I guess 10s to boot a VM rather than 10ms to start a job.) > nodes 100% 365d/y on prem is still a serious win. If you have the odd afaikt, you're merely saying "users with sparse duty cycle benefit from pooling load with other, similar users". that's true, but doesn't help me understand why pooling should be able to justify both Amazon's margin and *yours*. sorry if that's too pointed. I'm guessing the answer is that your market is mainly people who are too self-important (anything health- or finance-related) to do it themselves (or join a coop/consortium), or else people just getting their toes wet. > my, if I'd have used that for a whole year we would have gone broke, > but it is totally awesome to use for a few hours, and then go park it again, you're merely making the pitch for low-duty-cycle-pooling. to me, this doesn't really explain how multiple layers of profit margin will be supported... >> I'm a little surprised there aren't more cloud cooperatives, where smaller >> companies pool their resources to form a non-profit entity to get past these >> dis-economies of very small scale. fundamentally, I think it's just that >> almost anyone thrust into a management position is phobic about risk. >> I certainly see that in the organization where I work (essentialy an >> academic HPC coop.) > > You guys proved this out: > > http://webdocs.cs.ualberta.ca/~jonathan/PREVIOUS/Grad/Papers/ciss.pdf I've still got the t-shirt from CISS. we went along with it primarily for PR reasons, since the project's axe-to-grind was pushing the cycle-stealing grid concept. you'll notice hardly anyone talks about grid anymore, although it's really a Something-aaS (usually P in those days, since hardware VM wasn't around much.) Grid people seemed to love tilting at the geograpic/latency windmill, which I think we all know now was ah, quixotic. >> people really like EC2. that's great. but they shouldn't be deluded into >> thinking it's efficient: Amazon is making a KILLING on EC2. > > It's as efficient as "the bench" shows you, no more no less. you're talking about a single purchasing choice; I'm trying to understand the market/macro. if you're suggesting that running on a VMed cluster is more efficient than a bare-metal cluster (with comparably competent ops), well, let me just call that paradoxical. >>> systems than some of us have internally, are we are starting to see >>> overhead issues of vanish due to massive scale, certainly at cost? I know >> >> >> eh? numbers please. I see significant overheads only on quite small >> systems. > > You are right numbers were missing. Here's an example of a recent Cycle run: > > http://blog.cyclecomputing.com/2013/02/built-to-scale-10600-instance-cyclecloud-cluster-39-core-years-of-science-4362.html Amazon's spot price for m1.xlarge instances is indeed surprisingly low. I guess my main question is: once you succeed in driving more big pharma work into the EC2 spot market, wouldn't you expect the spot prices to approach on-demand prices? m1.xlarge instances sound like they are between half and a quarter of a modern 2s 16c E5 server, which cost $4-5k. that makes your sqft number and price numbers off by a factor of two (so perhaps you credit the instances with more speed - I've never tried to measure them.) > So ignoring any obvious self promotion here, 40yrs of compute for > $4.3K is a pretty awesome number, least to me. Probably compares with your spot instances were around $0.052/hour, and if you had bought servers at list price, you'd pay about .046 (that's assuming you run them for 5 years, but *not* counting operating costs.) > some academic rates, at least it is getting close, comparing a single > server: > > http://www.rhpcs.mcmaster.ca/current-rates heh, sure. I'm not really talking about small-scale hosting, though. > I do have to say - I absolutely LOVE that you guys put in real FTE > support for these servers - this is a very cool idea, I never did > implement charge back in my old gig, what you guys are doing here is > awesome! actually, RHPCS is providing something more like traditional IT support there - my org, Sharcnet does unlimited free research PaaS HPC (as does the rest of ComputeCanada). ComputeCanada divisions have made some attempts at cost recovery in the past 10ish years, and pretty much failed each time... > My point was more around the crazy "elastic scale" one can now pull > off with the appropriate software and engineering available from top > tier IaaS and PaaS providers. puzzled. is it any different from logging into a large cluster and submitting a 40k cpu job? (of course, in the example you gave, it was really 10k 4-cpu jobs, which is certainly a lot less crazy...) regards, mark hahn. _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
