Hi everyone,

thanks for the interesting discussion so far. From my personal point of view, I'd fully agree with the computational burst based argument. If a robust pipeline needs to scale for a short amount of time and local HPC resources are blocked, the cloud is an essential resource.

However, with projects like [1] or [2] I don't buy into the argument that using HPC is forbidding due to reproducibility of scientific results. I know that many HPC installations are very conservative when it comes to containerized execution (like in the cloud) and have a long lag of implementing modern technologies, but containerized execution for the sake of having a fixed set of dependencies can also be considered as a lack of software quality. For me, this in turn is as a result of our academic system of incentives, i.e. published results are valued higher than the tools that produced them (which makes people invest less in infrastructure). The latter often leads to brittle build systems and the lack of tests. It's interesting (if not paradox) to me that people tend to take money in their hands to buy compute hours in the cloud to actually mitigate this.

Cheers,
Peter

[1] http://www.nersc.gov/research-and-development/user-defined-images/
[2] http://singularity.lbl.gov/


On 06/13/2017 08:12 PM, C. Titus Brown wrote:
Hi all,

we have done varying amounts of cloud computing, but it tends not be
price competitive when developing/debugging analysis pipelines for large
sets of data (vertebrate GWAS, etc.) because of the disk space needs.

The UCSC Genome Center folk are relying increasingly on cloud computing
because it is so flexible and burst-scalable - also see Dockstore.org
for something that they are doing across cancer centers.

With regard to Alex Savio's comment on clinical data -  I don't know where in
the world you are, Alex, but at least in the US there are several portions of
AWS that are HIPAA-compliant.  The entire UC system can use AWS for clinical
data now, for example.  I can seek out details if anyone is interested.

Personally I think HPCs are a problem for reproducibility (see
blogs.nature.com/naturejobs/2017/06/01/techblog-c-titus-brown-predicting-the-paper-of-the-future)
 for a small bit of context and am a big fan of
computing *like* you're in the cloud (VMs or docker or singularity) so as to
manage dependencies. But while that is something that quite a few experienced
computational folk seem to agree with, I'm not sure how many people I will be
able to convince of that in the broader world ;).

best,
--titus

On Tue, Jun 13, 2017 at 05:54:36PM +0000, alexsa...@gmail.com wrote:
Hi Peter,

I wouldnt be able to use such services with clinical data. It's totally not
an option for me.
Although I've seen some talks and the performance seems quite competitive
since scalability is easy. It's true that uploading a big quantity of data
can take a considerable time and bandwith, some labs use the weekends for
data uploading. One problem may be to convince University fund managers to
pay for external computing services when they already provide HPC services.

My five cents...

On Tue, 13 Jun 2017, 13:38 Peter Steinbach, <steinb...@scionics.de> wrote:

Dear both,

as a side note (and my apologies for digressing), I was wondering how
popular cloud computing for data processing at scale in an academic
context is in the US or elsewhere?

Here in Europe, many universities run their own HPC centers where people
can sign up to process larger amounts of data or do larger simulations
or whatnot ... mostly people here are concerned about efficiency (data
connnections into the cloud are typically poor, VM overhead is
considerable) and security/confidentiality when putting scientific
workflows into the cloud.
What is your take on this?

Best,
Peter


PS. I love the "serverless" metaphor. Get's rid of all the problems of
computers. ;)

On 06/12/2017 06:02 PM, Marianne Corvellec wrote:
Hi Justin,

Thank you so much for the quick reply!

I'm going to give this new package a try.

Best,
Marianne

On Fri, Jun 9, 2017 at 11:20 AM, Justin Kitzes <jkit...@berkeley.edu>
wrote:
Hi Marianne,

PyWren by Eric Jonas sounds like it's pretty similar to what you're
looking for -

http://pywren.io/

It's a relatively new package that's still in active development, but
Eric is very interested in expanding it (and has some support from the
riselab at UC Berkeley to do so). I know that he's also actively looking
for use cases, so I'd definitely suggest getting in touch with him if
you're interested.

Best,

Justin

--
Justin Kitzes
Energy and Resources Group
Berkeley Institute for Data Science
University of California, Berkeley

On Jun 9, 2017, at 6:51 AM, Marianne Corvellec <
marianne.corvel...@gmail.com> wrote:

Dear community,

I'm curious as to whether some of you might have worked on or used
solutions such as AWS Lambda in the context of your scientific
research.

If so, have you documented it in a blog post that you could share?
Thanks in advance!

Without even considering workflows or full-fledged projects, wouldn't
we want to be able to make a standard API call to, say, fit a
polynomial to some data?  Is anyone aware of any effort in this
direction?

A friend of mine just drew my attention to this general issue, which
touches on open science and reproducible research...  In the meantime,
I'll encourage him to join this mailing list!

Thank you,
Marianne
_______________________________________________
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/listinfo/discuss

_______________________________________________
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/listinfo/discuss

_______________________________________________
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/listinfo/discuss

--

Sent from my phone, sorry for brevity or typos.

_______________________________________________
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/listinfo/discuss

_______________________________________________
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/listinfo/discuss

Reply via email to