Re: [Discuss] Serverless scientific computing (function as a service)

2017-06-14 Thread Henry Neeman

A couple of thoughts:

(1) Depending on specific libraries can be
an unintended but unavoidable side effect of
the programming language chosen.

For example, we've seen plenty of examples of
Python code that's quite brittle regarding
Python version (and perhaps versions of
various packages).

(MATLAB sometimes shows similar effects, but
typically gentler than Python, and Perl does
too, but I've encountered very few research
software developers who develop their workhorse
codes primarily in Perl.)

(2) For floating point calculations, there's
a reasonable argument that, if the code's
replicability is dependent on the order in
which various operations are executed, then
that algorithm is too fragile to be relied on
for meaningful research results.

The problem is that floating point
representation is approximate.

For example, a double precision floating point
number has 64 bits, meaning 2^64 possible
values.

But there are infinitely many real numbers,
and in fact infinitely many real numbers
between any two real numbers.

So almost every value you try to represent has
some error in its representation.

On top of that, every calculation you do
introduces additional error due to rounding.

As an example, consider this:

1.23 * 4.56 * 7.89 = 44.25343

That's 44.3 when rounded to 3 significant
figures.

Now let's recalculate, but this time rounding
after each operation, which is analogous to
what happens in real life in floating point
arithmetic:

1.23 * 4.56 = 5.6088 ~= 5.61 * 7.89 = 44.2629
~= 44.3

1.23 * 7.89 = 9.7047 ~= 9.70 * 4.56 = 44.2320
~= 44.2

4.56 * 7.89 = 35.9784 ~= 36.0 * 1.23 = 44.280
~= 44.3

Which one of these is the "right" order?

You can't possibly know that answer when
you write the code, because the "right"
order will depend on the values you're
calculating on.

Which you can't know while you're writing
the code, or you wouldn't have had to
write the code in the first place.

Now imagine doing zillions of these
calculations to get your result -- not just
multiplies but adds, subtracts, divides,
exponentiations, logarithms, cosines, you
name it.

You can imagine that numerical error is
going to build up pretty quickly.

In a "good" numerical method, that error
will accumulate in a random direction with
each operation, so the aggregate error
won't be too bad.

But in a "bad" numerical method, that error
may bias in a specific direction, so the
aggregate error may be terrible.

In either case, the result is essentially
guaranteed to be "wrong," in the sense that
it's approximate, which was guaranteed from
the start, because floating point
representation is approximate, as above.

If your definition of "replicable" is
"gets the exact same result bit-by-bit,"
and you achieve that aim, there's a decent
chance that it'll be the exact same,
scientifically unacceptable, amount of wrong.

Which isn't a win.

So the notion that bit-by-bit replicability
is the same as scientific replicability is
debatable.

The advantage of packages like BLAS is that
they've been designed by people who know an
awful lot about these issues, and so BLAS is
pretty robust with respect to floating point
issues.

But BLAS isn't designed or intended for
bit-by-bit replicability, because in large
scale numerical calculations, bit-by-bit
replicability isn't necessarily valuable from
a scientific perspective.

Henry

--

On Wed, 14 Jun 2017, Bennet Fauber wrote:

>Peter brings up an interesting point about code quality and its role
>in replicability.  It may that too strong a reliance on particular
>underlying libraries is really an indication of unstable code or
>unstable methods.
>
>Good numerical code should largely survive recompilation.  A good
>example of this is the code in R, I think.  The R maintainers have
>warnings about using MKL or other optimized libraries, and they
>provide robust source code for the basic BLAS functions R needs for
>those who aren't interested in or able to evaluate whether the
>differences shown between the tests at compilation and the baseline
>are significant for their research or not.
>
>Regarding containers and HPC, Singularity is making rapid inroads into
>HPC centers, I think, and it will only become more prevalent now that
>the Singularity people have largely got it so non-root users can
>create and maintain containers.  That's been a huge issue for HPC
>centers like ours with Docker, which wants to run as root.
>Singularity containers run entirely as the invoking user and in
>unprivileged space, which makes them far less controversial.
>
>Cloud providers, like Amazon, or 'cloud' cluster providers like
>Penguin do offer something else that is sometimes increasingly
>desirable to researchers at big university's and that is independence
>and portability.
>
>If a junior faculty member or graduate student -- or undergraduate --
>builds something in AWS and moves to a different university, there is
>no interruption to the research program:  AWS doesn't have to move.
>Similarly 

Re: [Discuss] Serverless scientific computing (function as a service)

2017-06-14 Thread C. Titus Brown
Hi Peter,

Nature had a piece on containers recently --

https://www.nature.com/news/software-simplified-1.22059

which has Lorena Barba making exactly the same points as you about software
robustness!  So at least the opinions are getting out there...

In my experience, however, it's difficult to get scientist to understand that
they should immediately stop working with their 15-year-old stack of software
and pipelines and reengineer it from scratch so as to be robust ;).  So
bandaids are sadly the soup du decade.

Less tongue-in-cheek, for many non-technical reasons, we've settled for an
incredibly fragile software infrastructure.  I don't see us working our
way out of that anytime soon. A few thoughts and links here:
http://ivory.idyll.org/blog/2017-pof-software-archivability.html

best,
--titus

kOn Wed, Jun 14, 2017 at 09:11:47AM +0200, Peter Steinbach wrote:
> Hi everyone,
>
> thanks for the interesting discussion so far. From my personal point of  
> view, I'd fully agree with the computational burst based argument. If a  
> robust pipeline needs to scale for a short amount of time and local HPC  
> resources are blocked, the cloud is an essential resource.
>
> However, with projects like [1] or [2] I don't buy into the argument  
> that using HPC is forbidding due to reproducibility of scientific  
> results. I know that many HPC installations are very conservative when  
> it comes to containerized execution (like in the cloud) and have a long  
> lag of implementing modern technologies, but containerized execution for  
> the sake of having a fixed set of dependencies can also be considered as  
> a lack of software quality. For me, this in turn is as a result of our  
> academic system of incentives, i.e. published results are valued higher  
> than the tools that produced them (which makes people invest less in  
> infrastructure). The latter often leads to brittle build systems and the  
> lack of tests. It's interesting (if not paradox) to me that people tend  
> to take money in their hands to buy compute hours in the cloud to  
> actually mitigate this.
>
> Cheers,
> Peter
>
> [1] http://www.nersc.gov/research-and-development/user-defined-images/
> [2] http://singularity.lbl.gov/
>
>
> On 06/13/2017 08:12 PM, C. Titus Brown wrote:
>> Hi all,
>>
>> we have done varying amounts of cloud computing, but it tends not be
>> price competitive when developing/debugging analysis pipelines for large
>> sets of data (vertebrate GWAS, etc.) because of the disk space needs.
>>
>> The UCSC Genome Center folk are relying increasingly on cloud computing
>> because it is so flexible and burst-scalable - also see Dockstore.org
>> for something that they are doing across cancer centers.
>>
>> With regard to Alex Savio's comment on clinical data -  I don't know where in
>> the world you are, Alex, but at least in the US there are several portions of
>> AWS that are HIPAA-compliant.  The entire UC system can use AWS for clinical
>> data now, for example.  I can seek out details if anyone is interested.
>>
>> Personally I think HPCs are a problem for reproducibility (see
>> blogs.nature.com/naturejobs/2017/06/01/techblog-c-titus-brown-predicting-the-paper-of-the-future)
>>  for a small bit of context and am a big fan of
>> computing *like* you're in the cloud (VMs or docker or singularity) so as to
>> manage dependencies. But while that is something that quite a few experienced
>> computational folk seem to agree with, I'm not sure how many people I will be
>> able to convince of that in the broader world ;).
>>
>> best,
>> --titus
>>
>> On Tue, Jun 13, 2017 at 05:54:36PM +, alexsa...@gmail.com wrote:
>>> Hi Peter,
>>>
>>> I wouldnt be able to use such services with clinical data. It's totally not
>>> an option for me.
>>> Although I've seen some talks and the performance seems quite competitive
>>> since scalability is easy. It's true that uploading a big quantity of data
>>> can take a considerable time and bandwith, some labs use the weekends for
>>> data uploading. One problem may be to convince University fund managers to
>>> pay for external computing services when they already provide HPC services.
>>>
>>> My five cents...
>>>
>>> On Tue, 13 Jun 2017, 13:38 Peter Steinbach,  wrote:
>>>
 Dear both,

 as a side note (and my apologies for digressing), I was wondering how
 popular cloud computing for data processing at scale in an academic
 context is in the US or elsewhere?

 Here in Europe, many universities run their own HPC centers where people
 can sign up to process larger amounts of data or do larger simulations
 or whatnot ... mostly people here are concerned about efficiency (data
 connnections into the cloud are typically poor, VM overhead is
 considerable) and security/confidentiality when putting scientific
 workflows into the cloud.
 What is your take on this?

 Best,
 Peter


 PS. I love the 

Re: [Discuss] Serverless scientific computing (function as a service)

2017-06-14 Thread Bennet Fauber
Peter brings up an interesting point about code quality and its role
in replicability.  It may that too strong a reliance on particular
underlying libraries is really an indication of unstable code or
unstable methods.

Good numerical code should largely survive recompilation.  A good
example of this is the code in R, I think.  The R maintainers have
warnings about using MKL or other optimized libraries, and they
provide robust source code for the basic BLAS functions R needs for
those who aren't interested in or able to evaluate whether the
differences shown between the tests at compilation and the baseline
are significant for their research or not.

Regarding containers and HPC, Singularity is making rapid inroads into
HPC centers, I think, and it will only become more prevalent now that
the Singularity people have largely got it so non-root users can
create and maintain containers.  That's been a huge issue for HPC
centers like ours with Docker, which wants to run as root.
Singularity containers run entirely as the invoking user and in
unprivileged space, which makes them far less controversial.

Cloud providers, like Amazon, or 'cloud' cluster providers like
Penguin do offer something else that is sometimes increasingly
desirable to researchers at big university's and that is independence
and portability.

If a junior faculty member or graduate student -- or undergraduate --
builds something in AWS and moves to a different university, there is
no interruption to the research program:  AWS doesn't have to move.
Similarly with something like Penguin on Demand for more traditional
HPC programs (e.g., MPI-based).  Doing work in the cloud also frees
one from the local IT department, which may have strict rules about
what can and cannot be installed on the institution's computers and
how they can be used that are contrary to what is needed (or wanted)
for the workflows.

In some cases, research can also be done outside of academia or
industry, in which case in-house infrastructure probably doesn't
exist.  That might be more applicable to social sciences, but I could
imagine independent scholars doing work in field biology, ecology,
water quality, etc.  They would benefit greatly from access to
computational machinery that is not only free of encumbering licensing
but of institutional [sic] infrastructure.

Just some more thoughts for the hearth,

-- bennet




On Wed, Jun 14, 2017 at 3:11 AM, Peter Steinbach  wrote:
> Hi everyone,
>
> thanks for the interesting discussion so far. From my personal point of
> view, I'd fully agree with the computational burst based argument. If a
> robust pipeline needs to scale for a short amount of time and local HPC
> resources are blocked, the cloud is an essential resource.
>
> However, with projects like [1] or [2] I don't buy into the argument that
> using HPC is forbidding due to reproducibility of scientific results. I know
> that many HPC installations are very conservative when it comes to
> containerized execution (like in the cloud) and have a long lag of
> implementing modern technologies, but containerized execution for the sake
> of having a fixed set of dependencies can also be considered as a lack of
> software quality. For me, this in turn is as a result of our academic system
> of incentives, i.e. published results are valued higher than the tools that
> produced them (which makes people invest less in infrastructure). The latter
> often leads to brittle build systems and the lack of tests. It's interesting
> (if not paradox) to me that people tend to take money in their hands to buy
> compute hours in the cloud to actually mitigate this.
>
> Cheers,
> Peter
>
> [1] http://www.nersc.gov/research-and-development/user-defined-images/
> [2] http://singularity.lbl.gov/
>
>
> On 06/13/2017 08:12 PM, C. Titus Brown wrote:
>>
>> Hi all,
>>
>> we have done varying amounts of cloud computing, but it tends not be
>> price competitive when developing/debugging analysis pipelines for large
>> sets of data (vertebrate GWAS, etc.) because of the disk space needs.
>>
>> The UCSC Genome Center folk are relying increasingly on cloud computing
>> because it is so flexible and burst-scalable - also see Dockstore.org
>> for something that they are doing across cancer centers.
>>
>> With regard to Alex Savio's comment on clinical data -  I don't know where
>> in
>> the world you are, Alex, but at least in the US there are several portions
>> of
>> AWS that are HIPAA-compliant.  The entire UC system can use AWS for
>> clinical
>> data now, for example.  I can seek out details if anyone is interested.
>>
>> Personally I think HPCs are a problem for reproducibility (see
>>
>> blogs.nature.com/naturejobs/2017/06/01/techblog-c-titus-brown-predicting-the-paper-of-the-future)
>> for a small bit of context and am a big fan of
>> computing *like* you're in the cloud (VMs or docker or singularity) so as
>> to
>> manage dependencies. But while that is something 

Re: [Discuss] Serverless scientific computing (function as a service)

2017-06-14 Thread Peter Steinbach

Hi everyone,

thanks for the interesting discussion so far. From my personal point of 
view, I'd fully agree with the computational burst based argument. If a 
robust pipeline needs to scale for a short amount of time and local HPC 
resources are blocked, the cloud is an essential resource.


However, with projects like [1] or [2] I don't buy into the argument 
that using HPC is forbidding due to reproducibility of scientific 
results. I know that many HPC installations are very conservative when 
it comes to containerized execution (like in the cloud) and have a long 
lag of implementing modern technologies, but containerized execution for 
the sake of having a fixed set of dependencies can also be considered as 
a lack of software quality. For me, this in turn is as a result of our 
academic system of incentives, i.e. published results are valued higher 
than the tools that produced them (which makes people invest less in 
infrastructure). The latter often leads to brittle build systems and the 
lack of tests. It's interesting (if not paradox) to me that people tend 
to take money in their hands to buy compute hours in the cloud to 
actually mitigate this.


Cheers,
Peter

[1] http://www.nersc.gov/research-and-development/user-defined-images/
[2] http://singularity.lbl.gov/


On 06/13/2017 08:12 PM, C. Titus Brown wrote:

Hi all,

we have done varying amounts of cloud computing, but it tends not be
price competitive when developing/debugging analysis pipelines for large
sets of data (vertebrate GWAS, etc.) because of the disk space needs.

The UCSC Genome Center folk are relying increasingly on cloud computing
because it is so flexible and burst-scalable - also see Dockstore.org
for something that they are doing across cancer centers.

With regard to Alex Savio's comment on clinical data -  I don't know where in
the world you are, Alex, but at least in the US there are several portions of
AWS that are HIPAA-compliant.  The entire UC system can use AWS for clinical
data now, for example.  I can seek out details if anyone is interested.

Personally I think HPCs are a problem for reproducibility (see
blogs.nature.com/naturejobs/2017/06/01/techblog-c-titus-brown-predicting-the-paper-of-the-future)
 for a small bit of context and am a big fan of
computing *like* you're in the cloud (VMs or docker or singularity) so as to
manage dependencies. But while that is something that quite a few experienced
computational folk seem to agree with, I'm not sure how many people I will be
able to convince of that in the broader world ;).

best,
--titus

On Tue, Jun 13, 2017 at 05:54:36PM +, alexsa...@gmail.com wrote:

Hi Peter,

I wouldnt be able to use such services with clinical data. It's totally not
an option for me.
Although I've seen some talks and the performance seems quite competitive
since scalability is easy. It's true that uploading a big quantity of data
can take a considerable time and bandwith, some labs use the weekends for
data uploading. One problem may be to convince University fund managers to
pay for external computing services when they already provide HPC services.

My five cents...

On Tue, 13 Jun 2017, 13:38 Peter Steinbach,  wrote:


Dear both,

as a side note (and my apologies for digressing), I was wondering how
popular cloud computing for data processing at scale in an academic
context is in the US or elsewhere?

Here in Europe, many universities run their own HPC centers where people
can sign up to process larger amounts of data or do larger simulations
or whatnot ... mostly people here are concerned about efficiency (data
connnections into the cloud are typically poor, VM overhead is
considerable) and security/confidentiality when putting scientific
workflows into the cloud.
What is your take on this?

Best,
Peter


PS. I love the "serverless" metaphor. Get's rid of all the problems of
computers. ;)

On 06/12/2017 06:02 PM, Marianne Corvellec wrote:

Hi Justin,

Thank you so much for the quick reply!

I'm going to give this new package a try.

Best,
Marianne

On Fri, Jun 9, 2017 at 11:20 AM, Justin Kitzes 

wrote:

Hi Marianne,

PyWren by Eric Jonas sounds like it's pretty similar to what you're

looking for -


http://pywren.io/

It's a relatively new package that's still in active development, but

Eric is very interested in expanding it (and has some support from the
riselab at UC Berkeley to do so). I know that he's also actively looking
for use cases, so I'd definitely suggest getting in touch with him if
you're interested.


Best,

Justin

--
Justin Kitzes
Energy and Resources Group
Berkeley Institute for Data Science
University of California, Berkeley


On Jun 9, 2017, at 6:51 AM, Marianne Corvellec <

marianne.corvel...@gmail.com> wrote:


Dear community,

I'm curious as to whether some of you might have worked on or used
solutions such as AWS Lambda in the context of your scientific
research.

If so, have you documented it in a blog 

Re: [Discuss] Serverless scientific computing (function as a service)

2017-06-13 Thread Strong, Dena L
I have anecdotes rather than data, but around here it's getting to be more 
popular as more people try it out. 

Folks do need to take care to look through their grant terms to see if there's 
any specific language about region restrictions or cloud computing 
restrictions. (I have a friend who keeps burning through SSDs on an 
under-her-desk server who would love to have Amazon's resources available but 
her grant specifically prohibits cloud computing.)

One local example: Here at the University of Illinois we had a team spend 3 
months analyzing a set of data, only to discover at the end of the 3 months 
that there was a software version mismatch that made that run incompatible with 
the rest of their results. 

They came to talk to my department's cloud and virtualization team about what 
could be done, and discovered that there was a way to re-run their work in 3 
days at a really cost effective price point on Amazon Web Services with our 
team's help. (Our lead thinks he might be able to get that originally 3 month 
process down into 1 day with a little more optimization.) Our AWS team often 
serves as an intermediary to help guide researchers through navigating the 
networking, security, and optimization issues.

-Dena Strong, Technology Services, University of Illinois

-Original Message-
From: Discuss [mailto:discuss-boun...@lists.software-carpentry.org] On Behalf 
Of Peter Steinbach
Sent: Tuesday, June 13, 2017 6:38 AM
To: discuss@lists.software-carpentry.org
Subject: Re: [Discuss] Serverless scientific computing (function as a service)

Dear both,

as a side note (and my apologies for digressing), I was wondering how popular 
cloud computing for data processing at scale in an academic context is in the 
US or elsewhere?

Here in Europe, many universities run their own HPC centers where people can 
sign up to process larger amounts of data or do larger simulations or whatnot 
... mostly people here are concerned about efficiency (data connnections into 
the cloud are typically poor, VM overhead is
considerable) and security/confidentiality when putting scientific workflows 
into the cloud.
What is your take on this?

Best,
Peter


PS. I love the "serverless" metaphor. Get's rid of all the problems of 
computers. ;)

On 06/12/2017 06:02 PM, Marianne Corvellec wrote:
> Hi Justin,
>
> Thank you so much for the quick reply!
>
> I'm going to give this new package a try.
>
> Best,
> Marianne
>
> On Fri, Jun 9, 2017 at 11:20 AM, Justin Kitzes <jkit...@berkeley.edu> wrote:
>> Hi Marianne,
>>
>> PyWren by Eric Jonas sounds like it's pretty similar to what you're 
>> looking for -
>>
>> http://pywren.io/
>>
>> It's a relatively new package that's still in active development, but Eric 
>> is very interested in expanding it (and has some support from the riselab at 
>> UC Berkeley to do so). I know that he's also actively looking for use cases, 
>> so I'd definitely suggest getting in touch with him if you're interested.
>>
>> Best,
>>
>> Justin
>>
>> --
>> Justin Kitzes
>> Energy and Resources Group
>> Berkeley Institute for Data Science
>> University of California, Berkeley
>>
>>> On Jun 9, 2017, at 6:51 AM, Marianne Corvellec 
>>> <marianne.corvel...@gmail.com> wrote:
>>>
>>> Dear community,
>>>
>>> I'm curious as to whether some of you might have worked on or used 
>>> solutions such as AWS Lambda in the context of your scientific 
>>> research.
>>>
>>> If so, have you documented it in a blog post that you could share?
>>> Thanks in advance!
>>>
>>> Without even considering workflows or full-fledged projects, 
>>> wouldn't we want to be able to make a standard API call to, say, fit 
>>> a polynomial to some data?  Is anyone aware of any effort in this 
>>> direction?
>>>
>>> A friend of mine just drew my attention to this general issue, which 
>>> touches on open science and reproducible research...  In the 
>>> meantime, I'll encourage him to join this mailing list!
>>>
>>> Thank you,
>>> Marianne
>>> ___
>>> Discuss mailing list
>>> Discuss@lists.software-carpentry.org
>>> http://lists.software-carpentry.org/listinfo/discuss
>>
> ___
> Discuss mailing list
> Discuss@lists.software-carpentry.org
> http://lists.software-carpentry.org/listinfo/discuss
>
___
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/listinfo/discuss
___
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/listinfo/discuss

Re: [Discuss] Serverless scientific computing (function as a service)

2017-06-13 Thread alexsa...@gmail.com
Hi Peter,

I wouldnt be able to use such services with clinical data. It's totally not
an option for me.
Although I've seen some talks and the performance seems quite competitive
since scalability is easy. It's true that uploading a big quantity of data
can take a considerable time and bandwith, some labs use the weekends for
data uploading. One problem may be to convince University fund managers to
pay for external computing services when they already provide HPC services.

My five cents...

On Tue, 13 Jun 2017, 13:38 Peter Steinbach,  wrote:

> Dear both,
>
> as a side note (and my apologies for digressing), I was wondering how
> popular cloud computing for data processing at scale in an academic
> context is in the US or elsewhere?
>
> Here in Europe, many universities run their own HPC centers where people
> can sign up to process larger amounts of data or do larger simulations
> or whatnot ... mostly people here are concerned about efficiency (data
> connnections into the cloud are typically poor, VM overhead is
> considerable) and security/confidentiality when putting scientific
> workflows into the cloud.
> What is your take on this?
>
> Best,
> Peter
>
>
> PS. I love the "serverless" metaphor. Get's rid of all the problems of
> computers. ;)
>
> On 06/12/2017 06:02 PM, Marianne Corvellec wrote:
> > Hi Justin,
> >
> > Thank you so much for the quick reply!
> >
> > I'm going to give this new package a try.
> >
> > Best,
> > Marianne
> >
> > On Fri, Jun 9, 2017 at 11:20 AM, Justin Kitzes 
> wrote:
> >> Hi Marianne,
> >>
> >> PyWren by Eric Jonas sounds like it's pretty similar to what you're
> looking for -
> >>
> >> http://pywren.io/
> >>
> >> It's a relatively new package that's still in active development, but
> Eric is very interested in expanding it (and has some support from the
> riselab at UC Berkeley to do so). I know that he's also actively looking
> for use cases, so I'd definitely suggest getting in touch with him if
> you're interested.
> >>
> >> Best,
> >>
> >> Justin
> >>
> >> --
> >> Justin Kitzes
> >> Energy and Resources Group
> >> Berkeley Institute for Data Science
> >> University of California, Berkeley
> >>
> >>> On Jun 9, 2017, at 6:51 AM, Marianne Corvellec <
> marianne.corvel...@gmail.com> wrote:
> >>>
> >>> Dear community,
> >>>
> >>> I'm curious as to whether some of you might have worked on or used
> >>> solutions such as AWS Lambda in the context of your scientific
> >>> research.
> >>>
> >>> If so, have you documented it in a blog post that you could share?
> >>> Thanks in advance!
> >>>
> >>> Without even considering workflows or full-fledged projects, wouldn't
> >>> we want to be able to make a standard API call to, say, fit a
> >>> polynomial to some data?  Is anyone aware of any effort in this
> >>> direction?
> >>>
> >>> A friend of mine just drew my attention to this general issue, which
> >>> touches on open science and reproducible research...  In the meantime,
> >>> I'll encourage him to join this mailing list!
> >>>
> >>> Thank you,
> >>> Marianne
> >>> ___
> >>> Discuss mailing list
> >>> Discuss@lists.software-carpentry.org
> >>> http://lists.software-carpentry.org/listinfo/discuss
> >>
> > ___
> > Discuss mailing list
> > Discuss@lists.software-carpentry.org
> > http://lists.software-carpentry.org/listinfo/discuss
> >
> ___
> Discuss mailing list
> Discuss@lists.software-carpentry.org
> http://lists.software-carpentry.org/listinfo/discuss

-- 

Sent from my phone, sorry for brevity or typos.
___
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/listinfo/discuss

Re: [Discuss] Serverless scientific computing (function as a service)

2017-06-13 Thread Peter Steinbach

Dear both,

as a side note (and my apologies for digressing), I was wondering how 
popular cloud computing for data processing at scale in an academic 
context is in the US or elsewhere?


Here in Europe, many universities run their own HPC centers where people 
can sign up to process larger amounts of data or do larger simulations 
or whatnot ... mostly people here are concerned about efficiency (data 
connnections into the cloud are typically poor, VM overhead is 
considerable) and security/confidentiality when putting scientific 
workflows into the cloud.

What is your take on this?

Best,
Peter


PS. I love the "serverless" metaphor. Get's rid of all the problems of 
computers. ;)


On 06/12/2017 06:02 PM, Marianne Corvellec wrote:

Hi Justin,

Thank you so much for the quick reply!

I'm going to give this new package a try.

Best,
Marianne

On Fri, Jun 9, 2017 at 11:20 AM, Justin Kitzes  wrote:

Hi Marianne,

PyWren by Eric Jonas sounds like it's pretty similar to what you're looking for 
-

http://pywren.io/

It's a relatively new package that's still in active development, but Eric is 
very interested in expanding it (and has some support from the riselab at UC 
Berkeley to do so). I know that he's also actively looking for use cases, so 
I'd definitely suggest getting in touch with him if you're interested.

Best,

Justin

--
Justin Kitzes
Energy and Resources Group
Berkeley Institute for Data Science
University of California, Berkeley


On Jun 9, 2017, at 6:51 AM, Marianne Corvellec  
wrote:

Dear community,

I'm curious as to whether some of you might have worked on or used
solutions such as AWS Lambda in the context of your scientific
research.

If so, have you documented it in a blog post that you could share?
Thanks in advance!

Without even considering workflows or full-fledged projects, wouldn't
we want to be able to make a standard API call to, say, fit a
polynomial to some data?  Is anyone aware of any effort in this
direction?

A friend of mine just drew my attention to this general issue, which
touches on open science and reproducible research...  In the meantime,
I'll encourage him to join this mailing list!

Thank you,
Marianne
___
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/listinfo/discuss



___
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/listinfo/discuss


___
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/listinfo/discuss

Re: [Discuss] Serverless scientific computing (function as a service)

2017-06-12 Thread Marianne Corvellec
Hi Justin,

Thank you so much for the quick reply!

I'm going to give this new package a try.

Best,
Marianne

On Fri, Jun 9, 2017 at 11:20 AM, Justin Kitzes  wrote:
> Hi Marianne,
>
> PyWren by Eric Jonas sounds like it's pretty similar to what you're looking 
> for -
>
> http://pywren.io/
>
> It's a relatively new package that's still in active development, but Eric is 
> very interested in expanding it (and has some support from the riselab at UC 
> Berkeley to do so). I know that he's also actively looking for use cases, so 
> I'd definitely suggest getting in touch with him if you're interested.
>
> Best,
>
> Justin
>
> --
> Justin Kitzes
> Energy and Resources Group
> Berkeley Institute for Data Science
> University of California, Berkeley
>
>> On Jun 9, 2017, at 6:51 AM, Marianne Corvellec 
>>  wrote:
>>
>> Dear community,
>>
>> I'm curious as to whether some of you might have worked on or used
>> solutions such as AWS Lambda in the context of your scientific
>> research.
>>
>> If so, have you documented it in a blog post that you could share?
>> Thanks in advance!
>>
>> Without even considering workflows or full-fledged projects, wouldn't
>> we want to be able to make a standard API call to, say, fit a
>> polynomial to some data?  Is anyone aware of any effort in this
>> direction?
>>
>> A friend of mine just drew my attention to this general issue, which
>> touches on open science and reproducible research...  In the meantime,
>> I'll encourage him to join this mailing list!
>>
>> Thank you,
>> Marianne
>> ___
>> Discuss mailing list
>> Discuss@lists.software-carpentry.org
>> http://lists.software-carpentry.org/listinfo/discuss
>
___
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/listinfo/discuss

Re: [Discuss] Serverless scientific computing (function as a service)

2017-06-09 Thread Justin Kitzes
Hi Marianne,

PyWren by Eric Jonas sounds like it's pretty similar to what you're looking for 
-

http://pywren.io/

It's a relatively new package that's still in active development, but Eric is 
very interested in expanding it (and has some support from the riselab at UC 
Berkeley to do so). I know that he's also actively looking for use cases, so 
I'd definitely suggest getting in touch with him if you're interested.

Best,

Justin

-- 
Justin Kitzes
Energy and Resources Group
Berkeley Institute for Data Science
University of California, Berkeley

> On Jun 9, 2017, at 6:51 AM, Marianne Corvellec  
> wrote:
> 
> Dear community,
> 
> I'm curious as to whether some of you might have worked on or used
> solutions such as AWS Lambda in the context of your scientific
> research.
> 
> If so, have you documented it in a blog post that you could share?
> Thanks in advance!
> 
> Without even considering workflows or full-fledged projects, wouldn't
> we want to be able to make a standard API call to, say, fit a
> polynomial to some data?  Is anyone aware of any effort in this
> direction?
> 
> A friend of mine just drew my attention to this general issue, which
> touches on open science and reproducible research...  In the meantime,
> I'll encourage him to join this mailing list!
> 
> Thank you,
> Marianne
> ___
> Discuss mailing list
> Discuss@lists.software-carpentry.org
> http://lists.software-carpentry.org/listinfo/discuss

___
Discuss mailing list
Discuss@lists.software-carpentry.org
http://lists.software-carpentry.org/listinfo/discuss