On 12/7/18, 8:46 AM, "Beowulf on behalf of Michael Di Domenico"
<[email protected] on behalf of [email protected]> wrote:
On Fri, Dec 7, 2018 at 11:35 AM John Hanks <[email protected]> wrote:
>
> But, putting it in a container wouldn't make my life any easier and
would, in fact, just add yet another layer of something to keep up to date.
i think the theory behind this is the containers allow the sysadmins
to kick the can down the road and put the onus of updates on the
container developer. but then you get into a circle of trust issue,
whereby now you have to trust the container developers are doing
something sane and in a timely manner.
a perfect example that we pitched up to our security team was (this
was few year ago mind you); what happens when someone embeds openssl
libraries in the container. who's responsible for updating them?
what happens when that container gets abandoned by the dev? and those
containers are running with some sort of docker/root privilege
menagire. this was back when openssl had bugs coming up left and
right. yeah, that conversation stopped dead in its tracks and we put
a moratorium on docker.
but i don't think the theory lines up with the practice, and that's
why dev's shouldn't be doing ops
this is a generic problem in areas other than HPC. Over the past few years, a
fair amount of the software I'm working with is targeted to spacecraft
platforms - We had an interesting exercise over the past couple years. I was
porting a standard orbit propagation package (SGP4, see
http://www.celestrak.com/ for the Pascal version from 2000), which is available
in many different languages. I happened to be implementing the C version in
RTEMS running on a SPARC V8 processor (the LEON2 and LEON3, as it happens).
The software itself is quite compact, has no dependencies other than math.h,
stdio.h, stdlib.h, and derives from an original Fortran version. RTEMS is a
real time operating system that exposes POSIX API, so it's easy to work with.
What we did is create a wrapper for SGP that matches a standardized set of APIs
for software radios (Space Telecommunications Radio System, STRS).
But here's the problem - There are really 4 different target hardware
platforms, all theoretically the same, but not. In the space flight software
business, one chooses a toolchain and development environment at the beginning
of the project (Phase A - Formulation) and you stay with that for the life of
the mission, unless there's a compelling reason to change. In the course of
the last 10 years, we've gone through 5 versions of RTEMS
(4.8.4.10,4.11,4.12,5.0), 3 different source management tools (cvs,svn,git), an
IDE that came and went (Eclipse), not to mention a variety of versions of the
gcc toolchain. Each mission has its own set of all of this. And, a bunch of
homegrown make files and related build processes. And, of course, it's a
hodgepodge of CentOS, Scientific Linux, Ubuntu, Debian, and RH, depending on
what was the "most supported distro" at the time the mission picked it (which
might depend on who the SysAdmin on the project was).
10 years is *forever* in the software development world. I've not yet had the
experience of a developer born after the first version of the flight software
they're working on was created - but I know that other people at JPL have (when
it takes 7 years to get to where you're going, and the mission lasts 10-15
years after that...). And this is perfectly reasonable - SGP4, for instance,
basically implements the laws of physics as a numerical model - it worked fine
in 2000, it works fine now, it's going to work just fine in 2030, with no
significant changes. "The SGP4 and SDP4 models were published along with sample
code in FORTRAN IV in 1988 with refinements over the original model to handle
the larger number of objects in orbit since" (Wikipedia article on SGP)
So, "inheriting" the SGP4 propagator from one project into another is not just
a matter of moving the source code for SGP. You have to compile it with all the
other stuff, and there are myriad hidden dependencies - does this platform have
hardware floating point or software emulated floating point, and if the latter,
which of several flavors. Where in the source tree (for that project) does it
sit? What's the permissions strategy? Where do you add it in the build process?
And then contemplate propagating a bug fix over all those platforms. You might
make a decision to propagate a change to some, but not all platforms - Maybe
the spacecraft you're contemplating is getting towards the end of its life, and
you'll never use the function you developed 4 years ago again. Do you put that
bug fix to address the incorrect gravitation parameter at Mars into the systems
that are orbiting Earth?
Yes - folks have said "put it in containers" and in the last few years, folks
have started spinning up VMs to manage this. Historically, we keep "systems
under glass" - once you've got the build PCs working, you preserve them for the
project forever. The problem is that PCs fail eventually. But whether it is
keeping half a dozen PCs on a shelf running, or half a dozen VMs running, it's
really the same administrative burden - they all need to have annual security
audits, perhaps have patches applied (if it's "on the network"). And you've
really not addressed the underlying problem of needing to support a remarkable
variety of heterogenous platforms. You've basically saved the physical space
on a shelf for all those PCs.
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf