[MirageOS-devel] Questions from potential new MirageOS Ocaml user

Luther Flippen Mon, 30 Apr 2018 06:14:24 -0700

Dear MirageOS developers,

I DID NOT KNOW WHAT FORUM TO POST THIS ON SO I AM SENDING IT TO YOU FIRST. 
SUGGESTIONS AND ANSWERS ARE WELCOME.

I have questions regarding my tentative choice of language/ecosystem, but felt
the need to frame this in a rather detailed context. I apologize ahead of time
for this verboseness. It is actually long enough to be a blog in fact.

MY BACKGROUND:

As a retired engineering research worker and former professor, I want to
continue research on my own but I need a (free) programming language ecosystem
to work in. I used to use Matlab, Mathematica, Fortran, C at times, etc. when
implementing the algorithm-development aspects of my work, and have done
engineering and scientific computing since the punch card days. I have already
decided on a strongly typed functional language from the ML family after a LOT
of research and reading, so please keep this in mind. I AM very much interested
in deployment, not just in using my software myself, and this includes the
cloud for example. I have not as yet learned/tried any of the following (or any
other functional) programming language. At my age I want to take the time to
get it right the first time before investing heavily in a new
language/ecosystem.

SOME FURTHER CONTEXT:

Currently we are seeing the evolution from traditional-OS-centric to
cloud/hypervisor/container-OS/unikernel-centric computing well under way and
pervasive. The client/web side of things are now dominated by ARM
devices/phones as well. This paradigm shift over the last two decades has been
dramatic. As the next step, I believe the current cloud will gradually evolve
more from centralized large clusters of servers to massively
geographically-distributed micro-cloud/fog systems and edge computing,
complimenting the growing IoT trend. A few years back (2013) some papers on the
Beowulf and on the Iridis-Pi high performance clusters, on the Bolzano
Raspberry Pi Cloud, and on the Glascow Raspberry Pi Cloud
demonstrated/prototyped micro-cloud/fog and HP-cluster systems based on the
Raspberry Pi version as of that time. These boards had no hardware support for
vitualization and had too-small (256/512 MB) memory for Xen, as well as
comparatively slow (100Mbps/2.0) ethernet/USB for inter-pi communication. Now
however, as an example at the time of this post, one can get (from Pine64 for
$45) the rPi form factor ROCK64 board with a Rockchip RK3328 Quad-Core ARM
Cortex A53 64-Bit (up to 1.5 GHz) Processor, 4GB 1600MHz LPDDR3 memory, true
Gigabit Ethernet, USB 3.0, eMMC module socket and microSD card slot for
persistent storage/booting, and ARM Mali-450MP2 Dual-core GPU (4K60P capable).
The Cortex 53 is an ARMv8-A with Virtualization Extensions hardware support,
and with the 4GB RAM memory, make this current hardware capable of fully
running the Xen hypervisor commonly in use on the cloud for example. The
Gigabit ethernet makes inter-board and external communication fast. The only
downside to this particular board/chip is that its Mali-450 gpu does not
support OpenCL. Otherwise, this board can hence potentially function as the
building blocks of a real micro-cloud/fog system, or a high performance
distributed-cluster system. It should be noted however, that there are other
ARM Mali GPUs that do support OpenCL, and hence using such GPUs for GPGPU
programming as well as the ARM CPU cores is supported. The hardware has caught
up.

Ignoring the above in my decision would be folly.

*****

THE OTHER (SECONDARY) CANDIDATE LANGUAGES:

along with my MAIN objections to them (there may be more, not listed), given
ordering unimportant:

(NOTE: The JVM was more of a negative to me than the positive it may be for
many enterprise-type programmers, so please no suggestions regarding languages
from that family. This includes Scala. Also, on a different note, my
understanding is that Erlang (and its derivatives) is meant mainly for IO-bound
concurrency, as opposed to CPU-bound parallelism, so I did not look into it
very far - correct me if I am wrong. I would like BOTH capabilities. I have
also looked at Julia and others too, and do not care to debate how I narrowed
my list to those of this post and re-hashing those issues.)

1) Haskell: lazy by default, but I want strict, with opt-in laziness, not
laziness by default. I understand laziness allows the separation of data
producers from data consumers, with its subsequent modularity, and the
convenience of not having to keep track of computation order, etc. but I
believe this is best offered on an opt-in basis, as in Clojure for example.
Haskell is clearly the best lazy by default language in my view, if one wants
that. Please do not try to change my mind on lazy by default, wasting effort
and time. Also, it is not always clear which of Haskell's multiple libraries,
when built for the same or similar purpose, is best-practice to use.

2) Standard ML: less actively used outside of academic circles than others, as
to my understanding, and how active/large is its ecosystem? It does have MLton
though.

3) F*: maturity and related issues bother me, but the Ocaml and F# ecosystems
are apparently available to the extent that it targets those. The
refinement/dependent type system is its most attractive feature. Will it
survive/grow beyond its current niche use though?

4) Idris: gaining in maturity, but still a small ecosystem (and no visible use
in production?). It can leverage its target language's ecosystems perhaps, and
it may be the best dependent typed language for practical regular (non-proof)
programming. However, how long will it be maintained (as it is a research
project): will the main author/developer move on one day for other research?
This is even pointed out by its developer as a reason to not use it for
production. I have partially read its associated book and found it very nice.

5) F#: is strongly tied to Microsoft so it has compromises/constraints from
being in the CLI family of languages and .NET framework. In addition, Microsoft
now treats it and .NET as second (third?) class citizens apparently. This
latter point was even made by a strong proponent of F# (see Jon Harrop's
comments at
https://www.quora.com/Why-did-the-ML-style-languages-SML-OCaml-F-fail-to-gain-any-traction)

6) ATS: There seems to be very few people using it, so the user base would be
small regarding asking for help/advice, and hence there would also be fewer
examples to go by too. This is compounded by a big long learning curve, to my
understanding, in which such things are needed. I have similar concerns
regarding its ecosystem being small. This is all a shame since it may be a
promising dependent-typed language (with linear types too) for scientific use
otherwise.

*****

CURRENT MAIN CONTENDER:

OCAML

before the questions/concerns,
SOME PLUSES FIRST:

1) MirageOS (this is very big in my view, pushing Ocaml over the edge as the
leader): This looks to be a development platform for the future: for cloud,
ARM, IoT, and micro-cloud/fog/HP-cluster computing in addition to Unix/Linux/
Windows and containers on PC/servers/VMs, etc. One (MirageOS) development
environment => wide open type-safe (and compiler
whole-appliance/unikernel-optimized) deployment options.

2) Fast single-thread/processor/core performance, at least for an ML language

3) Though it is not multicore, there are still many distributed parallel
computing options: JoCaml, CIEL, Opis, Functory, BSML (and related), async
parallel from Janestreet, MPI, Parmat, SKLML, Ocamlnet, and of course forking
in Unix,... and perhaps more I missed.

4) F* targets Ocaml, for refinement/dependent type capability, so Ocaml can
leverage this to gain those capabilities.

5) Coq (dependently typed) extracts to Ocaml (for verification/proofs, etc), so
this capability of Coq can be leveraged by Ocaml where needed/desired as well.

6) SPOC library for GPGPU programming at a high, type-safe level using Ocaml
(compatible with both CUDA and OpenCL)

7) Ocsigen libraries

8) Janestreet libraries

9) Its excellent module system of course

....and more

*****

FINALLY,
CONCERNS/QUESTIONS:

1) THIS IS A WELL-KNOWN/WELL-DISCUSSED CRITICISM. Ocaml has a GIL (no shared
memory multicore) and no type classes. Apparently there has been a promise of
multicore development in progress that has been a long time coming to fruition,
and that has bred skepticism in at least some. Does multicore (versus
parallel-distributed) become less relevant as the number of cores per node
grows? My understanding is, and I have heard it argued that, multicore behavior
approaches distributed behavior with increasing core number. Is this true?
Modular implicits are supposed to be on the way too (which are supposed to be
better than type classes?), but how far off are they for mature
production-quality use?

2) This, I think, is a big missing piece for MirageOS itself (especially in
light of Ocaml's GIL and the "no-forking" nature of unikernels): If I use
MirageOS as my development platform for Ocaml, which is what I would prefer,
what is the parallel distributed computing implementation that MirageOS will
use? (Note that this is different from Jitsu producing a swarm of application
copies for IO demand-response, from what I understand of it.) I read where they
will base it on the join calculus, but there seems to be some question as to
what is beyond that specifically. I read JoCaml, CIEL, and Opis mentioned as
possibilities. I assume this will involve communicating unikernels, spread over
multiple cores when on a single machine, and possibly scaling up to multiple
machines beyond that, for a given particular application running in parallel.
As a scientific programmer I might often want parallel computing capability for
any given application I deploy. Obviously I will be limited to course grain
parallelism mostly, this environment being distributed.

3) This is another potentially big missing piece from MirageOS: What are the
GPGPU programming capabilities available in the MirageOS ecosystem? Is the SPOC
library available and usable through MirageOS on hypervisors on which MirageOS
can run? If so can it be utilized in conjunction with the distributed computing
solution discussed in the previous question above at the same time? In other
words, can a unikernel in a pool of parallel-distributed communicating
unikernels access GPGPU programming resources (via SPOC, clMAGMA, or some
alternative) on the node on which it resides?

4) To what extent can the above two capabilities if present, GPGPU and
parallel-distributed programming, mitigate the lack of multicore capability,
especially regarding MirageOS? In this context, what are my options for fast
unboxed linear algebra computations, especially running on MirageOS? Would
Lapack routines be viable (which are usually fortran/C/C++)? Obviously shared
in-place memory manipulation of unboxed arrays is very efficient in this
context, as in multicore, but can new GPGPU capability compensate? For example,
the OpenCL BLAS and clMAGMA libraries for OpenCL, and the cuBLAS and NVBLAS
libraries for CUDA, come to mind, or something similarly able to do linear
algebra on the GPU. More generally, has the Xen/MirageOS community looked into
support for the scientific computing community? (I do not mean that they would
necessarily need/want to compete with the more-niche sub-community of
professional HPC for speed.) I am not just talking about Big Data input and
then visualization/exploring/manipulating the data by the way. Some might want
to run large simulations (physical, biological, etc) of some sort for example.

5) Not a deal-breaker, but MirageOS needs a real user guide/manual, preferably
available in pdf, or better yet a book, but not just blogs as they mostly have
now. I would like more documentation beyond concept-introduction papers too,
emphasizing developer-needed knowledge in actually using MirageOS day to day.
Correct me if I am wrong by pointing out such references. Ultimately a book
that also addressed the above 4 issues as well would be fantastic.

If the above 5 have good answers forthcoming, at least in the works or in the
near future, then that future could look very attractive and bright for
Ocaml/MirageOS, at least to me, and probably to many.

Sincerely,
Luther Flippen

Sent From My Sprint Phone.

_______________________________________________
MirageOS-devel mailing list
[email protected]
https://lists.xenproject.org/mailman/listinfo/mirageos-devel

[MirageOS-devel] Questions from potential new MirageOS Ocaml user

Reply via email to