Dear MirageOS developers,

I DID NOT KNOW WHAT FORUM TO POST THIS ON SO I AM SENDING IT TO YOU FIRST. 
SUGGESTIONS AND ANSWERS ARE WELCOME.

I have questions regarding my tentative choice of language/ecosystem, but felt 
the need to frame this in a rather detailed context. I apologize ahead of time 
for this verboseness. It is actually long enough to be a blog in fact.

MY BACKGROUND:

As a retired engineering research worker and former professor, I want to 
continue research on my own but I need a (free) programming language ecosystem 
to work in. I used to use Matlab, Mathematica, Fortran, C at times, etc. when 
implementing the algorithm-development aspects of my work, and have done 
engineering and scientific computing since the punch card days.  I have already 
decided on a strongly typed functional language from the ML family after a LOT 
of research and reading, so please keep this in mind. I AM very much interested 
in deployment, not just in using my software myself, and this includes the 
cloud for example. I have not as yet learned/tried any of the following (or any 
other functional) programming language. At my age I want to take the time to 
get it right the first time before investing heavily in a new 
language/ecosystem.

SOME FURTHER CONTEXT:

Currently we are seeing the evolution from traditional-OS-centric to 
cloud/hypervisor/container-OS/unikernel-centric computing well under way and 
pervasive. The client/web side of things are now dominated by ARM 
devices/phones as well. This paradigm shift over the last two decades has been 
dramatic. As the next step, I believe the current cloud will gradually evolve 
more from centralized large clusters of servers to massively 
geographically-distributed micro-cloud/fog systems and edge computing, 
complimenting the growing IoT trend. A few years back (2013) some papers on the 
Beowulf and on the Iridis-Pi high performance clusters, on the Bolzano 
Raspberry Pi Cloud, and on the Glascow Raspberry Pi Cloud 
demonstrated/prototyped micro-cloud/fog and HP-cluster systems based on the 
Raspberry Pi version as of that time. These boards had no hardware support for 
vitualization and had too-small (256/512 MB) memory for Xen, as well as 
comparatively slow (100Mbps/2.0) ethernet/USB for inter-pi communication. Now 
however, as an example at the time of this post, one can get (from Pine64 for 
$45) the rPi form factor ROCK64 board with a Rockchip RK3328 Quad-Core ARM 
Cortex A53 64-Bit (up to 1.5 GHz) Processor, 4GB 1600MHz LPDDR3 memory, true 
Gigabit Ethernet, USB 3.0, eMMC module socket and microSD card slot for 
persistent storage/booting, and ARM Mali-450MP2 Dual-core GPU (4K60P capable). 
The Cortex 53 is an ARMv8-A with Virtualization Extensions hardware support, 
and with the 4GB RAM memory, make this current hardware capable of fully 
running the Xen hypervisor commonly in use on the cloud for example. The 
Gigabit ethernet makes inter-board and external communication fast. The only 
downside to this particular board/chip is that its Mali-450 gpu does not 
support OpenCL. Otherwise, this board can hence potentially function as the 
building blocks of a real micro-cloud/fog system, or a high performance 
distributed-cluster system. It should be noted however, that there are other 
ARM Mali GPUs that do support OpenCL, and hence using such GPUs for GPGPU 
programming as well as the ARM CPU cores is supported. The hardware has caught 
up.

Ignoring the above in my decision would be folly.


*****

THE OTHER (SECONDARY) CANDIDATE LANGUAGES:

along with my MAIN objections to them (there may be more, not listed), given 
ordering unimportant:

(NOTE: The JVM was more of a negative to me than the positive it may be for 
many enterprise-type programmers, so please no suggestions regarding languages 
from that family. This includes Scala. Also, on a different note, my 
understanding is that Erlang (and its derivatives) is meant mainly for IO-bound 
concurrency, as opposed to CPU-bound parallelism, so I did not look into it 
very far - correct me if I am wrong. I would like BOTH capabilities. I have 
also looked at Julia and others too, and do not care to debate how I narrowed 
my list to those of this post and re-hashing those issues.)

1) Haskell: lazy by default, but I want strict, with opt-in laziness, not 
laziness by default. I understand laziness allows the separation of data 
producers from data consumers, with its subsequent modularity, and the 
convenience of not having to keep track of computation order, etc. but I 
believe this is best offered on an opt-in basis, as in Clojure for example. 
Haskell is clearly the best lazy by default language in my view, if one wants 
that. Please do not try to change my mind on lazy by default, wasting effort 
and time. Also, it is not always clear which of Haskell's multiple libraries, 
when built for the same or similar purpose, is best-practice to use.

2) Standard ML: less actively used outside of academic circles than others, as 
to my understanding, and how active/large is its ecosystem? It does have MLton 
though.

3) F*: maturity and related issues bother me, but the Ocaml and F# ecosystems 
are apparently available to the extent that it targets those. The 
refinement/dependent type system is its most attractive feature. Will it 
survive/grow beyond its current niche use though?

4) Idris: gaining in maturity, but still a small ecosystem (and no visible use 
in production?). It can leverage its target language's ecosystems perhaps, and 
it may be the best dependent typed language for practical regular (non-proof) 
programming. However, how long will it be maintained (as it is a research 
project): will the main author/developer move on one day for other research? 
This is even pointed out by its developer as a reason to not use it for 
production. I have partially read its associated book and found it very nice.

5) F#: is strongly tied to Microsoft so it has compromises/constraints from 
being in the CLI family of languages and .NET framework. In addition, Microsoft 
now treats it and .NET as second (third?) class citizens apparently. This 
latter point was even made by a strong proponent of F# (see Jon Harrop's 
comments at
https://www.quora.com/Why-did-the-ML-style-languages-SML-OCaml-F-fail-to-gain-any-traction)

6) ATS: There seems to be very few people using it, so the user base would be 
small regarding asking for help/advice, and hence there would also be fewer 
examples to go by too. This is compounded by a big long learning curve, to my 
understanding, in which such things are needed. I have similar concerns 
regarding its ecosystem being small. This is all a shame since it may be a 
promising dependent-typed language (with linear types too) for scientific use 
otherwise.


*****

CURRENT MAIN CONTENDER:

OCAML

before the questions/concerns,
SOME PLUSES FIRST:

1) MirageOS (this is very big in my view, pushing Ocaml over the edge as the 
leader): This looks to be a development platform for the future: for cloud, 
ARM, IoT, and micro-cloud/fog/HP-cluster computing in addition to Unix/Linux/ 
Windows and containers on PC/servers/VMs, etc. One (MirageOS) development 
environment => wide open type-safe (and compiler 
whole-appliance/unikernel-optimized) deployment options.

2) Fast single-thread/processor/core performance, at least for an ML language

3) Though it is not multicore, there are still many distributed parallel 
computing options: JoCaml, CIEL, Opis, Functory, BSML (and related), async 
parallel from Janestreet, MPI, Parmat, SKLML, Ocamlnet, and of course forking 
in Unix,... and perhaps more I missed.

4) F* targets Ocaml, for refinement/dependent type capability, so Ocaml can 
leverage this to gain those capabilities.

5) Coq (dependently typed) extracts to Ocaml (for verification/proofs, etc), so 
this capability of Coq can be leveraged by Ocaml where needed/desired as well.

6) SPOC library for GPGPU programming at a high, type-safe level using Ocaml 
(compatible with both CUDA and OpenCL)

7) Ocsigen libraries

8) Janestreet libraries

9) Its excellent module system of course

....and more


*****

FINALLY,
CONCERNS/QUESTIONS:

1) THIS IS A WELL-KNOWN/WELL-DISCUSSED CRITICISM. Ocaml has a GIL (no shared 
memory multicore) and no type classes. Apparently there has been a promise of 
multicore development in progress that has been a long time coming to fruition, 
and that has bred skepticism in at least some. Does multicore (versus 
parallel-distributed) become less relevant as the number of cores per node 
grows? My understanding is, and I have heard it argued that, multicore behavior 
approaches distributed behavior with increasing core number. Is this true? 
Modular implicits are supposed to be on the way too (which are supposed to be 
better than type classes?), but how far off are they for mature 
production-quality use?

2) This, I think, is a big missing piece for MirageOS itself (especially in 
light of Ocaml's GIL and the "no-forking" nature of unikernels): If I use 
MirageOS as my development platform for Ocaml, which is what I would prefer, 
what is the parallel distributed computing implementation that MirageOS will 
use? (Note that this is different from Jitsu producing a swarm of application 
copies for IO demand-response, from what I understand of it.) I read where they 
will base it on the join calculus, but there seems to be some question as to 
what is beyond that specifically. I read JoCaml, CIEL, and Opis mentioned as 
possibilities. I assume this will involve communicating unikernels, spread over 
multiple cores when on a single machine, and possibly scaling up to multiple 
machines beyond that, for a given particular application running in parallel. 
As a scientific programmer I might often want parallel computing capability for 
any given application I deploy. Obviously I will be limited to course grain 
parallelism mostly, this environment being distributed.

3) This is another potentially big missing piece from MirageOS: What are the 
GPGPU programming capabilities available in the MirageOS ecosystem? Is the SPOC 
library available and usable through MirageOS on hypervisors on which MirageOS 
can run? If so can it be utilized in conjunction with the distributed computing 
solution discussed in the previous question above at the same time? In other 
words, can a unikernel in a pool of parallel-distributed communicating 
unikernels access GPGPU programming resources (via SPOC, clMAGMA, or some 
alternative) on the node on which it resides?

4) To what extent can the above two capabilities if present, GPGPU and 
parallel-distributed programming, mitigate the lack of multicore capability, 
especially regarding MirageOS? In this context, what are my options for fast 
unboxed linear algebra computations, especially running on MirageOS? Would 
Lapack routines be viable (which are usually fortran/C/C++)? Obviously shared 
in-place memory manipulation of unboxed arrays is very efficient in this 
context, as in multicore, but can new GPGPU capability compensate? For example, 
the OpenCL BLAS and clMAGMA libraries for OpenCL, and the cuBLAS and NVBLAS 
libraries for CUDA, come to mind, or something similarly able to do linear 
algebra on the GPU. More generally, has the Xen/MirageOS community looked into 
support for the scientific computing community? (I do not mean that they would 
necessarily need/want to compete with the more-niche sub-community of 
professional HPC for speed.) I am not just talking about Big Data input and 
then visualization/exploring/manipulating the data by the way. Some might want 
to run large simulations (physical, biological, etc) of some sort for example.

5) Not a deal-breaker, but MirageOS needs a real user guide/manual, preferably 
available in pdf, or better yet a book, but not just blogs as they mostly have 
now. I would like more documentation beyond concept-introduction papers too, 
emphasizing developer-needed knowledge in actually using MirageOS day to day. 
Correct me if I am wrong by pointing out such references. Ultimately a book 
that also addressed the above 4 issues as well would be fantastic.

If the above 5 have good answers forthcoming, at least in the works or in the 
near future, then that future could look very attractive and bright for 
Ocaml/MirageOS, at least to me, and probably to many.

Sincerely,
Luther Flippen


Sent From My Sprint Phone.
_______________________________________________
MirageOS-devel mailing list
[email protected]
https://lists.xenproject.org/mailman/listinfo/mirageos-devel

Reply via email to