I've just been learning OCaml for the first time; I've written OCaml
programs before, but I didn't really learn enough of the language to
do anything useful.  I thought I would take some time to reflect on
the experience of learning the language, since (barring brain damage)
I won't have the opportunity to learn OCaml a second time.

First, the positives.

All Kinds Of Good Stuff
=======================

OCaml has many advantages.  Even the bytecode interpreter, running
under PowerPC emulation on this Intel Mac, is much faster than
Python's interpreter or any JavaScript interpreter.  You can define
new infix operations easily, or replace old ones (although you can't
override the old ones for a particular type).  Pattern matching makes
all kinds of interpreter-like things a cinch.  It's easy to use infix
operations as values (merely by parenthesizing them).  Implicit
currying often makes programs briefer and sometimes even clearer.
User-defined constructors display nicely by default, and just in
general, tagged unions are nice to use.

Pattern-matching is especially nice; consider this:
     let rec toyaplshow = ... function ...
       | Parenthesized e -> "(" ^ toyaplshow e ^ ")" 
       | Binary (Atom _ as e1, f, e2) | Binary (Parenthesized _ as e1, f, e2) 
-> 
           toyaplshow e1 ^ " " ^ f ^ " " ^ toyaplshow e2
       | Binary (e, f, e2) -> 
             toyaplshow (Parenthesized e) ^ " " ^ f ^ " " ^ toyaplshow e2 ;;

Binary, Atom, and Parenthesized are three of the constructors for a
certain abstract type; toyaplshow is a recursive function for
displaying it as an expression; and "^" is string concatenation.  In
this case, I want to parenthesize the leftmost operand of a binary
operation if it's not one of Atom or Parenthesized.  Doing this in an
object-oriented style, the code is about twice as long.

    class Expr:
        def needs_parentheses_sometimes(self): return True
    class Binary(Expr):
        def toyaplshow(self):
            left = self.left
            if left.needs_parentheses_sometimes(): left = Parenthesized(left)
            return "%s %s %s" % (left.toyaplshow(), self.op, 
                self.right.toyaplshow())
    class Atom(Expr):
        def needs_parentheses_sometimes(self): return False
    class Parenthesized(Expr):
        def needs_parentheses_sometimes(self): return False
        def toyaplshow(self): return "(%s)" % (self.expr,)

In this case, there also happen to be more kinds of values than there
are operations on them, so dividing the operations along class lines
instead of along functional lines results in stronger coupling.

Things can get far hairier;  consider this example from the OCaml tutorial:
    let balance = function 
          Black, z, Node (Red, y, Node (Red, x, a, b), c), d 
        | Black, z, Node (Red, x, a, Node (Red, y, b, c)), d 
        | Black, x, a, Node (Red, z, Node (Red, y, b, c), d) 
        | Black, x, a, Node (Red, y, b, Node (Red, z, c, d)) -> 
            Node (Red, y, Node (Black, x, a, b), Node (Black, z, c, d)) 
        | a, b, c, d -> 
            Node (a, b, c, d) 

In theory, you can get access to all of the Perl and Python libraries
as well through FFIs to Perl and Python, but those FFIs don't seem to
have been included in the OCaml package I got.

As with Python and Perl, you get fairly full access to the POSIX API,
but with some improvements; for example, select() takes lists of
file-descriptor objects, gettimeofday() returns a float, and most of
the system calls raise exceptions when they fail instead of returning
an error code.  I'm not sure all the changes are necessarily
improvements; I might really prefer bitvectors for select() for
efficiency, for example, and I'm not sure I can create a
Unix.file_descr from an integer.  But mostly they are.

And there are optional and named arguments (which seem to mostly
work), an object system that supports fully-static duck typing (at
least as far as what methods your objects define), a profiler, a
debugger, a native-code compiler that is reputed to generate good
code, records with named fields, recursion, tail-call elimination,
etc.  No first-class continuations, but exceptions.  And the compiler
tells you if you left out a case in your pattern-matching case
analysis.

Some Bad Stuff Too
==================

I wrote most of this while working through Jason Hickey's tutorial;
these are my notes learning OCaml as an experienced programmer who
doesn't know much OCaml.  As programming language tutorials go, it's
not too bad, although it's less than practical-minded; he doesn't
mention I/O at all until chapter 8, or explain it in full until
chapter 9.

I'm not that impressed with the OCaml experience for newbies.
Learning a new programming language is always somewhat frustrating; it
usually takes several days of learning before you can do anything
useful.  But I've found OCaml particularly frustrating, despite its
many advantages, and I thought I would write a bit about the
frustrations in hopes that it might enable me to correct them later
on.

Some of these frustrations are deeply rooted in the nature of OCaml,
and are not likely to go away without a completely new language, and
will probably continue to bother me even when I know the language;
others, such as badly-worded error messages, are quite superficial.

I hope this doesn't attract flames; it's not intended to deprecate the
OCaml language, which I am very impressed with.  I'm a little worried
that comparing OCaml unfavorably to déclassé languages like Perl,
Python, and JavaScript will lead people to discount my opinion.  I
assure you, I know other programming languages well (C being my
favorite low-level language and one I know well), but I find Python
and JavaScript more practical than C most of the time.

Insufficiently Helpful Error Messages
-------------------------------------

Here's a typical newbie mistake.  I mistakenly wrote the pattern 

"(a, b) :: rest" as "a, b :: rest", thinking that would parse as I
intended (since, of course, [a, b; c, d] parses as [(a, b); (c, d)]).
The error message from the ocaml toplevel said:

    This pattern matches values of type 'a * 'b list
    but is here used to match values of type 'c list

Now, if you read the error message correctly, it says that I'm trying
to match a list (which ocaml knew because another pattern-matching
case was []) against a pattern for a tuple containing a list as a
second member.  But the type "'a * 'b list" is structurally ambiguous
in the same way that the pattern is; I parsed it as ('a * 'b) list,
which was what I intended, not 'a * ('b list), which is what the
compiler meant.

In short, I was communicating ambiguously to the compiler, which
misinterpreted what I meant as something that didn't make sense; in
order to explain how it interpreted my statement, it used an
explanation that was ambiguous in exactly the same way, so that it
appeared to me that the compiler had understood my intent, but for
some mysterious reason did not think it was a good idea to carry out.

Of course, anyone who knows the least little bit of OCaml can parse
the type, and furthermore is not likely to seek explanations in
"mysterious reasons".  But as I was just beginning to learn OCaml,
everything was somewhat mysterious to start with.

Another example:

      function ([], []) -> 3 | (a1::as, b1::bs) -> 4 | _ -> 5 ;;
                               ^
    Syntax error: ')' expected, the highlighted '(' might be unmatched

This pattern-match doesn't parse.  Why?  I couldn't tell.  The error
message doesn't help.  (It turns out that "as" is a reserved word.)

Even "assert", which you put in your program so as to give you a
useful error message, gives you crap like this:

    Exception: Assert_failure ("toyapl.ml", 62, 4).

It does tell you which assertion failed (eventually you learn that
that means "line 62, column 4") but it would be nice if it included
the string of the actual expression that failed.

Most of my assertions are unit tests, usually assertions of equality
between two things, an expected value and an actual value.  I can
compare these two with the "=" operator, but I cannot figure out how
to declare an exception to tell me what they were:

      exception Test_failure of int * 'a * 'a ;;
                                      ^^
    Unbound type parameter 'a

I'd be satisfied with string representations; somehow the toplevel
produces string representations of objects of arbitrary types, but I
haven't yet learned how to do that.

Another example that's not quite so egregious:

      let y = List.map ((+)2) up_to 2000000 ;;
              ^^^^^^^^
    This function is applied to too many arguments,
    maybe you forgot a `;'

In my experience so far, every time I have gotten this message, it
hasn't been because I forgot a ';'; it's been because I neglected to
include some parentheses.

Sometimes Completely Enigmatic Error Messages
---------------------------------------------

    This expression has type toyaplexpr but is here used with type toyaplexpr

This can happen when you redefine a type in the interactive toplevel,
but some references to the old type remain.  The enigmatic nature of
this error message is such a well-known bug that it is described in
many introductions to OCaml.

    # Unix.getpid () ;;
    Reference to undefined global `Unix'

This means that it can't access the "Unix" library, although it
doesn't bother to say why.  It does not mean that the "Unix" library
doesn't exist, or that you misspelled its name.  It means you needed
to launch the ocaml interpreter with "ocaml unix.cma" instead of
"ocaml".

Insufficiently Helpful Exceptions
---------------------------------

    # List.assoc "hi" ["ho", 3] ;;
    Exception: Not_found.

Good thing "hi" was a constant --- it would be hard to tell what was
"Not_found", because the Not_found exception doesn't take any
arguments.

Similarly:

    # int_of_string "31 ";;
    Exception: Failure "int_of_string". 

It would be helpful if the exception told what string it failed on.  I
think it may be impossible to add this in a backward-compatible way to
OCaml today.

Along the same lines, these exceptions fail to acquire any stack-trace
information; you get the same error message of two or three words if
the exception happened deep down the stack, in the middle of a big
program.

The default Python stack trace at least shows you each line of code in
the exception stack, and I usually use cgitb, which also shows you a
few lines of context and the values of all the variables on the line
of code in question.  Usually this is sufficient to fix the problem
without further experimentation.

The lack of a stack trace on exceptions is hardly a disadvantage when
you're comparing to C (no exceptions) or C++ (no stack traces either),
but it's a disadvantage comparing to Java, Lisp, or Python; even Perl
records the source file and line number of origin by default, and a
full stack trace if you use "croak" instead of just "die".

Insufficient Introspection
--------------------------

So what does List.assoc do, anyway?

    # List.assoc ;;
    - : 'a -> ('a * 'b) list -> 'b = <fun>

Well, that's nice --- it tells you what the arguments and the return
type are.  Maybe from those you can figure out what the function is
for.  (In OCaml, that's not as far-fetched as it sounds; I can't think
of any other useful functions that have the type signature above.  But
it takes some thought.)  But if you type an analogous expression in R,
you get the entire source code of the function.  In Python, the
default response is similarly unenlightening:

    >>> map     
    <built-in function map>

But look, you can do this:

    >>> help(map)
    Help on built-in function map:

    map(...)
        map(function, sequence[, sequence, ...]) -> list

        Return a list of the results of applying the function to the
        items of the argument sequence(s).  If more than one sequence
        is given, the function is called with an argument list
        consisting of the corresponding item of each sequence,
        substituting None for missing values when not all sequences
        have the same length.  If the function is None, return a list
        of the items of the sequence (or a list of tuples if more than
        one sequence).

How about the functions available in the List library?  The List
library isn't even a value that you can type as an expression:

    # List ;;
    Characters 0-4:
      List ;;
      ^^^^
    Unbound constructor List

Compare Python:
    >>> import string
    >>> dir(string)
    ['__builtins__', ... 'upper', 'uppercase', 'whitespace', 'zfill']
    >>> dir()
    ['__builtins__', '__doc__', '__name__', 'string']
    >>> dir(__builtins__)
    ['ArithmeticError', 'AssertionError', 'AttributeError', ...
    'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'unichr',
    'unicode', 'vars', 'xrange', 'zip']

Python even has features like rlcompleter2 and IPython which allow you
to type "string."  and hit Tab to see what's in the "string" module.
(What Python calls "modules", OCaml calls "libraries".)

This would be a smaller problem if the OCaml documentation were
complete and up-to-date, but see the section "Missing Documentation".

Too Much Precedence
-------------------

OCaml has a lot of syntax.  Another typical newbie mistake:

    let simpleenv = ["+", fun x -> x; "-", (~-)] ;;

Commonly, because "," binds tighter than ";", you can conveniently
write lists of pairs conveniently like [1, "one"; 2, "two"].  In this
case, though, the "fun" gobbles up everything to the right, as
explained by the type the interpreter spits out:

    (string * ('a -> string * (int -> int))) list

Which is to say, a list of pairs of strings and functions, where each
function takes a value of some arbitrary type, discards it, and
returns a pair of a string and a function from ints to ints.

Somewhat Low-Level
------------------

The negative side of Ocaml's higher speed is that it's somewhat
low-level; the intrinsic data structures are things like linked lists,
fixed-size arrays, strings, and several different struct-like
structures (tuples, constructors, and records).  There are libraries
that provide things like auto-growing string buffers, queues, sets,
hash tables, balanced binary trees (in case you're a masochist),
(Perl-incompatible) regular expressions, random number generation,
large multidimensional numerical arrays, type-unsafe data structure
marshaling, arbitrary-precision numbers, MD5, command-line parsing,
and so on.

However, it appears that because Ocaml's "module", functor, and
object-orientation facilities are not widely enough used, the
information about whether you're using, say, a hash table or a
balanced binary tree, will be spread through all the code that
accesses it.

Lame Libraries
--------------

It's not that the regexp library (in the Str module) is gratuitously
Perl-incompatible, or that it gratuitously uses global state; given
OCaml's age, that may be forgivable.  It's just that the library is
small.

Some people see this as an advantage:

    It's not that there are an extraordinary number of tools in the
    toolbox.  (No; in fact, the toolbox is much smaller than the usual
    toolboxes, the ones used by your friends that contain everything
    but the sink.)  It's that the toolbox was carefully and very
    thoughtfully assembled by some very bright toolsmiths, distilling
    their many decades of experience, and designed, as all good
    toolkits are, for a very specific purpose: building fast, safe and
    solid programs that are oriented around separate compilation of
    functions that primarily do recursive manipulation of very complex
    data structures.

    Programs, like, say ... compilers.

    - Dwight VandenBerghe <[EMAIL PROTECTED]>, 1998-07-28, "Why
      ML/OCaml are good for writing compilers", posted to
      comp.compilers, currently online at
      http://flint.cs.yale.edu/cs421/case-for-ml.html

I don't see it as an advantage, myself.  I'm glad OCaml isn't multiple
gigabytes like the Apple Developer Tools CD, but here I have listed
some of the things I hoped were there, but that I can't find in the
standard library shipped with the language.  Most of the things listed
below have Python versions preinstalled on this Mac in
/System/Library/Frameworks/Python.framework (66MB), Perl versions in
/System/Library/Perl (45MB), and Java versions in
/System/Library/Frameworks/JavaVM.framework/Versions/1.5.0/Classes/classes.jar
(21MB, and that includes something like seven independent
implementations of the DOM.)  ocaml-3.09.2.dmg is only 13MB; maybe
there's an unmet need for an "OCaml distribution" that includes
standard libraries that cover most of these things and weighs in at
25-50MB.

- a more sophisticated unit-test framework than "assert"
- basic algorithmic stuff:
  - priority queues (there's one on p.25 of the user's manual but none in 
    /usr/local/lib/ocaml)
  - binary search
  - auto-resizing arrays (like OrderedCollection or Python's lists; the
    Buffer library has an implementation for strings)
  - CRC
  - random-shuffle and random-choice
- parsing and production of common data formats:
  - SGML or HTML or XML
    - any parser at all would be a big improvement, but if it had some of the
      horrible standard interfaces like DOM, SAX, XPath, or even XSLT, I could
      pick it up and use it more easily.  Amusingly, one of the best XQuery
      implementations is written in OCaml.
  - URLs
  - CSV
  - RFC-822 parsing
  - MIME
  - RSS
  - mbox
  - UTF-8 (!)
  - other character encodings
  - .ZIP files
- networking things:
  - a socket interface that lets me open a TCP connection to a DNS name
    with a single function call
  - an HTTP client library (server would be nice too, but its absence is
    less surprising; but, hey, guys, I hear this WWW thing might catch on)
  - other things are also missing but that's less surprising (SSL/TLS, IMAP, 
    FTP, SMTP, SNMP, LDAP, SOAP, XML-RPC, IRC, NNTP, POP)
- signal-processing things:
  - the ability to read and write common image formats (JPEG, PNG, GIF, PBM)
  - the ability to lines, text, and other graphical objects on them, with 
    alpha and antialiasing
  - the ability to read and write common audio formats (AU, AIFF)
  - other DSP stuff would be gravy but it's not surprising it's missing; the
    Bigarray library contains roughly the absolute minimum
- I'd complain about the absence of GUI stuff, but actually I think
  it's included, but doesn't work due to the absence of C development
  tools on this Mac.  (Although maybe it only works on X11.)
- type-safe marshalling and unmarshalling, at least for some subset of data 
  types (e.g. JSON, YAML)
- some kind of RPC (I'm not asking for IIOP here!)
- data compression and decompression tools
- SHA-1 and other cryptographic tools (it's nice that it has MD5, but DES, AES, 
  and RSA would be big pluses)
- termcap
- some kind of DBM or Berkeley DB interface
- some kind of SQL interface (at least SQLite!)
- some kind of date handling (although the Unix module does have
  gmtime, localtime, and mktime, it doesn't have strftime, and using
  mktime to print a calendar is kind of a pain)
- file pathname manipulation (dirname, basename, relative-path resolution)
- some kind of concurrency handling.  I'm not saying I want threads
  --- I would never say that --- but threads, or Erlang-style
  processes with easy communication, or Twisted-style reactors, or
  something!  Maybe there's something in the "vmthreads" directory,
  but it's hard to tell (see sections Missing Documentation and
  Insufficient Introspection for why.)
- some kind of logging for debugging (more than just Printf.fprintf stderr)

A lot of these things, maybe all of them, are available in other
libraries you can download from various sites on the internet.

Missing Documentation
---------------------

The MacOS X man pages for things like "ocaml" suggest reading "The
Objective Caml user's manual", but it does not seem to be included in
the disk image from which I installed OCaml.  Also, many of the .cm*
files in /usr/local/lib/ocaml lack man pages.

There are occasional errors in the documentation (for example, the
Pervasives man page, for example, says that integers are 31 bits wide
rather than 30; and the manual uses {< >} in an example a few sections
before explaining what it is) but I don't think they're as important
as the documentation that's missing altogether.

Later, when I had internet access again, I downloaded "The Objective
Caml system release 3.09 Documentation and user's manual", which I
think is the user's manual referred to in the man pages; but I find
that it refers me to the Camlp4 documentation, which I'm pretty sure I
don't have, for information about pattern-matching on streams.
Unfortunately I may not have internet access again for a while.

Arbitrary Limits
----------------

These limits seem most unfortunate to me:
    # Sys.max_string_length ;;
    - : int = 16777211
    # Sys.max_array_length ;;
    - : int = 4194303

I frequently deal with chunks of binary data bigger than 16 megabytes
in size, and arrays of more than 4 million items; I'm typing this on a
laptop with two gigabytes of RAM.

And the standard file operations (seek_in, pos_in, etc.) use 30-bit
numbers to represent file positions and sizes, with the consequence
that every use of them adds a bug to your program whenever it deals
with a file over 256MB in size.  (Unless you're on a 64-bit platform,
I think.)  The problem with this is that 256MB is now 15 cents' worth
of disk space, so I have a lot of files bigger than that.

Too Much Early Binding
----------------------

I'm sure Xavier Leroy and company, too, wish seek_in and pos_in used
64-bit ints, and that you could tell what key was Not_found in an
association list, and that you could change your program from storing
stuff in an array to storing it in a hash-table easily when the keys
turned out to be sparser than you expected.  The module system makes
the last item at least possible, but the others are not supported in a
backwards-compatible way by the type system.

Of course, a great deal of the Zen of OCaml is that lots of binding
can happen at compile-time rather than at run-time, so your code can
get type-checked and maybe run fast too.  But the binding I'm
complaining about is happening earlier, at edit-time; you have to
actually change your source code in order to accommodate the use of
Unix.LargeFile.lseek and its int64 arguments.  It isn't obvious to me
that, in principle, static typing requires these sorts of decisions to
be accidentally strewn across your entire codebase, given the
existence of type inference and parametric polymorphism in the form of
functors.

This is not a new problem --- it happens in C programs all the time,
due to the ubiquitous explicit static typing.  It's just that I had
the impression that OCaml had perhaps conquered this problem, as have
the languages I normally use day-to-day.

Here's another, smaller-scale example.  My unit tests largely consist
of code like this:

    assert ([45; 50] = eval (Atom [45; 50]));

Maybe at some point I would like to use floating-point for all my
interpreter's values instead of integers.  I could imagine a language
where I could retain compatibility with these tests by making Atom
into a function that does the necessary conversion for the old tests;
but that language is not OCaml, because in OCaml, your functions
cannot begin with capital letters.

Clumsiness Working With Source Code Files
-----------------------------------------

The unfortunate necessary duplication of code between .ml and .mli
files (if you have .mli files) is well-documented elsewhere.  But how
about if you want to interactively test some code from your module to
see why it fails a unit test?  Here's how I load that code into a
toplevel so I can test it:

    Beauty:~/devel/ocaml-tut kragen$ ocaml toyapl.ml
    Exception: Assert_failure ("toyapl.ml", 62, 4).
    Beauty:~/devel/ocaml-tut kragen$ ocaml
            Objective Caml version 3.09.2

    # Toyapl.eval ;;
    Characters 0-11:
      Toyapl.eval ;;
      ^^^^^^^^^^^
    Unbound value Toyapl.eval
    # 
    Beauty:~/devel/ocaml-tut kragen$ ocamlc -g -c toyapl.ml
    Beauty:~/devel/ocaml-tut kragen$ ls toyapl.*
    toyapl.cmi  toyapl.cmo      toyapl.ml       toyapl.ml~
    Beauty:~/devel/ocaml-tut kragen$ ocaml toyapl.cmo
    Exception: Assert_failure ("toyapl.ml", 62, 4).
    Beauty:~/devel/ocaml-tut kragen$ # I edit the file, comment out the assert
    Beauty:~/devel/ocaml-tut kragen$ ocamlc -g -c toyapl.ml
    Beauty:~/devel/ocaml-tut kragen$ ocaml toyapl.cmo
            Objective Caml version 3.09.2

    # Toyapl.eval ;;
    - : Toyapl.toyaplexpr -> Toyapl.toyaplval = <fun>

The (faked) equivalent in Python:

    >>> import toyapl
    Traceback (most recent call last):
      File "toyapl.py", line 62, in test
    AssertionError: ([45], 45)
    >>> toyapl.eval
    <function eval at 0x62bb0>

Python checks to see if there was a bytecode file for toyapl.py, and
if so whether it is up-to-date; if not, it recompiles it; then it
imports the bytecode file.  There is an exception during import (my
unit tests normally run during import) but because I'm in an
interactive toplevel, the module remains imported anyway so I can poke
at it and see what was wrong.

In a similar vein, consider this.
    Beauty:ocaml kragen$ cat > time.ml
    print_float (Unix.gettimeofday()) ;; print_string "\n"
    Beauty:ocaml kragen$ ocaml time.ml
    Reference to undefined global `Unix'
    Beauty:ocaml kragen$ (echo '#load "unix.cma" ;;'; cat time.ml) > time2.ml
    Beauty:ocaml kragen$ ocaml time2.ml
    1170635455.04
    Beauty:ocaml kragen$ ocamlc time2.ml
    File "time2.ml", line 1, characters 0-1:
    Syntax error

The #load directive makes it work in the interpreter, but causes a
syntax error in the compiler.  I think the right solution is to run
the program with "ocaml unix.cma time.ml" and compile it with "ocamlc
unix.cma time.ml", but it's a non-obvious failure mode.

This particular problem (in effect, the interpreter supports a
superset of the language supported by the compiler, and I made the
mistake of using one of the extensions) could be ameliorated by a more
helpful error message.

Syntax
------

I'm not going to belabor the point on syntax.  Everybody knows there's
a problem; I explained some aspects of the problem in some previous
sections.  This bites newbies like me particularly hard.

Out-Of-Dateness
---------------

ocamlopt generates PowerPC assembly code on this Intel MacBook, but
it's January 2007.  See also the complaints about libraries.

Possible Improvements
=====================

Probably some of these are bad ideas, as will be obvious to me after I
have more experience.  I'm not (yet) volunteering to do them.

1. A fatter distribution.  Including the manual (1MB), ocamlnet
   (800K), and the ocamlnet manual (450K) would go a long way toward
   removing my own biggest frustrations.

2. A simpler syntax with fewer levels of precedence.  Maybe writing
   OCaml in S-expressions would be going too far, or maybe not.
   Clearly this would make it effectively a different language.

3. No missing man pages.

4. No arbitrary limits on object size (other than physical address space,
   of course.)

5. Clearer wording for many error messages.

6. Reflection/introspection capabilities including the ability to
   interactively list the contents of modules.  Full reflection could
   be difficult to reconcile with static type-checking, but I think
   it's possible; but even modest reflection capabilities would help a
   lot with interactive use.

7. Self-documentation --- the ability to attach doc strings to
   functions and types, and doc strings being present throughout the
   standard library.

8. Exceptions that accumulate a full stack trace.  This is a little
   hairy to get to work in native code, but I don't think it's
   intractable.

Reply via email to