Hello, I pasted a copy of the article below for those that cannot access the site.
I would be interested to see an article on Haskell in the same light as this Ocaml article, aka a constructive criticism of Haskell. Enjoy! __ Donnie ### Begin Article ### Why Ocaml Sucks<http://enfranchisedmind.com/blog/2008/05/07/why-ocaml-sucks/> Published by Brian <http://enfranchisedmind.com/blog/author/bhurt-aw/> at 6:49 pm under Functional Languages: Ocaml, Haskell<http://enfranchisedmind.com/blog/categories/programming/functional-programming/> One of the ways to not fall into the blub fallacy is to regularly consider those ways in which your favorite language is inferior, and could be improved- preferrably radically improved. Now, it should come as a surprise to no one that my favorite language is (currently) Ocaml. So as an intellectual exercise I want to list at least some of the ways that Ocaml falls short as a language. I will note that if you use this post as a reason to *not* use Ocaml, you are a fool and are missing the point of this post. With the *possible*exception of Haskell, no other language I know of comes close to getting all the things right that Ocaml gets right. So, with that in mind, let's begin. 1. Lack of Parallelism In case you haven't guessed it from reading this blog, I think parallelism is going to be the big problem for the next decade. And while Ocaml does have threads, it also has a big ol' global interpreter lock, so that only one thread can be executing Ocaml code at a time. Which severely limits the utility of threads, and severely limits the ability of Ocaml programmers to take advantage of multiple cores. Now, to give the INRIA team credit, the reason they have for this is a good one- we (programmers) still don't know for sure how to write multithreaded code safely (I have my suspicions, but I have no proof), and we had even less of an idea a decade and a half ago when the fundamentals of Ocaml were being decided. And supporting multithreaded code slows down the garbage collector. And, a decade and a half ago, multithreaded code was a lot less important, and multicore systems were rare and expensive. So this is mainly just Ocaml showing it's age. But still, if I were to pick the biggest shortcomming of Ocaml, this would be it. 2. Printf You know what? Printf was a bad idea for C- having this special domain-specific language in what looks like strings to control formatting. It's even worse in Ocaml, where you have to add special kludges to the compiler to support it (at least the way Ocaml did it- there are smarter ways <http://www.brics.dk/RS/98/12/> that don't require compiler kludges). You might think that "%3d" is a string in Ocaml, and sometimes you'd be right. But sometimes it's a (int -> '_a, '_b, '_c, '_a) format4. This horrid type (which is *not* a string) is necessary to encode the types of the arguments printf needs to be passed it. The one thing I know of that C++ did better than Ocaml was ditching printf. And the idea of iostreams even isn't that bad- except for the fact that they encouraged generations of C++ programmers to abuse operator overloading (hint to Bjarne Stroustrup: << and >> are not the I/O operators, they're the bit shift operators!), and they made the iostream classes exceptionally difficult to inherit from or extend. But a combination of C++ style iostream operators and Java style iostream classes would have worked so much better, and not required hacking the compiler. An even better idea might be some variant of functional unparsing<http://www.brics.dk/RS/98/12/> . 3. Lack of multi-file modules Ocaml has an exceptionally powerfull and flexible module and functor system, which stops rather annoyingly at the file level. Files are modules, and files (and modules) can contain other modules within them, but you can't collect several files into one big multi-file module. You especially can't say that certain files, while they may be visible to other modules within the multi-file module, aren't visible outside the multi-file module. This is especially useful if you want to factor out common base functionality from a library of modules into a shared module, but not productize it for export to the outside world. This limits the ability of the programmers to correctly structure large projects. The -for-pack arguments help fix a lot of this, sorta- but they require smarts in the build system and generally special .mli files to control visibility. It can be done, but it's not clean- it makes the build process a lot more complex, and important information about which files are externally visible or not is contained somewhere that is not in the source code. This could certainly be done a hell of a lot better than it is. 4. Mutable data Mutable data is both a blessing and a curse. It is a blessing when learning the language, especially when you're coming from conventional imperative programming languages. Rather than having to learn what a lazy real time catenable dequeue is, and why you want one, you can just bang out a quick mutable doubly linked list and get the same behavior. I would argue that if you're coming from the imperative/OO world, it's easier to learn Ocaml than Haskell for precisely this reason. But this implies that mutable data is training wheels of a sort. And, like the fact that training wheels prevent you from going fast, mutable data prevents Ocaml from doing a lot of interesting and useful things. Many of Haskell's most interesting features- such as STM, easy multithreading, and deforrestation, and scrap your boilerplate, arise out of Haskell being purely applicative. Mutable data is holding Ocaml back. 5. Strings This is more of a minor nit, but the representation of strings as a (compact) array of chars is stupid. It made sense of pointer-arithmetic C, not so much for Ocaml. The definition of chars as 8 bits prevents the internationalization of Ocaml beyond the European languages, and the representation is inefficient. The most important operations on strings are either iterating over all the characters in a string, or concatenating two strings. Random access and mutation is rare, *unless* you're using the string as a bit or byte array, which is encouraged by the nature of strings. Bit arrays should be first class types of themselves, with specialized internal representations, allowing strings to be optimized as strings. An unbalanced tree based structure would allow O(1) (and exceptionally fast) string concatenation while preserving the O(N) cost to iterate over all characters in a string. Also, strings are currently mutable data, and as described above, mutable data is teh sucketh. This is doubly so for strings, which can be profitably interned (and thus should be). 6. Shift-reduce conflicts Shift-reduce conflicts happen in parser generators, especially LALR(1) parser generators like yacc and ocamlyacc, where there are situations where the parser doesn't know if the current expression is complete, or can still go on. The golden example of this is the "dangling else" problem of C and other languages, where you might see code like this: if (test1) then if (test2) then expr1 else expr2 The problem is that the parser, after it has finished parsing expr1expression and looking at the else, doesn't know wether to shift it (making the else and expr2 part of the inner if) or reduce it (making the else part of the outer if). I strongly recommend everyone who is developing a language to write the initial syntax in a LALR(1) parser, and eliminate all shift-reduce (and especially reduce-reduce) conflicts while you still can, because every shift-reduce conflict is a bug waiting to happen. In the above example, based on indenting, it is very likely to be meant to be part of the outer if, but the parser is going to bind it to the wrong if, causing a bug. This is slightly less bad in Ocaml, where generally the bug will be a type error, but that doesn't mean it's good. And Ocaml has literally dozens of these suckers lying in wait to bite and unwary programmer. Even I get bitten by shift-reduce conflict generated bugs on a regular basis. A quick side digression for those who started yelling that the indentation should be significant to the compiler. Consider the output that would be caused by printing the following 3 C/Ocaml strings: "\n \tx\n", "\n \t x\n", and "\n\t x\n". Each one starts with a newline, ends with an x and a newline, and contains exactly two spaces and a tab in between. Now, what columns do the three x's end up in? The answer is "it depends upon which editor you're using to view the output with, and what configuration that editor has". I just spent a large chunk of this afternoon cleaning up a bunch of code written by a programmer who merrily mixed spaces and tabs, and used a different editor than I do. As a consequence, what looked neatly formatted to him looked like gibberish to me. And if two humans can't agree on whether a particular piece of code is well formatted or not, what hope does the computer have to figure out the situation? The solution is to eliminate the root cause of the problem (the shift-reduce conflicts), not add a new way to introduce subtle and hard to debug errors. 7. Not_found One of the worst design decisions Ocaml has to have made is Not_found- this is the exception that's thrown whenever something isn't found. Tha'ts helpfull, isn't it? It's thrown by the Set and Map modules (and all their functor applications) the List module, various other data structure libraries, various regular expression libraries, and, as a rule, just about any random peice of code. And it contains absolutely no information as to what is not found, or what was being looked for. Or even what was doing the looking. At least Failure and Invalid_argument at least take a string argument, which is generally is the name the function that failed. So if you see the exception Invalid_argument("hd"), you know at least to be looking at calls to List.hd (or to other functions named "hd"). But Not_found? This is the sound of your hd hitting the desk, repeatedly. 8. Exceptions And while I'm ragging on Not_found, let me rag on exceptions a bit. Exceptions are used way too commonly in Ocaml- OK, I'll give Failure, Invalid_argument, and Assertion_failure a pass, but basically Not_found and End_of_file should never, *ever*, be thrown. Return 'a option instead. This has non-trivial implications. For example, take the classic newbie ocaml "mistake"- write a function that reads in a file and returns the file as a list of strings, one string per line. Should be easy. We use input_line as to read a line, and write our code like: let read_file desc = let rec loop accum = try let line = input desc in loop (line :: accum) with | End_of_file -> List.rev accum in loop [] ;; And they can't understand why this code blows up when they try to read a file with more than 30,000 lines in it. There are standard solutions to this- but they make the code uglier, and by far the best is to wrap the exception throwing code in an expression that turns the exception into an option, like: let read_file desc = let rec loop accum = let line = try Some(input desc) with | End_of_file -> None in match line with | Some s -> loop (s :: accum) | None -> List.rev accum in loop [] ;; At which point you really have to raise the question of why input_string didn't just return string_option from the get-go. The other popular alternative is to use mutable data. Which is just trading off one problem set for another. There is, by the way, an interesting idea for expetion handling that's been raised recently (see here<http://dx.doi.org/10.1017/S0956796801004099>and here <http://portal.acm.org/citation.cfm?id=1022471.1022475>). The basic idea is that the current Ocaml syntax for declaring an exception handler is: try expr1 with | Exception -> expr2 where if expr1 doesn't throw an exception, the value of the hole expression is the value returned by expr1- otherwise, if it throws Exception, then the whole expression has the value expr2. This syntax is replaced by: try var = expr1 in expr2 with | Exception -> expr3 Here, if expr1 does not throw an exception, then the value of the whole expression is that of expr2 with variable var having the value of expr1, while if it throws Exception, then the value of the whole expression is then expr3. Note that exceptions are only caught in expr1- it's just that if an exception is caught, then expr2 is skipped as well. But this means that calls from within expr2 can be tail calls. Let's take a look at our read_file program, the newbie way, a third time, this time with our new exception syntax: let read_file desc = let rec loop accum = try line = input_line desc in loop (line :: accum) with | End_of_file -> List.rev accum in loop [] ;; Since the tail call to loop is outside where we catch exceptions, it's a true tail call, and the function works "as expected". And the fact that we're using exceptions instead of options isn't that big of a deal anymore (Not_found still sucks, due to it's simple ubiquity and taciturnity). If someone with more time and/or camlp4 knowledge than I have wants to write this up as an extension, I'd be eternally gratefull. While I'm at it, would it be possible to have exceptions in the type system? I know Java tried this and everyone hated it, but Ocaml has two things that Java didn't- type inference and type variables. The type variables are nice because they allow you to deal with "unknown" exceptions- for example you could have a function of type (int -> int throws 'a) -> int throws 'a, meaning that you don't know (or care) what exceptions the passed-in funciton might throw, but that you throw the same exceptions (presumably by calling the passed-in function). Type inference also means you don't have to declare most of these cases, the compiler can infer what exceptions are thrown. This level of changes to the type system are highly unlikely, but a boy can dream, can't he? 9. Deforrestation This really should be under the "mutable data is the sucketh" section, but oh well. Haskell has introduced a very interesting and (to my knowledge) unique layer of optimization, called deforrestation. Basically, what happens if that the ghc compiler recognizes and can optimize certain common sub-optimal data structure transformations. For example, it can convert map f (map g lst) into map (f . g) lst, where the peroid (.) is the function composition operator. There are a couple of obvious advantages to being able to do this- for example, by doing so we've eliminated an intermediate data structure, and created new opportunities for optimizing the combined f and g functions. But there are two things of serious interest to me about this. First of all, this is the first time I've seen optimizations at this level being this easily performed. Companies like Intel and IBM have thrown literally man-millenia at this problem, and the solutions they've come up with were limited and fragile (slight changes to the code would enable or disable the optimization). And yet the Haskell people implemented in one Simon Peyton Jones long weekend (also known as a couple of man months for mere mortals like you and I). The reason for this is that the real difficult is not actually implementing the transformation, or even detecting that the transformation might be applicable, it's proving that the transformation is correct, that the code after the transformation is applied behaves identically to the code before the transformation is applied. In langauges with mutable data, like Fortran and C++ (and Ocaml), this is decidedly non-trivial. In Haskell, you can dash off the proof that it's always correct on the back of a cocktail napkin- proving that it's correct in the general case for all lists and all functions. And that thus the compiler doesn't even need to check- once it detects that the pattern can be applied, it can just go ahead and apply it. Haskell is doing data structure level optimizations with the ease that most other compiler do peephole instruction optimization. This is a non-trivial result. The second important aspect of this is that it changes the concept of what optimization is, or should be. I forget which paper it was I was reading that said that optimization should really be called depessimization. That the programmer wants to introduce pessimizations- the programmer could do the above deforrestation himself, except that to do so would make maintainance more difficult, as it'd break module boundaries, or requires knowledge of how a particular module is implemented, or simply requires knowledge of widely seperated peices of code. The goal of the compiler, rather than striving to produce some "optimal" (and thus impossible to obtain) implementation, but simple to undo the pessimizations that the programmer has introduced in the name of maintainablity, modularity, readability, and/or simplicity. The programmer shouldn't have to worry about creating intermediate data structures, and should worry about corrupting his code in the name of performance- that's the compilers job (don't break your code, let the computer do that for you! :-). Needless to say, between the mutable data, the side effects, and the handling of exceptions, Ocaml isn't going get deforrestation any time soon. 10. Limited standard library If writing monad tutorials is the cottage industry of Haskell programmers, than rewriting the standard library is Ocaml's cottage industry. You almost can't call yourself an Ocaml programmer if you haven't rewritten a goodly chunk of the standard library at least once. This is because every Ocaml programmer has a long list of functions that should (they think) be in the standard library but just aren't. The big ones for me are the lack of Int and Float modules ready for use by the Set and Map functors, the lack of an already-instantiated Map and Set modules for Strings, Ints, and Floats, and the lack of a list to set/map function. None of these are exactly hard to do, just annoying to have to constantly redo. Of course, the problem with this is the range and design patterns of Ocaml programmers- especially with comparison to languages like Java and C++, or even Ruby. The design patterns of a top-5% Java programmer is not all that different from the design philosophy of a bottom-30%'er. There may be differences in precisely what features are supported, and what names things are given may change, but the broad stroke designs will be similiar. The design patterns of Brian Hurt circa 2008 are radically different than the design patterns of Brian Hurt circa 2003. For example, were I to redo extlib now, it's be purely applicative, heavily functorized, lots of monads, and a fair bit of lazy evaluation, as opposed to the code I wrote in 2003. 11. Slow lazy Speaking of lazy evaluation- Ocaml's built-in lazy evaluation is wicked slow. I've actually timed it, and it's like 130+ clock cycles of overhead to force a lazy value the first time, and over 70 clock cycles of overhead to force it the second and subsequent times (when all it has to do is return the cached value). Given that most of the good uses of lazy evaluation has it wrapping computations that are like 10 clock cycles long or so, the overhead of the lazy thunk dominates. In this era of Ruby programmers, this may not seem like much- but this overhead discourages Ocaml programmers from using lazy evaluation. Which is a shame, as anyone who has read Okasaki knows how frimping usefull it is. Well, that's my list so far- subject to revision without notice. The vast majority of them qualify as picking nits- and they are. If you can damn with faint praise, then you should also be able to praise with faint damning- and this certainly qualifies as that. Most of the problems here only became apparent (or their solutions became apparent) long after Ocaml was mature and set. For example, it's only be recently that multi-threading has been a big deal. And the usefullness of monads and lazy evaluation for functional programming weren't understood until the late nineties/early 21st, a decade after Ocaml's formative years. So this whole post mostly amounts to whining that the Ocaml developers weren't *even more insightfull and foresightfull*than they were. And, compared to the major revisions Ocaml's "age-mates" Java and Perl have gone through, it's stood up well with the passing of time. No lanuage is perfect, including Ocaml. And to continue to improve we must understand what has gone before- both what worked, and what didn't. Popularity: 6% [?<http://alexking.org/projects/wordpress/popularity-contest> ] ### End Article ### On Thu, May 8, 2008 at 2:27 PM, Andrew Coppin <[EMAIL PROTECTED]> wrote: > Chad Scherrer wrote: > >> PS - the link now gives a "500 server error" - did our traffic overwhelm >> it? Is >> the cafe the next slashdot? ;) >> >> > > Hell, I can't even get a TCP SYN-ACK packet out of it... :-/ > > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe