Re: [Caml-list] Average cost of the OCaml GC
Jianzhou Zhao jianz...@seas.upenn.edu writes: On Thu, Nov 11, 2010 at 3:11 PM, Goswin von Brederlow goswin-...@web.de wrote: Jianzhou Zhao jianz...@seas.upenn.edu writes: On Thu, Nov 11, 2010 at 4:08 AM, Goswin von Brederlow goswin-...@web.de wrote: Jianzhou Zhao jianz...@seas.upenn.edu writes: Hi, What is the average cost of the OCaml GC? I have a program that calls 'mark_slice' in 57% of the total execution time, and calls 'sweep_slice' in 21% of the total time, reported by Callgrind, which is a profiling tool in Valgrind. 57% and 21% are the 'self cost' --- the cost of the function itself ('Self Cost'), rather than the cost including all called functions ('Inclusive Cost'). I guess 'mark_slice' and 'sweep_slice' are functions from OCaml GC. Are these numbers normal? Those numbers sound rather high to me. My program calls both OCaml and C, which passes around C data types in between. I also doubt if I defined the interface in an 'unefficient' way that slows down the GC. Are there any rules in mind to make GC work more efficiently? You can tune some of the GC parameters to suit your use case. Do you allocate custom types from C? In caml_alloc_custom(ops, size, used, max) the used and max do influence the GC how often to run. Yes. The code uses caml_alloc_custom to create a lot of small objects (less then 8 bytes) frequently. The used and max are set to be default, 0 and 1. The manual says http://caml.inria.fr/pub/docs/manual-ocaml/manual032.html#toc140 / If your finalized blocks contain no pointers to out-of-heap resources, or if the previous discussion made little sense to you, just take used = 0 and max = 1. But if you later find that the finalization functions are not called often enough, consider increasing the used / max ratio. // Does this mean the default used and max let GC do finalization 'as slow as possible'? This does not seem to be the case if the costs 57% and 20% are too high. I think 0/1 gives you the least amount of GC runs. If you set them wrong you might trigger the GC too often. In which case could they be set 'wrong'? For example, if 'used' is not equal to the real amount of allocated data; or is there a range of 'max' given a used? A used = 100 would be wrong here. Your 0/1 setting look fine to me. Do we still have other methods to debug such problems? Is it possible to know when and where GC runs, say, the number of times GC works after a particular usr-defined function? If this is possible, I was wondering if we can see which function in my code behave wrong. Only the interface the GC module exposes. You can turn the GC quite verbose. MfG Goswin ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
[Caml-list] Option functions (or lack thereof) + operator for composition
Hi, While writing a sample program (with lablgtk2), I found a few things annoying and thought I would ask here what you guys think. 1. Option type It seems that there is no predefined function to test an 'a option for being specifically None or Some _. This seems to be confirmed by the very existence of: http://ocaml-lib.sourceforge.net/doc/Option.html which defines such functions (is_none and is_some). I found it weird to be forced to use match expressions in my code for doing that, e.g.: * let curSelectedRow = ref None in * let updateButtonsStatus () = * button_remove#misc#set_sensitive * (match !curSelectedRow with None - false | _ - true) * in * ... I could add the OCaml library mentioned above, but I don't know how to do it (and where to find it) and, since my code is supposed to go into some other code, I'd prefer avoiding adding yet another dependency to it... 2. Operator for composition (and its precedence) To get rid of many warnings, I wrapped some calls (the connect calls of my widgets) into ignore (f x y) statements. I've no particular grief in using ignore, but I find the parentheses *really* annoying. In Haskell, I would write ignore $ f x y, which I find much lighter weight. I'm not familiar with operators and their precedence, but I wonder: is it possible to do something similar with OCaml? Thanks for reading. Best regards. --Serge ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Option functions (or lack thereof) + operator for composition
Hi, I found it weird to be forced to use match expressions in my code for doing that, e.g.: * let curSelectedRow = ref None in * let updateButtonsStatus () = * button_remove#misc#set_sensitive * (match !curSelectedRow with None - false | _ - true) * in * ... You are not forced to use match expression, you can just define : let is_none x = match x with None - true | Some _ - false and is_some x = not (is_none x) and then use these functions in your code ... In Haskell, I would write ignore $ f x y, which I find much lighter weight. Just define: let ($) f x = f x and that should do the trick Thomas ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Option functions (or lack thereof) + operator for composition
On Tue, Nov 16, 2010 at 12:43:30PM +0100, Thomas Gazagnaire wrote: You are not forced to use match expression, you can just define : let is_none x = match x with None - true | Some _ - false and is_some x = not (is_none x) and then use these functions in your code ... Or even simpler: let is_none = (=) None let is_some = (!=) None Note that you can use this directly, for instance: List.filter ((!=) None) l Best, -- Gabriel ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
[Caml-list] Re: SMP multithreading
Hi, On 15-11-2010, Wolfgang Draxinger wdraxinger.maill...@draxit.de wrote: Hi, I've just read http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html in particular this paragraph: | What about hyperthreading? Well, I believe it's the last convulsive | movement of SMP's corpse :-) We'll see how it goes market-wise. At | any rate, the speedups announced for hyperthreading in the Pentium 4 | are below a factor of 1.5; probably not enough to offset the overhead | of making the OCaml runtime system thread-safe. This reads just like the 640k ought be enough for everyone. Multicore systems are the standard today. Even the cheapest consumer machines come with at least two cores. Once can easily get 6 core machines today. Still thinking SMP was a niche and was dying? Hyperthreading was never remarkable about performance or whatever and is probably not pure SMP (emulated SMP maybe?). So, what're the developments regarding SMP multithreading OCaml? There are various development regarding this subject (most recent first): - Plasma (MapReduce in OCaml) http://plasma.camlcity.org/plasma/index.html - OC4MC (OCaml for MultiCore) http://www.algo-prog.info/ocmc/web/ - ocamlp3l http://camlp3l.inria.fr/eng.htm - jocaml http://jocaml.inria.fr/ - ocamlmpi http://forge.ocamlcore.org/projects/ocamlmpi/ All these projects try to tackle the challenge of SMP from different point of view. Maybe you'll find what your answer in one of them. Regards, Sylvain Le Gall ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Option functions (or lack thereof) + operator for composition
On 2010/11/16, at 20:27, Serge Le Huitouze wrote: 1. Option type It seems that there is no predefined function to test an 'a option for being specifically None or Some _. In ocaml you can compare at any type. So you can just write (!curSelectedRow None) which explains why there is no such function in the standard library. 2. Operator for composition (and its precedence) To get rid of many warnings, I wrapped some calls (the connect calls of my widgets) into ignore (f x y) statements. I've no particular grief in using ignore, but I find the parentheses *really* annoying. In Haskell, I would write ignore $ f x y, which I find much lighter weight. I'm not familiar with operators and their precedence, but I wonder: is it possible to do something similar with OCaml? You can, but unfortunately $ does not have the right associativity. # let ($) f x = f x;; val ( $ ) : ('a - 'b) - 'a - 'b = fun # ignore $ 1+1;; - : unit = () # succ $ succ 3;; - : int = 5 # succ $ succ $ succ 3;; Error: This expression has type int - int but an expression was expected of type int As you can see, it works fine for one argument, but doesn't work if you nest them. (Actually, this is a question of point of view, since you can use it in place of parentheses to sequence multiple arguments) If you want the compositionality, you can use something starting with @: # let (@@) f x = f x;; val ( @@ ) : ('a - 'b) - 'a - 'b = fun # succ @@ succ @@ succ 3;; - : int = 6 Note however that there is a simpler way to circumvent the problem: use the -w s option on the command line, disabling the statement warning. All my lablgtk code uses this flag :-) Jacques Garrigue ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Option functions (or lack thereof) + operator for composition
To get rid of many warnings, I wrapped some calls (the connect calls of my widgets) into ignore (f x y) statements. You can, but unfortunately $ does not have the right associativity. # let ($) f x = f x;; val ( $ ) : ('a - 'b) - 'a - 'b = fun # ignore $ 1+1;; - : unit = () # succ $ succ 3;; - : int = 5 # succ $ succ $ succ 3;; If you want the compositionality, you can use something starting with @: # let (@@) f x = f x;; val ( @@ ) : ('a - 'b) - 'a - 'b = fun # succ @@ succ @@ succ 3;; - : int = 6 Wow, tricky! Not as tricky as Perl naming scheme for variables, but still tricky! ;-) So I have to learn the precedence and associaticity table in section 6.7 of OCaml's reference manual!!! Note however that there is a simpler way to circumvent the problem: use the -w s option on the command line, disabling the statement warning. All my lablgtk code uses this flag :-) Warnings should be gotten rid of, so I don't like the idea of making them silent. But restricting silence to the statement warning is probably OK. And I'll take your words for it, since, obviously, you have more experience with lablgtk than myself! ;-) But, on the other hand, maybe you can harmlessly use -w s, whereas it may bite me, given my poor understanding of this stuff... Thanks. --Serge ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Option functions (or lack thereof) + operator for composition
Hello. So I have to learn the precedence and associaticity table in section 6.7 of OCaml's reference manual!!! I usually redefine () operator: let ( ) f x = f x It's ok until you don't use JoCaml which defines keyword . Also take a look at other infix operators I use, maybe you found them useful too: http://bitbucket.org/gds/ops/src/tip/ops.ml (comments are in russian and in OCaml, I'll translate russian text to english at first request, it's not hard for me, but it seems that OCaml code in comments describes the usage well enough). ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Option functions (or lack thereof) + operator for composition
On Tue, Nov 16, 2010 at 3:26 PM, Michael Ekstrand mich...@elehack.netwrote: Batteries provides operators for things like this. It defines the '**' operator for function application; it's an odd name, but it has the right associativity. As Dmitry mentioned, some override (). Batteries also provides composition operators |- and -|, and a pipeline operator | (opposite of **). With that operator, you can write: f x y | ignore thereby putting the emphasis on f x y and relegating ignore to a cleanup at the end. - Michael (|) as inverse of (|) is also available. It doesn't have the right associativity, but you can easily use (f -| g -| h | x) instead of (f ** g ** h ** x). Though I find the application-as-pipeline style quite readable in some cases, I think that in general it is more often superfluous than not. Besides, as mentioned recently on this list, overuse of the function composition operators (|-) and (-|) are also call for troubles with the value restriction. All in all, I think it's reasonable to stay conservative and not advertise funky binary operators too loudly. That said, domain-specific binary operators are certainly useful for readability in some contexts --- that's what an infix operator is anyway : an unreadable-by-design symbol that only get meaning by domain-specific conventions. Local open in, available by standard since OCaml 3.12, allow us to neatly encapsulate such domain-specific notations into OCaml modules. ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Option functions (or lack thereof) + operator for composition
On Tue, Nov 16, 2010 at 12:27 PM, Serge Le Huitouze serge.lehuito...@gmail.com wrote: It seems that there is no predefined function to test an 'a option for being specifically None or Some _. This seems to be confirmed by the very existence of: http://ocaml-lib.sourceforge.net/doc/Option.html which defines such functions (is_none and is_some). I found it weird to be forced to use match expressions in my code for doing that, e.g.: * let curSelectedRow = ref None in * let updateButtonsStatus () = * button_remove#misc#set_sensitive * (match !curSelectedRow with None - false | _ - true) * in * ... Though useless in this case (just use (() None)), there is a very nice syntax extension proposal by Richard Jones to transform any pattern into a boolean predicate : (matches p) would be equivalent to a function that returns true if the input matches the pattern. I have implemented it in camlp4 (the code may be slightly bitrotten) in case you're interested: http://bluestorm.info/camlp4/pa_matches.ml.html I'm not familiar with operators and their precedence, but I wonder: is it possible to do something similar with OCaml? In OCaml, the associativity/precedence of an operator is defined by its first symbols. For example (++$*) has exactly the precedence of (+). You can find all precedence classes and their prefixes in the OCaml Manual: http://caml.inria.fr/pub/docs/manual-ocaml/expr.ht...@manual.kwd33 http://caml.inria.fr/pub/docs/manual-ocaml/expr.ht...@manual.kwd33Though this is less flexible that other languages that let you choose precedence and associativity on a case per case basis, it gives a nice homogeneity to binary operators: you don't need to look at the operator definition site to have a (vague, unless you know the table by hearth) idea of its syntactic properties. ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] SMP multithreading
Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar Friendly: On 11/15/2010 09:27 AM, Wolfgang Draxinger wrote: Hi, I've just read http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html in particular this paragraph: | What about hyperthreading? Well, I believe it's the last convulsive | movement of SMP's corpse :-) We'll see how it goes market-wise. At | any rate, the speedups announced for hyperthreading in the Pentium 4 | are below a factor of 1.5; probably not enough to offset the overhead | of making the OCaml runtime system thread-safe. This reads just like the 640k ought be enough for everyone. Multicore systems are the standard today. Even the cheapest consumer machines come with at least two cores. Once can easily get 6 core machines today. Still thinking SMP was a niche and was dying? So, what're the developments regarding SMP multithreading OCaml? Cheers Wolfgang At the risk of feeding a (possibly unintentional) troll, I'd like to share some possibly new thoughts on this ever-living topic. It looks like high-performance computing of the near future will be built out of many machines (message passing), each with many cores (SMP). One could use message passing for all communication in such a system, but a hybrid approach might be best for this architecture, with use of shared memory within each box and message passing between. Of course the best choice depends strongly on the particular task. In the long run, it'll likely be a combination of a few large, powerful cores (Intel-CPU style w/ the capability to run a single thread as fast as possible) with many many smaller compute engines (GPGPUs or the like, optimized for power and area, closely coupled with memory) that provides the highest performance density. The question of how to program such an architecture seems as if it's being answered without the functional community's input. What can we contribute? Yes, that's generally the right question. Current hardware is a kind of experiment - vendors have only taken the multicore path because it is right now the easiest way of improving the performance potential, although it is questionable whether (non-server) applications can really benefit from it (excluding here server apps because for these parallelization is relatively easy to get). Future hardware will probably be even more different - however, it is still unclear which design paths will be taken. Could be manycores (many CPUs with non-uniform RAM), could be specialized compute units. Maybe we'll see again a separation of consumer and datacenter markets - the former optimizing for numeric simulation applications (i.e. games), the latter for high-throughput data paths and parallel CPU power. The problem here is that this is all speculation. There are some things we can do to improve the situation (and some ideas are not realistic): * A probably not-so-difficult improvement would be better message passing between independent but local processes. I've started an experiment for such a mechanism (http://projects.camlcity.org/projects/dl/ocamlnet-3.0.3/doc/html-main/Netcamlbox.html), which tries to exploit that GC-managed memory has a well-known structure. With more help from the GC this could be made even better (safer, fewer corner cases). * We need more frameworks for parallel programming. I'm currently developing Plasma, a Map/Reduce framework. Using a framework has the big advantage that the whole program is structured so it profits from parallelization, and that it is possible to train developers for it that have no idea about parallelization. There are probably more algorithm schemes where this is possible. * I have a lot of doubts whether FP languages ever run well on SMP with a bigger number of cores. The problem is the relatively high memory allocation rate - the GC has to work a lot harder than in imperative languages. The OC4MC project uses thread-local minor heaps because of this. Probably this is not enough, and one even needs thread-local major heaps (plus a third generation for values accessed by several threads). All in all you could get the same effect by instantiating the ocaml runtime several times (if this were possible), let each runtime run in its own thread, and provide some extra functionality for passing values between threads and for sharing values. This would not be exactly the SMP model, but would allow a number of parallelization techniques, and is probably future-proof as it encourages message passing over sharing. This is certainly worth experimentation. * One can also tackle the problem from the multi-processing side: Provide better mechanisms for message passing (see above) and
Re: [Caml-list] OCamlJit 2.0
To those of you who are lazy but still curious, I just read the report, and here are the answers to the question I had: 1. Is that project related to Basile Starynkevitch's venerable OCamlJIT ? Yes, OcamlJIT was apparently a major inspiration for this work. The overall design is similar, and in particular the objective of absolute compatibility with the existing bytecode (even when it may hurts performances) was retained. Unfortunately, due to bitrotting and different architectures (OcamlJIT x86 vs. OcamlJIT2 x86_64), there is no direct performance comparison, though the reports seems to indicate similar to better performances. Several points were also improved (better (but still suboptimal) register usage, clever hacks to speed up mapping between bytecode and native code adresses...). 2. Does it use LLVM ? No, it doesn't, but it's justified. Apparently, an early prototype showed than LLVM compilation/generation overhead was too high. A (debatable) design goal of OcamlJIT2.0 is to be absolutely faster than ocamlrun, even on *very* short-running programs (`ocamlc -help`). Here is a direct quote from the report : In a simple OCamlJit2 prototype based on LLVM, we have measured significant compilation overhead; short running programs (like ocamlc applied to a small to medium sized *.ml file) were three to four times slower than running the same program with the byte-code interpreter. Similar results were observed by other projects using LLVM for Just-In-Time compilation. For example, the Mono project10 reported impressive speedups for long running, computationally intensive applications, but at the cost of increased com- pilation time and memory usage. OCamlJIT2.0 uses its own macro to generate x86_64 assembly directly (apparently using a dedicated library wasn't worth it). A drawback of this strategy is its inherent non-portability. 3. Does it perform well ? As explained in the original post, the speedups against bytecode are around 2-6x. It's quite honourable, and the results are very consistent: all benchmarked programs running for more than 1 second have speedups between 2.59 and 5.34. The authors seems to think other optimizations are possible, and expect significant performance improvements in the future. The goal of being on par with the ocaml native compiler is (prudently) mentioned. On the benchmark, ocamlopt was itself 2 to 6 times faster than OcamlJIT2.0. 4. Various remarks: Various clever hacks are described in the report, which bring noticeable performance improvements but probably make the implementation more complex. I was also interested by the following remark --- maybe a bit provocative: Both JVM and CLR depend on profile guided optimizations for good performance, because their byte-code is slower than the OCaml byte-code. This is mostly because of the additional overhead that comes with exception handling, multi-threading, class loading and object synchronization. But it is also due to the fact that both the JVM byte-code [27] and the Common Intermediate Language (CIL) [10] instruction sets are not as optimized as the OCaml byte-code. On Tue, Nov 16, 2010 at 3:52 PM, Benedikt Meurer benedikt.meu...@googlemail.com wrote: Hello everybody, OCamlJit 2.0 is a new Just-In-Time engine for Objective Caml 3.12.0 on desktop processors (x86/x86-64). It translates the OCaml byte-code used by the interpreter (ocamlrun and ocaml) to x86/x86-64 native code on-demand and runs the generated native code instead of interpreting the byte-code. It is designed to run with minimal compilation overhead (translating only what is being executed, avoiding costly code generation and optimization techniques), while being 100% compatible with the byte-code runtime (including serialization and hashing of closures, etc.). OCamlJit 2.0 was specifically designed for desktop processors and is not really portable to anything else in its current shape, because the target audience are people using the interactive top-level and the byte-code interpreter for rapid prototyping/development (which is unlikely to happen on anything else but x86/x86-64). The implementation currently requires a system that adheres to the SysV ABI, which includes Linux, BSD, OS X, but excludes Win32/Win64 (patches/ideas are welcome). It was tested on Linux/x86 (Debian), Linux/amd64 (CentOS) and Mac OS X 10.6 (64bit). The x86 implementation requires SSE2 capable processors (otherwise it falls back to the byte-code interpreter), so it won't speedup your OCaml programs running on 486 CPUs. :-) OCamlJit 2.0 runs most benchmarks at 2-6 times faster than the byte-code interpreter. The interactive top-level benefits twice when used with the JIT engine: (a) the compiler stages are JIT compiled and (b) the generated byte-code is JIT compiled. A tech report describing a slightly earlier prototype and including performance measures of OCamlJit 2.0 on Mac OS X (64bit) is available at:
Re: [Caml-list] OCamlJit 2.0
On Nov 16, 2010, at 18:07 , bluestorm wrote: To those of you who are lazy but still curious, I just read the report, and here are the answers to the question I had: Thanks for posting these points, should have done this in my original post... 1. Is that project related to Basile Starynkevitch's venerable OCamlJIT ? Yes, OcamlJIT was apparently a major inspiration for this work. The overall design is similar, and in particular the objective of absolute compatibility with the existing bytecode (even when it may hurts performances) was retained. Unfortunately, due to bitrotting and different architectures (OcamlJIT x86 vs. OcamlJIT2 x86_64), there is no direct performance comparison, though the reports seems to indicate similar to better performances. Several points were also improved (better (but still suboptimal) register usage, clever hacks to speed up mapping between bytecode and native code adresses...). With the x86 port being close to complete, direct comparison with OcamlJIT will be possible (dunno if OcamlJIT will work with 3.12 tho). I'll do appropriate comparisons once everything is in place. 2. Does it use LLVM ? No, it doesn't, but it's justified. Apparently, an early prototype showed than LLVM compilation/generation overhead was too high. A (debatable) design goal of OcamlJIT2.0 is to be absolutely faster than ocamlrun, even on *very* short-running programs (`ocamlc -help`). This is indeed debatable, atleast for ocamlc -help. But this was not the main concern with LLVM. LLVM overhead is acceptable for long running computations, but everything else is slowed down noticably. It should also be noted that my LLVM prototype was rather quickdirty, so it may indeed be possible to get a LLVM based JIT which is on par with the byte-code interpreter for common applications. But why would one want to do this? Long running computations can be speed up very well using ocamlopt (no need to perform them using the interactive top-level). OCamlJIT2.0 uses its own macro to generate x86_64 assembly directly (apparently using a dedicated library wasn't worth it). A drawback of this strategy is its inherent non-portability. It used AsmJit before, which is also limited to x86/x86-64; the non-portability is actually a design decision (as explained in my original post, portability to non-desktop machines was not a goal for the current implementation). HTH, Benedikt Meurer ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Option functions (or lack thereof) + operator for composition
On 11/16/10 03:51, Gabriel Kerneis wrote: On Tue, Nov 16, 2010 at 12:43:30PM +0100, Thomas Gazagnaire wrote: You are not forced to use match expression, you can just define : let is_none x = match x with None - true | Some _ - false and is_some x = not (is_none x) and then use these functions in your code ... Or even simpler: let is_none = (=) None let is_some = (!=) None Note that you can use this directly, for instance: List.filter ((!=) None) l I would like to add that I also follow this general approach consisting in defining such trivial functions where I need them. Being trivial does not mean that they are going to be used everywhere. It makes the code easier to *read*. My 2 cents. Martin ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] SMP multithreading
On 2010 Nov 15, at 22:46 , Edgar Friendly wrote: It looks like high-performance computing of the near future will be built out of many machines (message passing), each with many cores (SMP). One could use message passing for all communication in such a system, but a hybrid approach might be best for this architecture, with use of shared memory within each box and message passing between. Of course the best choice depends strongly on the particular task. In the long run, it'll likely be a combination of a few large, powerful cores (Intel-CPU style w/ the capability to run a single thread as fast as possible) with many many smaller compute engines (GPGPUs or the like, optimized for power and area, closely coupled with memory) that provides the highest performance density. OCaml code should be able to share immutable OCaml data with other processes just as it shares libraries. See http://cap-lore.com/Software/pch.html . Some of the ideas there might be improved with hardware support. Admission: If I had read all of the interesting pointers given on this thread I would never finish sending this e-mail. ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] SMP multithreading
On Tue, Nov 16, 2010 at 7:04 PM, Gerd Stolpmann i...@gerd-stolpmann.dewrote: Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar Friendly: * As somebody mentioned implicit parallelization: Don't expect anything from this. Even if a good compiler finds ways to parallelize 20% of the code (which would be a lot), the runtime effect would be marginal. 80% of the code is run at normal speed (hopefully) and dominates the runtime behavior. The point is that such compiler-driven code improvements are only local optimizations. For getting good parallelization results you need to restructure the design of the program - well, maybe compiler2.0 can do this at some time, but this is not in sight. I think you are underestimating parallelizing compilers. -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] SMP multithreading
Am Dienstag, den 16.11.2010, 22:35 +0200 schrieb Eray Ozkural: On Tue, Nov 16, 2010 at 7:04 PM, Gerd Stolpmann i...@gerd-stolpmann.de wrote: Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar Friendly: * As somebody mentioned implicit parallelization: Don't expect anything from this. Even if a good compiler finds ways to parallelize 20% of the code (which would be a lot), the runtime effect would be marginal. 80% of the code is run at normal speed (hopefully) and dominates the runtime behavior. The point is that such compiler-driven code improvements are only local optimizations. For getting good parallelization results you need to restructure the design of the program - well, maybe compiler2.0 can do this at some time, but this is not in sight. I think you are underestimating parallelizing compilers. I was more citing Amdahl's law, and did not want to criticize any effort in this area. It's more the usefulness for the majority of problems. How useful is the best parallelizing compiler if only a small part of the program _can_ actually benefit from it? Think about it. If you are not working in an area where many subroutines can be sped up, you consider this way of parallelizing as a waste of time. And this is still true for the majority of problems. Also, for many problems that can be tackled, the scalability is very limited. Gerd -- Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany g...@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] SMP multithreading
On Wed, Nov 17, 2010 at 12:13 AM, Gerd Stolpmann i...@gerd-stolpmann.dewrote: Am Dienstag, den 16.11.2010, 22:35 +0200 schrieb Eray Ozkural: On Tue, Nov 16, 2010 at 7:04 PM, Gerd Stolpmann i...@gerd-stolpmann.de wrote: Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar Friendly: * As somebody mentioned implicit parallelization: Don't expect anything from this. Even if a good compiler finds ways to parallelize 20% of the code (which would be a lot), the runtime effect would be marginal. 80% of the code is run at normal speed (hopefully) and dominates the runtime behavior. The point is that such compiler-driven code improvements are only local optimizations. For getting good parallelization results you need to restructure the design of the program - well, maybe compiler2.0 can do this at some time, but this is not in sight. I think you are underestimating parallelizing compilers. I was more citing Amdahl's law, and did not want to criticize any effort in this area. It's more the usefulness for the majority of problems. How useful is the best parallelizing compiler if only a small part of the program _can_ actually benefit from it? Think about it. If you are not working in an area where many subroutines can be sped up, you consider this way of parallelizing as a waste of time. And this is still true for the majority of problems. Also, for many problems that can be tackled, the scalability is very limited. What makes you think only 20% of the code can be parallelized? I've worked in such a compiler project, and there were way too many opportunities for parallelization in ordinary C code, let alone a functional language; implicit parallelism would work wonders there. Of course you may think whatever you were thinking, but I know that a high degree of parallelism can be achieved through a functional language. I can't tell you much more though. If you think that a very small portion of the code can be parallelized you probably do not appreciate the kinds of static and dynamic analysis those compilers perform. Of course if you are thinking of applying it to some office or e-mail application, this might not be the case, but automatic parallelization strategies would work best when you apply them to a computationally intensive program. The really limiting factor for current functional languages would be their reliance on inherently sequential primitives like list processing, which may in some cases limit the compiler to only pipeline parallelism (something non-trivial to do by hand actually). Instead they would have to get their basic forms from high-level parallel PLs (which might mean rewriting a lot of things). The programs could look like something out of a category theory textbook then. But I think even without modification you could get a lot of speedup from ocaml code by applying state-of-the-art automatic parallelization techniques. By the way, how much of the serial work is parallelized that's what matters rather than the code length that is. Even if the parallelism is not so obvious, the analysis can find out which iterations are parallelizable, which variables have which kinds of dependencies, memory dependencies, etc. etc. In the public, there is this perception that automatic parallelization does not work, which is wrong, while it is true that the popular compiler vendors do not understand much in this regard, all I've seen (in popular products) are lame attempts to generate vector instructions That is not to say, that implicit parallelization of current code can replace parallel algorithm design (which is AI-complete). Rather, I think, implicit parallelization is one of the things that will help parallel computing people: by having them work at a more abstract level, exposing more concurrency through functional forms, yet avoiding writing low-level comms code. High-level explicit parallelism is also quite favorable, as there may be many situations where, say, dynamic load-balancing approaches are suitable. The approach of HPF might be relevant here, with the programmer making annotations to guide the distribution of data structures, and the rest inferred from code. So whoever says that isn't possible, probably hasn't read up much in the computer architecture community. Probably even expert PL researchers may be misled here, as they have made the quoted remark or similar remarks about multi-core/SMP architectures. It was known for a very long time (20+ years) that the clock speeds would hit a wall and then we'd have to expend more area. This is true regardless of the underlying architecture/technology actually. Mother nature is parallel. It is the sequence that is an abstraction. And I wonder what is more natural than
Re: [Caml-list] SMP multithreading
On Wed, 17 Nov 2010 01:04:54 +0200 Eray Ozkural examach...@gmail.com wrote: [readworthy text] I'd like to point out how the big competitor to OCaml deals with it. The GHC Haskell system has SMP parallization built in for some time, and it does it quite well. Wolfgang ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] SMP multithreading
On Wed, Nov 17, 2010 at 1:52 AM, Wolfgang Draxinger wdraxinger.maill...@draxit.de wrote: On Wed, 17 Nov 2010 01:04:54 +0200 Eray Ozkural examach...@gmail.com wrote: [readworthy text] I'd like to point out how the big competitor to OCaml deals with it. The GHC Haskell system has SMP parallization built in for some time, and it does it quite well. I think I tested the parallel features just once in the distant past, something like that would be so useful for ocaml :) Explicit threading that is suited to functional programming with a syntax independent of the actual thread implementation. The par combinator looks like fun to use. Wishlist item definitely :) Cheers, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
RE: [Caml-list] SMP multithreading
Wolfgang wrote: I'd like to point out how the big competitor to OCaml deals with it. The GHC Haskell system has SMP parallization built in for some time, and it does it quite well. I beg to differ. Upon trying to reproduce many of the Haskell community's results, I found that even their own parallel Haskell programs often exhibit huge slowdowns. This is because Haskell's unpredictable performance leads to unpredictable granularity and, consequently, more time can be spent administering the tiny parallel computations than is gained by doing so. The results I found here are typical: http://flyingfrogblog.blogspot.com/2010/01/naive-parallelism-with-hlvm.html Note that the absolute performance peaks at an unpredictable number of cores only in the case of Haskell. This is because the GC does not scale beyond about 4 cores for any Haskell programs doing significant amounts of allocation, which is basically all Haskell programs because allocations are everywhere in Haskell. Ultimately, running on all cores attains no speedup at all with Haskell in that case. This was branded the last core slowdown but the slowdown clearly started well before all 8 cores. There was a significant development towards improving this situation but it won't fix the granularity problem: http://hackage.haskell.org/trac/ghc/blog/new-gc-preview The paper Regular, shape-polymorphic, parallel arrays in Haskell cites 2.5x speedups when existing techniques were not only already getting 7x speedups but better absolute performance as well. Cache complexity is the problem, as I explained here: http://flyingfrogblog.blogspot.com/2010/06/regular-shape-polymorphic-paralle l.html Probably the best solution for multicore programming is Cilk. This technique has already been adopted both in Intel's TBB and Microsoft's .NET 4 but, AFAIK, the only functional language with access to it is F#. There are some great papers on multicore-friendly cache oblivious algorithms written in Cilk: http://www.fftw.org/~athena/papers/tocs08.pdf Note, in particular, that Cilk is not only much faster but also much easier to use than explicit message passing. To do something like this, threads need to be able to run in parallel and mutate the same shared heap. Although that is objectively easy (I did it in HLVM), OCaml's reliance upon very high allocation rates, efficient collection of young garbage and a ridiculous density of pointers in the heap make it a *lot* harder. Cheers, Jon. ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
RE: [Caml-list] SMP multithreading
Granularity and cache complexity are the reasons why not. If you find anything and everything that can be done in parallel and parallelize it then you generally obtain only slowdowns. An essential trick is to exploit locality via mutation but, of course, purely functional programming sucks at that by design not least because it is striving to abstract that concept away. I share your dream but I doubt it will ever be realized. Cheers, Jon. From: caml-list-boun...@yquem.inria.fr [mailto:caml-list-boun...@yquem.inria.fr] On Behalf Of Eray Ozkural Sent: 16 November 2010 23:05 To: Gerd Stolpmann Cc: caml-list@yquem.inria.fr Subject: Re: [Caml-list] SMP multithreading On Wed, Nov 17, 2010 at 12:13 AM, Gerd Stolpmann i...@gerd-stolpmann.de wrote: Am Dienstag, den 16.11.2010, 22:35 +0200 schrieb Eray Ozkural: On Tue, Nov 16, 2010 at 7:04 PM, Gerd Stolpmann i...@gerd-stolpmann.de wrote: Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar Friendly: * As somebody mentioned implicit parallelization: Don't expect anything from this. Even if a good compiler finds ways to parallelize 20% of the code (which would be a lot), the runtime effect would be marginal. 80% of the code is run at normal speed (hopefully) and dominates the runtime behavior. The point is that such compiler-driven code improvements are only local optimizations. For getting good parallelization results you need to restructure the design of the program - well, maybe compiler2.0 can do this at some time, but this is not in sight. I think you are underestimating parallelizing compilers. I was more citing Amdahl's law, and did not want to criticize any effort in this area. It's more the usefulness for the majority of problems. How useful is the best parallelizing compiler if only a small part of the program _can_ actually benefit from it? Think about it. If you are not working in an area where many subroutines can be sped up, you consider this way of parallelizing as a waste of time. And this is still true for the majority of problems. Also, for many problems that can be tackled, the scalability is very limited. What makes you think only 20% of the code can be parallelized? I've worked in such a compiler project, and there were way too many opportunities for parallelization in ordinary C code, let alone a functional language; implicit parallelism would work wonders there. Of course you may think whatever you were thinking, but I know that a high degree of parallelism can be achieved through a functional language. I can't tell you much more though. If you think that a very small portion of the code can be parallelized you probably do not appreciate the kinds of static and dynamic analysis those compilers perform. Of course if you are thinking of applying it to some office or e-mail application, this might not be the case, but automatic parallelization strategies would work best when you apply them to a computationally intensive program. The really limiting factor for current functional languages would be their reliance on inherently sequential primitives like list processing, which may in some cases limit the compiler to only pipeline parallelism (something non-trivial to do by hand actually). Instead they would have to get their basic forms from high-level parallel PLs (which might mean rewriting a lot of things). The programs could look like something out of a category theory textbook then. But I think even without modification you could get a lot of speedup from ocaml code by applying state-of-the-art automatic parallelization techniques. By the way, how much of the serial work is parallelized that's what matters rather than the code length that is. Even if the parallelism is not so obvious, the analysis can find out which iterations are parallelizable, which variables have which kinds of dependencies, memory dependencies, etc. etc. In the public, there is this perception that automatic parallelization does not work, which is wrong, while it is true that the popular compiler vendors do not understand much in this regard, all I've seen (in popular products) are lame attempts to generate vector instructions That is not to say, that implicit parallelization of current code can replace parallel algorithm design (which is AI-complete). Rather, I think, implicit parallelization is one of the things that will help parallel computing people: by having them work at a more abstract level, exposing more concurrency through functional forms, yet avoiding writing low-level comms code. High-level explicit parallelism is also quite favorable, as there may be many situations where, say, dynamic load-balancing approaches are suitable. The approach of HPF might be relevant here, with the programmer
[Caml-list] Lightweight way to extend existing module. Almost there?
With the improvements to the module system in ocaml 3.12 extending a signature just got much much lighter. Suppose I want the signature of the module which in every respects is like StdLabels excepted that the String module has an extra is_prefix_of I can write: module String : sig include module type of StdLabels.String val is_prefix_of : string - string - bool end include module type of StdLabels with module String := String I know of no way to extend the implementation in a similarly concise way (for which the length of the solution does not depend on the number of top level items in StdLabels). Have I overlooked anything obvious? Till P.S.: FYI existing solution for the implementation: module String = struct include StringLabels let is_prefix_of (needle:string) (haystack:string) : bool = let len = length needle in len = length haystack (sub haystack ~pos:0 ~len = needle) end (* Haven't quite found out how to include a module hiding another one... *) module List = ListLabels module Array = ArrayLabels ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] SMP multithreading
Oh well, I'm not so surprised that the fine-grain task-parallelism with (?) dynamic load-balancing strategy doesn't get much speedup. Doing HPC with Haskell is a bit like using Java for writing parallel programs, you might as well use a C-64 and Commodore BASIC. And yes, some people do use Java with MPI. Java people have benchmarks too :) But for some reason I had difficulty using Java and Haskell even with medium size problems. On the other hand, a lot more can be achieved with a parallelizing compiler that uses profiling and static analysis. As I said even in C good results can be achieved, I've seen that, so I know it's doable with ocaml, just a difficult kind of compiler. The functional features would expose more concurrency. At any rate, implicit parallelism isn't the same as a parallelizing compiler, it's better, because you would be using primitives, that the compiler knows to its heart. That's like combining the best of both worlds, I think, because obviously parallelizing compilers can work best on the easier kinds of parallelism. It can pull more tricks than many assume, but it would still not replace a parallel algorithm designer. You don't really expect to give quicksort as input and get hypercube quicksort as output in these parallelizing compilers that apply a number of heuristic transformations to the code, but in many problems they can be made to generate pretty good code. The best part is once you have it you can apply it to every program, it's one of the cheapest ways to get speedup, so I'd say it's worthwhile for ocaml right now. Just not the way GHC does. On Wed, Nov 17, 2010 at 5:47 AM, Jon Harrop jonathandeanhar...@googlemail.com wrote: Granularity and cache complexity are the reasons why not. If you find anything and everything that can be done in parallel and parallelize it then you generally obtain only slowdowns. An essential trick is to exploit locality via mutation but, of course, purely functional programming sucks at that by design not least because it is striving to abstract that concept away. I share your dream but I doubt it will ever be realized. Cheers, Jon. *From:* caml-list-boun...@yquem.inria.fr [mailto: caml-list-boun...@yquem.inria.fr] *On Behalf Of *Eray Ozkural *Sent:* 16 November 2010 23:05 *To:* Gerd Stolpmann *Cc:* caml-list@yquem.inria.fr *Subject:* Re: [Caml-list] SMP multithreading On Wed, Nov 17, 2010 at 12:13 AM, Gerd Stolpmann i...@gerd-stolpmann.de wrote: Am Dienstag, den 16.11.2010, 22:35 +0200 schrieb Eray Ozkural: On Tue, Nov 16, 2010 at 7:04 PM, Gerd Stolpmann i...@gerd-stolpmann.de wrote: Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar Friendly: * As somebody mentioned implicit parallelization: Don't expect anything from this. Even if a good compiler finds ways to parallelize 20% of the code (which would be a lot), the runtime effect would be marginal. 80% of the code is run at normal speed (hopefully) and dominates the runtime behavior. The point is that such compiler-driven code improvements are only local optimizations. For getting good parallelization results you need to restructure the design of the program - well, maybe compiler2.0 can do this at some time, but this is not in sight. I think you are underestimating parallelizing compilers. I was more citing Amdahl's law, and did not want to criticize any effort in this area. It's more the usefulness for the majority of problems. How useful is the best parallelizing compiler if only a small part of the program _can_ actually benefit from it? Think about it. If you are not working in an area where many subroutines can be sped up, you consider this way of parallelizing as a waste of time. And this is still true for the majority of problems. Also, for many problems that can be tackled, the scalability is very limited. What makes you think only 20% of the code can be parallelized? I've worked in such a compiler project, and there were way too many opportunities for parallelization in ordinary C code, let alone a functional language; implicit parallelism would work wonders there. Of course you may think whatever you were thinking, but I know that a high degree of parallelism can be achieved through a functional language. I can't tell you much more though. If you think that a very small portion of the code can be parallelized you probably do not appreciate the kinds of static and dynamic analysis those compilers perform. Of course if you are thinking of applying it to some office or e-mail application, this might not be the case, but automatic parallelization strategies would work best when you apply them to a computationally intensive program. The really limiting factor
Re: [Caml-list] SMP multithreading
On Wed, Nov 17, 2010 at 06:27:14AM +0200, Eray Ozkural wrote: As I said even in C good results can be achieved, I've seen that, so I know it's doable with ocaml, just a difficult kind of compiler. The functional features would expose more concurrency. Could you share a pointer to a paper describing this compiler? Thanks, -- Gabriel Kerneis ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs