Re: [Caml-list] ocamlclean : an OCaml bytecode cleaner (Was: (Announce) OCAPIC : OCaml for PIC18 microcontrollers)
Hello, ocamlclean removes code to which there is no possible path. For instance, this program : let plop = List.map succ [1;2;3];; uses module List (for map) and module Pervasives (for succ) but doesn't use a lot of functions of List or Pervasives (e.g., List.iter, List.fold_left, Pervasives.print_endline). So most functions of modules Pervasives and List are removed from the bytecode executable. If one dynamically loads some bytecode, for instance the previous program becomes let plop = List.map succ [1;2;3];; let _ = Dynlink.load stuff.cmo;; then stuff.cmo should not reference anything that may not exist, such as Pervasives.(@) since it has been removed by ocamlclean. And we are not supposed to know at compile-time what stuff.cmo needs from stdlib. Hence I guess everything should be kept and ocamlclean not used. On the other hand, if we statically know what is in stuff.cmo, then why load it dynamically? (I guess the answer can be just for fun but I'm not so sure it's such a good answer :-) Though, I'm not very familiar with Dynlink, and I'm not sure what Dynlink.allow_only really does... I haven't tested using dynlink to load a self-sufficient module. I might work, but I don't really see how it can be usefull anyway... Cheers, Philippe Wang On Nov 11, 2010, at 06:52 AM, Julien Signoles wrote: Hello, Is ocamlclean compatible with dynamic loading? That is code potentially used by some unknown dynamically-loaded code must be kept. -- Julien ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
[Caml-list] ocamlclean : an OCaml bytecode cleaner (Was: (Announce) OCAPIC : OCaml for PIC18 microcontrollers)
Dear all, Shortly: ocamlclean is now available in a separate package so that you don't have to get the whole ocapic distribution just to try ocamlclean. More information: ocamlclean takes a bytecode executable (which are generally but not necessarily produced by ocamlc compiler) and reduces its size by eliminating some dead code. Dead code is discriminated statically. (It's impossible to eliminate all dead code, but in some cases it can reduce bytecode executables tremendously) It is meant to be compatible with standard bytecode such as produced by ocamlc. (DBUG section is currently not supported and is removed during the cleaning process. Other unsupported sections are left untouched.) Web site: http://www.algo-prog.info/ocaml_for_pic/ Developer: Benoît Vaugon -- Philippe Wang http://www-apr.lip6.fr/~pwang/ ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] (Announce) OCAPIC : OCaml for PIC18 microcontrollers
On Nov 6, 2010, at 18:47 GMT+01:00, Goswin von Brederlow wrote: Philippe Wang philippe.w...@lip6.fr writes: Dear all, this is an announcement for OCAPIC, a project which brings OCaml to programming PIC micro-controllers. Some PIC18 series characteristics: - 8 bit architecture - low cost (a few US dollars), fairly spread in electronics world - very low volatile memory (a few bytes only, up to ~5000 bytes, depending on the model) - very low non-volatile memory (less than a KB up to 128 KB) - EEPROM : 0 to 1024 bytes Doesn't the overhead of boxed structures as well as loosing a bit on ints make that impractical given the extremly limited memory? MfG Goswin Thanks for the question. Let me try to give an (indirect) answer. OCAPIC has 16-1=15bit integers and 16bit blocks. And the overhead is quite acceptable to us. A gobblet game [1] I.A. was implemented and tested. (The OCaml code is included in the distribution so anyone can check it out.) The first version of this game was very hard to beat (for a human). Then a strategy was found (to beat the I.A.). So some randomization was supplied to the I.A. to make it more interesting. Now the I.A. has become really very hard to beat. (We used a PIC18F4620: flash memory = 64kiB; volatile memory = 3968B ; EEPROM = 1KiB ; speed = 10 MIPS) Between two moves, the I.A. may trigger the GC about ten times or more. However, the time between two moves is less than 2 seconds, and generally quite less than half a second (and in the beginning of the game it's hard to realize the time it takes). Providing a GC to programming PIC microcontrollers is a tremendous gain comparing to manually manage everything (memory and computing). Providing a high-level language allows to implement algorithms that would be very hard or impossible to implement in ASM (or most low-level languages such as C or Basic). We haven't yet experimented real-time constrained programming (e.g., ReactiveML might bring OCAPIC to a step further). Now, maybe the direct answer to the question can be : programming PICs has been impractical to most people, now all readers of this list can potentially program them without much difficulties (and without paying a too high cost on performance efficiency). :-) [1] http://www.educationallearninggames.com/how-to-play-gobblet-game-rules.asp Cheers, -- Philippe Wang philippe.w...@lip6.fr http://www-apr.lip6.fr/~pwang/ ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] (Announce) OCAPIC : OCaml for PIC18 microcontrollers
PIC ASM is the first programming language Benoît learnt, a few years ago. He has practiced it ever since. But meanwhile he learnt OCaml (among other languages). A few months ago, he suggested me to implement an OCaml virtual machine running on PICs, with maximum performance efficiency in mind. This is why OCAPIC's VM is implemented in ASM. The purpose is of course to program PICs with a high level language while remaining (relatively) *very* efficient. Vincent St-Amour and Marc Feeley have a similar project (Scheme on PICs) which a much higher priority on portability: their VM is implemented in C code. http://www.ccs.neu.edu/home/stamourv/picobit-ifl.pdf The side effect of our project — which can interest many OCaml users — is that OCAPIC provides ocamlclean, which is a tool that takes an OCaml bytecode binary (produced by ocamlc) and reduces it by (statically) eliminating most of its deadcode (and of course dynlink is thence broken; note that dynlink is not relevant on PICs). This tool is independent from the rest of OCAPIC. Actually, this tool was mandatory for programs using OO-layer : without it, bytecode binaries embedding OO-layer were to big to fit on our PICs. Cheers, Philippe On Nov 5, 2010, at 1:35 PM, Daniel Bünzli wrote: Interesting project. Was the choice of PIC based on technical reasons or just familiarity of the authors with these chips ? I would have liked to give it a try but unfortunately I work AVRs and avr-gcc. Best, Daniel ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
[Caml-list] (Announce) OCAPIC : OCaml for PIC18 microcontrollers
Dear all, this is an announcement for OCAPIC, a project which brings OCaml to programming PIC micro-controllers. Some PIC18 series characteristics: - 8 bit architecture - low cost (a few US dollars), fairly spread in electronics world - very low volatile memory (a few bytes only, up to ~5000 bytes, depending on the model) - very low non-volatile memory (less than a KB up to 128 KB) - EEPROM : 0 to 1024 bytes How to program those little chips with OCaml: - write an OCaml program, compile it, transfer it to the PIC. Well, actually it demands a little more than just that: - write an OCaml program, like usually, while keeping in mind that the stack is more limited than usual, same for the heap - compile it (with ocamlc) - reduce the binary (with ocamlclean : a bytecode reducer which removes dead-code) - transform the (reduced or not) binary (with bc2asm : take back not useful zeros, thence reducing the binary size) - transfer it to the PIC along with its OCaml VM. Indeed, an OCaml VM has been implemented in PIC18 ASM in order to run OCaml programs on a PIC ! :-) An example of real program is in the distribution (open source, downloadable from the website): ocapic-1.3/src/tests/goblet/ (722 lines of ML code). We also provide a simulator in order to run on a PC (needs X11 (Linux/MacOSX) and GCC) your programs written for PIC18. The whole implementation has been fairly well tested, however the documentation is still quite young. Here is the website : http://www.algo-prog.info/ocaml_for_pic/ Cheers. Benoît Vaugon (developer and initiator of OCAPIC project) Philippe Wang (supervisor) Emmanuel Chailloux (supervisor) P.S. si vous êtes francophone et nous contactez directement, merci de le faire en français ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] oc4mc status?
Hi, thank you for your interest :) A few days ago, I updated the web site. It is now minimal (almost all in a single page) but shows the project status on the top of the page (and is easier to maintain). Currently, the last entry is [[ 2010 spring-summer (current work, in progress) : (ocaml-3.12-svn) “from scratch”, making the runtime library fully reentrant, first without threads preoccupation ]] This means that with the very little man-power we have, we are currently concentrated on making the runtime library fully reentrant (while relying on the past experience). This work currently does not address parallel threads, which have become a secondary issue. I may detail the motivations later, if they don't appear evidently... Cheers, -- Philippe Wang http://www-apr.lip6.fr/~pwang/ On Jul 2, 2010, at 4:32 PM, Eray Ozkural wrote: Hi there, oc4mc looks like a cool project, I had heard it before but I never got to try it, I suppose the latest development release worked with ocaml 3.10.2. I downloaded it and want to give it a shot to see if I can get some speedups with a parallel code I'm working on. So, how is the development going? I read on their page that they are planning a release for this summer based on the new ocaml. Cheers, -- Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara http://groups.yahoo.com/group/ai-philosophy http://myspace.com/arizanesil http://myspace.com/malfunct ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Ques from a beginner: how to access a type defined in one .ml file in another .ml file
On Wed, May 12, 2010 at 12:30 PM, Tarun Sethi tarunseth...@gmail.com wrote: Hi, I m very new to ocaml and I am not sure if this the right forum to ask a beginner level question. I have tried reading tutorials and the manual but no help. Please help me on the problem below, In a.ml a record type t is defined and is also defined transparently in a.mli, i.e. in d interface so that the type definition is available to all other files. a.ml also has a function, func, which returns a list of t. Now in another file, b.ml i m calling func, now obviously ocaml compiler wud nt be able to infer d type of objects stored in d list, for compiler its just a list. so in b.ml, i hav something like dis, let tlist = A.func in let vart = List.hd tlist in printf %s\n vart.name (*name is a field in record t*) Now here i get a compiler error sayin Unbound record field label name which makes sense as compiler can't infer d type of vart. my first question: how do I explicitly provide d type of vart as t here? i tried doing let vart:A.t = but got the same error. I also tried creating another function to fetch the first element of d list and mentioning return type as A.t, but then i got the Unbound value A.t. I did this: let firstt = function [] - 0 | x :: _ - A.t x ;; The problem is compiler is unable to recognize A.t (a type) in b.ml but is able to recognize function A.func. If I remove A.t from the b.ml, i don'get any compiler errors. Please help, its urgent work. Thanks in advance! ~Tarun I guess this is not the right place to ask such a question... There is a beginners' list. However, this should answer your question : write instead : variable_name.Module_name.field_name If variable_name has been defined in yet another module, you may write YetAnotherModule.variable_name.Module_name.field_name If you want to avoid module name prefixes, you may want to use open : open Module_name;; let foo = variable_namefield_name ;; However (from my personal point of view) open should be avoided because it often makes maintenance very tough. About type constraints, the syntax is rather this : (variable : type_name) with parentheses most of the time. -- Philippe Wang m...@philippewang.info ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Threads Scheduling
On Tue, Apr 13, 2010 at 11:56 PM, Gregory Malecha gmale...@gmail.com wrote: Hi Jake, The documentation for Condition.wait says: wait c m atomically unlocks the mutex m and suspends the calling process on the condition variable c. The process will restart after the condition variable c has been signalled. The mutex m is locked again before wait returns. I figured that I needed to lock and unlock the mutex in the child threads because otherwise it is possible for the condition variable to be signaled before the main thread waits, which I thought means that the signal is lost. Thanks Daniel, I'll take a look at it. On Tue, Apr 13, 2010 at 5:04 PM, Daniel Bünzli daniel.buen...@erratique.ch wrote: You may also be interested in this thread [1]. Daniel [1] http://groups.google.com/group/fa.caml/browse_thread/thread/9606b618dab79fb5 -- gregory malecha Hi, Your f function *might* prevent preemption... For instance, if let f () = while true do () done;; then it means f does not allocate nor call any external function, and so it the scheduler is stuck because scheduling is done at allocation or *some* external functions (which contain blocking sections, e.g., I/O operations). So it is important that when using Thread module, there is, for scheduling, at some point a call to an allocation or a blocking operation, or Thread.yield. As most functional code will allocate, this problem is not so frequent, though. -- Philippe Wang m...@philippewang.info ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Question about ocaml threads and TLS (on linux)
Hi, I'm not sure I understand (though I've read the whole text), but maybe this will answer your question: On Linux, OCaml threads (with the native compiler ocamlopt) are implemented with POSIX threads (in C), so when your OCaml thread runs the C stub, it's the same as if you were running the C stub in some C thread. When you are in a section declared as a blocking section, a collection can be triggered concurrently in another thread and so the heap must not be accessed neither for reading or writing, that's all. Using __thread recent feature should also work if you manage to compile everything correctly. Notably, we use it in some places in ocaml4multicore (a patch to ocaml's runtime library to allow parallel threads). However, I don't know how __thread is handled by the compiler... I mean : it there a pointer for buf in every thread or only in those that use it? I hope my answer isn't useless! Cheers, -- Philippe Wang m...@philippewang.info On Wed, Feb 24, 2010 at 10:00 PM, Goswin von Brederlow goswin-...@web.de wrote: Hi, I'm having a little problem for my libfuse-ocaml bindings for the threaded interface. For those that don't want to read all of the mail my question is: Will every ocaml thread have its own thread-local-storage in the C stubs? I have the following calling sequence: User ocaml code | Fuse C stub | libfuse code --+--+- Fuse.process fs 'process stub' | | enter_blocking_section() | | char *buf = malloc(size) | | fuse_session_process() | ops-write(buf+off) | 'write stub' | | leave_blocking_section() | | a = caml_ba_alloc_dims() | caml_callback(...,a,...) | my_ref := a | | enter_blocking_section() | callback done | 'process stub' | | free(buf) | leave_blocking_section() | Fuse.process done | The 'process stub' allocates a buffer and frees it at the end, which is usualy fine. Except in the case of a write callback where the buffer is passed back to ocaml as Bigarray. If the Bigarray is copied, like above, then the ocaml code still has a reference to the data at the point the 'process stub' wants to free it. To solve that problem I need the write callback to signal that the buffer was passed to ocaml and is now under GC control. The buffer must not be free()ed by the 'process stub'. The libfuse API does not provide for this so I have to somehow communicate between 'process stub' and 'write stub' around the libfuse code. Possible solution: -- __thread char *buf = NULL; value ocaml_fuse_process(...) { buf = malloc(size); fuse_session_process() if (buf != NULL) free(buf); } void write_callback(...) { a = caml_ba_alloc_dims(...); buf = NULL; } This way ocaml_fuse_process will allocate a new buffer whenever it doesn't have one and the write_callback will take over the buffer and give it to the GC. Now my question is: Does that work? Is it safe? Will every ocaml thread have its own thread-local-storage buf? Currently I'm only interested in supporting Linux. If it is safe there that is enough. MfG Goswin ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] vm in ocaml
On Wed, Feb 3, 2010 at 4:47 PM, Joel Reymont joe...@gmail.com wrote: I have a translator from a Pascal-like trading language written in OCaml and I need the output to run as a DLL embedded in a trading platform. I'm thinking of generating bytecode and have the user pass the path to the bytecode file to the DLL during initialization. I don't want to load source code into my runtime since I want to do a lot of error checking on it to make sure the runtime experience is smooth. I don't want to ship ocamlc, etc. since I want to have a single executable. I'm not sure if embedding OCaml (and thus a license!) is needed to generate OCaml bytecode in my scenario, so the bytecode I'm talking about is my bytecode. I understand that a bit of C will be required to wrap the OCaml runtime in a DLL. I would prefer to stay with OCaml for the whole project which prompts my question... I understand that : - you want to generate some bytecode (with your own bytecode specs) from the Pascal-like language - interprete this bytecode with a VM written in OCaml but actually I don't quite understand your question :-/ Has anyone used OCaml to write a virtual machine? Some people (including some colleagues of mine (and me), actually) have used OCaml to write an OCaml virtual machine. (I've heard someone say (indirectly) that we were not the first). It is an interesting exercise... for people who prefer writing in OCaml rather than in C. It's also interesting to run an OCaml VM in an OCaml VM ... in an OCaml VM, the last one being in OCaml compiled with ocamlopt or in C (or in Java [Cadmium] or in JavaScript [O'Browser], though we haven't tried), and all the previous one being in OCaml compiled with ocamlc. How big is the OCaml runtime when bundled as a DLL or shared library? Sorry I've no idea for this question. -- Philippe Wang m...@philippewang.info ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Re: value restriction
On Sat, Jan 2, 2010 at 5:46 PM, Andrej Bauer andrej.ba...@andrej.com wrote: on another note (but staying very much on the same topic), why won't the following generalize: # let foo = let counter = ref 0 in let bar = !counter in let baz = fun x - bar in baz val foo : '_a - int = fun It's even worse: Objective Caml version 3.11.1 # let _ = ref () in fun x - x ;; - : '_a - '_a = fun I am sure this makes sense in France. Happy new year! Andrej The idea is to prevent potentially wrong programs. It is bad to write (let x = ref [ ] in x := [hello] ; x := [2]). So the algorithm — that prevents the generalization process of expressions such as (ref [ ]) — prevents the generalization of all application expressions. (actually, almost all because I think there are a few exceptions such as # let f = let x = ref [] in !x ;; val f : 'a list = []). Making a perfect algorithm that generalizes only and always when permitted is very hard (maybe it's impossible because not decidable?). This example shows a program that is rejected because its type is not computable in Caml's type system : (fun x - x x) (fun x - x) (fun x - x) It could be a valid program (i.e. it wouldn't lead to a type crash at runtime), but it is rejected because the type system is not capable of asserting its correctness. (I am not certain I am not off topic) Cheers, -- Philippe Wang m...@philippewang.info ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] How to write a CUDA kernel in ocaml?
On Thu, Dec 17, 2009 at 7:45 AM, Eray Ozkural examach...@gmail.com wrote: What I want to do is to run the ocaml bytecode interpreter on each core, and then feed the relevant bytecode to those. It can be done, I suppose? Or am I missing something crucial? :) The runtime library would have to be ported to OpenCL/CUDA, as well, isn't that possible? I don't see why it wouldn't be possible. After all, there are Java, JavaScript and OCaml implementations of that VM, so it could probably be implemented with any normal programming language (exclude those that are not Turing complete and exclude those such as brainfuck or sed) ! But I don't quite see how it could help gaining performance, at least not yet. Anyway, I'm looking forward to seeing a new esoteric implementation of that nice VM ! :-) PS: Sorry for having mailed this to you personally, I intended to post it to the mailing list. no problem ;-) -- Philippe Wang m...@philippewang.info ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] How to write a CUDA kernel in ocaml?
On Wed, Dec 16, 2009 at 2:47 PM, Eray Ozkural examach...@gmail.com wrote: One trivial and low-performance solution that comes to mind is: make an ocaml bytecode interpreter into a CUDA kernel and then pass the bytecode to it, and then voila, at least we have some 512-way parallelism on the GT300. How does that sound? We'd be losing some performance but massive parallelism will cover up for some of that. With parallel processors, you move very quickly the performance bottleneck from processor(s) to memory bandwidth, such that - it's hell to program because you have to manage concurrency and it has a real cost - it's useful for very specific programs that have very few memory access compared to processor computations (such as some compression algorithms, a more specific and very easy to write example is matrix multiplications). Imagine you have 3000MHz for memory bandwidth, which is extremely good today (I think). And imagine you have 100 processors that share this memory bandwidth. If they all want to access memory at the same time, even if you forget the concurrency management cost, you have 3000/100MHz/processor=30MHz/processor, which is very very very low. So think about 10 processors instead of 100 to be more realistic, it's still 300MHz/processor, which looks like what we had about a decade ago... (IMHO) A not-too-too-bad-but-still-realistic way to take benefit of GPUs today, with OCaml (or any high-level language), is to write computation functions in C (possibly with some assembly), and to write composition functions in OCaml. Or (less realistic in a short amount of time) maybe to write a compiler that may do the job for you, but it's not quite easy... Good luck, -- Philippe Wang m...@philippewang.info ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] threads, signals, and timeout
Considering that posix signals are not real-time *anyway*, using them to programme specific treatments per-thread is hmmm... say a nightmare ! Plus I don't quite see how you could eventually have a non-broken implementation. Gerd Stolpmann emphasized it if I understood well. One solution would be to use state variables to check every once in a while. Or maybe to use fairthreads instead, but I guess that the problem is actually much more complicated than just that. Well, I thought I had more interesting things to say. I was wrong, then just my two cents. Anyways, good luck! Cheers, Philippe Wang On Mon, Oct 26, 2009 at 7:08 PM, yoann padioleau pad.a...@gmail.com wrote: Hi, I would like to create different threads where each thread do some computation and are subject to different timeout. Without threads I usually use Unix.alarm with a SIGALARM handler that just raise a Timeout exception and everything works fine, but when I try to do something similar with threads it does not work because apparently the Unix.alarm done in one thread override the Unix.alarm done in another thread. I had a look at thread.mli but was not able to find anything related to timeout. Is there a way to have multiple timeout and multiple threads at the same time ? Here is a program that unforunately get the first timeout, but not the second :( (* ocamlc -g -thread unix.cma threads.cma signals_and_threads.ml *) exception Timeout let mytid () = let t = Thread.self () in let i = Thread.id t in i let set_timeout () = Sys.set_signal Sys.sigalrm (Sys.Signal_handle (fun _ - prerr_endline Time is up!; print_string (Printf.sprintf id: %d\n (mytid())); raise Timeout )); ignore(Unix.alarm 1); () let main = let t1 = Thread.create (fun () - set_timeout (); print_string (Printf.sprintf t1 id: %d\n (mytid())); let xs = [1;2;3] in while(true) do let _ = List.map (fun x - x + 1) xs in () done; () ) () in let t2 = Thread.create (fun () - set_timeout (); print_string (Printf.sprintf t2 id: %d\n (mytid())); let xs = [1;2;3] in while(true) do let _ = List.map (fun x - x + 1) xs in () done; () ) () in Thread.join t1; Thread.join t2; () -- Here is the output Time is up! t2 id: 2 t1 id: 1 id: 1 Thread 1 killed on uncaught exception Signals_and_threads.Timeout the program loops, meaning the second thread never received its timeout ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Sep 25, 2009, at 6:07 AM, Jacques Garrigue wrote: First, like everybody else, I'd like very much to try this out. Is there any chance it could compile on Snow Leopard :-) (I suppose it's near impossible, but still ask...) I haven't tried that yet, mostly because I guess that it wouldn't work out-of-the-box. However, the .asm file should be ok with OS X and what may clash are configure file behavior and C macros. I should take a closer look at that, since SL now seems to work well. Cheers, -- Philippe Wang philippe.w...@lip6.fr http://www-apr.lip6.fr/~pwang/ ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Fri, Sep 25, 2009 at 1:28 AM, Jon Harrop j...@ffconsultancy.com wrote: On Thursday 24 September 2009 15:38:06 Philippe Wang wrote: Very few programs that are not written with multicore in mind would not be penalized. I mean our GC is much much dumber than INRIA OCaml's one. Our goal was to show it was possible to have good performance with multicores for OCaml. Maybe someday we'll find some time to optimize the GC, but it's likely not very soon. Just to quantify this with a data point: the fastest (serial) version of my ray tracer benchmark is 10x slower with the new GC. However, this is anomalous with respect to complexity and the relative performance is much better for simpler renderings. For example, the new GC is only 1.7x slower with n=6 instead of n=9. I just put a version with a bug fix on some structures allocation (20090925). I hope it removes this anomaly. -- Philippe Wang m...@philippewang.info ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Thu, Sep 24, 2009 at 3:47 AM, Jon Harrop j...@ffconsultancy.com wrote: Following your advice, it seems to work perfectly now: :-) Wow! 2.6x faster on 2 cores is good. ;-) your machine is more generous than ours (which is Intel, not AMD) :-) That's a really fantastic piece of work. I'll do my best to study it and write literature about it. May I ask, can you give a rough overview of the design? For example, is there a separate nursery per thread so each thread can allocate a certain amount before incurring a global pause? Do you have any ideas for libraries built on top of this, such as a task parallel library using work-stealing deques? A few words on the GC's design (that uses stopcopy algorithm several times) : Heaps : - a set of pages are used to give threads the possibility to allocate memory without interfering with other threads, such as there is no mutex locking at local memory allocation. Each thread borns with an empty page, when it's full, the thread takes another one. - a big heap is shared between all, there is a mutex over it to prevent parallel memory allocation into this one. Collection : - when there are no pages left, a collection stops-the-world and copies living values (of the pages) to the shared heap - when the shared heap is full, a collection stops-the-world and copies all living values (pages+shared heap) to a new shared heap (which can be grow if need be) Special operations : - if there is a blocking operation (e.g. mutex lock or I/O operation), the mechanism is roughly the same as original INRIA OCaml's : it tells the GC that there is no need to stop it when stopping the world. - if there is a thread with no allocation and no blocking operation, the behaviur is the same as INRIA OCaml. The number of pages, the size of a page, and the size of the shared heap can be changed before running a program by setting some environment variables (cf. last lines README file included in the distribution package). -- Philippe Wang m...@philippewang.info ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Thu, Sep 24, 2009 at 3:40 PM, Rakotomandimby Mihamina miham...@gulfsat.mg wrote: 09/24/2009 03:39 PM, Stefano Zacchiroli: So, the real question is: is OC4MC going to be ported to mainline OCaml and support in the future or not? I dont write so much programs that would really require multiple cores. But I think this is such a good feature that should be inclided in the main distribution... Thing is that having a runtime library that supports parallel threads costs more than having a runtime library that doesn't. Programs that take advantage of multicore architectures are not easy to write, not easy to maintain, not easy to debug, ... So it's a great feature, so it should get into mainstream is not a good enough reason for INRIA's team. It's probably up to the community to find a great way of taking advantage of multicore architectures. One must be aware that - parallel threads vs not-parellel threads : if a program is well suited to parallel computing on multicore CPUs, then it means that not-parallel-capable runtime library puts the performance bottleneck at the CPU. Then, allowing parallel threads means *moving* this bottleneck (moving, not removing) : indeed, it's much likely that the bottleneck will then be at memory (RAM) bandwidth. See, if your memory is 1000 MHz, having 8 cores means 125MHz/core, which becomes ridiculous even if it were 2400MHz it would mean only 300MHz/core, imaging a 300MHz memory bandwidth for a 3GHz core ! So it's *very* important to keep that in mind. - for programming langages that are from the early beginning quite slower than INRIA OCaml, it's much easier to gain performance because they come from far, sometimes from very very far. Well, from a quite subjective personal point of view, of course it would be really great to give parallel threads capability to mainstream INRIA OCaml, because it would mean having found a (great) acceptable solution. -- Philippe Wang m...@philippewang.info ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Thu, Sep 24, 2009 at 3:11 PM, Jon Harrop j...@ffconsultancy.com wrote: Are values such as float arrays copied in their entirety or are they allocated outside the shared heap and only a pointer to them is copied? They should be in a heap (page or shared). We don't allocate many things outside the heaps. Is the copy operation parallelized? Nope. When the world is stopped for the collection, everything is done sequentially until the world is resumed. I don't think it's relevant to parallelize the copy operation (hell to implementdebug, then I don't think that performance would be very interesting because we would probably need a write mutex on the destination heap) Is there a write barrier but no read barrier? If so, what exactly does the write barrier do? There is a lock when a thread is created because we need to update the list of existing threads and we have to give it a page. Then, each time a thread wants memory, it checks if the world needs to be stopped. If the world needs to be stopped, it means that there is a necessary collection waiting for the world to be stopped. There is lock if a thread needs to allocate memory in the shared heap so that two threads don't end up using the same space for different things. If two threads want to write in the same block, it's up to the programmer to prevent (or allow) such a thing with a mutex (or whatever other mechanism). Special operations : - if there is a blocking operation (e.g. mutex lock or I/O operation), the mechanism is roughly the same as original INRIA OCaml's : it tells the GC that there is no need to stop it when stopping the world. Can users mark external calls in their bindings as blocking so the GC will treat them appropriately? Yes, it's the same as INRIA OCaml : enter_blocking_operation / leave_blocking_operation functions. It's mandatory that in the section between entrance and exit, the thread is not accessing anything allocated in a Caml heap. If there is need to write some value returned by the blocking operation, it should be written in a C side value (on C stack or with C malloc) and put back to Caml heap after exit (and then C free if C malloced). -- Philippe Wang m...@philippewang.info ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] OC4MC : OCaml for Multicore architectures
I've seen a question about 3.11 and I think I didn't answer, so I'm answering here : We have tried to make OC4MC work with OCaml 3.11 (I don't remember the subsubversion number). Currently, it does not work properly (it's still too easy to write a program that crashes or deadlocks). Cheers, Philippe Wang ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Sep 24, 2009, at 18:02 GMT+02:00, Pascal Cuoq wrote: On Sep 24, 2009, at 5:47 PM, Philippe Wang wrote: Is the copy operation parallelized? Nope. When the world is stopped for the collection, everything is done sequentially until the world is resumed. I don't think it's relevant to parallelize the copy operation (hell to implementdebug, then I don't think that performance would be very interesting because we would probably need a write mutex on the destination heap) Well, you could start copying to the bottom of the next heap with one thread going up and to the top of it with another going down. Assume optimistically that the two threads will not reach the same cacheline at the end of the copies, and you don't need any synchronisation at all between them, except joining at the end. After checking, if they have reached the same cacheline, you need to reallocate the destination heap anyway. You still get a single unfragmented free block as a result. Even better: stop the world just before there remains less that one cacheline of free space and you don't need to check if the two threads have met. You still need to reallocate the destination heap sometimes though. A concurrent copy means that there would be bad overhead for single core. It also means putting bottleneck to memory bandwidth as memory copy operations are clearly quickly limited by this bandwidth, not by CPU. It may hopefully become false in a few years, but hardware manufacturers don't seem to be excited by that, they seem to prefer making the marketing on the number of cores. Look at GPUs : they have very fast graphical RAM, but they have a huge number of processing units. I don't really see the point in that (i.e. having a huge number of PU) anyway (except marketing). Ok, back to GC stuff. A stopcopy algorithm needs to have a set of roots to make the copy of living values. Each thread has its stack, so it has its subset of roots. Then what ? Parallelize the copy from each thread ? Ok we have to determine the best number of threads according to number of cores but more importantly according to memory bandwidth given per core. (what a nightmare!) Then there are shared values (in the shared heap for instance, but what if there are lateral pointers due to mutable values?). (We are leaving the nightmare for hell! but some people have been there.) Copying a living value means that if later you encounter something pointing to its old address, you have to know the new one. This means writing at the old address. I don't see how we can make *today* something very interesting in concurrent with a stopcopy algorithm. I believe (but I'm *not* a GC expert at all) concurrent GCs are not based on stopcopy algorithm but rather some mark{do-some-stuff-such- as-sweep}. Oh, and I meant to say, but everyone else was faster than me: well done! Thank you, and thanks everyone else who appreciate this work. :-) Philippe Wang ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Sep 24, 2009, at 18:49 GMT+02:00, Richard Jones wrote: On Thu, Sep 24, 2009 at 02:09:56PM +0100, Jon Harrop wrote: Fair enough. I think this is the single most important development OCaml has seen since its inception so I would personally drop OCaml in favor of oc4mc even if it meant reverting to 3.10.2. I think 'personally' is the key word there. You forget that people are quite happily programming in very slow languages like Perl, Python, Ruby and Visual Basic, and those people vastly outnumber the ones using F#, Haskell, OCaml, SML etc. (They don't even have static safety, dammit!). Should we tell them that using CPU for nothing (side-effect for using a slow language) has a bad effect on global warming? Could it be a wake-up call? :-p half-kidding, Philippe Wang ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] OC4MC : OCaml for Multicore architectures
On Sep 25, 2009, at 1:28 AM, Jon Harrop wrote: On Thursday 24 September 2009 15:38:06 Philippe Wang wrote: Very few programs that are not written with multicore in mind would not be penalized. I mean our GC is much much dumber than INRIA OCaml's one. Our goal was to show it was possible to have good performance with multicores for OCaml. Maybe someday we'll find some time to optimize the GC, but it's likely not very soon. Just to quantify this with a data point: the fastest (serial) version of my ray tracer benchmark is 10x slower with the new GC. However, this is anomalous with respect to complexity and the relative performance is much better for simpler renderings. For example, the new GC is only 1.7x slower with n=6 instead of n=9. Can you tell what data structures (and their sizes if possible) you are using? Thanks for your feedbacks. -- Philippe Wang philippe.w...@lip6.fr http://www-apr.lip6.fr/~pwang/ ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] OC4MC : OCaml for Multicore architectures
I've updated the download page, it should be more robust to multiple downloads now. Cheers, Philippe Wang ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] OC4MC : OCaml for Multicore architectures
make program.nc uses original ocamlopt make program.th uses the newly built ocamlopt with the necessary options (lib links) then you can compare program.nc and program.th On Thu, Sep 24, 2009 at 2:21 AM, Jon Harrop j...@ffconsultancy.com wrote: On Wednesday 23 September 2009 11:53:09 Goswin von Brederlow wrote: Has anyone tested this yet? Any success stories? Well, I've used the build.sh script to build a patched OCaml 3.10.2 that identifies itself as: $ ocamlopt -v The Objective Caml native-code compiler, version 3.10.2+patch-ocaml4multicore-20090823 Standard library directory: /home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml and I've built their tests: $ cd tests $ make matmul.nc ocamlopt -o matmul.nc -thread unix.cmxa threads.cmxa graphics.cmxa matmul.ml File matmul.ml, line 25, characters 8-13: Warning Y: unused variable count. File matmul.ml, line 26, characters 8-16: Warning Y: unused variable last_col. and run them: $ time ./matmul.nc 1000 8 Temp de calcul: utime 38.930433, stime 0.012000, rtime 38.943138 Fatal error: exception Invalid_argument(index out of bounds) real0m38.974s user0m38.942s sys 0m0.028s Note the exception that (I think) should have been caught and handled silently. But I cannot get anything to run in parallel. None of the tests use more than one core and my own busy-wait-loops-on-two-threads test also runs only on one core. Any idea what I'm doing wrong? Is there a flag to enable it or something? One possible cause: I'm running in a 64-bit chroot. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs -- Philippe Wang m...@philippewang.info ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] OC4MC : OCaml for Multicore architectures
Ok... well, I guess that - whether it is something about your environment that is too different from ours (in which case build.sh is bad), - whether you have corrupted your installation (it could be by having a bad PATH value that makes original ocamlopt be mixed up with oc4mc ocamlopt) What I suggest is to use a default PATH (without modifying it for the purpose of OC4MC), and do these steps in a clean directory that is not included in PATH : 1) wget oc4mc-2009.tgz 2) tar xzf oc4mc-2009.tgz 3) cd oc4mc-2009 4) wget ocaml 3.10.2 (tar.gz or tar.bz2) 5) bash build.sh ... wait 6) cd test 7) make matmul.th 8) time matmul.th 1000 8 Sorry it's messy, we are thinking about something cleaner... (there's a matter of lack of time somewhere) cheers, -- Philippe Wang m...@philippewang.info On Thu, Sep 24, 2009 at 2:05 AM, Jon Harrop j...@ffconsultancy.com wrote: On Thursday 24 September 2009 00:15:14 you wrote: make program.nc uses original ocamlopt make program.th uses the newly built ocamlopt with the necessary options (lib links) then you can compare program.nc and program.th Aha! Progress, but now I get errors: $ make matmul.th ../out/bin/ocamlopt -ccopt -march=native -ccopt -mtune=native -ccopt -O4 -I /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/ -I /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o -cclib -lgc -cclib -g -thread unix.cmxa threads.cmxa graphics.cmxa -verbose -compact -rectypes -inline 100 -fno-PIC -cclib -lunix -cclib -lpthread matmul.ml -o matmul.th File matmul.ml, line 25, characters 8-13: Warning Y: unused variable count. File matmul.ml, line 26, characters 8-16: Warning Y: unused variable last_col. + as -o matmul.o /tmp/camlasm081590.s + as -o /tmp/camlstartupdac3e2.o /tmp/camlstartup8f7152.s + gcc -o 'matmul.th' -I'/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml' -march=native -mtune=native -O4 '/tmp/camlstartupdac3e2.o' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/std_exit.o' 'matmul.o' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/graphics.a' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml/threads/threads.a' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/unix.a' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/stdlib.a' '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/' '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par' '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml/threads' '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml' '-lgraphics' '-lX11' '-lthreadsnat' '-lunix' '-lpthread' '-lunix' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o' '-lgc' '-g' '-lunix' '-lpthread' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a' -lm -ldl /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a(memory.o): In function `gc_end_roots': memory.c:(.text+0x10): multiple definition of `gc_end_roots' /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:/home/jdh30/src/ocaml/parallel/oc4mc-20090823/runtime/gcs/sc_par/gci.c:948: first defined here /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a(memory.o): In function `gc_begin_roots': memory.c:(.text+0x12): multiple definition of `gc_begin_roots' /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:/home/jdh30/src/ocaml/parallel/oc4mc-20090823/runtime/gcs/sc_par/gci.c:947: first defined here /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a(finalise.o): In function `caml_final_do_strong_roots': finalise.c:(.text+0x0): multiple definition of `caml_final_do_strong_roots' /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:/home/jdh30/src/ocaml/parallel/oc4mc-20090823/runtime/gcs/sc_par/gci.c:301: first defined here /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o: In function `stop_the_world': gci.c:(.text+0x38e): undefined reference to `caml_all_threads' gci.c:(.text+0x403): undefined reference to `caml_all_threads' gci.c:(.text+0x410): undefined reference to `caml_all_threads' gci.c:(.text+0x48a): undefined reference to `caml_all_threads' /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o: In function `resume_the_world': gci.c:(.text+0x4c4): undefined reference to `caml_all_threads' /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:gci.c: (.text+0x57c): more undefined references to `caml_all_threads' follow /home/jdh30/src/ocaml
[Caml-list] OC4MC : OCaml for Multicore architectures
This is some additional noise about OCaml for Multicore architectures (or Ok with parallel threads GC). Dear list, We have implemented an alternative runtime library for OCaml, one that allows threads to compute in parallel on different cores of now widespread CPUs. This project will be presented at IFL 2009 (http://blogs.shu.edu/projects/IFL2009/ ). A testing version available online at http://www.algo-prog.info/ocmc/ It works with OCaml 3.10.2 for Linux x86-64bit, we haven't met any bugs with the latest build (it doesn't *unexpectedly* crash, not yet). Hope you'll enjoy, -- Mathias Bourgoin, Adrien Jonquet, Emmanuel Chailloux, Benjamin Canou, Philippe Wang ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] threads
Hi, let x = Array.make 100 [] let update i n = x.(i) - n :: x.(i) let read i = x.(i) I don't think you can obtain funny results when you don't put a mutex on these two specific update and read. What is sure is that update function is not atomic because you have a value allocation at the right of - (with :: operator), and this may trigger garbage collection and/or make the scheduler change the running thread. What you can be sure with the current official OCaml distribution is that there won't be at the exact same time both an (-)operation and a (.())operation. But it is actually possible, for instance, for a thread to compute while another one is simultaneously writing on a socket. So it is generally not a good idea to count on some operation atomicity to put or not a mutex lock (well it's good to write some hard-to-debug code)... Cheers, Philippe Wang On Tue, Sep 8, 2009 at 7:33 PM, ygrek ygrekhere...@gmail.com wrote: Hello, let x = Array.make 100 [] let update i n = x.(i) - n :: x.(i) let read i = x.(i) Consider the following scenario: one thread is `update`ing x, another thread(s) uses only `read`. Is it safe to use these functions without locking on mutex? I.e. is Array.set atomic? What about updating references (:=) ? If I understand correctly these operations require only one cpu instruction to update one machine word and so should be atomic. Taking into account single-cpu affinity of ocaml program it should be safe to write such multithreaded code. Is it true? Is it safe to assume that ocamlopt won't skip reads/writes to globally visible memory address using cached value in a register? -- ygrek http://ygrek.org.ua ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs -- Philippe Wang m...@philippewang.info ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] ok with parallel threads GC (aka ocaml for multicore)
On Sat, Apr 18, 2009 at 1:05 AM, fo...@x9c.fr fo...@x9c.fr wrote: I was indeed mostly worried with the runtime itself. I wanted to have a fully reentrant runtime for the OCaml-Java project (to be able to execute several programs in the very same JVM) and remember that it implied primitives from compare.c, hash.c, intern.c and extern.c among others to be written differently for this purpose. Indeed. We have made them reentrant but we haven't made much stress testing on that. Reentrance on those are not free (they have a cost), and the way we chose is the simplest or quickest way. Out of curiosity: you state that your GC is of stop-the-world nature, what about finalizers ? Are they executed by the GC thread when the world is stopped or concurrently with application threads ? Not sure this question really matters, just curious (I mean, it is doubtful that one would write finalizers with a long execution time). Finalizers are supposed to be called by the thread that does the garbage collection, so there is no concurrency with finalizers as the rest of the world is meant to be stopped when garbage collecting. (Our garbage collector does not try to be as smart as the original one on many many things) By the way, we are late on writing the documentation for our future release... but we have just implemented a (simple) experimental growing heap. Here is a quote from wikipedia (http://en.wikipedia.org/wiki/Speedup): Sometimes a speedup of more than N when using N processors is observed in parallel computing, which is called super linear speedup. Super linear speedup rarely happens and often confuses beginners, who believe the theoretical maximum speedup should be N when N processors are used. Well, we *have* observed that on a matrix multiplication :-) -- Philippe Wang philippe.w...@lip6.fr http://www-apr.lip6.fr/~pwang/ ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] ok with parallel threads GC (aka ocaml for multicore)
On Apr 16, 2009, at 11:45 CEDT, Philippe Wang wrote: A negative answer would imply that you patched the OCaml runtime to make it reentrant. To illustrate my point, I will take the example of the file byterun/compare.c. In this file, the code for the comparison of values makes use of a global variable (named caml_compare_unordered). It should be, unless bugs are still hiding under that. It is supposed to have been done for a while. We'll make one more check. If you patched the runtime to allow multiple threads to use it concurrently, I would also be interested by the strategy used: is the problem solved by additional parameters ? by the use of thread-local storage ? by any other mean ? - local variable instead of global variable in functions - some functions are parameterized by thread identifier (that is, one more parameter than before) (e.g. in amd64.S) Well, we went back into runtime code implementation. This is what can be said rapidly : - compare.c contains no global variables anymore, we use local variables instead - if a Caml-C or C-Caml interface uses caml_compare_unordered, we don't know what can happen with parallel threads - we have many global mutex locks with small scopes - we do use an enter/leave blocking section mechanism to prevent the GC from waiting on a blocking operation such as I/O or mutex locking etc. - we don't support weak values (not sure whether they don't work or they became strong, if they dont work anymore, they can be back in 2 minutes as strong values anyway) - serialisation of values is a little bit tricky, though it should work - most important : many global variables do not exist anymore because they are irrelevant in our implementation - we do not support unofficial-features of ocaml 3.10, e.g. the new features that come with 3.11 but actually have their roots in previous versions ~ it is almost sad to see all the based-on-one-thread-at-a-time optimisations removed... + (it looks like it works just fine) I hope there are no hidden bad global variables. Is it fully reentrant ? H... maybe. We use a stop-the-world GC (which means no one is running is parallel with the GC), that is actually like original ocaml, that comes with its inconveniences : C calls not declared as blocking sections (which has quite a cost) may prevent other threads from running when the heap is full. Graphics module, for instance, is not reentrant at all (anyhow it's not part of the runtime). Same for Str. Funny thing is we can open several windows by launching parallel threads (though only one is useful at the end). Anyway, thank you for your questions and interest, they have helped us findfix some bugs. -- Philippe Wang http://www-apr.lip6.fr/~pwang/ PS. We tried to switch to 3.11, but it seems to need too much time, it's far from being a piece of cake. We have tried to make it work on Leopard (actually, I failed the 1st time - half the way, I may try again if I have time). = A free very personal advice that may save you some headaches: do not program in concurrent shared memory style, especially when you can replace concurrent by parallel. Even if it may have a future, even if it may sound great, even if it sounds exciting, even if it helps you go faster, even if put-here- whatever-you-want, it is **awful**. Well, if you really really don't have a choice, never mind what I said. ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] ok with parallel threads GC (aka ocaml for multicore)
Le 14 avr. 09 à 12:21, Philippe Wang a écrit : The garbage collector is clearly separated from the rest of the runtime library: the GC is contained in a libgc.a and our patched ocamlopt links programs to this GC. The GC variables are known by the GC only. Well, this is not what I had in mind, but I realize that my question was not clear. A better question would have been: Is your implementation still based on a global runtime lock ? No, it isn't. (And it would probably too often prevent parallel threads, wouldn't it.) A negative answer would imply that you patched the OCaml runtime to make it reentrant. To illustrate my point, I will take the example of the file byterun/compare.c. In this file, the code for the comparison of values makes use of a global variable (named caml_compare_unordered). It should be, unless bugs are still hiding under that. It is supposed to have been done for a while. We'll make one more check. If you patched the runtime to allow multiple threads to use it concurrently, I would also be interested by the strategy used: is the problem solved by additional parameters ? by the use of thread-local storage ? by any other mean ? - local variable instead of global variable in functions - some functions are parameterized by thread identifier (that is, one more parameter than before) (e.g. in amd64.S) Philippe Wang ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] ok with parallel threads GC (aka ocaml for multicore)
On Apr 10, 2009, at 20:36 CEDT, fo...@x9c.fr wrote: Would it be correct to assume that the current state of the project implies that you have patched the OCaml runtime to make it fully reentrant ? Is this following partial answer relevant enough ? The garbage collector is clearly separated from the rest of the runtime library: the GC is contained in a libgc.a and our patched ocamlopt links programs to this GC. The GC variables are known by the GC only. If so, is this code/patch available for download ? Officially, not yet (and not before April 20th). We did not expect the debugging part to be so complex and hard, and taking so long. The man power dramatically decreased in late September : the 2 Master's students went back to Master's courses, and the 3 supervisors had to do research in parallel with teaching. Some major bug fixes were made in February/March, a lot of major bug fixes were made in April (yes, these last 2 weeks). You know, bugs hiding other bugs... however we are hopefully getting close to the fix point: today there is no known bug ! :-) Unsupported features are - of course - not considered as bugs. For instance, posix signals are (currently) not supported. And, as parallel computing *potentially* requires quite a lot more memory, some programs can easily end up in a blocked state when the heap becomes full: our GC (currently) uses (parameterized) fixed size pages and heap. The next days, we will concentrate on making benchmarks (if you have some relevant testing programs, they are welcome), and if we don't discover new bugs then we will focus on (finishing) writing a documentation and a building script, for the release. If we release as such now, we will have too much support to do because of the lack of documentation. So it's not quite a good idea... When we have the minimal-but-sufficient documentation, we will make the release :-) In parallel, we try to make it work with OS X Leopard 64 bit and/or ocaml 3.11 (currently we only support 3.10.2 - Linux x86_64). Anyway, wholehearted respect for undertaking such a complex project. Good luck in your bug-chasing tasks ! Thanks. -- Philippe Wang Philippe.Wang \at/ lip6.fr N.B. I hope we will not discover new bugs in our amd64.S, our assembly guru is enjoying (abroad with no www) his vacances de pâques... ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
[Caml-list] ok with parallel threads GC (aka ocaml for multicore)
Hello list, Mathias and Adrien have just started their internship (for their Master's degree requirements). Thus they have some time to spend on this project. Moreover, Mathias' internship is strongly related to this project. = man power dramatically increased We are currently searching for the last remaining bugs. Our thread library is restricted, it contains: Thread : create, join, yield, id, self, delay Mutex : full module Condition : full module Our alternative garbage collector - uses a Stop(the world)Copy algorithm - has memory pages for threads (each thread takes a page at its creation) - has a shared heap for shared values and for old generation from pages (i.e. memory pages are flushed to this heap) - should be not to hard to replace. Blocking sections such as I/O operations or mutex locks do not prevent garbage collection. We currently do *not* support POSIX signals (let's say their behaviour is not specified). We should make a release soon, but before: - some code has to be cleaned - some benchmarks have to be done - some documentation has to be completed - an installation script still has to be written. Thus not a lot is left to do before the release :-) We are writing test programs to search for the last remaining bugs but also to measure performances. So far, as long as there are not too many concurrent memory accesses, it is not too hard to go n times faster with a n-core CPU; though intense memory accesses generate page faults and divide memory bandwidth by the number of concurrent accesses, and intense memory consuming programs show our GC is not as performant as INRIA's, of course. Cheers, -- Philippe Wang Philippe.Wang \at/ lip6.fr PS: Sorry for taking so much time, debugging parallel threads in shared memory style is hell (you can give it a try). ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs