RE: [Caml-list] Re: multicore wish
Mihamina, > Ok, so for the beginner I am (must I ask on the beginners ML?): is > multicore support just useless or not? That *entirely* depends on what you want to do. If, for example, you have to do a large calculation that is limited by memory and not by CPU, or, if you have an application that is trivially parallelized anyway, multicore support won't make much of a difference. There are (many) other applications, however, where it does matter quite a lot. Actually, the biggest effect of multicore architectures I see is to shift the emphasis from raw CPU power to memory bandwidth. -- Thomas ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Re: multicore wish
On Monday 21 December 2009 14:19:36 Mihamina Rakotomandimby wrote: > Ok, so for the beginner I am (must I ask on the beginners ML?): is > multicore support just useless or not? I have found a great many uses for multicores but you need a decent foundation to make effective use of it. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Re: multicore wish
Hi, > I am beginning using Ocsigen, for a growing web project: > Is multicore support useless for scaling on Ocsigen? Categorically, yes. In fact, I would say that the model used by Ocsigen is close to being optimal performance-wise as far as web applications are concerned. The Ocsigen server and Eliom applications use Lwt for concurrency, ensuring that the CPU is always busy and will not idle waiting for I/O. Moreover, the green threads offered by Lwt are much lighter than system threads, and avoid the context switching penalty incurred by the latter. And what about those multiple cores? Simple, if you have n cores, then simply fire up n instances of the Ocsigen server, and put a dispatching server like HAProxy or Ocsigen itself as frontend (there are some simple tricks to ensure that the same client is always directed to same server). This solution takes advantage of the fact that serving web requests from multiple clients is embarrasingly parallel from the web application standpoint. (Sure, you'll have contention on the database side, but most DBMSs handle that reasonably well). (And yes, I am aware that Lwt's performance could be improved further by using syscalls like epoll/kqeueue/etc instead of select. That is however an implementation issue, not an architectural flaw). Best regards, Dario Teixeira ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Re: multicore wish
On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote: > Jon Harrop writes: > > 1. The array "a" is just an ordinary array of any type of values on the > > shared heap in F# but, for generality in OCaml, this must be both the > > underlying ordinary data and a manually-managed shared big array of > > indices into the ordinary data where the indices get sorted while the > > original data remain in place until they are permuted at the end. > > Unless you have a primitive type that isn't a pointer. In OCaml, you would need to write a custom quicksort optimized for that particular type. In F#, the generic version just works and works efficiently. > The advantage with ocaml though is that you never have pointers into a > structure. Makes thinks a lot simpler for the GC and avoids large > overheads in memory. I don't understand what you mean by OCaml "never has pointers into a structure". Half the problem with OCaml is that OCaml almost always uses pointers and the programmer has no choice, e.g. for complex numbers. > > 2. The sorted subarrays are contiguous in memory and, at some > > subdivision, will fit into L2 cache. So F# offers optimal locality. In > > contrast, there is no locality whatsoever in the OCaml code and most > > accesses into the unsorted original array will incur cache misses right > > up to main memory. So the OCaml approach does not scale as well and will > > never see superlinear speedup because it cannot be cache friendly. > > On the other hand swapping two elements in the array has a constant > cost no matter what size they have. At some size there will be a break > even point where copying the data costs more than the cache misses and > with increasing size the cache won't help F# so much either. In theory, yes. In practice, that threshold is far larger than any value type that a real program would use so it is of no practical concern. Moreover, F# gives the programmer control over whether data are unboxed (value types) or boxed (reference types) anyway. In contrast, OCaml is tied to a few value types that are hard-coded into the GC. > > 3. Child tasks are likely to be executed on the same core as their parent > > and use a subset of their parent's data in F#, so they offer the best > > possible locality. In contrast, child processes are likely to be executed > > on another core in OCaml and offer the worst possible locality. > > But, if I understood you right, you first fork one process per > core. No, in OCaml I fork every child. That is the only transparent way to give the child a coherent view of the heap but it is extremely slow (~1ms): "F# can do 60MFLOPS of computation in the time it takes OCaml to fork a single process" - http://caml.inria.fr/pub/ml-archives/caml-list/2009/06/542b8bed77022b4a4824de2da5b7f714.en.html > Those should then also each pin themself to one core. You have no idea which core a forked child process will run on. > Each process then has a work queue which is works through. So they will > always use the local data. Only when a queue runs dry they steal from > another process and ruin locality. You are correctly describing the efficient solution based upon work-stealing queues that F# uses but OCaml cannot express it. > So I don't see where your argument fits. You are not creating childs > on the fly. Only at the start and they run till all the work is done. > At least in this example. No, every recursive invocation of the parallel quicksort spawns another child on-the-fly. That's precisely why it parallelizes so efficiently when you have wait-free work-stealing task deques and a shared heap. In general, you rewrite algorithms into this recursive divide and conquer form and parallelize when possible. You can parallelize a *lot* of problems efficiently that way. > > 4. Task deques can handle an arbitrary number of tasks limited only by > > memory whereas processes are a scarce resource and forking is likely to > > fail, whereupon the "invoke" combinator will simply execute sequentially. > > So it is much easier to write reliable and performant code in F# than > > OCaml. > > Why would you fork in invoke? Fork is currently the only transparent way to implement "invoke" but it is extremely slow and unreliable. > > 5. OCaml's fork-based "invoke" combinator is many orders of magnitude > > slower than pushing a closure onto a concurrent task deque in F#. > > > > 6. The "threshold" value is a magic number derived from measurements on a > > given machine in my OCaml code but is dynamically adjusted in a > > machine-independent way by the "invoke" combinator in my F# code using > > atomic operations and real time profiling of the proportion of time spent > > spawning tasks vs doing actual work. > > 5+6 seem to be an implementation detail of some specific > implementation you are talking about. Yes. I'm talking about today's OCaml and F# implementations. > I don't see anything in the theory that would require that. If by "
Re: [Caml-list] Re: multicore wish
On 12/22/2009 01:12 PM, Jon Harrop wrote: On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote: The advantage with ocaml though is that you never have pointers into a structure. Makes thinks a lot simpler for the GC and avoids large overheads in memory. I don't understand what you mean by OCaml "never has pointers into a structure". Half the problem with OCaml is that OCaml almost always uses pointers and the programmer has no choice, e.g. for complex numbers. I think he means that ocaml structs (records, tuples) will only ever have pointers pointing to their beginning - you can't have a pointer to somewhere in the middle of a structure. E ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Re: multicore wish
On Tuesday 22 December 2009 18:02:32 Edgar Friendly wrote: > On 12/22/2009 01:12 PM, Jon Harrop wrote: > > On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote: > >> The advantage with ocaml though is that you never have pointers into a > >> structure. Makes thinks a lot simpler for the GC and avoids large > >> overheads in memory. > > > > I don't understand what you mean by OCaml "never has pointers into a > > structure". Half the problem with OCaml is that OCaml almost always uses > > pointers and the programmer has no choice, e.g. for complex numbers. > > I think he means that ocaml structs (records, tuples) will only ever > have pointers pointing to their beginning - you can't have a pointer to > somewhere in the middle of a structure. If so then I do not understand the relevance. You cannot have pointers into a structure in F# or HLVM either... -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Re: multicore wish
Jon Harrop writes: > On Tuesday 22 December 2009 18:02:32 Edgar Friendly wrote: >> On 12/22/2009 01:12 PM, Jon Harrop wrote: >> > On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote: >> >> The advantage with ocaml though is that you never have pointers into a >> >> structure. Makes thinks a lot simpler for the GC and avoids large >> >> overheads in memory. >> > >> > I don't understand what you mean by OCaml "never has pointers into a >> > structure". Half the problem with OCaml is that OCaml almost always uses >> > pointers and the programmer has no choice, e.g. for complex numbers. >> >> I think he means that ocaml structs (records, tuples) will only ever >> have pointers pointing to their beginning - you can't have a pointer to >> somewhere in the middle of a structure. Exactly. > If so then I do not understand the relevance. You cannot have pointers into a > structure in F# or HLVM either... If you have an array a of (int * int) then in ocaml a is an array of pointers to tuples. a.(5) is a pointer to the 6th tuple. You said that in F# the array will be really an array of tuples. Then a.(5) will be a pointer into the array at the position where the 6th tuple is. It does not point to the begining of an allocated block but to the middle of one. That means as long as a.(5) is reachable the full array has to remain allocated or the GC has to recognise that only one (a few) items of an array are reachable and copy them before freeing the array. The GC also needs a way to find the begining of an allocated block from a pointer into the block. Which means extra overhead in both memory and time. Another think is that tuples are immutable but arrays are mutable. In ocaml you get this nice behaviour: # let a = Array.init 5 (fun x -> (x,x));; val a : (int * int) array = [|(0, 0); (1, 1); (2, 2); (3, 3); (4, 4)|] # let x = a.(1);; val x : int * int = (1, 1) # a.(1) <- (2,2);; - : unit = () # x;; - : int * int = (1, 1) In F# you would get (2,2) for your fast sortable array. A different semantic. MfG Goswin ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Re: multicore wish
Jon Harrop writes: > On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote: >> Jon Harrop writes: >> > 1. The array "a" is just an ordinary array of any type of values on the >> > shared heap in F# but, for generality in OCaml, this must be both the >> > underlying ordinary data and a manually-managed shared big array of >> > indices into the ordinary data where the indices get sorted while the >> > original data remain in place until they are permuted at the end. >> >> Unless you have a primitive type that isn't a pointer. > > In OCaml, you would need to write a custom quicksort optimized for that > particular type. In F#, the generic version just works and works efficiently. > >> The advantage with ocaml though is that you never have pointers into a >> structure. Makes thinks a lot simpler for the GC and avoids large >> overheads in memory. > > I don't understand what you mean by OCaml "never has pointers into a > structure". Half the problem with OCaml is that OCaml almost always uses > pointers and the programmer has no choice, e.g. for complex numbers. > >> > 2. The sorted subarrays are contiguous in memory and, at some >> > subdivision, will fit into L2 cache. So F# offers optimal locality. In >> > contrast, there is no locality whatsoever in the OCaml code and most >> > accesses into the unsorted original array will incur cache misses right >> > up to main memory. So the OCaml approach does not scale as well and will >> > never see superlinear speedup because it cannot be cache friendly. >> >> On the other hand swapping two elements in the array has a constant >> cost no matter what size they have. At some size there will be a break >> even point where copying the data costs more than the cache misses and >> with increasing size the cache won't help F# so much either. > > In theory, yes. In practice, that threshold is far larger than any value type > that a real program would use so it is of no practical concern. Moreover, F# > gives the programmer control over whether data are unboxed (value types) or > boxed (reference types) anyway. In contrast, OCaml is tied to a few value > types that are hard-coded into the GC. > >> > 3. Child tasks are likely to be executed on the same core as their parent >> > and use a subset of their parent's data in F#, so they offer the best >> > possible locality. In contrast, child processes are likely to be executed >> > on another core in OCaml and offer the worst possible locality. >> >> But, if I understood you right, you first fork one process per >> core. > > No, in OCaml I fork every child. That is the only transparent way to give the > child a coherent view of the heap but it is extremely slow (~1ms): So if you add a (sleep 60) to the ocaml code then ocaml gets even more worse than F#? You are comparing an implementation that is specifically optimized to use things F# is good at and ocaml is bad. To give a fair comparison you need to optimize the implementation to the language. > "F# can do 60MFLOPS of computation in the time it takes OCaml > to fork a single process" - > http://caml.inria.fr/pub/ml-archives/caml-list/2009/06/542b8bed77022b4a4824de2da5b7f714.en.html > >> Those should then also each pin themself to one core. > > You have no idea which core a forked child process will run on. I can pin each child to one core and the scheduler will move it there. I would expect a multi-core ocaml to actualy do this internaly already so there is on GC threads per core ocaml runs on. But that depends on the GC implementation that will be used. >> Each process then has a work queue which is works through. So they will >> always use the local data. Only when a queue runs dry they steal from >> another process and ruin locality. > > You are correctly describing the efficient solution based upon work-stealing > queues that F# uses but OCaml cannot express it. You mean that you didn't implement that way. I can easily express that with closures and pre-forked worker threads. >> So I don't see where your argument fits. You are not creating childs >> on the fly. Only at the start and they run till all the work is done. >> At least in this example. > > No, every recursive invocation of the parallel quicksort spawns another child > on-the-fly. That's precisely why it parallelizes so efficiently when you have > wait-free work-stealing task deques and a shared heap. In general, you > rewrite algorithms into this recursive divide and conquer form and > parallelize when possible. You can parallelize a *lot* of problems > efficiently that way. That seems awfully ineficient. Then you have a million children running on maybe 4 cores and each job queue holds no job. I think we mean different things by children. By children I mean what the kernel sees as children. Newly forked threads. I think you mean jobs that get put into the queues. There is no reason to fork a new system thread every time the parallel quicksort splits an array in two. >> >
Re: [Caml-list] Re: multicore wish
On Thursday 24 December 2009 12:58:18 Goswin von Brederlow wrote: > Jon Harrop writes: > > On Tuesday 22 December 2009 18:02:32 Edgar Friendly wrote: > >> On 12/22/2009 01:12 PM, Jon Harrop wrote: > >> > On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote: > >> >> The advantage with ocaml though is that you never have pointers into > >> >> a structure. Makes thinks a lot simpler for the GC and avoids large > >> >> overheads in memory. > >> > > >> > I don't understand what you mean by OCaml "never has pointers into a > >> > structure". Half the problem with OCaml is that OCaml almost always > >> > uses pointers and the programmer has no choice, e.g. for complex > >> > numbers. > >> > >> I think he means that ocaml structs (records, tuples) will only ever > >> have pointers pointing to their beginning - you can't have a pointer to > >> somewhere in the middle of a structure. > > Exactly. > > > If so then I do not understand the relevance. You cannot have pointers > > into a structure in F# or HLVM either... > > If you have an array a of (int * int) then in ocaml a is an array of > pointers to tuples. a.(5) is a pointer to the 6th tuple. Yes. > You said that in F# the array will be really an array of tuples. Then > a.(5) will be a pointer into the array at the position where the 6th > tuple is. No. The expression a.(5) loads all fields of the value type. For example, if it were an array of complex number then a.(5) would load the real and imaginary parts. > That means as long as a.(5) is reachable the full array has to remain > allocated or the GC has to recognise that only one (a few) items of an > array are reachable and copy them before freeing the array. The GC > also needs a way to find the begining of an allocated block from a > pointer into the block. Which means extra overhead in both memory and > time. > > Another think is that tuples are immutable but arrays are mutable. In > ocaml you get this nice behaviour: > > # let a = Array.init 5 (fun x -> (x,x));; > val a : (int * int) array = [|(0, 0); (1, 1); (2, 2); (3, 3); (4, 4)|] > # let x = a.(1);; > val x : int * int = (1, 1) > # a.(1) <- (2,2);; > - : unit = () > # x;; > - : int * int = (1, 1) > > In F# you would get (2,2) for your fast sortable array. A different > semantic. No. Even if F# represented tuples as structs (which it does not, unfortunately) you would still get the same result as you do in OCaml. HLVM already implements tuples as structs and your example works exactly as the OCaml does: # let xs = create(6, (1, 1)) in let x = xs.(1) in xs.(1) <- (2, 2); x;; - : `Struct[`Int; `Int] = (1, 1) The only difference is that HLVM unboxes the pairs in the array so the heap contains only a single pointer and accessing a pair requires only one indirection. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Re: multicore wish
On Thursday 24 December 2009 13:19:52 Goswin von Brederlow wrote: > Jon Harrop writes: > > No, in OCaml I fork every child. That is the only transparent way to give > > the child a coherent view of the heap but it is extremely slow (~1ms): > > So if you add a (sleep 60) to the ocaml code then ocaml gets even > more worse than F#? You are comparing an implementation that is > specifically optimized to use things F# is good at and ocaml is > bad. To give a fair comparison you need to optimize the implementation > to the language. Post a better OCaml solution if you can. > >> Each process then has a work queue which is works through. So they will > >> always use the local data. Only when a queue runs dry they steal from > >> another process and ruin locality. > > > > You are correctly describing the efficient solution based upon > > work-stealing queues that F# uses but OCaml cannot express it. > > You mean that you didn't implement that way. No, I mean OCaml cannot express it. > I can easily express that with closures and pre-forked worker threads. OCaml's threads do not run in parallel so that will not work. > >> So I don't see where your argument fits. You are not creating childs > >> on the fly. Only at the start and they run till all the work is done. > >> At least in this example. > > > > No, every recursive invocation of the parallel quicksort spawns another > > child on-the-fly. That's precisely why it parallelizes so efficiently > > when you have wait-free work-stealing task deques and a shared heap. In > > general, you rewrite algorithms into this recursive divide and conquer > > form and parallelize when possible. You can parallelize a *lot* of > > problems efficiently that way. > > That seems awfully ineficient. Then you have a million children > running on maybe 4 cores and each job queue holds no job. > > I think we mean different things by children. By children I mean what > the kernel sees as children. Newly forked threads. I think you mean > jobs that get put into the queues. There is no reason to fork a new > system thread every time the parallel quicksort splits an array in > two. The children are lightweight tasks, not threads or processes. > >> Why would you fork in invoke? > > > > Fork is currently the only transparent way to implement "invoke" but it > > is extremely slow and unreliable. > > No, it is the insane way. Use closures. Please demonstrate. > > parallel programs that leverage multicores and run a *lot* faster than > > anything that can be written in OCaml. You can even make this accessible > > to OCaml programmers as a DSL with automatic interop to make high > > performance parallel programming as easy as possible from OCaml. > > Or just implement it properly in ocaml. The problem part is the > GC. The rest is easy. No kidding. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Re: multicore wish
Jon Harrop writes: > On Thursday 24 December 2009 13:19:52 Goswin von Brederlow wrote: >> Jon Harrop writes: >> > No, in OCaml I fork every child. That is the only transparent way to give >> > the child a coherent view of the heap but it is extremely slow (~1ms): >> >> So if you add a (sleep 60) to the ocaml code then ocaml gets even >> more worse than F#? You are comparing an implementation that is >> specifically optimized to use things F# is good at and ocaml is >> bad. To give a fair comparison you need to optimize the implementation >> to the language. > > Post a better OCaml solution if you can. > >> >> Each process then has a work queue which is works through. So they will >> >> always use the local data. Only when a queue runs dry they steal from >> >> another process and ruin locality. >> > >> > You are correctly describing the efficient solution based upon >> > work-stealing queues that F# uses but OCaml cannot express it. >> >> You mean that you didn't implement that way. > > No, I mean OCaml cannot express it. > >> I can easily express that with closures and pre-forked worker threads. > > OCaml's threads do not run in parallel so that will not work. >> Or just implement it properly in ocaml. The problem part is the >> GC. The rest is easy. > > No kidding. There is one implementation: http://www.algo-prog.info/ocmc/web/ But as said maybe not a verry good one. I tried implementing parallel threads under the original GC by forking multiple instances of the same program and using a Bigarray to mmap /dev/null for shared memory between the instances. That works for sharing primitive types, flat records (records of primitive types) and even fixed (or bound) sized structures but it gets more and more complex to share each and needs some Obj magic, marshaling or C stubs (except for primitive types). It works but is a real hack. I hope someone will pick up the pices and update OCaml4Multicore to the latest ocaml or maybe for ocaml to add it directly. If not then I fear ocaml will be left behind soon. MfG Goswin ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Re: multicore wish
On Sunday 27 December 2009 12:45:53 Goswin von Brederlow wrote: > There is one implementation: http://www.algo-prog.info/ocmc/web/ > But as said maybe not a very good one. > > I tried implementing parallel threads under the original GC by forking > multiple instances of the same program and using a Bigarray to mmap > /dev/null for shared memory between the instances. That works for > sharing primitive types, flat records (records of primitive types) and > even fixed (or bound) sized structures but it gets more and more > complex to share each and needs some Obj magic, marshaling or C stubs > (except for primitive types). It works but is a real hack. Once you've conceded to manual memory management (mmap of shared bigarrays) and low-level programming (no polymorphism, Obj.magic) you've lost the main advantages of OCaml and you still cannot get great performance. > I hope someone will pick up the pices and update OCaml4Multicore to > the latest ocaml or maybe for ocaml to add it directly. If not then I > fear ocaml will be left behind soon. Building upon OCaml rather than starting from scratch makes it vastly more difficult to implement a useful parallel GC not just because the GC and code gen must work in harmony together but because OCaml has been so heavily optimized in the wrong direction for this (e.g. a bit twiddled uniform representation that burdens the GC with everything from complex numbers to pairs). That's why I think the best solution is to start from scratch and build a completely separate VM bred for shared-memory parallelism. Indeed, HLVM already outperforms OC4MC even though I have put a fraction of the effort into it and built a lot more surrounding infrastructure. For example, my latest GC only took 5 days to write. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Re: multicore wish
Am Sonntag, den 27.12.2009, 13:45 +0100 schrieb Goswin von Brederlow: > Jon Harrop writes: > > > On Thursday 24 December 2009 13:19:52 Goswin von Brederlow wrote: > >> Jon Harrop writes: > >> > No, in OCaml I fork every child. That is the only transparent way to give > >> > the child a coherent view of the heap but it is extremely slow (~1ms): > >> > >> So if you add a (sleep 60) to the ocaml code then ocaml gets even > >> more worse than F#? You are comparing an implementation that is > >> specifically optimized to use things F# is good at and ocaml is > >> bad. To give a fair comparison you need to optimize the implementation > >> to the language. > > > > Post a better OCaml solution if you can. > > > >> >> Each process then has a work queue which is works through. So they will > >> >> always use the local data. Only when a queue runs dry they steal from > >> >> another process and ruin locality. > >> > > >> > You are correctly describing the efficient solution based upon > >> > work-stealing queues that F# uses but OCaml cannot express it. > >> > >> You mean that you didn't implement that way. > > > > No, I mean OCaml cannot express it. > > > >> I can easily express that with closures and pre-forked worker threads. > > > > OCaml's threads do not run in parallel so that will not work. > > >> Or just implement it properly in ocaml. The problem part is the > >> GC. The rest is easy. > > > > No kidding. > > There is one implementation: http://www.algo-prog.info/ocmc/web/ > But as said maybe not a verry good one. > > I tried implementing parallel threads under the original GC by forking > multiple instances of the same program and using a Bigarray to mmap > /dev/null for shared memory between the instances. That works for > sharing primitive types, flat records (records of primitive types) and > even fixed (or bound) sized structures but it gets more and more > complex to share each and needs some Obj magic, marshaling or C stubs > (except for primitive types). It works but is a real hack. It works with all types: https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_mem.mli look for init_value. It's non-released code yet. However, there are some problems: Values outside the heap do not support the polymorphic comparison and hash functions. That's a hard limitation, e.g. you cannot even compare two strings, or build a hash table with strings as keys. That limits the usefulness of shared memory. Also, as we are striving for safe programs, there is the question when shared memory can be safely released. The GC could help here, but there are no ways to hook in, e.g. to get notified when there are no pointers anymore to a value living in shared memory. Gerd > > I hope someone will pick up the pices and update OCaml4Multicore to > the latest ocaml or maybe for ocaml to add it directly. If not then I > fear ocaml will be left behind soon. > > MfG > Goswin > > ___ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > -- Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany g...@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Re: multicore wish
On 28 Dec 2009, at 12:28, Gerd Stolpmann wrote: > However, there are some problems: Values outside the heap do not support > the polymorphic comparison and hash functions. That's a hard limitation, > e.g. you cannot even compare two strings, or build a hash table with > strings as keys. That limits the usefulness of shared memory. Camlp4 may help here; Thomas Gazagnaire and I have been working on a language-integrated ORM, and it has a reliable type-conv hash generator library [1]. It works with mutable records as well as immutable ones (the main limitation of the built-in polymorphic hash function) but it could be used to generate explicit comparison functions for shared memory as well. We need to split out the various support libraries in the ORM separately at some point anyway, so if the hash generator is of any use I'll cut a release. [1] http://github.com/avsm/ocaml-orm-sqlite/tree/master/hash/ -anill ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Re: multicore wish
Gerd Stolpmann wrote: > It works with all types: > > https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_mem.mli > > look for init_value. It's non-released code yet. > > However, there are some problems: Values outside the heap do not support > the polymorphic comparison and hash functions. That's a hard limitation, > e.g. you cannot even compare two strings, or build a hash table with > strings as keys. That limits the usefulness of shared memory. In OCaml 3.11 and later, you can call caml_page_table_add(In_static_area, start, end) to inform the run-time system that the address range [start, end) contains well-formed Caml data that polymorphic primitives can safely work on. This should solve your problem. - Xavier Leroy ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Re: multicore wish
Am Montag, den 28.12.2009, 19:05 +0100 schrieb Xavier Leroy: > Gerd Stolpmann wrote: > > > It works with all types: > > > > https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_mem.mli > > > > look for init_value. It's non-released code yet. > > > > However, there are some problems: Values outside the heap do not support > > the polymorphic comparison and hash functions. That's a hard limitation, > > e.g. you cannot even compare two strings, or build a hash table with > > strings as keys. That limits the usefulness of shared memory. > > In OCaml 3.11 and later, you can call > >caml_page_table_add(In_static_area, start, end) > > to inform the run-time system that the address range [start, end) > contains well-formed Caml data that polymorphic primitives can safely > work on. This should solve your problem. Yes, in deed! Very nice. As there are still unused bits in the page table... I'm thinking about the following. One bit could be used by the GC to indicate that values somewhere in the page are still referenced from heap memory. At the beginning of the mark phase the bit is cleared. Whenever a pointer is seen that goes to a page marked as In_static_area, the bit for the area is set. As before, the mark phase ignores these pointers otherwise (they are not followed). After the mark phase, one could check which of the pages are still used, and this would make it possible to call finalisers for whole pages. As we only add an "else case" to the mark loop, the costs for this are negligible. In the shared memory scenario, each process mapping share memory could define such a finaliser, and only when it is called the memory block is unmapped. (And when a global counter is decreased to 0, the shared memory object as such can be deleted. But that's outside the ocaml runtime.) With that help, shared memory would be relatively safe to use. The only remaining trap would be value pointers that go from shared memory to heap memory - but ocaml has already lots of ways to control mutability of values, so I think it is ok to let the programmer do this. Gerd -- Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany g...@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs