Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-11-08 Thread Jon Harrop
On Friday 25 September 2009 00:28:57 Jon Harrop wrote:
 Just to quantify this with a data point: the fastest (serial) version of my
 ray tracer benchmark is 10x slower with the new GC. However, this is
 anomalous with respect to complexity and the relative performance is much
 better for simpler renderings. For example, the new GC is only 1.7x slower
 with n=6 instead of n=9.

The new SmartPumpkin release of OC4MC does a lot better. Specifically, the 
version compiled with partial collections is now only 3.9x slower on a serial 
ray tracer with n=9 (compared to 10x slower before). I'll try it in more 
detail...

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-10-09 Thread Jon Harrop
On Saturday 26 September 2009 00:26:50 Benjamin Canou wrote:
 On the maintenance side, as Philippe said, we already have some half
 working version with ocaml 3.11.x, but partly because of the changes
 made to the native runtime in this release and partly because of [1],
 porting the patch is not trivial.

OC4MC seems to work very well for numerical problems that do not allocation at 
all but introducing even the slightest mutation (not even in the inner loop) 
completely destroys performance and scaling. I'm guessing the reason is that 
any allocations eventually trigger collections and those are copying the 
entire heap which, in this case, consists almost entirely of float array 
arrays.

My guess was that using big arrays would alleviate this problem by placing 
most of the data outside the OCaml heap (I'm guessing that oc4mc leaves the 
element data of a big array alone and copies only the small reference to 
it?). However, it does not seem to handle bigarrays:

../out/lib/ocaml//libbigarray.a(bigarray_stubs.o): In function 
`caml_ba_compare':
bigarray_stubs.c:(.text+0x1e5): undefined reference to 
`caml_compare_unordered'
bigarray_stubs.c:(.text+0x28d): undefined reference to 
`caml_compare_unordered'
collect2: ld returned 1 exit status
Error during linking

If I am correct then I would value functioning bigarrays above OCaml 3.11 
support.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-26 Thread kcheung
 On Saturday 26 September 2009 01:45:50 kche...@math.carleton.ca wrote:
 Perhaps an off-topic and naive question: What does it take to beat F#
 and
 still have predictable performance?

 Provided you're talking abouts today's machines and don't care about pause
 times, HLVM with a parallel GC (not unlike the oc4mc one) and a task
 library
 would beat F# and still have predictable performance.

If I understand correctly, HLVM is an
analog of Microsoft's CLR.  So theoretically,
one can build a compiler for ocaml that
compiles to HLVM.  Would that make ocaml
beat F#?

Kevin.

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-26 Thread Jon Harrop
On Saturday 26 September 2009 14:51:21 kche...@math.carleton.ca wrote:
  On Saturday 26 September 2009 01:45:50 kche...@math.carleton.ca wrote:
  Perhaps an off-topic and naive question: What does it take to beat F#
  and
  still have predictable performance?
 
  Provided you're talking abouts today's machines and don't care about
  pause times, HLVM with a parallel GC (not unlike the oc4mc one) and a
  task library
  would beat F# and still have predictable performance.

 If I understand correctly, HLVM is an
 analog of Microsoft's CLR.

HLVM certainly draws upon ideas from the CLR but it is different in many 
respects. One important advantage of HLVM over the CLR is that it handles 
structs correctly in the presence of tail calls (thanks to LLVM). This means 
that tuples can be represented (in the absence of polymorphic recursion) as 
unboxed C structs which *greatly* reduces the burden on the garbage 
collector. HLVM also uses a far superior code generator (LLVM) compared to 
the CLR and OCaml.

 So theoretically, 
 one can build a compiler for ocaml that
 compiles to HLVM.  Would that make ocaml
 beat F#?

That would beat the performance of F# with minimal effort. That was the goal 
of my HLVM hobby project but I was forced to shelve it when the recession 
hit. Hopefully I'll get back to it in 2010...

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-25 Thread Hugo Ferreira

Hello,

In tried not getting into this discussion but I could not resist
commenting on the following:

Jacques Garrigue wrote:
...
 ... There are applications for that (ray tracing is
 one), but this is not the kind of needs most people have.
...

As with most technology people will or will not use something
according to their perceived effort/pleasure to learn/use
something and the advantages it is supposed to bring.

Put it another way; if parallel/concurrent programming could be
easily used with a minimum of effort then I believe most people
would use it simply because it is available.

In other words the (ready) availability of (multi-core PCs and)
parallel computing support (in Ocaml) will certainly influence the
number of people that will take advantage of it simply because it
is available (confer with e-mails on this thread).

...
 If I tell you that you just have to modify a bit your program to get a
 near linear speedup, then it looks great. But in practice it is rather
 having to rethink completely your algorithm, to eventually get a
 speedup bounded by bandwidth, and starting from a point lower than the
 original single thread program.
...

Rethinking our application/algorithmic structure may not be a real
deterrent. An application does not require parallel/concurrent
processing everywhere. It is really a question of identifying where
and when this is useful. Much like selecting the most appropriate 
data-structure for any application. It's not an all or nothing

proposition.

My 2 cents.

Regards,
Hugo F.

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-25 Thread Philippe Wang


On Sep 25, 2009, at 6:07 AM, Jacques Garrigue wrote:


First, like everybody else, I'd like very much to try this out.
Is there any chance it could compile on Snow Leopard :-)
(I suppose it's near impossible, but still ask...)


I haven't tried that yet, mostly because I guess that it wouldn't work  
out-of-the-box.
However, the .asm file should be ok with OS X and what may clash are  
configure file behavior and C macros.

I should take a closer look at that, since SL now seems to work well.

Cheers,


--
Philippe Wang
  philippe.w...@lip6.fr
  http://www-apr.lip6.fr/~pwang/

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-25 Thread Jon Harrop
On Friday 25 September 2009 08:32:26 Hugo Ferreira wrote:
 Put it another way; if parallel/concurrent programming could be
 easily used with a minimum of effort then I believe most people
 would use it simply because it is available.

Once your run-time supports it, you just need a library that farms tasks out 
to threads via queues and a lot of parallelism really is easy.

  ...
   If I tell you that you just have to modify a bit your program to get a
   near linear speedup, then it looks great. But in practice it is rather
   having to rethink completely your algorithm, to eventually get a
   speedup bounded by bandwidth, and starting from a point lower than the
   original single thread program.
  ...

 Rethinking our application/algorithmic structure may not be a real
 deterrent. An application does not require parallel/concurrent
 processing everywhere. It is really a question of identifying where
 and when this is useful. Much like selecting the most appropriate
 data-structure for any application. It's not an all or nothing
 proposition.

Right. Parallelizing programs generally consists of identifying a performance 
bottleneck via measurement and performing the outermost parallelizable loops 
in parallel. You can do many more clever things but they are far less common.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-25 Thread Philippe Wang
On Fri, Sep 25, 2009 at 1:28 AM, Jon Harrop j...@ffconsultancy.com wrote:
 On Thursday 24 September 2009 15:38:06 Philippe Wang wrote:
 Very few programs that are not written with multicore in mind would
 not be penalized.
 I mean our GC is much much dumber than INRIA OCaml's one.
 Our goal was to show it was possible to have good performance with
 multicores for OCaml.
 Maybe someday we'll find some time to optimize the GC, but it's likely
 not very soon.

 Just to quantify this with a data point: the fastest (serial) version of my
 ray tracer benchmark is 10x slower with the new GC. However, this is
 anomalous with respect to complexity and the relative performance is much
 better for simpler renderings. For example, the new GC is only 1.7x slower
 with n=6 instead of n=9.

I just put a version with a bug fix on some structures allocation (20090925).
I hope it removes this anomaly.

-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-25 Thread Xavier Leroy

Jon Harrop wrote:

On Thursday 24 September 2009 13:39:40 Stefano Zacchiroli wrote:

On Thu, Sep 24, 2009 at 12:52:24PM +0100, Jon Harrop wrote:

The next steps are to get oc4mc into the apt repositories and build

Uhm, I'm curious: how do you plan to achieve that?


Good question. I have no idea, of course. :-)


That would be suicidal.  I definitely do not want to belittle the work
of Philippe and his teammates -- what they did is an amazing hack
indeed --, but you need to keep in mind the difference between a
proof-of-concept experiment and a product.

In a proof-of-concept experiment, you implement the feature want to
experiment with and keep everything else as simple as possible
(otherwise there is little chance that you'll complete the
experiment).  That's exactly what Philippe et al did, and rightly so:
their GC is about the simplest you can think of, they didn't bother
adapting some features of the run-time system, they target AMD64/Unix
only, etc.  Now they have a platform they can experiment with and make
measurements on: mission accomplished.

In a product, you'd need something that is essentially a drop-off
replacement for today's OCaml and can run, say, Coq with at most a 10%
slowdown.  That's a long way to go (I'd say a couple of years of work).
For example, single-generation stop-and-copy GC is known to have
terrible performance (both in running time and in latency) for
programs that have large data sets and allocate intensively.  This is
true in the sequential case and even worse in a stop-the-world
parallel setting, by Amdahl's law.  Note that the programs I mentioned
above are exactly those that the Caml user community cares most about
-- not matrix multiply nor ray tracers, Harrop's propaganda
notwithstanding -- and those for which OCaml has been delivering
top-class performance for the last 12 years -- again, Harrop's
propaganda notwithstanding.

On your way to a product, you'd need to independently-collectable
generations (which means some work on the compiler as well), plus a
parallel or even better concurrent major collector.  And of course a
lot more work on the runtime system and C interface to make everything
truly reentrant while remaining portable.  And probably some kind of
two-level scheduler for threads.  And after all that work
you'd end up with an extremely low-level and unsafe parallel
programming model that you'd need to tame by developing clever
libraries that mere mortals can use effectively (Apple's Grand Central
was mentioned on this thread; it's a good example)...

In summary, Philippe and his coauthors do deserve a round of applause,
but please keep a cool head.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-25 Thread Jon Harrop
On Friday 25 September 2009 05:07:21 Jacques Garrigue wrote:
 Your benchmark seems strange to me, as you are comparing apples with
 oranges.

In some sense, yes. I was interested in the performance of the 
defacto-standard hash table implementations and not the performance that can 
be obtained by reinventing the wheel.

 Hashtables in Python are a basic feature of the language, 
 and they are of course implemented in C. In ocaml, they are
 implemented in ocaml (except the hashing function, which has to be
 polymorphic), using an array of association lists!
 (Actually the pairs are flattened for better performance, but still)
 What is impressive is that you don't need any special optimization to
 get reasonably good performance.

OCaml is 4x slower than F# on that benchmark for several reasons:

1. Overhead of 31-bit int arithmetic.

2. Lack of constant table sizes in the implementation and OCaml's failure to 
optimize mod-by-a-constant.

3. No monomorphization.

You can write a far more efficient hash table implementation in F# than you 
can in OCaml because it addressed all of those deficiencies.

 Actually the only tuning you need is to start from a reasonable table size,
 which you didn't... 

No, the exact opposite is true: OCaml had the unfair advantage of starting 
from the optimal table size for the problem whereas F# started from the 
default size and had to resize. If you level the playing field then OCaml is 
8x slower than F#.

  Even if that were not the case, the idea of cherry picking interpreted
  scripting languages to compete with because OCaml has fallen so far
  behind mainstream languages (let alone modern languages) is embarrassing.
  What's next, OCaml vs Bash for your high performance needs?

 OCaml was never touted as an HPC language!

I started learning OCaml because people were running high performance OCaml 
code on a 256-CPU supercomputer in Cambridge. I have been touting OCaml for 
HPC ever since. Thousands of scientists and engineers all over the world have 
used OCaml for technical computing and chose it precisely because it was 
competitively performant.

 The only claim I've seen is that it intends to stay within 2x of C for most
 applications. (Which is not so easy these days, gcc getting much faster.)

Yes. The infrastructure for compiler writers is improving rapidly as well 
though, e.g. LLVM.

 Actually, I believe that Philippe's point is rather different.
 Making a functional language work well on multicores is difficult.
 If I tell you that you just have to modify a bit your program to get a
 near linear speedup, then it looks great. But in practice it is rather
 having to rethink completely your algorithm,

Sure. The free lunch is over. However, the solution usually consists either of 
spawning independent computations or parallelizing outer loops, both of which 
can be made very easy by the language implementor.

 to eventually get a speedup bounded by bandwidth,

For some applications under certain circumstances, yes.

 and starting from a point lower than the original single thread program.

Yes.

 There are applications for that (ray tracing is one), but this is not the
 kind of needs most people have. 

Not the kind of needs the remaining OCaml programmers have, perhaps. Outside 
the OCaml world, a lot of people are now programming for multicores.

 By the way, I was discussing with numerical computation people working
 on BLAS the other day, and their answer was clear: if you need high 
 performance, better use a grid than SMP, since bandwidth is 
 paramount. 

That is a false dichotomy. Grids are inevitably composed of multicores so you 
will still lose out if you fail to leverage SMP when programming for a grid.

 ...And you have to write in C or FORTRAN (or asm), because the timing of
 instructions matter. 

I have written linear algebra code in F# that outperforms Intel's vendor tuned 
Fortran (the MKL) by a substantial margin on Intel hardware. Moreover, their 
code only works on certain types whereas mine is generic.

OCaml is an excellent language for this kind of work but it requires an 
implementation with a performance profile that is very different from 
OCaml's.

 The funniest part was that those people were working on integer
 computations, but had to stick to floating point, because timing on integers
 is unpredictable, making synchronization harder.   

Interesting.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-25 Thread kcheung
 I will add that we did not made this experiment to beat F# or python's
 hashtables, so I will not comment on that here. The point about
 performance is that it should be *predictable*.

Perhaps an off-topic and naive question:
What does it take to beat F# and still
have predictable performance?

In any case, OC4MC is very encouraging.
Congrats to the team!


___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-25 Thread Jon Harrop
On Saturday 26 September 2009 01:45:50 kche...@math.carleton.ca wrote:
 Perhaps an off-topic and naive question: What does it take to beat F# and
 still have predictable performance?

Provided you're talking abouts today's machines and don't care about pause 
times, HLVM with a parallel GC (not unlike the oc4mc one) and a task library 
would beat F# and still have predictable performance.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread rixed
  Wow! 2.6x faster on 2 cores is good. ;-)
 
 Isn't that impossible?  Or is the multicore GC better than the single
 threaded one?  (Sorry if this is a stupid or obvious question)

There are so many factors that makes the running time unpredictable that
nothing is surprising any more. Haven't you read this paper [1] about the
length of an environment variable causing a program to be 10% faster or
slower ? :)

[1]: http://www-plan.cs.colorado.edu/diwan/asplos09.pdf

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread kcheung
 On Thursday 24 September 2009 01:01:58 you wrote:

 No problem. I'll be happy to get anything working!

 Following your advice, it seems to work perfectly now:

I'm not too familiar with concurrency in ocaml.
How does OC4MC compare with JoCaml?


 $ ./matmul.th 500 1
 Temp de calcul: utime 2.324145, stime 0.020001, rtime 2.325608
 $ ./matmul.th 500 2
 Temp de calcul: utime 1.780111, stime 0.00, rtime 0.890797
 $ ./matmul.th 500 3
 Temp de calcul: utime 1.784111, stime 0.004000, rtime 0.608895
 $ ./matmul.th 500 4
 Temp de calcul: utime 1.764110, stime 0.004000, rtime 0.451214
 $ ./matmul.th 500 5
 Temp de calcul: utime 1.768111, stime 0.00, rtime 0.393285
 $ ./matmul.th 500 6
 Temp de calcul: utime 1.924120, stime 0.004001, rtime 0.333215
 $ ./matmul.th 500 7
 Temp de calcul: utime 1.788112, stime 0.00, rtime 0.302328
 $ ./matmul.th 500 8
 Temp de calcul: utime 1.992124, stime 0.00, rtime 0.290383

 Wow! 2.6x faster on 2 cores is good. ;-)

 That's a really fantastic piece of work. I'll do my best to study it and
 write
 literature about it. May I ask, can you give a rough overview of the
 design?
 For example, is there a separate nursery per thread so each thread can
 allocate a certain amount before incurring a global pause? Do you have any
 ideas for libraries built on top of this, such as a task parallel library
 using work-stealing deques?

 Thanks very much!!!

 --
 Dr Jon Harrop, Flying Frog Consultancy Ltd.
 http://www.ffconsultancy.com/?e

 ___
 Caml-list mailing list. Subscription management:
 http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
 Archives: http://caml.inria.fr
 Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
 Bug reports: http://caml.inria.fr/bin/caml-bugs



___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Florian Hars
Richard Jones schrieb:
 On Thu, Sep 24, 2009 at 02:47:17AM +0100, Jon Harrop wrote:
 Wow! 2.6x faster on 2 cores is good. ;-)
 
 Isn't that impossible?  Or is the multicore GC better than the single
 threaded one?  (Sorry if this is a stupid or obvious question)

It might just happen that the size of the working set and memory
access pattern of the application is just right so that you get a
better interleaving of cache misses and thread execution if you
run more than two threads on two cores. Hyperthreading might muddle
things further.

- Florian

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Jon Harrop
On Thursday 24 September 2009 10:49:43 Richard Jones wrote:
 On Thu, Sep 24, 2009 at 02:47:17AM +0100, Jon Harrop wrote:
  Wow! 2.6x faster on 2 cores is good. ;-)

 Isn't that impossible?  Or is the multicore GC better than the single
 threaded one?  (Sorry if this is a stupid or obvious question)

Superlinear scaling is entirely possible because more cores can mean more 
cache in play. However, I have only seen superlinear scaling on AMD hardware 
and not Intel hardware.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Jon Harrop
On Thursday 24 September 2009 11:00:57 kche...@math.carleton.ca wrote:
  On Thursday 24 September 2009 01:01:58 you wrote:
 
  No problem. I'll be happy to get anything working!
 
  Following your advice, it seems to work perfectly now:

 I'm not too familiar with concurrency in ocaml.
 How does OC4MC compare with JoCaml?

JoCaml is all about concurrency: minimizing latency. Oc4mc is all about 
parallelism: maximizing throughput.

Until now, OCaml sucked at parallelism. You can sometimes obtain some 
parallelism by forking threads but it is asymptotically slower than using 
shared memory. Consequently, oc4mc is a hugely-important development in the 
OCaml world because it means that OCaml programmers can write OCaml programs 
that use multicore machines efficiently for the first time.

The next steps are to get oc4mc into the apt repositories and build some 
libraries that make parallelism easier (like Microsoft's Task Parallel 
Library).

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Rakotomandimby Mihamina

09/24/2009 02:52 PM, Jon Harrop:

The next steps are to get oc4mc into the apt repositories


Amen! ;-)

--
  Architecte Informatique chez Blueline/Gulfsat:
   Administration Systeme, Recherche  Developpement
   +261 34 29 155 34

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread rixed
 Until now, OCaml sucked at parallelism. (...) OCaml programmers
 can write OCaml programs that use multicore machines efficiently
 for the first time.

Subtle and strongly argumented, as expected.

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Philippe Wang
On Thu, Sep 24, 2009 at 3:47 AM, Jon Harrop j...@ffconsultancy.com wrote:
 Following your advice, it seems to work perfectly now:

:-)

 Wow! 2.6x faster on 2 cores is good. ;-)

your machine is more generous than ours (which is Intel, not AMD) :-)

 That's a really fantastic piece of work. I'll do my best to study it and write
 literature about it. May I ask, can you give a rough overview of the design?
 For example, is there a separate nursery per thread so each thread can
 allocate a certain amount before incurring a global pause? Do you have any
 ideas for libraries built on top of this, such as a task parallel library
 using work-stealing deques?

A few words on the GC's design (that uses stopcopy algorithm several times) :

Heaps :
- a set of pages are used to give threads the possibility to allocate
memory without interfering with other threads, such as there is no
mutex locking at local memory allocation. Each thread borns with an
empty page, when it's full, the thread takes another one.
- a big heap is shared between all, there is a mutex over it to
prevent parallel memory allocation into this one.

Collection :
- when there are no pages left, a collection stops-the-world and
copies living values (of the pages) to the shared heap
- when the shared heap is full, a collection stops-the-world and
copies all living values (pages+shared heap) to a new shared heap
(which can be grow if need be)

Special operations :
- if there is a blocking operation (e.g. mutex lock or I/O operation),
the mechanism is roughly the same as original INRIA OCaml's : it tells
the GC that there is no need to stop it when stopping the world.
- if there is a thread with no allocation and no blocking operation,
the behaviur is the same as INRIA OCaml.


The number of pages, the size of a page, and the size of the shared
heap can be changed before running a program by setting some
environment variables (cf. last lines README file included in the
distribution package).



-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Stefano Zacchiroli
On Thu, Sep 24, 2009 at 12:52:24PM +0100, Jon Harrop wrote:
 The next steps are to get oc4mc into the apt repositories and build

Uhm, I'm curious: how do you plan to achieve that?
AFAICT the patch is only against 3.10.2, and in Debian we're at 3.11.1.

Thus far, we have never had support for more than one version of OCaml
at a time. If it were worth we can surely consider that, but the current
uncertainty about OC4MC future doesn't seem enough to justify that.

So, the real question is: is OC4MC going to be ported to mainline OCaml
and support in the future or not? If the answer is no, I don't see it
arriving in Debian anytime soon.

Cheers.

-- 
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
z...@{upsilon.cc,pps.jussieu.fr,debian.org} -- http://upsilon.cc/zack/
Dietro un grande uomo c'è ..|  .  |. Et ne m'en veux pas si je te tutoie
sempre uno zaino ...| ..: | Je dis tu à tous ceux que j'aime


signature.asc
Description: Digital signature
___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Jon Harrop
On Thursday 24 September 2009 13:39:40 Stefano Zacchiroli wrote:
 On Thu, Sep 24, 2009 at 12:52:24PM +0100, Jon Harrop wrote:
  The next steps are to get oc4mc into the apt repositories and build

 Uhm, I'm curious: how do you plan to achieve that?

Good question. I have no idea, of course. :-)

 AFAICT the patch is only against 3.10.2, and in Debian we're at 3.11.1.

Philippe, is it feasible to bring your patches up to date wrt OCaml?

 Thus far, we have never had support for more than one version of OCaml
 at a time. If it were worth we can surely consider that, but the current
 uncertainty about OC4MC future doesn't seem enough to justify that.

Fair enough. I think this is the single most important development OCaml has 
seen since its inception so I would personally drop OCaml in favor of oc4mc 
even if it meant reverting to 3.10.2.

There is also the issue that this is x64 only...

 So, the real question is: is OC4MC going to be ported to mainline OCaml
 and support in the future or not? If the answer is no, I don't see it
 arriving in Debian anytime soon.

Yes, that would be ideal. Pretty please, Xavier? ;-)

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Jon Harrop
On Thursday 24 September 2009 13:14:35 Philippe Wang wrote:
 On Thu, Sep 24, 2009 at 3:47 AM, Jon Harrop j...@ffconsultancy.com wrote:
  Following your advice, it seems to work perfectly now:
 
 :-)
 :
  Wow! 2.6x faster on 2 cores is good. ;-)

 your machine is more generous than ours (which is Intel, not AMD) :-)

Yes. I don't know why AMD are so much better at this but I have seen it 
several times now.

  That's a really fantastic piece of work. I'll do my best to study it and
  write literature about it. May I ask, can you give a rough overview of
  the design? For example, is there a separate nursery per thread so each
  thread can allocate a certain amount before incurring a global pause? Do
  you have any ideas for libraries built on top of this, such as a task
  parallel library using work-stealing deques?

 A few words on the GC's design (that uses stopcopy algorithm several
 times) :

 Heaps :
 - a set of pages are used to give threads the possibility to allocate
 memory without interfering with other threads, such as there is no
 mutex locking at local memory allocation. Each thread borns with an
 empty page, when it's full, the thread takes another one.
 - a big heap is shared between all, there is a mutex over it to
 prevent parallel memory allocation into this one.

 Collection :
 - when there are no pages left, a collection stops-the-world and
 copies living values (of the pages) to the shared heap
 - when the shared heap is full, a collection stops-the-world and
 copies all living values (pages+shared heap) to a new shared heap
 (which can be grow if need be)

Ok, so this is stopcopy GC with per-thread nurseries/gen0.

Are values such as float arrays copied in their entirety or are they allocated 
outside the shared heap and only a pointer to them is copied?

Is the copy operation parallelized?

Is there a write barrier but no read barrier? If so, what exactly does the 
write barrier do?

 Special operations :
 - if there is a blocking operation (e.g. mutex lock or I/O operation),
 the mechanism is roughly the same as original INRIA OCaml's : it tells
 the GC that there is no need to stop it when stopping the world.

Can users mark external calls in their bindings as blocking so the GC will 
treat them appropriately?

 - if there is a thread with no allocation and no blocking operation,
 the behaviur is the same as INRIA OCaml.

 The number of pages, the size of a page, and the size of the shared
 heap can be changed before running a program by setting some
 environment variables (cf. last lines README file included in the
 distribution package).

Great!

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Rakotomandimby Mihamina

09/24/2009 03:39 PM, Stefano Zacchiroli:

So, the real question is: is OC4MC going to be ported to mainline OCaml
and support in the future or not?


I dont write so much programs that would really require multiple cores.
But I think this is such a good feature that should be inclided in
the main distribution...

--
  Architecte Informatique chez Blueline/Gulfsat:
   Administration Systeme, Recherche  Developpement
   +261 34 29 155 34

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Mike Lin
On Thu, Sep 24, 2009 at 8:39 AM, Stefano Zacchiroli z...@debian.org wrote:


 So, the real question is: is OC4MC going to be ported to mainline OCaml
 and support in the future or not?


Recalling how mainline had us waiting like 5 years for native exception
backtraces, and then another like 3 years for the ability to access the
backtrace within the program, I most certainly hope NOT :)
(Nothing personal to INRIA, I work on academic projects and well know how
these things go, it's just not the most awesome maintenance schedule for
one's main PL)
___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Dario Teixeira
Hi,

Cheers for the work you guys put into this project!  And I'd like to join
the crowd that has questions, if I may:

a) If I understand correctly, part of prerequisites for implementing the
   new GC was cleaning up the excessive use of imperative constructs in
   the compiler's tree.  Will the new tree be also more amenable to the
   implementation of new language constructs such as GADTs?

b) Could you quantify the performance penalty (if any) of using the new GC
   in a single-thread context?  And should this penalty be significant, are
   there provisions for a compile-time choice of which GC to use?

c) Is there an understanding between you and the folks at INRIA concerning
   the eventual merging of this code into the mainline tree?

Thanks a lot for your time!
Best regards,
Dario Teixeira





___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Philippe Wang
On Thu, Sep 24, 2009 at 3:40 PM, Rakotomandimby Mihamina
miham...@gulfsat.mg wrote:
 09/24/2009 03:39 PM, Stefano Zacchiroli:

 So, the real question is: is OC4MC going to be ported to mainline OCaml
 and support in the future or not?

 I dont write so much programs that would really require multiple cores.
 But I think this is such a good feature that should be inclided in
 the main distribution...

Thing is that having a runtime library that supports parallel threads
costs more than having a runtime library that doesn't.

Programs that take advantage of multicore architectures are not easy
to write, not easy to maintain, not easy to debug, ...
So it's a great feature, so it should get into mainstream is not a
good enough reason for INRIA's team. It's probably up to the community
to find a great way of taking advantage of multicore architectures.

One must be aware that
- parallel threads vs not-parellel threads : if a program is well
suited to parallel computing on multicore CPUs, then it means that
not-parallel-capable runtime library puts the performance bottleneck
at the CPU. Then, allowing parallel threads means *moving* this
bottleneck (moving, not removing) : indeed, it's much likely that the
bottleneck will then be at memory (RAM) bandwidth. See, if your memory
is 1000 MHz, having 8 cores means 125MHz/core, which becomes
ridiculous even if it were 2400MHz it would mean only 300MHz/core,
imaging a 300MHz memory bandwidth for a 3GHz core !  So it's *very*
important to keep that in mind.
- for programming langages that are from the early beginning quite
slower than INRIA OCaml, it's much easier to gain performance because
they come from far, sometimes from very very far.

Well, from a quite subjective personal point of view, of course it
would be really great to give parallel threads capability to
mainstream INRIA OCaml, because it would mean having found a (great)
acceptable solution.

-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Stefano Zacchiroli
On Thu, Sep 24, 2009 at 04:40:53PM +0300, Rakotomandimby Mihamina wrote:
 I dont write so much programs that would really require multiple cores.
 But I think this is such a good feature that should be inclided in
 the main distribution...

I think you miss what does that would mean in terms of efforts for
maintaining the corresponding packages. De facto, it would mean
duplicating all source packages of the libraries you want to be able to
build against ocaml 3.10.2 + OC4MC.

You want PCRE? then you need two PCRE packages (3.11 and 3.10.2 4MC)
You want ocamlnet? then you need two ocamlnet packages

You got the picture :-)

Additionally, it would also mean supporting in-house potential security
problems arising for old version of the compiler (or even 3rd party
libraries when you will be forced to fork then due to source-level
incompatibilities between versions) without any upstream support.

Not fun.

Cheers.

-- 
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
z...@{upsilon.cc,pps.jussieu.fr,debian.org} -- http://upsilon.cc/zack/
Dietro un grande uomo c'è ..|  .  |. Et ne m'en veux pas si je te tutoie
sempre uno zaino ...| ..: | Je dis tu à tous ceux que j'aime

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Stefano Zacchiroli
On Thu, Sep 24, 2009 at 09:55:53AM -0400, Mike Lin wrote:
 On Thu, Sep 24, 2009 at 8:39 AM, Stefano Zacchiroli z...@debian.org wrote:
  So, the real question is: is OC4MC going to be ported to mainline OCaml
  and support in the future or not?
 
 Recalling how mainline had us waiting like 5 years for native exception
 backtraces, and then another like 3 years for the ability to access the
 backtrace within the program, I most certainly hope NOT :)
 (Nothing personal to INRIA, I work on academic projects and well know how
 these things go, it's just not the most awesome maintenance schedule for
 one's main PL)

But the result you are anticipating will actually mean low acceptance of
OC4MC among common users, possibly close to 0. All mainstream ways
of distributing OCaml (both .rpm and .deb distros, GODI, ...)  are
regularly switching to most recent versions of the compiler.

The only people being able to stay to 3.10.2 to benefit of OC4MC will be
industries which fixed their developed on a specific version and do not
plan to change.

Or am I missing something?
Cheers.

-- 
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
z...@{upsilon.cc,pps.jussieu.fr,debian.org} -- http://upsilon.cc/zack/
Dietro un grande uomo c'è ..|  .  |. Et ne m'en veux pas si je te tutoie
sempre uno zaino ...| ..: | Je dis tu à tous ceux que j'aime

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Philippe Wang
On Thu, Sep 24, 2009 at 3:11 PM, Jon Harrop j...@ffconsultancy.com wrote:
 Are values such as float arrays copied in their entirety or are they allocated
 outside the shared heap and only a pointer to them is copied?

They should be in a heap (page or shared). We don't allocate many
things outside the heaps.

 Is the copy operation parallelized?

Nope. When the world is stopped for the collection, everything is done
sequentially until the world is resumed.
I don't think it's relevant to parallelize the copy operation (hell to
implementdebug, then I don't think that performance would be very
interesting because we would probably need a write mutex on the
destination heap)

 Is there a write barrier but no read barrier? If so, what exactly does the
 write barrier do?

There is a lock when a thread is created because we need to update the
list of existing threads and we have to give it a page.
Then, each time a thread wants memory, it checks if the world needs to
be stopped. If the world needs to be stopped, it means that there is a
necessary collection waiting for the world to be stopped.
There is lock if a thread needs to allocate memory in the shared heap
so that two threads don't end up using the same space for different
things.
If two threads want to write in the same block, it's up to the
programmer to prevent (or allow) such a thing with a mutex (or
whatever other mechanism).

 Special operations :
 - if there is a blocking operation (e.g. mutex lock or I/O operation),
 the mechanism is roughly the same as original INRIA OCaml's : it tells
 the GC that there is no need to stop it when stopping the world.

 Can users mark external calls in their bindings as blocking so the GC will
 treat them appropriately?

Yes, it's the same as INRIA OCaml : enter_blocking_operation /
leave_blocking_operation functions.
It's mandatory that in the section between entrance and exit, the
thread is not accessing anything allocated in a Caml heap.
If there is need to write some value returned by the blocking
operation, it should be written in a C side value (on C stack or with
C malloc) and put back to Caml heap after exit (and then C free if C
malloced).


-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Philippe Wang
I've seen a question about 3.11 and I think I didn't answer, so I'm
answering here :

We have tried to make OC4MC work with OCaml 3.11 (I don't remember the
subsubversion number). Currently, it does not work properly (it's
still too easy to write a program that crashes or deadlocks).

Cheers,

Philippe Wang

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Dario Teixeira
Hi,

 Very few programs that are not written with multicore in mind would
 not be penalized. I mean our GC is much much dumber than INRIA OCaml's
 one. Our goal was to show it was possible to have good performance with
 multicores for OCaml.  Maybe someday we'll find some time to optimize
 the GC, but it's likely not very soon.

Thanks for the clarification.  While not detracting from your work (which
I think is very interesting and valuable), for me single-thread performance
is still paramount.  I am working in a domain (doing backend web application
programming using the Ocsigen framework) where multi-threaded parallelism
is a bit silly, since you can get much better performance and design
simplicity by running multiple independent servers (one for each core).
Each server runs multiple concurrent Lwt-threads (a cooperative form of
green threads) to make sure the CPU is always busy and not waiting on I/O.

This solution has the advantage of requiring no process context-switching
within each server, while still maximising CPU utilisation.  And I suspect
there are many other fields where a similar approach could be used 
advantageously
instead of thread-based parallelism.


 I guess that if INRIA decides to implement parallel threads capability,
 they will have to make the runtime library ready (clean up some global
 variables, tidy the code like remove compatibility.h and such stuff)
 before thinking about the GC. This could take some time, because it's
 not good to break everything at once. Then, if they have finished this
 step, I would be confident that they could integrate an awesome GC.
 But that's only my personal opinion...

Again, it's a question of whether the cost justifies the benefits.
Personally, I'm in the camp that would rather see improvements to
the type system (like native GADTS!)...

Anyway, keep us appraised of your work.  It's very welcome.

Best regards,
Dario Teixeira



  

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Jon Harrop
On Thursday 24 September 2009 13:11:24 ri...@happyleptic.org wrote:
  Until now, OCaml sucked at parallelism. (...) OCaml programmers
  can write OCaml programs that use multicore machines efficiently
  for the first time.

 Subtle and strongly argumented, as expected.

I forgot to mention that multithreaded programming is vastly easier than 
multi-process programming in the context of parallelism because you get 
automatic memory management and O(1) communication.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Pascal Cuoq

On Sep 24, 2009, at 5:47 PM, Philippe Wang wrote:


Is the copy operation parallelized?


Nope. When the world is stopped for the collection, everything is done
sequentially until the world is resumed.
I don't think it's relevant to parallelize the copy operation (hell to
implementdebug, then I don't think that performance would be very
interesting because we would probably need a write mutex on the
destination heap)


Well, you could start copying to the bottom of the next heap with
one thread going up and to the top of it with another going down.
Assume optimistically that the two threads will not reach the same
cacheline at the end of the copies, and you don't need any
synchronisation at all between them, except joining at the end.

After checking, if they have reached the same cacheline,
you need to reallocate the destination heap anyway.

You still get a single unfragmented free block as a result.

Even better: stop the world just before there remains less that one
cacheline of free space and you don't need to check if the two threads  
have

met. You still need to reallocate the destination heap sometimes though.

Oh, and I meant to say, but everyone else was faster than me:
well done!

Pascal

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Philippe Wang



On Sep 24, 2009, at 18:02 GMT+02:00, Pascal Cuoq wrote:


On Sep 24, 2009, at 5:47 PM, Philippe Wang wrote:


Is the copy operation parallelized?


Nope. When the world is stopped for the collection, everything is  
done

sequentially until the world is resumed.
I don't think it's relevant to parallelize the copy operation (hell  
to

implementdebug, then I don't think that performance would be very
interesting because we would probably need a write mutex on the
destination heap)


Well, you could start copying to the bottom of the next heap with
one thread going up and to the top of it with another going down.
Assume optimistically that the two threads will not reach the same
cacheline at the end of the copies, and you don't need any
synchronisation at all between them, except joining at the end.

After checking, if they have reached the same cacheline,
you need to reallocate the destination heap anyway.

You still get a single unfragmented free block as a result.

Even better: stop the world just before there remains less that one
cacheline of free space and you don't need to check if the two  
threads have
met. You still need to reallocate the destination heap sometimes  
though.


A concurrent copy means that there would be bad overhead for single  
core. It also means putting bottleneck to memory bandwidth as memory  
copy operations are clearly quickly limited by this bandwidth, not by  
CPU. It may hopefully become false in a few years, but hardware  
manufacturers don't seem to be excited by that, they seem to prefer  
making the marketing on the number of cores. Look at GPUs : they have  
very fast graphical RAM, but they have a huge number of processing  
units. I don't really see the point in that (i.e. having a huge number  
of PU) anyway (except marketing).


Ok, back to GC stuff. A stopcopy algorithm needs to have a set of  
roots to make the copy of living values.
Each thread has its stack, so it has its subset of roots. Then what ?  
Parallelize the copy from each thread ? Ok we have to determine the  
best number of threads according to number of cores but more  
importantly according to memory bandwidth given per core. (what a  
nightmare!)
Then there are shared values (in the shared heap for instance, but  
what if there are lateral pointers due to mutable values?). (We are  
leaving the nightmare for hell! but some people have been there.)  
Copying a living value means that if later you encounter something  
pointing to its old address, you have to know the new one. This means  
writing at the old address. I don't see how we can make *today*  
something very interesting in concurrent with a stopcopy algorithm. I  
believe (but I'm *not* a GC expert at all) concurrent GCs are not  
based on stopcopy algorithm but rather some mark{do-some-stuff-such- 
as-sweep}.




Oh, and I meant to say, but everyone else was faster than me:
well done!


Thank you, and thanks everyone else who appreciate this work. :-)

Philippe Wang

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Richard Jones
On Thu, Sep 24, 2009 at 02:09:56PM +0100, Jon Harrop wrote:
 Fair enough. I think this is the single most important development OCaml has 
 seen since its inception so I would personally drop OCaml in favor of oc4mc 
 even if it meant reverting to 3.10.2.

I think 'personally' is the key word there.  You forget that people
are quite happily programming in very slow languages like Perl,
Python, Ruby and Visual Basic, and those people vastly outnumber the
ones using F#, Haskell, OCaml, SML etc.  (They don't even have static
safety, dammit!).

Rich.

-- 
Richard Jones
Red Hat

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Philippe Wang



On Sep 24, 2009, at 18:49 GMT+02:00, Richard Jones wrote:


On Thu, Sep 24, 2009 at 02:09:56PM +0100, Jon Harrop wrote:
Fair enough. I think this is the single most important development  
OCaml has
seen since its inception so I would personally drop OCaml in favor  
of oc4mc

even if it meant reverting to 3.10.2.


I think 'personally' is the key word there.  You forget that people
are quite happily programming in very slow languages like Perl,
Python, Ruby and Visual Basic, and those people vastly outnumber the
ones using F#, Haskell, OCaml, SML etc.  (They don't even have static
safety, dammit!).


Should we tell them that using CPU for nothing (side-effect for using  
a slow language) has a bad effect on global warming? Could it be a  
wake-up call? :-p


half-kidding,

Philippe Wang

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread David Teller
Well, let me join the chorus and congratulate.

I'll need to test this as soon as possible.

Cheers,
 David

On Tue, 2009-09-22 at 23:30 +0200, Philippe Wang wrote:
 This is some additional noise about OCaml for Multicore  
 architectures (or Ok with parallel threads GC).
 
 
 Dear list,
 
 We have implemented an alternative runtime library for OCaml, one that  
 allows threads to compute in parallel on different cores of now  
 widespread CPUs.
 
 This project will be presented at IFL 2009 
 (http://blogs.shu.edu/projects/IFL2009/ 
 ).
 
 A testing version available online at
 http://www.algo-prog.info/ocmc/
 It works with OCaml 3.10.2 for Linux x86-64bit, we haven't met any  
 bugs with the latest build (it doesn't *unexpectedly* crash, not yet).
 
 Hope you'll enjoy,
 
 --
 Mathias Bourgoin, Adrien Jonquet, Emmanuel Chailloux, Benjamin Canou,  
 Philippe Wang
 
 ___
 Caml-list mailing list. Subscription management:
 http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
 Archives: http://caml.inria.fr
 Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
 Bug reports: http://caml.inria.fr/bin/caml-bugs
 

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Jon Harrop
On Thursday 24 September 2009 17:49:33 Richard Jones wrote:
 On Thu, Sep 24, 2009 at 02:09:56PM +0100, Jon Harrop wrote:
  Fair enough. I think this is the single most important development OCaml
  has seen since its inception so I would personally drop OCaml in favor of
  oc4mc even if it meant reverting to 3.10.2.

 I think 'personally' is the key word there. You forget that people 
 are quite happily programming in very slow languages like Perl,
 Python, Ruby and Visual Basic,

Visual Basic has been a *lot* faster than OCaml for several years now, not 
least because it makes efficient multicore programming easy. Even Python is 
beating OCaml on benchmarks now:

http://flyingfrogblog.blogspot.com/2009/04/f-vs-ocaml-vs-haskell-hash-table.html

Even if that were not the case, the idea of cherry picking interpreted 
scripting languages to compete with because OCaml has fallen so far behind 
mainstream languages (let alone modern languages) is embarrassing. What's 
next, OCaml vs Bash for your high performance needs?

 and those people vastly outnumber the ones using F#, Haskell, OCaml, SML
 etc.  (They don't even have static safety, dammit!).

If you want to draw aspirations based upon popularity, look at the most 
popular languages: Java and C#. They are far more popular than OCaml for many 
reasons but parallel threads to make efficient multicore programming easy is 
a big one.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread rixed
 Visual Basic has been a *lot* faster than OCaml for several years now, not 
 (...) Even Python (...) Java and C#. They are far more popular than OCaml for 
 many 
 reasons but parallel threads to make efficient multicore programming easy is 
 a big one.

In general you sounds like a reasonable and knowledgeable person, yet in
some messages you seam to completely lose contact with reality.

Either you have a small kid at home who steals your identity when you
are away, or, considering that it always happens when the toppic gets
close to concurrency or the dotnet platform, you might be suffering
in some way.

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Jon Harrop
On Thursday 24 September 2009 15:38:06 Philippe Wang wrote:
 Very few programs that are not written with multicore in mind would
 not be penalized.
 I mean our GC is much much dumber than INRIA OCaml's one.
 Our goal was to show it was possible to have good performance with
 multicores for OCaml.
 Maybe someday we'll find some time to optimize the GC, but it's likely
 not very soon.

Just to quantify this with a data point: the fastest (serial) version of my 
ray tracer benchmark is 10x slower with the new GC. However, this is 
anomalous with respect to complexity and the relative performance is much 
better for simpler renderings. For example, the new GC is only 1.7x slower 
with n=6 instead of n=9.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Philippe Wang


On Sep 25, 2009, at 1:28 AM, Jon Harrop wrote:


On Thursday 24 September 2009 15:38:06 Philippe Wang wrote:

Very few programs that are not written with multicore in mind would
not be penalized.
I mean our GC is much much dumber than INRIA OCaml's one.
Our goal was to show it was possible to have good performance with
multicores for OCaml.
Maybe someday we'll find some time to optimize the GC, but it's  
likely

not very soon.


Just to quantify this with a data point: the fastest (serial)  
version of my

ray tracer benchmark is 10x slower with the new GC. However, this is
anomalous with respect to complexity and the relative performance is  
much
better for simpler renderings. For example, the new GC is only 1.7x  
slower

with n=6 instead of n=9.



Can you tell what data structures (and their sizes if possible) you  
are using?

Thanks for your feedbacks.

--
Philippe Wang
  philippe.w...@lip6.fr
  http://www-apr.lip6.fr/~pwang/

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Jacques Garrigue
First, like everybody else, I'd like very much to try this out.
Is there any chance it could compile on Snow Leopard :-)
(I suppose it's near impossible, but still ask...)

From: Jon Harrop j...@ffconsultancy.com
 Visual Basic has been a *lot* faster than OCaml for several years now, not 
 least because it makes efficient multicore programming easy. Even Python is 
 beating OCaml on benchmarks now:
 
 http://flyingfrogblog.blogspot.com/2009/04/f-vs-ocaml-vs-haskell-hash-table.html

IIRC, currently Visual Basic is just a skin for C#. You have to
write all the types, so it's rather hard to call it Basic. And yes,
MS has invested a lot in the CLR, and that pays.

Your benchmark seems strange to me, as you are comparing apples with
oranges. Hashtables in Python are a basic feature of the language,
and they are of course implemented in C. In ocaml, they are
implemented in ocaml (except the hashing function, which has to be
polymorphic), using an array of association lists!
(Actually the pairs are flattened for better performance, but still)
What is impressive is that you don't need any special optimization to
get reasonably good performance. Actually the only tuning you need is
to start from a reasonable table size, which you didn't (never start
from 1, you will have to redo all the hashing every time the table
needs to be grown).

 Even if that were not the case, the idea of cherry picking interpreted 
 scripting languages to compete with because OCaml has fallen so far behind 
 mainstream languages (let alone modern languages) is embarrassing. What's 
 next, OCaml vs Bash for your high performance needs?

OCaml was never touted as an HPC language! The only claim I've seen is
that it intends to stay within 2x of C for most applications. (Which
is not so easy these days, gcc getting much faster.)

Actually, I believe that Philippe's point is rather different.
Making a functional language work well on multicores is difficult.
If I tell you that you just have to modify a bit your program to get a
near linear speedup, then it looks great. But in practice it is rather
having to rethink completely your algorithm, to eventually get a speedup
bounded by bandwidth, and starting from a point lower than the original
single thread program. There are applications for that (ray tracing is
one), but this is not the kind of needs most people have.

By the way, I was discussing with numerical computation people working
on BLAS the other day, and their answer was clear: if you need high
performance, better use a grid than SMP, since bandwidth is
paramount. And you have to write in C or FORTRAN (or asm), because the
timing of instructions matter. The funniest part was that those people
were working on integer computations, but had to stick to floating
point, because timing on integers is unpredictable, making
synchronization harder.

Cheers,

Jacques

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-23 Thread Goswin von Brederlow
Philippe Wang philippe.w...@lip6.fr writes:

 This is some additional noise about OCaml for Multicore
 architectures (or Ok with parallel threads GC).
 

 Dear list,

 We have implemented an alternative runtime library for OCaml, one that
 allows threads to compute in parallel on different cores of now
 widespread CPUs.

 This project will be presented at IFL 2009
 (http://blogs.shu.edu/projects/IFL2009/
 ).

 A testing version available online at
 http://www.algo-prog.info/ocmc/
 It works with OCaml 3.10.2 for Linux x86-64bit, we haven't met any
 bugs with the latest build (it doesn't *unexpectedly* crash, not yet).

 Hope you'll enjoy,

 --
 Mathias Bourgoin, Adrien Jonquet, Emmanuel Chailloux, Benjamin Canou,
 Philippe Wang

Has anyone tested this yet? Any success stories?

MfG
Goswin

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-23 Thread Jon Harrop
On Wednesday 23 September 2009 11:53:09 Goswin von Brederlow wrote:
 Has anyone tested this yet? Any success stories?

Its compiling. :-)

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-23 Thread Jon Harrop
On Wednesday 23 September 2009 13:21:35 Jon Harrop wrote:
 On Wednesday 23 September 2009 11:53:09 Goswin von Brederlow wrote:
  Has anyone tested this yet? Any success stories?

 Its compiling. :-)

Oops, I just compiled a vanilla OCaml 3.10 and their patch is not currently 
downloadable. I assume everyone else is thrashing their server instead of 
writing contentless posts here? :-)

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-23 Thread Philippe Wang
I've updated the download page, it should be more robust to multiple
downloads now.

Cheers,

Philippe Wang

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-23 Thread Jon Harrop
On Wednesday 23 September 2009 11:53:09 Goswin von Brederlow wrote:
 Has anyone tested this yet? Any success stories?

Well, I've used the build.sh script to build a patched OCaml 3.10.2 that 
identifies itself as:

$ ocamlopt -v
The Objective Caml native-code compiler, version 
3.10.2+patch-ocaml4multicore-20090823
Standard library 
directory: 
/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml

and I've built their tests:

$ cd tests
$ make matmul.nc
ocamlopt -o matmul.nc -thread unix.cmxa threads.cmxa 
graphics.cmxa matmul.ml
File matmul.ml, line 25, characters 8-13:
Warning Y: unused variable count.
File matmul.ml, line 26, characters 8-16:
Warning Y: unused variable last_col.

and run them:

$ time ./matmul.nc 1000 8
Temp de calcul: utime 38.930433, stime 0.012000, rtime 38.943138
Fatal error: exception Invalid_argument(index out of bounds)

real0m38.974s
user0m38.942s
sys 0m0.028s

Note the exception that (I think) should have been caught and handled 
silently.

But I cannot get anything to run in parallel. None of the tests use more than 
one core and my own busy-wait-loops-on-two-threads test also runs only on one 
core. Any idea what I'm doing wrong? Is there a flag to enable it or 
something?

One possible cause: I'm running in a 64-bit chroot.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-23 Thread Philippe Wang
make program.nc uses original ocamlopt

make program.th uses the newly built ocamlopt with the necessary
options (lib links)

then you can compare program.nc and program.th

On Thu, Sep 24, 2009 at 2:21 AM, Jon Harrop j...@ffconsultancy.com wrote:
 On Wednesday 23 September 2009 11:53:09 Goswin von Brederlow wrote:
 Has anyone tested this yet? Any success stories?

 Well, I've used the build.sh script to build a patched OCaml 3.10.2 that
 identifies itself as:

 $ ocamlopt -v
 The Objective Caml native-code compiler, version
 3.10.2+patch-ocaml4multicore-20090823
 Standard library
 directory: 
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml

 and I've built their tests:

 $ cd tests
 $ make matmul.nc
 ocamlopt -o matmul.nc -thread unix.cmxa threads.cmxa
 graphics.cmxa matmul.ml
 File matmul.ml, line 25, characters 8-13:
 Warning Y: unused variable count.
 File matmul.ml, line 26, characters 8-16:
 Warning Y: unused variable last_col.

 and run them:

 $ time ./matmul.nc 1000 8
 Temp de calcul: utime 38.930433, stime 0.012000, rtime 38.943138
 Fatal error: exception Invalid_argument(index out of bounds)

 real0m38.974s
 user0m38.942s
 sys 0m0.028s

 Note the exception that (I think) should have been caught and handled
 silently.

 But I cannot get anything to run in parallel. None of the tests use more than
 one core and my own busy-wait-loops-on-two-threads test also runs only on one
 core. Any idea what I'm doing wrong? Is there a flag to enable it or
 something?

 One possible cause: I'm running in a 64-bit chroot.

 --
 Dr Jon Harrop, Flying Frog Consultancy Ltd.
 http://www.ffconsultancy.com/?e

 ___
 Caml-list mailing list. Subscription management:
 http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
 Archives: http://caml.inria.fr
 Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
 Bug reports: http://caml.inria.fr/bin/caml-bugs




-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-23 Thread Philippe Wang
Ok... well, I guess that
- whether it is something about your environment that is too different
from ours (in which case build.sh is bad),
- whether you have corrupted your installation (it could be by having
a bad PATH value that makes original ocamlopt be mixed up with oc4mc
ocamlopt)


What I suggest is to use a default PATH (without modifying it for the
purpose of OC4MC), and do these steps in a clean directory that is not
included in PATH :

1) wget oc4mc-2009.tgz
2) tar xzf oc4mc-2009.tgz
3) cd oc4mc-2009
4) wget ocaml 3.10.2 (tar.gz or tar.bz2)
5) bash build.sh
   ... wait
6) cd test
7) make matmul.th
8) time matmul.th 1000 8

Sorry it's messy, we are thinking about something cleaner... (there's
a matter of lack of time somewhere)

cheers,

-- 
Philippe Wang
   m...@philippewang.info


On Thu, Sep 24, 2009 at 2:05 AM, Jon Harrop j...@ffconsultancy.com wrote:
 On Thursday 24 September 2009 00:15:14 you wrote:
 make program.nc uses original ocamlopt

 make program.th uses the newly built ocamlopt with the necessary
 options (lib links)

 then you can compare program.nc and program.th

 Aha! Progress, but now I get errors:

 $ make matmul.th
 ../out/bin/ocamlopt -ccopt -march=native -ccopt -mtune=native -ccopt -O4 -I 
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/ -I 
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par 
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o
  -cclib -lgc -cclib  -g -thread
 unix.cmxa threads.cmxa graphics.cmxa -verbose -compact -rectypes -inline
 100 -fno-PIC  -cclib -lunix -cclib -lpthread matmul.ml -o matmul.th
 File matmul.ml, line 25, characters 8-13:
 Warning Y: unused variable count.
 File matmul.ml, line 26, characters 8-16:
 Warning Y: unused variable last_col.
 + as -o matmul.o /tmp/camlasm081590.s
 + as -o /tmp/camlstartupdac3e2.o /tmp/camlstartup8f7152.s
 +
 gcc   -o 'matmul.th' 
 -I'/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml'
  -march=native -mtune=native -O4 '/tmp/camlstartupdac3e2.o' 
 '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/std_exit.o'
  'matmul.o' 
 '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/graphics.a'
  
 '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml/threads/threads.a'
  
 '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/unix.a' 
 '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/stdlib.a'
   '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/' 
 '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par' 
 '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml/threads'
  
 '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml'
  '-lgraphics' '-lX11' '-lthreadsnat' '-lunix' '-lpthread' '-lunix' 
 '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o'
  '-lgc' '-g' '-lunix' '-lpthread' 
 '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a'
  -lm  -ldl
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a(memory.o):
 In function `gc_end_roots':
 memory.c:(.text+0x10): multiple definition of `gc_end_roots'
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:/home/jdh30/src/ocaml/parallel/oc4mc-20090823/runtime/gcs/sc_par/gci.c:948:
 first defined here
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a(memory.o):
 In function `gc_begin_roots':
 memory.c:(.text+0x12): multiple definition of `gc_begin_roots'
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:/home/jdh30/src/ocaml/parallel/oc4mc-20090823/runtime/gcs/sc_par/gci.c:947:
 first defined here
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a(finalise.o):
 In function `caml_final_do_strong_roots':
 finalise.c:(.text+0x0): multiple definition of `caml_final_do_strong_roots'
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:/home/jdh30/src/ocaml/parallel/oc4mc-20090823/runtime/gcs/sc_par/gci.c:301:
 first defined here
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:
 In function `stop_the_world':
 gci.c:(.text+0x38e): undefined reference to `caml_all_threads'
 gci.c:(.text+0x403): undefined reference to `caml_all_threads'
 gci.c:(.text+0x410): undefined reference to `caml_all_threads'
 gci.c:(.text+0x48a): undefined reference to `caml_all_threads'
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:
 In function `resume_the_world':
 gci.c:(.text+0x4c4): undefined reference to `caml_all_threads'
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:gci.c:
 (.text+0x57c): more undefined references to `caml_all_threads' follow
 

Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-23 Thread Jon Harrop
On Thursday 24 September 2009 01:01:58 you wrote:
 Ok... well, I guess that
 - whether it is something about your environment that is too different
 from ours (in which case build.sh is bad),
 - whether you have corrupted your installation (it could be by having
 a bad PATH value that makes original ocamlopt be mixed up with oc4mc
 ocamlopt)

 What I suggest is to use a default PATH (without modifying it for the
 purpose of OC4MC), and do these steps in a clean directory that is not
 included in PATH :

 1) wget oc4mc-2009.tgz
 2) tar xzf oc4mc-2009.tgz
 3) cd oc4mc-2009
 4) wget ocaml 3.10.2 (tar.gz or tar.bz2)
 5) bash build.sh
 6) cd tests
 7) make matmul.th
 8) time ./matmul.th 1000 8

 Sorry it's messy, we are thinking about something cleaner... (there's
 a matter of lack of time somewhere)

No problem. I'll be happy to get anything working!

Following your advice, it seems to work perfectly now:

$ ./matmul.th 500 1
Temp de calcul: utime 2.324145, stime 0.020001, rtime 2.325608
$ ./matmul.th 500 2
Temp de calcul: utime 1.780111, stime 0.00, rtime 0.890797
$ ./matmul.th 500 3
Temp de calcul: utime 1.784111, stime 0.004000, rtime 0.608895
$ ./matmul.th 500 4
Temp de calcul: utime 1.764110, stime 0.004000, rtime 0.451214
$ ./matmul.th 500 5
Temp de calcul: utime 1.768111, stime 0.00, rtime 0.393285
$ ./matmul.th 500 6
Temp de calcul: utime 1.924120, stime 0.004001, rtime 0.333215
$ ./matmul.th 500 7
Temp de calcul: utime 1.788112, stime 0.00, rtime 0.302328
$ ./matmul.th 500 8
Temp de calcul: utime 1.992124, stime 0.00, rtime 0.290383

Wow! 2.6x faster on 2 cores is good. ;-)

That's a really fantastic piece of work. I'll do my best to study it and write 
literature about it. May I ask, can you give a rough overview of the design? 
For example, is there a separate nursery per thread so each thread can 
allocate a certain amount before incurring a global pause? Do you have any 
ideas for libraries built on top of this, such as a task parallel library 
using work-stealing deques?

Thanks very much!!!

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs