Re: [Caml-list] ocamlclean : an OCaml bytecode cleaner (Was: (Announce) OCAPIC : OCaml for PIC18 microcontrollers)

2010-11-11 Thread Philippe Wang
Hello,

ocamlclean removes code to which there is no possible path. For instance, this 
program :
  let plop = List.map succ [1;2;3];;
uses module List (for map) and module Pervasives (for succ) but doesn't use a 
lot of functions of List or Pervasives (e.g., List.iter, List.fold_left, 
Pervasives.print_endline). So most functions of modules Pervasives and List are 
removed from the bytecode executable.

If one dynamically loads some bytecode, for instance the previous program 
becomes
  let plop = List.map succ [1;2;3];;
  let _ = Dynlink.load stuff.cmo;;
then stuff.cmo should not reference anything that may not exist, such as 
Pervasives.(@) since it has been removed by ocamlclean. And we are not supposed 
to know at compile-time what stuff.cmo needs from stdlib. Hence I guess 
everything should be kept and ocamlclean not used.

On the other hand, if we statically know what is in stuff.cmo, then why load it 
dynamically? (I guess the answer can be just for fun but I'm not so sure it's 
such a good answer :-)

Though, I'm not very familiar with Dynlink, and I'm not sure what 
Dynlink.allow_only really does...

I haven't tested using dynlink to load a self-sufficient module. I might work, 
but I don't really see how it can be usefull anyway...

Cheers,
Philippe Wang

On Nov 11, 2010, at 06:52 AM, Julien Signoles wrote:

 Hello,
 Is ocamlclean compatible with dynamic loading? That is code potentially used 
 by some unknown dynamically-loaded code must be kept.
 --
 Julien

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] ocamlclean : an OCaml bytecode cleaner (Was: (Announce) OCAPIC : OCaml for PIC18 microcontrollers)

2010-11-10 Thread Philippe Wang
Dear all,

Shortly:
ocamlclean is now available in a separate package so that you don't have to get 
the whole ocapic distribution just to try ocamlclean.

More information:
ocamlclean takes a bytecode executable (which are generally but not necessarily 
produced by ocamlc compiler) and reduces its size by eliminating some dead 
code. Dead code is discriminated statically. (It's impossible to eliminate all 
dead code, but in some cases it can reduce bytecode executables tremendously) 
It is meant to be compatible with standard bytecode such as produced by ocamlc. 
(DBUG section is currently not supported and is removed during the cleaning 
process. Other unsupported sections are left untouched.)

Web site:
http://www.algo-prog.info/ocaml_for_pic/

Developer: 
Benoît Vaugon



--
Philippe Wang
 http://www-apr.lip6.fr/~pwang/


___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] (Announce) OCAPIC : OCaml for PIC18 microcontrollers

2010-11-06 Thread Philippe Wang

On Nov 6, 2010, at 18:47 GMT+01:00, Goswin von Brederlow wrote:

 Philippe Wang philippe.w...@lip6.fr writes:
 
 Dear all,
 
 this is an announcement for OCAPIC, a project which brings OCaml to
 programming PIC micro-controllers.
 
 Some PIC18 series characteristics:
 - 8 bit architecture
 - low cost (a few US dollars), fairly spread in electronics world
 - very low volatile memory (a few bytes only, up to ~5000 bytes, depending on
 the model)
 - very low non-volatile memory (less than a KB up to 128 KB)
 - EEPROM : 0 to 1024 bytes
 
 Doesn't the overhead of boxed structures as well as loosing a bit on
 ints make that impractical given the extremly limited memory?
 
 MfG
Goswin

Thanks for the question. Let me try to give an (indirect) answer.

OCAPIC has 16-1=15bit integers and 16bit blocks. And the overhead is quite 
acceptable to us.

A gobblet game [1] I.A. was implemented and tested. (The OCaml code is included 
in the distribution so anyone can check it out.)
The first version of this game was very hard to beat (for a human). Then a 
strategy was found (to beat the I.A.).  So some randomization was supplied to 
the I.A. to make it more interesting. Now the I.A. has become really very hard 
to beat.
(We used a PIC18F4620: flash memory = 64kiB; volatile memory = 3968B ; EEPROM = 
1KiB ; speed = 10 MIPS)

Between two moves, the I.A. may trigger the GC about ten times or more. 
However, the time between two moves is less than 2 seconds, and generally quite 
less than half a second (and in the beginning of the game it's hard to realize 
the time it takes).

Providing a GC to programming PIC microcontrollers is a tremendous gain 
comparing to manually manage everything (memory and computing). 
Providing a high-level language allows to implement algorithms that would be 
very hard or impossible to implement in ASM (or most low-level languages such 
as C or Basic).

We haven't yet experimented real-time constrained programming (e.g., ReactiveML 
might bring OCAPIC to a step further).

Now, maybe the direct answer to the question can be :
  programming PICs has been impractical to most people, now all readers of this 
list can potentially program them without much difficulties (and without paying 
a too high cost on performance efficiency).

:-)

[1] http://www.educationallearninggames.com/how-to-play-gobblet-game-rules.asp

Cheers,

--
Philippe Wang
  philippe.w...@lip6.fr
  http://www-apr.lip6.fr/~pwang/

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] (Announce) OCAPIC : OCaml for PIC18 microcontrollers

2010-11-05 Thread Philippe Wang
PIC ASM is the first programming language Benoît learnt, a few years ago. He 
has practiced it ever since.
But meanwhile he learnt OCaml (among other languages). A few months ago, he 
suggested me to implement an OCaml virtual machine running on PICs, with 
maximum performance efficiency in mind. This is why OCAPIC's VM is implemented 
in ASM. 

The purpose is of course to program PICs with a high level language while 
remaining (relatively) *very* efficient.

Vincent St-Amour and Marc Feeley have a similar project (Scheme on PICs) which 
a much higher priority on portability: their VM is implemented in C code.
http://www.ccs.neu.edu/home/stamourv/picobit-ifl.pdf

The side effect of our project — which can interest many OCaml users — is 
that OCAPIC provides ocamlclean, which is a tool that takes an OCaml bytecode 
binary (produced by ocamlc) and reduces it by (statically) eliminating most of 
its deadcode (and of course dynlink is thence broken; note that dynlink is not 
relevant on PICs). This tool is independent from the rest of OCAPIC.
Actually, this tool was mandatory for programs using OO-layer : without it, 
bytecode binaries embedding OO-layer were to big to fit on our PICs. 

Cheers,

Philippe


On Nov 5, 2010, at 1:35 PM, Daniel Bünzli wrote:

 Interesting project. Was the choice of PIC based on technical reasons
 or just familiarity of the authors with these chips ?
 
 I would have liked to give it a try but unfortunately I work AVRs and avr-gcc.
 
 Best,
 
 Daniel

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] (Announce) OCAPIC : OCaml for PIC18 microcontrollers

2010-11-04 Thread Philippe Wang
Dear all,

this is an announcement for OCAPIC, a project which brings OCaml to 
programming PIC micro-controllers.

Some PIC18 series characteristics:
- 8 bit architecture
- low cost (a few US dollars), fairly spread in electronics world
- very low volatile memory (a few bytes only, up to ~5000 bytes, depending on 
the model)
- very low non-volatile memory (less than a KB up to 128 KB)
- EEPROM : 0 to 1024 bytes

How to program those little chips with OCaml:
- write an OCaml program, compile it, transfer it to the PIC.

Well, actually it demands a little more than just that:
- write an OCaml program, like usually, while keeping in mind that the stack is 
more limited than usual, same for the heap
- compile it (with ocamlc)
- reduce the binary (with ocamlclean : a bytecode reducer which removes 
dead-code)
- transform the (reduced or not) binary (with bc2asm : take back not useful 
zeros, thence reducing the binary size)
- transfer it to the PIC along with its OCaml VM.

Indeed, an OCaml VM has been implemented in PIC18 ASM in order to run OCaml 
programs on a PIC ! :-)

An example of real program is in the distribution (open source, downloadable 
from the website):
ocapic-1.3/src/tests/goblet/ (722 lines of ML code).

We also provide a simulator in order to run on a PC (needs X11 (Linux/MacOSX) 
and GCC) your programs written for PIC18.

The whole implementation has been fairly well tested, however the documentation 
is still quite young.

Here is the website : 
http://www.algo-prog.info/ocaml_for_pic/ 

Cheers.

Benoît Vaugon (developer and initiator of OCAPIC project)
Philippe Wang (supervisor)
Emmanuel Chailloux (supervisor)

P.S. si vous êtes francophone et nous contactez directement, merci de le faire 
en français

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] oc4mc status?

2010-07-02 Thread Philippe Wang
Hi, thank you for your interest :)

A few days ago, I updated the web site. It is now minimal (almost all in a 
single page) but shows the project status on the top of the page (and is easier 
to maintain).

Currently, the last entry is
[[ 
  2010 spring-summer (current work, in progress) : (ocaml-3.12-svn) “from 
scratch”, making the runtime library fully reentrant, first without threads 
preoccupation
]]

This means that with the very little man-power we have, we are currently 
concentrated on making the runtime library fully reentrant (while relying on 
the past experience).
This work currently does not address parallel threads, which have become a 
secondary issue.

I may detail the motivations later, if they don't appear evidently...

Cheers,

--
Philippe Wang
 http://www-apr.lip6.fr/~pwang/

On Jul 2, 2010, at 4:32 PM, Eray Ozkural wrote:

 Hi there,
 
 oc4mc looks like a cool project, I had heard it before but I never got to try 
 it, I suppose the latest development release worked with ocaml 3.10.2. I 
 downloaded it and want to give it a shot to see if I can get some speedups 
 with a parallel code I'm working on. So, how is the development going? I read 
 on their page that they are planning a release for this summer based on the 
 new ocaml.
 
 Cheers,
 
 -- 
 Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
 http://groups.yahoo.com/group/ai-philosophy
 http://myspace.com/arizanesil http://myspace.com/malfunct

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Ques from a beginner: how to access a type defined in one .ml file in another .ml file

2010-05-12 Thread Philippe Wang
On Wed, May 12, 2010 at 12:30 PM, Tarun Sethi tarunseth...@gmail.com wrote:
 Hi,

 I m very new to ocaml and I am not sure if this the right forum to ask a
 beginner level question. I have tried reading tutorials and the manual but
 no help. Please help me on the problem below,

 In a.ml a record type t is defined and is also defined transparently
 in a.mli, i.e. in d interface so that the type definition is available
 to all other files.

 a.ml also has a function, func, which returns a list of t.

 Now in another file, b.ml  i m calling func, now obviously ocaml
 compiler wud nt be able to infer d type of objects stored in d list,
 for compiler its just a list. so in b.ml, i hav something like dis,

 let tlist = A.func in
 let vart = List.hd tlist in
 printf %s\n vart.name     (*name is a field in record t*)

 Now here i get a compiler error sayin Unbound record field label
 name which makes sense as compiler can't infer d type of vart.

 my first question: how do I explicitly provide d type of vart as t
 here?
                          i tried doing let vart:A.t =   but got the
 same error.

 I also tried creating another function to fetch the first element of d
 list and mentioning return type as A.t, but then i got the Unbound
 value A.t. I did this:

 let firstt = function
      [] - 0
    | x :: _ - A.t x ;;

 The problem is compiler is unable to recognize A.t (a type) in b.ml
 but is able to recognize function A.func. If I remove A.t from the
 b.ml, i don'get any compiler errors.

 Please help, its urgent work.
 Thanks in advance!

 ~Tarun

I guess this is not the right place to ask such a question... There is
a beginners' list.

However, this should answer your question :

write instead :
variable_name.Module_name.field_name

If variable_name has been defined in yet another module, you may write
YetAnotherModule.variable_name.Module_name.field_name

If you want to avoid module name prefixes, you may want to use open :

open Module_name;;
let foo = variable_namefield_name ;;

However (from my personal point of view) open should be avoided
because it often makes maintenance very tough.


About type constraints, the syntax is rather this :
(variable : type_name)
with parentheses most of the time.

-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Threads Scheduling

2010-04-14 Thread Philippe Wang
On Tue, Apr 13, 2010 at 11:56 PM, Gregory Malecha gmale...@gmail.com wrote:
 Hi Jake,
 The documentation for Condition.wait says:
 wait c m atomically unlocks the mutex m and suspends the calling process on
 the condition variable c. The process will restart after the condition
 variable c has been signalled. The mutex m is locked again before wait
 returns.
 I figured that I needed to lock and unlock the mutex in the child threads
 because otherwise it is possible for the condition variable to be signaled
 before the main thread waits, which I thought means that the signal is
 lost.
 Thanks Daniel, I'll take a look at it.
 On Tue, Apr 13, 2010 at 5:04 PM, Daniel Bünzli daniel.buen...@erratique.ch
 wrote:

 You may also be interested in this thread [1].

 Daniel

 [1]
 http://groups.google.com/group/fa.caml/browse_thread/thread/9606b618dab79fb5



 --
 gregory malecha

Hi,

Your f function *might* prevent preemption...
For instance, if
let f () = while true do () done;;
then it means f does not allocate nor call any external function, and
so it the scheduler is stuck because scheduling is done at allocation
or *some* external functions (which contain blocking sections, e.g.,
I/O operations).
So it is important that when using Thread module, there is, for
scheduling, at some point a call to an allocation or a blocking
operation, or Thread.yield.
As most functional code will allocate, this problem is not so frequent, though.



-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Question about ocaml threads and TLS (on linux)

2010-02-24 Thread Philippe Wang
Hi,

I'm not sure I understand (though I've read the whole text), but maybe
this will answer your question:
On Linux, OCaml threads (with the native compiler ocamlopt) are
implemented with POSIX threads (in C), so when your OCaml thread runs
the C stub, it's the same as if you were running the C stub in some C
thread.
When you are in a section declared as a blocking section, a collection
can be triggered concurrently in another thread and so the heap must
not be accessed neither for reading or writing, that's all.

Using __thread recent feature should also work if you manage to
compile everything correctly. Notably, we use it in some places in
ocaml4multicore (a patch to ocaml's runtime library to allow parallel
threads). However, I don't know how __thread is handled by the
compiler... I mean : it there a pointer for buf in every thread or
only in those that use it?

I hope my answer isn't useless!

Cheers,

-- 
Philippe Wang
   m...@philippewang.info



On Wed, Feb 24, 2010 at 10:00 PM, Goswin von Brederlow
goswin-...@web.de wrote:
 Hi,

 I'm having a little problem for my libfuse-ocaml bindings for the
 threaded interface. For those that don't want to read all of the mail my
 question is:

 Will every ocaml thread have its own thread-local-storage in the C stubs?


 I have the following calling sequence:

 User ocaml code   | Fuse C stub              | libfuse code
 --+--+-
 Fuse.process fs    'process stub'           |
                  | enter_blocking_section() |
                  | char *buf = malloc(size) |
                  | fuse_session_process()   
                  |                           ops-write(buf+off)
                  | 'write stub'             |
                  | leave_blocking_section() |
                  | a = caml_ba_alloc_dims() |
                   caml_callback(...,a,...) |
 my_ref := a                                 |
                  | enter_blocking_section() 
                  |                           callback done
                  | 'process stub'           |
                  | free(buf)                |
                   leave_blocking_section() |
 Fuse.process done |


 The 'process stub' allocates a buffer and frees it at the end, which is
 usualy fine. Except in the case of a write callback where the buffer is
 passed back to ocaml as Bigarray. If the Bigarray is copied, like above,
 then the ocaml code still has a reference to the data at the point the
 'process stub' wants to free it.

 To solve that problem I need the write callback to signal that the
 buffer was passed to ocaml and is now under GC control. The buffer must
 not be free()ed by the 'process stub'. The libfuse API does not provide
 for this so I have to somehow communicate between 'process stub' and
 'write stub' around the libfuse code.


 Possible solution:
 --

 __thread char *buf = NULL;

 value ocaml_fuse_process(...) {
  buf = malloc(size);
  fuse_session_process()
  if (buf != NULL) free(buf);
 }

 void write_callback(...) {
  a = caml_ba_alloc_dims(...);
  buf = NULL;
 }


 This way ocaml_fuse_process will allocate a new buffer whenever it
 doesn't have one and the write_callback will take over the buffer and
 give it to the GC.


 Now my question is: Does that work? Is it safe? Will every ocaml thread
 have its own thread-local-storage buf?

 Currently I'm only interested in supporting Linux. If it is safe there
 that is enough.

 MfG
        Goswin

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] vm in ocaml

2010-02-24 Thread Philippe Wang
On Wed, Feb 3, 2010 at 4:47 PM, Joel Reymont joe...@gmail.com wrote:
 I have a translator from a Pascal-like trading language written in OCaml and 
 I need the output to run as a DLL embedded in a trading platform.

 I'm thinking of generating bytecode and have the user pass the path to the 
 bytecode file to the DLL during initialization.

 I don't want to load source code into my runtime since I want to do a lot of 
 error checking on it to make sure the runtime experience is smooth. I don't 
 want to ship ocamlc, etc. since I want to have a single executable. I'm not 
 sure if embedding OCaml (and thus a license!) is needed to generate OCaml 
 bytecode in my scenario, so the bytecode I'm talking about is my bytecode.

 I understand that a bit of C will be required to wrap the OCaml runtime in a 
 DLL. I would prefer to stay with OCaml for the whole project which prompts my 
 question...

I understand that :
- you want to generate some bytecode (with your own bytecode specs)
from the Pascal-like language
- interprete this bytecode with a VM written in OCaml
but actually I don't quite understand your question :-/

 Has anyone used OCaml to write a virtual machine?

Some people (including some colleagues of mine (and me), actually)
have used OCaml to write an OCaml virtual machine.
(I've heard someone say (indirectly) that we were not the first).
It is an interesting exercise... for people who prefer writing in
OCaml rather than in C.
It's also interesting to run an OCaml VM in an OCaml VM ... in an
OCaml VM, the last one being in OCaml compiled with ocamlopt or in C
(or in Java [Cadmium] or in JavaScript [O'Browser], though we haven't
tried), and all the previous one being in OCaml compiled with ocamlc.

 How big is the OCaml runtime when bundled as a DLL or shared library?

Sorry I've no idea for this question.

-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Re: value restriction

2010-01-02 Thread Philippe Wang
On Sat, Jan 2, 2010 at 5:46 PM, Andrej Bauer andrej.ba...@andrej.com wrote:
 on another note (but staying very much on the same topic), why won't the
 following generalize:
 # let foo =
     let counter  = ref 0 in
     let bar = !counter  in
     let baz = fun x - bar
     in
       baz

 val foo : '_a - int = fun

 It's even worse:

        Objective Caml version 3.11.1

 # let _ = ref () in fun x - x ;;
 - : '_a - '_a = fun

 I am sure this makes sense in France. Happy new year!

 Andrej

The idea is to prevent potentially wrong programs.
It is bad to write (let x = ref [ ] in x := [hello] ; x := [2]).
So the algorithm — that prevents the generalization process of
expressions such as (ref [ ]) — prevents the generalization of all
application expressions. (actually, almost all because I think there
are a few exceptions such as # let f = let x = ref [] in !x ;; val f :
'a list = []).

Making a perfect algorithm that generalizes only and always when
permitted is very hard (maybe it's impossible because not decidable?).

This example shows a program that is rejected because its type is not
computable in Caml's type system :
(fun x - x x) (fun x - x) (fun x - x)
It could be a valid program (i.e. it wouldn't lead to a type crash at
runtime), but it is rejected because the type system is not capable of
asserting its correctness.

(I am not certain I am not off topic)

Cheers,

-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] How to write a CUDA kernel in ocaml?

2009-12-17 Thread Philippe Wang
On Thu, Dec 17, 2009 at 7:45 AM, Eray Ozkural examach...@gmail.com wrote:
 What I want to do is to run the ocaml bytecode interpreter on each core, and
 then feed the relevant bytecode to those. It can be done, I suppose? Or am I
 missing something crucial? :) The runtime library would have to be ported to
 OpenCL/CUDA, as well, isn't that possible?

I don't see why it wouldn't be possible. After all, there are Java,
JavaScript and OCaml implementations of that VM, so it could probably
be implemented with any normal programming language (exclude those
that are not Turing complete and exclude those such as brainfuck or
sed) ! But I don't quite see how it could help gaining performance, at
least not yet.

Anyway, I'm looking forward to seeing a new esoteric implementation of
that nice VM ! :-)

 PS: Sorry for having mailed this to you personally, I intended to post
 it to the
 mailing list.

no problem ;-)

-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] How to write a CUDA kernel in ocaml?

2009-12-16 Thread Philippe Wang
On Wed, Dec 16, 2009 at 2:47 PM, Eray Ozkural examach...@gmail.com wrote:

 One trivial and low-performance solution that comes to mind is: make
 an ocaml bytecode interpreter into a CUDA kernel and then pass the
 bytecode to it, and then voila, at least we have some 512-way
 parallelism on the GT300. How does that sound? We'd be losing some
 performance but massive parallelism will cover up for some of that.


With parallel processors, you move very quickly the performance
bottleneck from processor(s) to memory bandwidth, such that
- it's hell to program because you have to manage concurrency and it
has a real cost
- it's useful for very specific programs that have very few memory
access compared to processor computations (such as some compression
algorithms, a more specific and very easy to write example is matrix
multiplications).

Imagine you have 3000MHz for memory bandwidth, which is extremely good
today (I think). And imagine you have 100 processors that share this
memory bandwidth. If they all want to access memory at the same time,
even if you forget the concurrency management cost, you have
3000/100MHz/processor=30MHz/processor, which is very very very low. So
think about 10 processors instead of 100 to be more realistic, it's
still 300MHz/processor, which looks like what we had about a decade
ago...

(IMHO) A not-too-too-bad-but-still-realistic way to take benefit of
GPUs today, with OCaml (or any high-level language), is to write
computation functions in C (possibly with some assembly), and to write
composition functions in OCaml. Or (less realistic in a short amount
of time) maybe to write a compiler that may do the job for you, but
it's not quite easy...

Good luck,

-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] threads, signals, and timeout

2009-10-26 Thread Philippe Wang
Considering that posix signals are not real-time *anyway*, using them
to programme specific treatments per-thread is hmmm... say a nightmare
! Plus I don't quite see how you could eventually have a non-broken
implementation. Gerd Stolpmann emphasized it if I understood well.

One solution would be to use state variables to check every once in a while.

Or maybe to use fairthreads instead, but I guess that the problem is
actually much more complicated than just that.

Well, I thought I had more interesting things to say. I was wrong,
then just my two cents.
Anyways, good luck!

Cheers,

Philippe Wang


On Mon, Oct 26, 2009 at 7:08 PM, yoann padioleau pad.a...@gmail.com wrote:
 Hi,

 I would like to create different threads where each thread do some
 computation and are subject to different
 timeout. Without threads I usually use Unix.alarm with a SIGALARM
 handler that just raise a Timeout exception
 and everything works fine, but when I try to do something similar with
 threads it does not work
 because apparently the Unix.alarm done in one thread override the
 Unix.alarm done in another
 thread. I had a look at thread.mli but was not able to find anything
 related to timeout.
 Is there a way to have multiple timeout and multiple threads at the same time 
 ?

 Here is a program that unforunately get the first timeout, but not the second 
 :(


 (*
 ocamlc -g -thread unix.cma threads.cma signals_and_threads.ml
 *)

 exception Timeout

 let mytid () =
  let t = Thread.self () in
  let i = Thread.id t in
  i

 let set_timeout () =

  Sys.set_signal Sys.sigalrm
(Sys.Signal_handle (fun _ -
  prerr_endline Time is up!;
  print_string (Printf.sprintf id: %d\n (mytid()));
  raise Timeout
));

  ignore(Unix.alarm 1);
  ()


 let main =
  let t1 =
Thread.create (fun () -
  set_timeout ();
  print_string (Printf.sprintf t1 id: %d\n (mytid()));

  let xs = [1;2;3] in
  while(true) do
let _ = List.map (fun x - x + 1) xs in
()
  done;
  ()
) ()
  in

  let t2 =
Thread.create (fun () -
  set_timeout ();
  print_string (Printf.sprintf t2 id: %d\n (mytid()));

  let xs = [1;2;3] in
  while(true) do
let _ = List.map (fun x - x + 1) xs in
()
  done;
  ()
) ()
  in
  Thread.join t1;
  Thread.join t2;
  ()

 --




 Here is the output
 Time is up!
 t2 id: 2
 t1 id: 1
 id: 1
 Thread 1 killed on uncaught exception Signals_and_threads.Timeout
  the program loops, meaning the second thread never received its timeout


___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-25 Thread Philippe Wang


On Sep 25, 2009, at 6:07 AM, Jacques Garrigue wrote:


First, like everybody else, I'd like very much to try this out.
Is there any chance it could compile on Snow Leopard :-)
(I suppose it's near impossible, but still ask...)


I haven't tried that yet, mostly because I guess that it wouldn't work  
out-of-the-box.
However, the .asm file should be ok with OS X and what may clash are  
configure file behavior and C macros.

I should take a closer look at that, since SL now seems to work well.

Cheers,


--
Philippe Wang
  philippe.w...@lip6.fr
  http://www-apr.lip6.fr/~pwang/

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-25 Thread Philippe Wang
On Fri, Sep 25, 2009 at 1:28 AM, Jon Harrop j...@ffconsultancy.com wrote:
 On Thursday 24 September 2009 15:38:06 Philippe Wang wrote:
 Very few programs that are not written with multicore in mind would
 not be penalized.
 I mean our GC is much much dumber than INRIA OCaml's one.
 Our goal was to show it was possible to have good performance with
 multicores for OCaml.
 Maybe someday we'll find some time to optimize the GC, but it's likely
 not very soon.

 Just to quantify this with a data point: the fastest (serial) version of my
 ray tracer benchmark is 10x slower with the new GC. However, this is
 anomalous with respect to complexity and the relative performance is much
 better for simpler renderings. For example, the new GC is only 1.7x slower
 with n=6 instead of n=9.

I just put a version with a bug fix on some structures allocation (20090925).
I hope it removes this anomaly.

-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Philippe Wang
On Thu, Sep 24, 2009 at 3:47 AM, Jon Harrop j...@ffconsultancy.com wrote:
 Following your advice, it seems to work perfectly now:

:-)

 Wow! 2.6x faster on 2 cores is good. ;-)

your machine is more generous than ours (which is Intel, not AMD) :-)

 That's a really fantastic piece of work. I'll do my best to study it and write
 literature about it. May I ask, can you give a rough overview of the design?
 For example, is there a separate nursery per thread so each thread can
 allocate a certain amount before incurring a global pause? Do you have any
 ideas for libraries built on top of this, such as a task parallel library
 using work-stealing deques?

A few words on the GC's design (that uses stopcopy algorithm several times) :

Heaps :
- a set of pages are used to give threads the possibility to allocate
memory without interfering with other threads, such as there is no
mutex locking at local memory allocation. Each thread borns with an
empty page, when it's full, the thread takes another one.
- a big heap is shared between all, there is a mutex over it to
prevent parallel memory allocation into this one.

Collection :
- when there are no pages left, a collection stops-the-world and
copies living values (of the pages) to the shared heap
- when the shared heap is full, a collection stops-the-world and
copies all living values (pages+shared heap) to a new shared heap
(which can be grow if need be)

Special operations :
- if there is a blocking operation (e.g. mutex lock or I/O operation),
the mechanism is roughly the same as original INRIA OCaml's : it tells
the GC that there is no need to stop it when stopping the world.
- if there is a thread with no allocation and no blocking operation,
the behaviur is the same as INRIA OCaml.


The number of pages, the size of a page, and the size of the shared
heap can be changed before running a program by setting some
environment variables (cf. last lines README file included in the
distribution package).



-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Philippe Wang
On Thu, Sep 24, 2009 at 3:40 PM, Rakotomandimby Mihamina
miham...@gulfsat.mg wrote:
 09/24/2009 03:39 PM, Stefano Zacchiroli:

 So, the real question is: is OC4MC going to be ported to mainline OCaml
 and support in the future or not?

 I dont write so much programs that would really require multiple cores.
 But I think this is such a good feature that should be inclided in
 the main distribution...

Thing is that having a runtime library that supports parallel threads
costs more than having a runtime library that doesn't.

Programs that take advantage of multicore architectures are not easy
to write, not easy to maintain, not easy to debug, ...
So it's a great feature, so it should get into mainstream is not a
good enough reason for INRIA's team. It's probably up to the community
to find a great way of taking advantage of multicore architectures.

One must be aware that
- parallel threads vs not-parellel threads : if a program is well
suited to parallel computing on multicore CPUs, then it means that
not-parallel-capable runtime library puts the performance bottleneck
at the CPU. Then, allowing parallel threads means *moving* this
bottleneck (moving, not removing) : indeed, it's much likely that the
bottleneck will then be at memory (RAM) bandwidth. See, if your memory
is 1000 MHz, having 8 cores means 125MHz/core, which becomes
ridiculous even if it were 2400MHz it would mean only 300MHz/core,
imaging a 300MHz memory bandwidth for a 3GHz core !  So it's *very*
important to keep that in mind.
- for programming langages that are from the early beginning quite
slower than INRIA OCaml, it's much easier to gain performance because
they come from far, sometimes from very very far.

Well, from a quite subjective personal point of view, of course it
would be really great to give parallel threads capability to
mainstream INRIA OCaml, because it would mean having found a (great)
acceptable solution.

-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Philippe Wang
On Thu, Sep 24, 2009 at 3:11 PM, Jon Harrop j...@ffconsultancy.com wrote:
 Are values such as float arrays copied in their entirety or are they allocated
 outside the shared heap and only a pointer to them is copied?

They should be in a heap (page or shared). We don't allocate many
things outside the heaps.

 Is the copy operation parallelized?

Nope. When the world is stopped for the collection, everything is done
sequentially until the world is resumed.
I don't think it's relevant to parallelize the copy operation (hell to
implementdebug, then I don't think that performance would be very
interesting because we would probably need a write mutex on the
destination heap)

 Is there a write barrier but no read barrier? If so, what exactly does the
 write barrier do?

There is a lock when a thread is created because we need to update the
list of existing threads and we have to give it a page.
Then, each time a thread wants memory, it checks if the world needs to
be stopped. If the world needs to be stopped, it means that there is a
necessary collection waiting for the world to be stopped.
There is lock if a thread needs to allocate memory in the shared heap
so that two threads don't end up using the same space for different
things.
If two threads want to write in the same block, it's up to the
programmer to prevent (or allow) such a thing with a mutex (or
whatever other mechanism).

 Special operations :
 - if there is a blocking operation (e.g. mutex lock or I/O operation),
 the mechanism is roughly the same as original INRIA OCaml's : it tells
 the GC that there is no need to stop it when stopping the world.

 Can users mark external calls in their bindings as blocking so the GC will
 treat them appropriately?

Yes, it's the same as INRIA OCaml : enter_blocking_operation /
leave_blocking_operation functions.
It's mandatory that in the section between entrance and exit, the
thread is not accessing anything allocated in a Caml heap.
If there is need to write some value returned by the blocking
operation, it should be written in a C side value (on C stack or with
C malloc) and put back to Caml heap after exit (and then C free if C
malloced).


-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Philippe Wang
I've seen a question about 3.11 and I think I didn't answer, so I'm
answering here :

We have tried to make OC4MC work with OCaml 3.11 (I don't remember the
subsubversion number). Currently, it does not work properly (it's
still too easy to write a program that crashes or deadlocks).

Cheers,

Philippe Wang

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Philippe Wang



On Sep 24, 2009, at 18:02 GMT+02:00, Pascal Cuoq wrote:


On Sep 24, 2009, at 5:47 PM, Philippe Wang wrote:


Is the copy operation parallelized?


Nope. When the world is stopped for the collection, everything is  
done

sequentially until the world is resumed.
I don't think it's relevant to parallelize the copy operation (hell  
to

implementdebug, then I don't think that performance would be very
interesting because we would probably need a write mutex on the
destination heap)


Well, you could start copying to the bottom of the next heap with
one thread going up and to the top of it with another going down.
Assume optimistically that the two threads will not reach the same
cacheline at the end of the copies, and you don't need any
synchronisation at all between them, except joining at the end.

After checking, if they have reached the same cacheline,
you need to reallocate the destination heap anyway.

You still get a single unfragmented free block as a result.

Even better: stop the world just before there remains less that one
cacheline of free space and you don't need to check if the two  
threads have
met. You still need to reallocate the destination heap sometimes  
though.


A concurrent copy means that there would be bad overhead for single  
core. It also means putting bottleneck to memory bandwidth as memory  
copy operations are clearly quickly limited by this bandwidth, not by  
CPU. It may hopefully become false in a few years, but hardware  
manufacturers don't seem to be excited by that, they seem to prefer  
making the marketing on the number of cores. Look at GPUs : they have  
very fast graphical RAM, but they have a huge number of processing  
units. I don't really see the point in that (i.e. having a huge number  
of PU) anyway (except marketing).


Ok, back to GC stuff. A stopcopy algorithm needs to have a set of  
roots to make the copy of living values.
Each thread has its stack, so it has its subset of roots. Then what ?  
Parallelize the copy from each thread ? Ok we have to determine the  
best number of threads according to number of cores but more  
importantly according to memory bandwidth given per core. (what a  
nightmare!)
Then there are shared values (in the shared heap for instance, but  
what if there are lateral pointers due to mutable values?). (We are  
leaving the nightmare for hell! but some people have been there.)  
Copying a living value means that if later you encounter something  
pointing to its old address, you have to know the new one. This means  
writing at the old address. I don't see how we can make *today*  
something very interesting in concurrent with a stopcopy algorithm. I  
believe (but I'm *not* a GC expert at all) concurrent GCs are not  
based on stopcopy algorithm but rather some mark{do-some-stuff-such- 
as-sweep}.




Oh, and I meant to say, but everyone else was faster than me:
well done!


Thank you, and thanks everyone else who appreciate this work. :-)

Philippe Wang

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Philippe Wang



On Sep 24, 2009, at 18:49 GMT+02:00, Richard Jones wrote:


On Thu, Sep 24, 2009 at 02:09:56PM +0100, Jon Harrop wrote:
Fair enough. I think this is the single most important development  
OCaml has
seen since its inception so I would personally drop OCaml in favor  
of oc4mc

even if it meant reverting to 3.10.2.


I think 'personally' is the key word there.  You forget that people
are quite happily programming in very slow languages like Perl,
Python, Ruby and Visual Basic, and those people vastly outnumber the
ones using F#, Haskell, OCaml, SML etc.  (They don't even have static
safety, dammit!).


Should we tell them that using CPU for nothing (side-effect for using  
a slow language) has a bad effect on global warming? Could it be a  
wake-up call? :-p


half-kidding,

Philippe Wang

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-24 Thread Philippe Wang


On Sep 25, 2009, at 1:28 AM, Jon Harrop wrote:


On Thursday 24 September 2009 15:38:06 Philippe Wang wrote:

Very few programs that are not written with multicore in mind would
not be penalized.
I mean our GC is much much dumber than INRIA OCaml's one.
Our goal was to show it was possible to have good performance with
multicores for OCaml.
Maybe someday we'll find some time to optimize the GC, but it's  
likely

not very soon.


Just to quantify this with a data point: the fastest (serial)  
version of my

ray tracer benchmark is 10x slower with the new GC. However, this is
anomalous with respect to complexity and the relative performance is  
much
better for simpler renderings. For example, the new GC is only 1.7x  
slower

with n=6 instead of n=9.



Can you tell what data structures (and their sizes if possible) you  
are using?

Thanks for your feedbacks.

--
Philippe Wang
  philippe.w...@lip6.fr
  http://www-apr.lip6.fr/~pwang/

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-23 Thread Philippe Wang
I've updated the download page, it should be more robust to multiple
downloads now.

Cheers,

Philippe Wang

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-23 Thread Philippe Wang
make program.nc uses original ocamlopt

make program.th uses the newly built ocamlopt with the necessary
options (lib links)

then you can compare program.nc and program.th

On Thu, Sep 24, 2009 at 2:21 AM, Jon Harrop j...@ffconsultancy.com wrote:
 On Wednesday 23 September 2009 11:53:09 Goswin von Brederlow wrote:
 Has anyone tested this yet? Any success stories?

 Well, I've used the build.sh script to build a patched OCaml 3.10.2 that
 identifies itself as:

 $ ocamlopt -v
 The Objective Caml native-code compiler, version
 3.10.2+patch-ocaml4multicore-20090823
 Standard library
 directory: 
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml

 and I've built their tests:

 $ cd tests
 $ make matmul.nc
 ocamlopt -o matmul.nc -thread unix.cmxa threads.cmxa
 graphics.cmxa matmul.ml
 File matmul.ml, line 25, characters 8-13:
 Warning Y: unused variable count.
 File matmul.ml, line 26, characters 8-16:
 Warning Y: unused variable last_col.

 and run them:

 $ time ./matmul.nc 1000 8
 Temp de calcul: utime 38.930433, stime 0.012000, rtime 38.943138
 Fatal error: exception Invalid_argument(index out of bounds)

 real0m38.974s
 user0m38.942s
 sys 0m0.028s

 Note the exception that (I think) should have been caught and handled
 silently.

 But I cannot get anything to run in parallel. None of the tests use more than
 one core and my own busy-wait-loops-on-two-threads test also runs only on one
 core. Any idea what I'm doing wrong? Is there a flag to enable it or
 something?

 One possible cause: I'm running in a 64-bit chroot.

 --
 Dr Jon Harrop, Flying Frog Consultancy Ltd.
 http://www.ffconsultancy.com/?e

 ___
 Caml-list mailing list. Subscription management:
 http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
 Archives: http://caml.inria.fr
 Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
 Bug reports: http://caml.inria.fr/bin/caml-bugs




-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-23 Thread Philippe Wang
Ok... well, I guess that
- whether it is something about your environment that is too different
from ours (in which case build.sh is bad),
- whether you have corrupted your installation (it could be by having
a bad PATH value that makes original ocamlopt be mixed up with oc4mc
ocamlopt)


What I suggest is to use a default PATH (without modifying it for the
purpose of OC4MC), and do these steps in a clean directory that is not
included in PATH :

1) wget oc4mc-2009.tgz
2) tar xzf oc4mc-2009.tgz
3) cd oc4mc-2009
4) wget ocaml 3.10.2 (tar.gz or tar.bz2)
5) bash build.sh
   ... wait
6) cd test
7) make matmul.th
8) time matmul.th 1000 8

Sorry it's messy, we are thinking about something cleaner... (there's
a matter of lack of time somewhere)

cheers,

-- 
Philippe Wang
   m...@philippewang.info


On Thu, Sep 24, 2009 at 2:05 AM, Jon Harrop j...@ffconsultancy.com wrote:
 On Thursday 24 September 2009 00:15:14 you wrote:
 make program.nc uses original ocamlopt

 make program.th uses the newly built ocamlopt with the necessary
 options (lib links)

 then you can compare program.nc and program.th

 Aha! Progress, but now I get errors:

 $ make matmul.th
 ../out/bin/ocamlopt -ccopt -march=native -ccopt -mtune=native -ccopt -O4 -I 
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/ -I 
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par 
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o
  -cclib -lgc -cclib  -g -thread
 unix.cmxa threads.cmxa graphics.cmxa -verbose -compact -rectypes -inline
 100 -fno-PIC  -cclib -lunix -cclib -lpthread matmul.ml -o matmul.th
 File matmul.ml, line 25, characters 8-13:
 Warning Y: unused variable count.
 File matmul.ml, line 26, characters 8-16:
 Warning Y: unused variable last_col.
 + as -o matmul.o /tmp/camlasm081590.s
 + as -o /tmp/camlstartupdac3e2.o /tmp/camlstartup8f7152.s
 +
 gcc   -o 'matmul.th' 
 -I'/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml'
  -march=native -mtune=native -O4 '/tmp/camlstartupdac3e2.o' 
 '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/std_exit.o'
  'matmul.o' 
 '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/graphics.a'
  
 '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml/threads/threads.a'
  
 '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/unix.a' 
 '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/stdlib.a'
   '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/' 
 '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par' 
 '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml/threads'
  
 '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml'
  '-lgraphics' '-lX11' '-lthreadsnat' '-lunix' '-lpthread' '-lunix' 
 '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o'
  '-lgc' '-g' '-lunix' '-lpthread' 
 '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a'
  -lm  -ldl
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a(memory.o):
 In function `gc_end_roots':
 memory.c:(.text+0x10): multiple definition of `gc_end_roots'
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:/home/jdh30/src/ocaml/parallel/oc4mc-20090823/runtime/gcs/sc_par/gci.c:948:
 first defined here
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a(memory.o):
 In function `gc_begin_roots':
 memory.c:(.text+0x12): multiple definition of `gc_begin_roots'
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:/home/jdh30/src/ocaml/parallel/oc4mc-20090823/runtime/gcs/sc_par/gci.c:947:
 first defined here
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a(finalise.o):
 In function `caml_final_do_strong_roots':
 finalise.c:(.text+0x0): multiple definition of `caml_final_do_strong_roots'
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:/home/jdh30/src/ocaml/parallel/oc4mc-20090823/runtime/gcs/sc_par/gci.c:301:
 first defined here
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:
 In function `stop_the_world':
 gci.c:(.text+0x38e): undefined reference to `caml_all_threads'
 gci.c:(.text+0x403): undefined reference to `caml_all_threads'
 gci.c:(.text+0x410): undefined reference to `caml_all_threads'
 gci.c:(.text+0x48a): undefined reference to `caml_all_threads'
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:
 In function `resume_the_world':
 gci.c:(.text+0x4c4): undefined reference to `caml_all_threads'
 /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:gci.c:
 (.text+0x57c): more undefined references to `caml_all_threads' follow
 /home/jdh30/src/ocaml

[Caml-list] OC4MC : OCaml for Multicore architectures

2009-09-22 Thread Philippe Wang
This is some additional noise about OCaml for Multicore  
architectures (or Ok with parallel threads GC).



Dear list,

We have implemented an alternative runtime library for OCaml, one that  
allows threads to compute in parallel on different cores of now  
widespread CPUs.


This project will be presented at IFL 2009 (http://blogs.shu.edu/projects/IFL2009/ 
).


A testing version available online at
http://www.algo-prog.info/ocmc/
It works with OCaml 3.10.2 for Linux x86-64bit, we haven't met any  
bugs with the latest build (it doesn't *unexpectedly* crash, not yet).


Hope you'll enjoy,

--
Mathias Bourgoin, Adrien Jonquet, Emmanuel Chailloux, Benjamin Canou,  
Philippe Wang


___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] threads

2009-09-08 Thread Philippe Wang
Hi,

  let x = Array.make 100 []
  let update i n = x.(i) - n :: x.(i)
  let read i = x.(i)

I don't think you can obtain funny results when you don't put a mutex
on these two specific update and read.
What is sure is that update function is not atomic because you have
a value allocation at the right of - (with :: operator), and this
may trigger garbage collection and/or make the scheduler change the
running thread.

What you can be sure with the current official OCaml distribution is
that there won't be at the exact same time both an (-)operation and a
(.())operation.
But it is actually possible, for instance, for a thread to compute
while another one is simultaneously writing on a socket. So it is
generally not a good idea to count on some operation atomicity to put
or not a mutex lock (well it's good to write some hard-to-debug
code)...

Cheers,

Philippe Wang


On Tue, Sep 8, 2009 at 7:33 PM, ygrek ygrekhere...@gmail.com wrote:
 Hello,

  let x = Array.make 100 []
  let update i n = x.(i) - n :: x.(i)
  let read i = x.(i)

  Consider the following scenario: one thread is `update`ing x, another
 thread(s) uses only `read`. Is it safe to use these functions without
 locking on mutex?

  I.e. is Array.set atomic? What about updating references (:=) ?

  If I understand correctly these operations require only one cpu
 instruction to update one machine word and so should be atomic. Taking
 into account single-cpu affinity of ocaml program it should be safe
 to write such multithreaded code. Is it true?

  Is it safe to assume that ocamlopt won't skip reads/writes to globally
 visible memory address using cached value in a register?

 --
  ygrek
  http://ygrek.org.ua

 ___
 Caml-list mailing list. Subscription management:
 http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
 Archives: http://caml.inria.fr
 Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
 Bug reports: http://caml.inria.fr/bin/caml-bugs





-- 
Philippe Wang
   m...@philippewang.info

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] ok with parallel threads GC (aka ocaml for multicore)

2009-04-23 Thread Philippe Wang

On Sat, Apr 18, 2009 at 1:05 AM, fo...@x9c.fr fo...@x9c.fr wrote:
I was indeed mostly worried with the runtime itself. I wanted to  
have a fully reentrant runtime for the OCaml-Java
project (to be able to execute several programs in the very same  
JVM) and remember that it implied primitives
from compare.c, hash.c, intern.c and extern.c among  
others to be written differently for this purpose.


Indeed. We have made them reentrant but we haven't made much stress  
testing on that.
Reentrance on those are not free (they have a cost), and the way we  
chose is the simplest or quickest way.


Out of curiosity: you state that your GC is of stop-the-world  
nature, what about finalizers ?
Are they executed by the GC thread when the world is stopped or  
concurrently with

application threads ?
Not sure this question really matters, just curious (I mean, it  
is doubtful that one would write finalizers with

a long execution time).


Finalizers are supposed to be called by the thread that does the  
garbage collection, so there is no concurrency with finalizers as the  
rest of the world is meant to be stopped when garbage collecting.
(Our garbage collector does not try to be as smart as the original one  
on many many things)


By the way, we are late on writing the documentation for our future  
release...

but we have just implemented a (simple) experimental growing heap.

Here is a quote from wikipedia (http://en.wikipedia.org/wiki/Speedup):

Sometimes a speedup of more than N when using N processors is observed  
in parallel computing, which is called super linear speedup. Super  
linear speedup rarely happens and often confuses beginners, who  
believe the theoretical maximum speedup should be N when N processors  
are used.


Well, we *have* observed that on a matrix multiplication :-)

--
Philippe Wang
 philippe.w...@lip6.fr
 http://www-apr.lip6.fr/~pwang/

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] ok with parallel threads GC (aka ocaml for multicore)

2009-04-17 Thread Philippe Wang


On Apr 16, 2009, at 11:45 CEDT, Philippe Wang wrote:


A negative answer would imply that you patched the OCaml
runtime to make it reentrant. To illustrate my point, I will take
the example of the file byterun/compare.c. In this file, the
code for the comparison of values makes use of a global
variable (named caml_compare_unordered).


It should be, unless bugs are still hiding under that.
It is supposed to have been done for a while. We'll make one more  
check.



If you patched the runtime to allow multiple threads to use
it concurrently, I would also be interested by the strategy
used: is the problem solved by additional parameters ?
by the use of thread-local storage ? by any other mean ?


- local variable instead of global variable in functions
- some functions are parameterized by thread identifier (that is,  
one more parameter than before) (e.g. in amd64.S)


Well, we went back into runtime code implementation.

This is what can be said rapidly :
- compare.c contains no global variables anymore, we use local  
variables instead
- if a Caml-C or C-Caml interface uses caml_compare_unordered, we  
don't know what can happen with parallel threads

- we have many global mutex locks with small scopes
- we do use an enter/leave blocking section mechanism to prevent the  
GC from waiting on a blocking operation such as I/O or mutex locking  
etc.
- we don't support weak values (not sure whether they don't work or  
they became strong, if they dont work anymore, they can be back in 2  
minutes as strong values anyway)

- serialisation of values is a little bit tricky, though it should work
- most important : many global variables do not exist anymore because  
they are irrelevant in our implementation
- we do not support unofficial-features of ocaml 3.10, e.g. the new  
features that come with 3.11 but actually have their roots in previous  
versions
~ it is almost sad to see all the based-on-one-thread-at-a-time  
optimisations removed...

+ (it looks like it works just fine)


I hope there are no hidden bad global variables.

Is it fully reentrant ? H... maybe.

We use a stop-the-world GC (which means no one is running is parallel  
with the GC), that is actually like original ocaml, that comes with  
its inconveniences : C calls not declared as blocking sections (which  
has quite a cost) may prevent other threads from running when the heap  
is full.


Graphics module, for instance, is not reentrant at all (anyhow it's  
not part of the runtime). Same for Str.
Funny thing is we can open several windows by launching parallel  
threads (though only one is useful at the end).



Anyway, thank you for your questions and interest, they have helped us  
findfix some bugs.



--
Philippe Wang
  http://www-apr.lip6.fr/~pwang/


PS.  We tried to switch to 3.11, but it seems to need too much time,  
it's far from being a piece of cake.
We have tried to make it work on Leopard (actually, I failed the 1st  
time - half the way, I may try again if I have time).


=
A free very personal advice that may save you some headaches:
do not program in concurrent shared memory style, especially when you  
can replace concurrent by parallel.
Even if it may have a future, even if it may sound great, even if it  
sounds exciting, even if it helps you go faster, even if put-here- 
whatever-you-want,
it is **awful**. Well, if you really really don't have a choice, never  
mind what I said.


___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] ok with parallel threads GC (aka ocaml for multicore)

2009-04-16 Thread Philippe Wang

Le 14 avr. 09 à 12:21, Philippe Wang a écrit :

The garbage collector is clearly separated from the rest of the  
runtime library: the GC is contained in a libgc.a and our patched  
ocamlopt links programs to this GC. The GC variables are known by  
the GC only.


Well, this is not what I had in mind, but I realize that my question
was not clear. A better question would have been:
   Is your implementation still based on a global runtime lock ?


No, it isn't. (And it would probably too often prevent parallel  
threads, wouldn't it.)



A negative answer would imply that you patched the OCaml
runtime to make it reentrant. To illustrate my point, I will take
the example of the file byterun/compare.c. In this file, the
code for the comparison of values makes use of a global
variable (named caml_compare_unordered).


It should be, unless bugs are still hiding under that.
It is supposed to have been done for a while. We'll make one more check.


If you patched the runtime to allow multiple threads to use
it concurrently, I would also be interested by the strategy
used: is the problem solved by additional parameters ?
by the use of thread-local storage ? by any other mean ?


- local variable instead of global variable in functions
- some functions are parameterized by thread identifier (that is, one  
more parameter than before) (e.g. in amd64.S)


Philippe Wang
___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] ok with parallel threads GC (aka ocaml for multicore)

2009-04-14 Thread Philippe Wang

On Apr 10, 2009, at 20:36 CEDT, fo...@x9c.fr wrote:


Would it be correct to assume that the current state of the project
implies that you have patched the OCaml runtime to make it
fully reentrant ?


Is this following partial answer relevant enough ?
The garbage collector is clearly separated from the rest of the  
runtime library: the GC is contained in a libgc.a and our patched  
ocamlopt links programs to this GC. The GC variables are known by the  
GC only.



If so, is this code/patch available for download ?


Officially, not yet (and not before April 20th).

We did not expect the debugging part to be so complex and hard, and  
taking so long.
The man power dramatically decreased in late September : the 2  
Master's students went back to Master's courses, and the 3 supervisors  
had to do research in parallel with teaching.
Some major bug fixes were made in February/March, a lot of major bug  
fixes were made in April (yes, these last 2 weeks).
You know, bugs hiding other bugs... however we are hopefully getting  
close to the fix point: today there is no known bug ! :-)
Unsupported features are - of course - not considered as bugs. For  
instance, posix signals are (currently) not supported. And, as  
parallel computing *potentially* requires quite a lot more memory,  
some programs can easily end up in a blocked state when the heap  
becomes full: our GC (currently) uses (parameterized) fixed size pages  
and heap.


The next days, we will concentrate on making benchmarks (if you have  
some relevant testing programs, they are welcome), and if we don't  
discover new bugs then we will focus on (finishing) writing a  
documentation and a building script, for the release.
If we release as such now, we will have too much support to do because  
of the lack of documentation. So it's not quite a good idea...
When we have the minimal-but-sufficient documentation, we will make  
the release :-)


In parallel, we try to make it work with OS X Leopard 64 bit and/or  
ocaml 3.11 (currently we only support 3.10.2 - Linux x86_64).



Anyway, wholehearted  respect for undertaking such a complex project.
Good luck in your bug-chasing tasks !


Thanks.

--
Philippe Wang
   Philippe.Wang \at/ lip6.fr

N.B. I hope we will not discover new bugs in our amd64.S, our assembly  
guru is enjoying (abroad with no www) his vacances de pâques...

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] ok with parallel threads GC (aka ocaml for multicore)

2009-04-10 Thread Philippe Wang

Hello list,

Mathias and Adrien have just started their internship (for their  
Master's degree requirements).
Thus they have some time to spend on this project. Moreover, Mathias'  
internship is strongly related to this project.

= man power dramatically increased

We are currently searching for the last remaining bugs.

Our thread library is restricted, it contains:
Thread : create, join, yield, id, self, delay
Mutex : full module
Condition : full module


Our alternative garbage collector
 - uses a Stop(the world)Copy algorithm
 - has memory pages for threads (each thread takes a page at its  
creation)
 - has a shared heap for shared values and for old generation from  
pages (i.e. memory pages are flushed to this heap)

 - should be not to hard to replace.
Blocking sections such as I/O operations or mutex locks do not prevent  
garbage collection.


We currently do *not* support POSIX signals (let's say their behaviour  
is not specified).


We should make a release soon, but before:
 - some code has to be cleaned
 - some benchmarks have to be done
 - some documentation has to be completed
 - an installation script still has to be written.
Thus not a lot is left to do before the release :-)

We are writing test programs to search for the last remaining bugs but  
also to measure performances.


So far, as long as there are not too many concurrent memory accesses,  
it is not too hard to go n times faster with a n-core CPU;
though intense memory accesses generate page faults and divide memory  
bandwidth by the number of concurrent accesses,
and intense memory consuming programs show our GC is not as performant  
as INRIA's, of course.


Cheers,

--
Philippe Wang
  Philippe.Wang \at/ lip6.fr


PS: Sorry for taking so much time, debugging parallel threads in  
shared memory style is hell (you can give it a try).


___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs