[Caml-list] propo/ je vai transferai $12.million sur ton compte ci oui je te le dit detail

2009-02-20 Thread deni usman
Invitation : "propo/ je vai transferai $12.million sur ton compte ci oui 
je te le dit detail".


Par votre hôte deni usman:


 Date:  samedi 21 février 2009

 Heure: 4h 00 - 5h 00 (GMT+00:00)

Invités:

 * jorge...@yahoo.com
 * leopoldoerne...@yahoo.com-br
 * ibumat...@yahoo.com
 * italoliev...@yahoo.com.ar
 * ernestozelayan...@yahoo.com
 * thorsteinsbuec...@yahoo.de
 * tarile...@yahoo.it
 * irnche...@yahoo.com
 * sussan_4y...@yahoo.com
 * ilonasafron...@rambler.ru
 * victorialo...@yahoo.com
 * joylove...@yahoo.com
 * aishaazi...@yahoo.com
 * ernesto.villanu...@limac.org.mx
 * evillanuev...@yahoo.com
 * ypa...@yahoo.com
 * emile.ga...@laposte.net
 * ebo_dun...@yahoo.com
 * ernesto.azo...@cec.eu.int
 * wgonzalez51...@yahoo.com
 * wgonzale...@yahoo.com
 * modestasua...@yahoo.fr
 * marjovac...@yahoo.com
 * c...@qc.aira.com
 * ernestoherrer...@yahoo.com.mx
 * nestyhos...@yahoo.es.
 * rober...@yahoo.com
 * foliesclubb...@yahoo.fr
 * envl...@yahoo.fr
 * skal...@yahoo.fr
 * cor...@aler.orgguillermoantonioramos
 * gmail.comricardsa...@yahoo.comsoesteus
 * saiddag...@yahoo.com
 * ernesto.fus...@entecra.it
 * fla...@vdolores.com.ar
 * fla...@ciudad.com.ar
 * ecverce...@yahoo.com
 * ernestomorales...@yahoo.ca
 * detfre...@yahoo.com.br
 * a.taghip...@yahoo.com
 * vanninacasalo...@yahoo.fr
 * ernesto.corva...@wanadoo.fr
 * lesfilsdharpa...@yahoo.com
 * attemene...@yahoo.fr
 * manon...@abacom.com
 * pont_de_b...@yahoo.ca
 * paul.mo...@usherbrooke.ca
 * ernesto.mol...@usherbrooke.ca
 * nuevacoc...@yahoo.com.ar
 * maala...@yahoo.com
 * h...@bik-gmbh.de
 * caml-l...@inria.fr
 * erne...@barettadeit.com
 * pasckosky2...@yahoo.it
 * l...@barettadeit.com
 * nash042...@yahoo.com
 * ebe...@hotmail.com
 * knamw...@yahoo.fr
 * emach...@yahoo.com
 * allcra...@yandex.ru
 * justus_the_g...@yahoo.com
 * ernesto.traj...@gmail.com
 * transtlaxc...@yahoo.com
 * mu...@yahoo.com
 * f...@telepolis.com
 * florg...@yahoo.com
 * jpar...@yahoo.com
 * myriz...@yahoo.fr
 * m_bust...@yahoo.com
 * ytamboura2...@yahoo.fr
 * burkinabe_outaou...@yahoo.fr
 * colombianosngatin...@yahoo.ca
 * nauti...@yahoo.fr
 * ernestoceccone...@gmail.com
 * estellesybi...@aol.com
 * josernest...@hotmail.com
 * fantasn...@hotmail.com
 * jeanpaulr...@yahoo.fr
 * bene...@yahoo.com.br
 * domnar...@yahoo.fr

invitation_add_to_your_yahoo_calendar:

 
http://fr.calendar.yahoo.com//?v=60&ST=20090221T04%2B&TITLE=propo/+je+vai+transferai+$12.million+sur+ton+compte+ci+oui+je+te+le+dit+detail&DUR=0100&VIEW=d&DESC=propo/+je+vai+transferai+$12.million+sur+ton+compte+ci+oui+je+te+le+dit+detail&TYPE=10


Copyright © 2009 Tous droits réservés.
 www.yahoo.fr

Données personnelles:
 http://privacy.yahoo.com/privacy/us

Conditions d'utilisation:
 http://docs.yahoo.com/info/terms/
___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Lazy and Threads

2009-02-20 Thread Yaron Minsky

You're totally right. I withdraw my complaint.

y

Yaron Minsky

On Feb 20, 2009, at 1:36 PM, Xavier Leroy  wrote:


Victor Nicollet wrote:
   I'm working with both lazy expressions and threads, and noticed  
that the

   evaluation of lazy expressions is not thread-safe:


Yaron Minsky wrote:

At a minimum, this seems like a bug in the documentation. The
documentation states very clearly that Undefined is called when a  
value
is recursively forced.  Clearly, you get the same error when you  
force a
lazy value that is in the process of being forced for the first  
time
It does seem like fixing the behavior to match the current  
documentation
would be superior to fixing the documentation to match the current  
behavior.


It's not just the Lazy module: in general, the whole standard library
is not thread-safe.  Probably that should be stated in the
documentation for the threads library, but there isn't much point in
documenting it per standard library module.

As to making the standard library thread-safe by sprinkling it with
mutexes, Java-style: no way.  There is one part of the stdlib that is
made thread-safe this way: buffered I/O operations.  (The reason is
that, owing to the C implementation of some of these operations, a
race condition in buffered I/O could actually crash the whole program,
rather than just result in unexpected results as in the case of pure
Caml modules.)  You (Yaron) and others recently complained that such
locking around buffered I/O made some operations too slow for your
taste.  Wait until you wrap a mutex around all Lazy.force
operations...

More generally speaking, locking within a standard library is the
wrong thing to do: that doesn't prevent race conditions at the
application level, and for reasonable performance you need to lock at
a much coarser grain, again at the application level.  (That's one of
the things that make shared-memory programming with threads and locks
so incredibly painful and non-modular.)

Coming back to Victor's original question:

   Aside from handling a mutex myself (which I don't find very  
elegant for
   a read operation in a pure functional program) is there a  
solution I can
   use to manipulate lazy expressions in a pure functional multi- 
threaded

   program?


You need to think more / tell us more about what you're trying to
achieve with sharing lazy values between threads.

If your program is really purely functional (i.e. no I/O of any kind),
OCaml's multithreading is essentially useless, as you're not going to
get any speedup from it and would be better off with sequential
computations.  If your program does use threads to overlap computation
and I/O, using threads might be warranted, but then what is the
appropriate granularity of locking that you'd need?

A somewhat related question is: what semantics do you expect from
concurrent Lazy.force operations on a shared suspension?  One thread
blocks while the other completes the computation?  Same but with busy
waiting?  (if the computations are generally small).  Or do you want
speculative execution?  (Both threads may evaluate the suspended
computation.)

There is no unique answer to these questions: it all depends on what
you're trying to achieve...

- Xavier Leroy


___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] speeding up matrix multiplication (newbie question)

2009-02-20 Thread Mike Lin
These are good points. I tend to compulsively eliminate any kind of memory
allocation from my numerical loops -- it's true the OCaml allocator is a lot
faster than malloc, but you could end up repaying a lot of that back to the
GC later!
The silly library I sent out does operate on OCaml float arrays (which are
unboxed and directly accessible as double arrays to C, though I'm not sure
if it's totally kosher to use 'em that way). Something to kick around...
Mike

On Fri, Feb 20, 2009 at 5:30 PM, Will M. Farr  wrote:

> Mike and Erick,
>
> In some of my work, I've got code which is constantly creating and
> multiplying 4x4 matrices (Lorentz transforms).  I usually write in a
> functional style, so I do not generally overwrite old matrices with
> the multiplication results.  I have discovered that, at these sizes,
> it's about a factor of two to three faster to do this with OCaml
> arrays for the matrices rather than the Bigarrays you would need to
> use to interface with the BLAS in Lacaml or ocamlgsl.  The reason is
> that the memory block for a bigarray is malloc-ed, which is extremely
> slow compared to the OCaml native allocation.  That overhead
> completely eliminates the advantage of going through C or Fortran
> libraries.
>
> The ideal solution would be to use OCaml native float arrays (fast
> allocation) with LAPACK (fast matrix multiply), but I'm satisfied with
> the performance of my code, and too lazy to put out the effort to
> implement the C glue for this.
>
> Just thought you might appreciate another data point.  I have no idea
> where the turnover is between 4x4 and 61x61 :).
>
> Will
>
> 2009/2/20 Mike Lin :
> > Erick, we should compare notes sometime. I have a lot of code for doing
> this
> > kind of stuff (I am working on empirical codon models with 61x61 rate
> > matrices). The right way to speed up matrix-vector operations is to use
> BLAS
> > via either Lacaml or ocamlgsl. But if, like me, you like to
> > counterproductively fiddle around with stupid things, here's a little
> ocaml
> > library I wrote that links to a C function to do vector-vector dot using
> > SSE2 vectorization.
> >
> > http://www.broad.mit.edu/~mlin/for_others/SuperFast_dot.zip
> >
> > IIRC in my tests it gave about 50% speedup vs. the obvious C code and
> 100%
> > speedup vs. the obvious ocaml code. But, I am doing 61x61 and I'm not
> sure
> > if the speedup scales down to 20x20 or especially 4x4.
> >
> > Mike
> >
> > On Fri, Feb 20, 2009 at 2:53 PM, Erick Matsen 
> wrote:
> >>
> >> Wow, once again I am amazed by the vitality of this list. Thank you
> >> for your suggestions.
> >>
> >> Here is the context: we are interested in calculating the likelihood
> >> of taxonomic placement of short "metagenomics" sequence fragments from
> >> unknown organisms in the ocean. We start by assuming a model of
> >> sequence evolution, which is a reversible Markov chain. The taxonomy
> >> is represented as a tree, and the sequence information is a collection
> >> of likelihoods of sequence identities. As we move up the tree, these
> >> sequences "evolve" by getting multiplied by the exponentiated
> >> instantaneous Markov matrix.
> >>
> >> The matrices are of the size of the sequence model: 4x4 when looking
> >> at nucleotides, and 20x20 when looking at proteins.
> >>
> >> The bottleneck is (I mis-spoke before) that we are multiplying many
> >> length-4 or length-20 vectors by a collection of matrices which
> >> represent the time evolution of those sequences as follows.
> >>
> >> Outer loop:
> >>  modify the amount of time each markov process runs
> >>  exponentiate the rate matrices to get transition matrices
> >>
> >>  Recur over the tree, starting at the leaves:
> >>at a node, multiply all of the daughter likelihood vectors together
> >>return the multiplication of that product by the trasition matrix
> >> (bottleneck!)
> >>
> >> The trees are on the order of 50 leaves, and there are about 500
> >> likelihood vectors done at once.
> >>
> >> All of this gets run on a big cluster of Xeons. It's not worth
> >> parallelizing because we are running many instances of this process
> >> already, which fills up the cluster nodes.
> >>
> >> So far I have been doing the simplest thing possible, which is just to
> >> multiply the matrices out like \sum_j a_ij v_j. Um, this is a bit
> >> embarassing.
> >>
> >> let mul_vec m v =
> >>if Array.length v <> n_cols m then
> >>  failwith "mul_vec: matrix size and vector size don't match!";
> >>let result = Array.create (n_rows m) N.zero in
> >>for i=0 to (n_rows m)-1 do
> >>  for j=0 to (n_cols m)-1 do
> >>result.(i) <- N.add result.(i) (N.mul (get m i j) v.(j))
> >>  done;
> >>done;
> >>result
> >>
> >> I have implemented it in a functorial way for flexibility. N is the
> >> number class. How much improvement might I hope for if I make a
> >> dedicated float vector multiplication function

Re: [Caml-list] speeding up matrix multiplication (newbie question)

2009-02-20 Thread Markus Mottl
Unless you want to interface C-calls into BLAS/LAPACK directly without
bounds checking, releasing the OCaml-lock, and other "fru-fru", it
seems unlikely that you will get much of an advantage using those
libraries given the small size of your matrices.  E.g. Lacaml is
optimized for larger matrices (probably > 10x10).

I guess you should be fine rolling your own implementation for such
small matrices.

Regards,
Markus

-- 
Markus Mottlhttp://www.ocaml.infomarkus.mo...@gmail.com

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] speeding up matrix multiplication (newbie question)

2009-02-20 Thread Will M. Farr
Mike and Erick,

In some of my work, I've got code which is constantly creating and
multiplying 4x4 matrices (Lorentz transforms).  I usually write in a
functional style, so I do not generally overwrite old matrices with
the multiplication results.  I have discovered that, at these sizes,
it's about a factor of two to three faster to do this with OCaml
arrays for the matrices rather than the Bigarrays you would need to
use to interface with the BLAS in Lacaml or ocamlgsl.  The reason is
that the memory block for a bigarray is malloc-ed, which is extremely
slow compared to the OCaml native allocation.  That overhead
completely eliminates the advantage of going through C or Fortran
libraries.

The ideal solution would be to use OCaml native float arrays (fast
allocation) with LAPACK (fast matrix multiply), but I'm satisfied with
the performance of my code, and too lazy to put out the effort to
implement the C glue for this.

Just thought you might appreciate another data point.  I have no idea
where the turnover is between 4x4 and 61x61 :).

Will

2009/2/20 Mike Lin :
> Erick, we should compare notes sometime. I have a lot of code for doing this
> kind of stuff (I am working on empirical codon models with 61x61 rate
> matrices). The right way to speed up matrix-vector operations is to use BLAS
> via either Lacaml or ocamlgsl. But if, like me, you like to
> counterproductively fiddle around with stupid things, here's a little ocaml
> library I wrote that links to a C function to do vector-vector dot using
> SSE2 vectorization.
>
> http://www.broad.mit.edu/~mlin/for_others/SuperFast_dot.zip
>
> IIRC in my tests it gave about 50% speedup vs. the obvious C code and 100%
> speedup vs. the obvious ocaml code. But, I am doing 61x61 and I'm not sure
> if the speedup scales down to 20x20 or especially 4x4.
>
> Mike
>
> On Fri, Feb 20, 2009 at 2:53 PM, Erick Matsen  wrote:
>>
>> Wow, once again I am amazed by the vitality of this list. Thank you
>> for your suggestions.
>>
>> Here is the context: we are interested in calculating the likelihood
>> of taxonomic placement of short "metagenomics" sequence fragments from
>> unknown organisms in the ocean. We start by assuming a model of
>> sequence evolution, which is a reversible Markov chain. The taxonomy
>> is represented as a tree, and the sequence information is a collection
>> of likelihoods of sequence identities. As we move up the tree, these
>> sequences "evolve" by getting multiplied by the exponentiated
>> instantaneous Markov matrix.
>>
>> The matrices are of the size of the sequence model: 4x4 when looking
>> at nucleotides, and 20x20 when looking at proteins.
>>
>> The bottleneck is (I mis-spoke before) that we are multiplying many
>> length-4 or length-20 vectors by a collection of matrices which
>> represent the time evolution of those sequences as follows.
>>
>> Outer loop:
>>  modify the amount of time each markov process runs
>>  exponentiate the rate matrices to get transition matrices
>>
>>  Recur over the tree, starting at the leaves:
>>at a node, multiply all of the daughter likelihood vectors together
>>return the multiplication of that product by the trasition matrix
>> (bottleneck!)
>>
>> The trees are on the order of 50 leaves, and there are about 500
>> likelihood vectors done at once.
>>
>> All of this gets run on a big cluster of Xeons. It's not worth
>> parallelizing because we are running many instances of this process
>> already, which fills up the cluster nodes.
>>
>> So far I have been doing the simplest thing possible, which is just to
>> multiply the matrices out like \sum_j a_ij v_j. Um, this is a bit
>> embarassing.
>>
>> let mul_vec m v =
>>if Array.length v <> n_cols m then
>>  failwith "mul_vec: matrix size and vector size don't match!";
>>let result = Array.create (n_rows m) N.zero in
>>for i=0 to (n_rows m)-1 do
>>  for j=0 to (n_cols m)-1 do
>>result.(i) <- N.add result.(i) (N.mul (get m i j) v.(j))
>>  done;
>>done;
>>result
>>
>> I have implemented it in a functorial way for flexibility. N is the
>> number class. How much improvement might I hope for if I make a
>> dedicated float vector multiplication function? I'm sorry, I know
>> nothing about "boxing." Where can I read about that?
>>
>>
>> Thank you again,
>>
>> Erick
>>
>>
>>
>> On Fri, Feb 20, 2009 at 10:46 AM, Xavier Leroy 
>> wrote:
>> >> I'm working on speeding up some code, and I wanted to check with
>> >> someone before implementation.
>> >>
>> >> As you can see below, the code primarily spends its time multiplying
>> >> relatively small matrices. Precision is of course important but not
>> >> an incredibly crucial issue, as the most important thing is relative
>> >> comparison between things which *should* be pretty different.
>> >
>> > You need to post your matrix multiplication code so that the regulars
>> > on this list can tear it to pieces :-)
>> >
>> > From the profile you gave, it looks like you parameterized y

Re: [Caml-list] speeding up matrix multiplication (newbie question)

2009-02-20 Thread Mike Lin
Erick, we should compare notes sometime. I have a lot of code for doing this
kind of stuff (I am working on empirical codon models with 61x61 rate
matrices). The right way to speed up matrix-vector operations is to use BLAS
via either Lacaml or ocamlgsl. But if, like me, you like to
counterproductively fiddle around with stupid things, here's a little ocaml
library I wrote that links to a C function to do vector-vector dot using
SSE2 vectorization.

http://www.broad.mit.edu/~mlin/for_others/SuperFast_dot.zip

IIRC in my tests it gave about 50% speedup vs. the obvious C code and 100%
speedup vs. the obvious ocaml code. But, I am doing 61x61 and I'm not sure
if the speedup scales down to 20x20 or especially 4x4.

Mike

On Fri, Feb 20, 2009 at 2:53 PM, Erick Matsen  wrote:

> Wow, once again I am amazed by the vitality of this list. Thank you
> for your suggestions.
>
> Here is the context: we are interested in calculating the likelihood
> of taxonomic placement of short "metagenomics" sequence fragments from
> unknown organisms in the ocean. We start by assuming a model of
> sequence evolution, which is a reversible Markov chain. The taxonomy
> is represented as a tree, and the sequence information is a collection
> of likelihoods of sequence identities. As we move up the tree, these
> sequences "evolve" by getting multiplied by the exponentiated
> instantaneous Markov matrix.
>
> The matrices are of the size of the sequence model: 4x4 when looking
> at nucleotides, and 20x20 when looking at proteins.
>
> The bottleneck is (I mis-spoke before) that we are multiplying many
> length-4 or length-20 vectors by a collection of matrices which
> represent the time evolution of those sequences as follows.
>
> Outer loop:
>  modify the amount of time each markov process runs
>  exponentiate the rate matrices to get transition matrices
>
>  Recur over the tree, starting at the leaves:
>at a node, multiply all of the daughter likelihood vectors together
>return the multiplication of that product by the trasition matrix
> (bottleneck!)
>
> The trees are on the order of 50 leaves, and there are about 500
> likelihood vectors done at once.
>
> All of this gets run on a big cluster of Xeons. It's not worth
> parallelizing because we are running many instances of this process
> already, which fills up the cluster nodes.
>
> So far I have been doing the simplest thing possible, which is just to
> multiply the matrices out like \sum_j a_ij v_j. Um, this is a bit
> embarassing.
>
> let mul_vec m v =
>if Array.length v <> n_cols m then
>  failwith "mul_vec: matrix size and vector size don't match!";
>let result = Array.create (n_rows m) N.zero in
>for i=0 to (n_rows m)-1 do
>  for j=0 to (n_cols m)-1 do
>result.(i) <- N.add result.(i) (N.mul (get m i j) v.(j))
>  done;
>done;
>result
>
> I have implemented it in a functorial way for flexibility. N is the
> number class. How much improvement might I hope for if I make a
> dedicated float vector multiplication function? I'm sorry, I know
> nothing about "boxing." Where can I read about that?
>
>
> Thank you again,
>
> Erick
>
>
>
> On Fri, Feb 20, 2009 at 10:46 AM, Xavier Leroy 
> wrote:
> >> I'm working on speeding up some code, and I wanted to check with
> >> someone before implementation.
> >>
> >> As you can see below, the code primarily spends its time multiplying
> >> relatively small matrices. Precision is of course important but not
> >> an incredibly crucial issue, as the most important thing is relative
> >> comparison between things which *should* be pretty different.
> >
> > You need to post your matrix multiplication code so that the regulars
> > on this list can tear it to pieces :-)
> >
> > From the profile you gave, it looks like you parameterized your matrix
> > multiplication code over the + and * operations over matrix elements.
> > This is good for genericity but not so good for performance, as it
> > will result in more boxing (heap allocation) of floating-point values.
> > The first thing you should try is write a version of matrix
> > multiplication that is specialized for type "float".
> >
> > Then, there are several ways to write the textbook matrix
> > multiplication algorithm, some of which perform less boxing than
> > others.  Again, post your code and we'll let you know.
> >
> >> Currently I'm just using native (double-precision) ocaml floats and
> >> the native ocaml arrays for a first pass on the problem.  Now I'm
> >> thinking about moving to using float32 bigarrays, and I'm hoping
> >> that the code will double in speed. I'd like to know: is that
> >> realistic? Any other suggestions?
> >
> > It won't double in speed: arithmetic operations will take exactly the
> > same time in single or double precision.  What single-precision
> > bigarrays buy you is halving the memory footprint of your matrices.
> > That could result in better cache behavior and therefore slightly
> > better speed, but it depends 

Re: [Caml-list] speeding up matrix multiplication (newbie question)

2009-02-20 Thread Martin Jambon
Erick Matsen wrote:
> Wow, once again I am amazed by the vitality of this list. Thank you
> for your suggestions.
> 
> Here is the context: we are interested in calculating the likelihood
> of taxonomic placement of short "metagenomics" sequence fragments from
> unknown organisms in the ocean. We start by assuming a model of
> sequence evolution, which is a reversible Markov chain. The taxonomy
> is represented as a tree, and the sequence information is a collection
> of likelihoods of sequence identities. As we move up the tree, these
> sequences "evolve" by getting multiplied by the exponentiated
> instantaneous Markov matrix.
> 
> The matrices are of the size of the sequence model: 4x4 when looking
> at nucleotides, and 20x20 when looking at proteins.
> 
> The bottleneck is (I mis-spoke before) that we are multiplying many
> length-4 or length-20 vectors by a collection of matrices which
> represent the time evolution of those sequences as follows.
> 
> Outer loop:
>   modify the amount of time each markov process runs
>   exponentiate the rate matrices to get transition matrices
> 
>   Recur over the tree, starting at the leaves:
> at a node, multiply all of the daughter likelihood vectors together
> return the multiplication of that product by the trasition matrix
> (bottleneck!)
> 
> The trees are on the order of 50 leaves, and there are about 500
> likelihood vectors done at once.
> 
> All of this gets run on a big cluster of Xeons. It's not worth
> parallelizing because we are running many instances of this process
> already, which fills up the cluster nodes.
> 
> So far I have been doing the simplest thing possible, which is just to
> multiply the matrices out like \sum_j a_ij v_j. Um, this is a bit
> embarassing.
> 
> let mul_vec m v =
> if Array.length v <> n_cols m then
>   failwith "mul_vec: matrix size and vector size don't match!";
> let result = Array.create (n_rows m) N.zero in
> for i=0 to (n_rows m)-1 do
>   for j=0 to (n_cols m)-1 do
>   result.(i) <- N.add result.(i) (N.mul (get m i j) v.(j))
>   done;
> done;
> result
> 
> I have implemented it in a functorial way for flexibility. N is the
> number class. How much improvement might I hope for if I make a
> dedicated float vector multiplication function? I'm sorry, I know
> nothing about "boxing." Where can I read about that?

Depending on the savings, you can afford to spend more or less time optimizing
this. Here are some simple things to consider:

In the OCaml land, try first getting rid of the functor (or use a
defunctorizer; ocamldefun?).

Limit memory accesses, by doing something like:

for i = 0 to m - 1 do
  let a_i = m.(i) in
  for j = 0 to n - 1 do
let a_ij = a_i.(j) in (* instead of a.(i).(j) *)
...
  done
done

Also you can use Array.unsafe_get where it really matters.


You can also use bigarrays and implement the loop in C. It could be fun. I'm
not sure how much it saves.


Martin

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] speeding up matrix multiplication (newbie question)

2009-02-20 Thread Will M. Farr
Erick,

Sorry about the long email, but here is an explanation of what
"boxing" means, how it slows you down in this case, and how you can
(eventually) figure out whether it will slow you down in general.  I'm
not an expert, so I've probably made mistakes in the following, but I
think the broad outlines are correct.  True experts can weigh in to
correct me.

Due to polymorphism (i.e. the fact that one must be able to stuff
*any* ocaml value into an 'a list), every ocaml value must have a
uniform format and size if it is possible that it can be used
polymorphically.  The current compiler can prove in certain cases that
a value is not used polymorphically; when this is possible, the value
can be stored more efficiently.  For example, in the expression:

let f a b =
let x = a *. b in
3.0 *. x

the compiler can tell that x is always going to be used as a float and
not polymorphically.  However, a, b, and (3.0 *. x) can escape to the
larger world, where they could potentially be used polymorphically,
and therefore need to have the uniform representation.

It turns out that the representation chosen for the OCaml runtime is a
32-bit integer (i.e. a pointer).  So, to represent a float that can be
used polymorphically, the runtime allocates 64-bits of memory, stuffs
the float into it, and uses the pointer to the memory to represent the
float in the program.  This procedure is called "boxing".  In the case
above, the contents of x can be represented "unboxed" as a real 64-bit
floating point number, while the contents of a and b, and the result
(3.0 *. x) must be boxed.

The boxing rules for the ocaml compiler are described at
http://caml.inria.fr/pub/old_caml_site/ocaml/numerical.html (I don't
know if this is current or not).  The relevant rules for analyizing
your code are probably a) no cross-module inlining of functor argument
functions (i.e. N.add, N.mul, etc), b) boxing of floats at all
(non-inlined) function boundaries, and c) floating point arrays store
their elements unboxed.  There are other rules, given at the above
website.

So, the code you wrote

result.(i) <- N.add result.(i) (N.mul (get m i j) v.(j))

allocates a box for the result of (get m i j) (unless this procedure
is inlined, which is possible if it's short, and not coming in via
functor argument---it would be better to use m.(i).(j) if possible),
another box for v.(j), passes them to N.mul, which allocates a box for
the result of the multiplication.  Then a box is allocated for
result.(i), and the previous result and result.(i) are passed to
N.add, which allocates a box for its result.  Then this box is opened
and the value in it is stored in result.(i).  Ugh.  If, instead, you
write

result.(i) <- result.(i) +. (get m i j)*.v.(j)

there is *at most* one box for (get m i j).  If we assume that (get m
i j) is inlined and therefore the boxing overhead is eliminated, the
entire process takes only a few clock cycles.  Compare to possibly
several hundred or more clock cycles above, plus the extra stress on
the GC from collecting all the boxes when they are no longer
referenced by the program, and it adds up to *significant* overhead.

In short, you'll get a lot faster if you specialize the operations to
floats.  Like order-of-magnitude or greater speedups.

Good luck!

Will

On Fri, Feb 20, 2009 at 2:53 PM, Erick Matsen  wrote:
> Wow, once again I am amazed by the vitality of this list. Thank you
> for your suggestions.
>
> Here is the context: we are interested in calculating the likelihood
> of taxonomic placement of short "metagenomics" sequence fragments from
> unknown organisms in the ocean. We start by assuming a model of
> sequence evolution, which is a reversible Markov chain. The taxonomy
> is represented as a tree, and the sequence information is a collection
> of likelihoods of sequence identities. As we move up the tree, these
> sequences "evolve" by getting multiplied by the exponentiated
> instantaneous Markov matrix.
>
> The matrices are of the size of the sequence model: 4x4 when looking
> at nucleotides, and 20x20 when looking at proteins.
>
> The bottleneck is (I mis-spoke before) that we are multiplying many
> length-4 or length-20 vectors by a collection of matrices which
> represent the time evolution of those sequences as follows.
>
> Outer loop:
>  modify the amount of time each markov process runs
>  exponentiate the rate matrices to get transition matrices
>
>  Recur over the tree, starting at the leaves:
>at a node, multiply all of the daughter likelihood vectors together
>return the multiplication of that product by the trasition matrix
> (bottleneck!)
>
> The trees are on the order of 50 leaves, and there are about 500
> likelihood vectors done at once.
>
> All of this gets run on a big cluster of Xeons. It's not worth
> parallelizing because we are running many instances of this process
> already, which fills up the cluster nodes.
>
> So far I have been doing the simplest thing possible, which is just to
> multip

Re: [Caml-list] speeding up matrix multiplication (newbie question)

2009-02-20 Thread Erick Matsen
Wow, once again I am amazed by the vitality of this list. Thank you
for your suggestions.

Here is the context: we are interested in calculating the likelihood
of taxonomic placement of short "metagenomics" sequence fragments from
unknown organisms in the ocean. We start by assuming a model of
sequence evolution, which is a reversible Markov chain. The taxonomy
is represented as a tree, and the sequence information is a collection
of likelihoods of sequence identities. As we move up the tree, these
sequences "evolve" by getting multiplied by the exponentiated
instantaneous Markov matrix.

The matrices are of the size of the sequence model: 4x4 when looking
at nucleotides, and 20x20 when looking at proteins.

The bottleneck is (I mis-spoke before) that we are multiplying many
length-4 or length-20 vectors by a collection of matrices which
represent the time evolution of those sequences as follows.

Outer loop:
  modify the amount of time each markov process runs
  exponentiate the rate matrices to get transition matrices

  Recur over the tree, starting at the leaves:
at a node, multiply all of the daughter likelihood vectors together
return the multiplication of that product by the trasition matrix
(bottleneck!)

The trees are on the order of 50 leaves, and there are about 500
likelihood vectors done at once.

All of this gets run on a big cluster of Xeons. It's not worth
parallelizing because we are running many instances of this process
already, which fills up the cluster nodes.

So far I have been doing the simplest thing possible, which is just to
multiply the matrices out like \sum_j a_ij v_j. Um, this is a bit
embarassing.

let mul_vec m v =
if Array.length v <> n_cols m then
  failwith "mul_vec: matrix size and vector size don't match!";
let result = Array.create (n_rows m) N.zero in
for i=0 to (n_rows m)-1 do
  for j=0 to (n_cols m)-1 do
result.(i) <- N.add result.(i) (N.mul (get m i j) v.(j))
  done;
done;
result

I have implemented it in a functorial way for flexibility. N is the
number class. How much improvement might I hope for if I make a
dedicated float vector multiplication function? I'm sorry, I know
nothing about "boxing." Where can I read about that?


Thank you again,

Erick



On Fri, Feb 20, 2009 at 10:46 AM, Xavier Leroy  wrote:
>> I'm working on speeding up some code, and I wanted to check with
>> someone before implementation.
>>
>> As you can see below, the code primarily spends its time multiplying
>> relatively small matrices. Precision is of course important but not
>> an incredibly crucial issue, as the most important thing is relative
>> comparison between things which *should* be pretty different.
>
> You need to post your matrix multiplication code so that the regulars
> on this list can tear it to pieces :-)
>
> From the profile you gave, it looks like you parameterized your matrix
> multiplication code over the + and * operations over matrix elements.
> This is good for genericity but not so good for performance, as it
> will result in more boxing (heap allocation) of floating-point values.
> The first thing you should try is write a version of matrix
> multiplication that is specialized for type "float".
>
> Then, there are several ways to write the textbook matrix
> multiplication algorithm, some of which perform less boxing than
> others.  Again, post your code and we'll let you know.
>
>> Currently I'm just using native (double-precision) ocaml floats and
>> the native ocaml arrays for a first pass on the problem.  Now I'm
>> thinking about moving to using float32 bigarrays, and I'm hoping
>> that the code will double in speed. I'd like to know: is that
>> realistic? Any other suggestions?
>
> It won't double in speed: arithmetic operations will take exactly the
> same time in single or double precision.  What single-precision
> bigarrays buy you is halving the memory footprint of your matrices.
> That could result in better cache behavior and therefore slightly
> better speed, but it depends very much on the sizes and number of your
> matrices.
>
> - Xavier Leroy
>

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Building pcre-ocaml on OCaml 3.11.0 on MinGW

2009-02-20 Thread Gerd Stolpmann
GODI includes now MinGW support, and pcre is among the actually working
packages. Just take it, or look there for how the build is done.

Note that you should take GODI for 3.10 because there is still a bug in
the 3.11 version.

Gerd

Am Freitag, den 20.02.2009, 14:28 + schrieb David Allsopp:
> I've just had an enlightening few hours getting pcre-ocaml to compile under
> Windows (I tried a few years ago and, very lazily, just gave up). I've
> managed to get it to work but I'm wondering whether anyone else has done
> this and, if so, whether they can explain/confirm/correct a couple of the
> steps involved. I'm very much indebted to Alain Frisch's instructions for
> building PCRE under OCaml 3.10.0 which are part of the CDuce distribution or
> I would've been completely at sea with this!
> 
> The main thing that's got me puzzled is the renaming of libpcre.dll.a and
> libpcre.a that I have to do to get the thing to link.
> 
> Note that I'm building the "official" way - so MinGW running within Cygwin.
> 
> 
> +++
> Building pcre for MinGW
>   (Ensure that Cygwin's PCRE libraries are *not* installed)
>   Unpacked PCRE 7.8
>   ./configure --prefix="C:/Dev/OCaml" \  # set to install PCRE
>   --includedir="C:/Dev/OCaml/lib" \  # to my OCaml tree.
>   --disable-cpp \# from Alain's
> instructions.
>   --enable-utf8 \# Similarly.
>   --build=mingw32 \  # MinGW, not Cygwin build
>   CC="gcc -mno-cygwin"   # Necessary to ensure that
>  # autoconf detects the
> correct
>  # library and include dirs 
>  # when querying gcc (CFLAGS
>  # won't work here)
>   make
>   make install
> 
> 
> +++
> Building pcre-ocaml
>   (Note that with older (< 0.14) versions of flexlink, the linker errors
> noted doesn't show up and the resulting library is broken)
>   Unpacked pcre-ocaml 5.15.1
>   Edit pcre-ocaml-release-5.15.1/Makefile.conf to contain:
> export LIBDIRS := C:/Dev/OCaml/lib # Location of PCRE
> export MINGW=1 # MinGW build
> export CC=gcc  # Or you get lots of errors!
>   patch -p0 -i pcre-ocaml-release-5.15.1.patch # (attached)
> 
>   The patch "fixes" two things in OCamlMakefile
> a) It causes the ocamlc -where check to pass the result through cygpath.
> My OCAMLLIB variable is correctly set (for Windows) as C:\Dev\OCaml\lib and
> Cygwin configure scripts should generally respect that (build in Cygwin, run
> in Windows so OCAMLLIB is a Windows environment variable...). I've been lazy
> though and not done the same thing for the camlp4 -where test...
> b) It adds a simple check for OCaml 3.11 (it would be better if it did a
> >= 3.11 check but I haven't bothered to bring up the GNU make info pages to
> check the syntax for doing that!) and uses ocamlmklib instead of manually
> building the stub libraries if OCaml 3.11 is found - the manual build
> instructions included in OCamlMakefile are for 3.10 and earlier and so don't
> work (i.e. non-flexlink linking)
> 
>   OK, so at this stage it looks as though we should be ready to build. But
> if I run make (with flexlink 0.14 or later) then I get the following errors
> (compiling with flexlink 0.13 works, but the resulting library is broken):
> 
> ocamlmklib -LC:/Dev/OCaml/lib -o pcre_stubs  pcre_stubs.o -lpcre
> c:\Users\DRA\AppData\Local\Temp\dyndll8e6a10.o:pcre_stubs.c:(.text+0x205):
> undefined reference to `__imp__pcre_callout'
> [and several more missing __imp__pcre_... messages]
> 
>   The problem is in C:\Dev\OCaml\lib, it appears. In there are libpcre.a,
> libpcre.dll.a and libpcre.la. If I rename libpcre.a to libpcre.old.a and
> then libpcre.dll.a to libpcre.a then the build works and the resulting
> library builds. As far as I can tell, this is something to do with libtool
> but I know very little about this - is the inability of the library to link
> without renaming these files something to do with using flexlink as the
> linker? If I link a test program in C using -lpcre then it works - but is
> that because gcc knows how to read .la files and looks for the libpcre.dll.a
> file correctly?
> 
>   However, once this is followed through, the library does correctly build
> and install - and the examples all seem to be working. So, finally, it's
> cheerio to the Str module for me :o)
> 
>   Any pointers appreciated!
> 
> 
> David
> ___
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's li

Re: [Caml-list] true parallelism / threads

2009-02-20 Thread Gerd Stolpmann

Am Freitag, den 20.02.2009, 10:40 -0600 schrieb Atmam Ta:
> Hi,
> 
> I am trying to evaluate ocaml for a project involving large scale
> numerical calculations. We would need parallel processing, i.e. a
> library that distributes jobs accross multiple processors within a
> machine and accross multiple PCs.
> Speed and easy programability are important. I have tried to search
> this issue first, but the postings I found were usually negative and
> 4-5 years old. On the other hand, I see a number of libraries in the
> Hump that by now might be taking care of these things. 
> 
> My question is: is ocaml good for parallel processing / hreaded
> computation, are there (mature) libraries or tools that let developers
> make use of multicore and multimachine environments?

ocamlnet contains a mature sunrpc implementation, and a framework for
multi-processing. It is used in professional cluster environments, e.g.
by the Wink people searcher.

See here for a commented example:
http://blog.camlcity.org/blog/parallelmm.html

Gerd
-- 

Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
g...@gerd-stolpmann.de  http://www.gerd-stolpmann.de
Phone: +49-6151-153855  Fax: +49-6151-997714



___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] speeding up matrix multiplication (newbie question)

2009-02-20 Thread Xavier Leroy
> I'm working on speeding up some code, and I wanted to check with
> someone before implementation.
> 
> As you can see below, the code primarily spends its time multiplying
> relatively small matrices. Precision is of course important but not
> an incredibly crucial issue, as the most important thing is relative
> comparison between things which *should* be pretty different.

You need to post your matrix multiplication code so that the regulars
on this list can tear it to pieces :-)

>From the profile you gave, it looks like you parameterized your matrix
multiplication code over the + and * operations over matrix elements.
This is good for genericity but not so good for performance, as it
will result in more boxing (heap allocation) of floating-point values.
The first thing you should try is write a version of matrix
multiplication that is specialized for type "float".

Then, there are several ways to write the textbook matrix
multiplication algorithm, some of which perform less boxing than
others.  Again, post your code and we'll let you know.

> Currently I'm just using native (double-precision) ocaml floats and
> the native ocaml arrays for a first pass on the problem.  Now I'm
> thinking about moving to using float32 bigarrays, and I'm hoping
> that the code will double in speed. I'd like to know: is that
> realistic? Any other suggestions?

It won't double in speed: arithmetic operations will take exactly the
same time in single or double precision.  What single-precision
bigarrays buy you is halving the memory footprint of your matrices.
That could result in better cache behavior and therefore slightly
better speed, but it depends very much on the sizes and number of your
matrices.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Lazy and Threads

2009-02-20 Thread Xavier Leroy
Victor Nicollet wrote:
> I'm working with both lazy expressions and threads, and noticed that the
> evaluation of lazy expressions is not thread-safe:

Yaron Minsky wrote:
> At a minimum, this seems like a bug in the documentation. The
> documentation states very clearly that Undefined is called when a value
> is recursively forced.  Clearly, you get the same error when you force a
> lazy value that is in the process of being forced for the first time
> It does seem like fixing the behavior to match the current documentation
> would be superior to fixing the documentation to match the current behavior.

It's not just the Lazy module: in general, the whole standard library
is not thread-safe.  Probably that should be stated in the
documentation for the threads library, but there isn't much point in
documenting it per standard library module.

As to making the standard library thread-safe by sprinkling it with
mutexes, Java-style: no way.  There is one part of the stdlib that is
made thread-safe this way: buffered I/O operations.  (The reason is
that, owing to the C implementation of some of these operations, a
race condition in buffered I/O could actually crash the whole program,
rather than just result in unexpected results as in the case of pure
Caml modules.)  You (Yaron) and others recently complained that such
locking around buffered I/O made some operations too slow for your
taste.  Wait until you wrap a mutex around all Lazy.force
operations...

More generally speaking, locking within a standard library is the
wrong thing to do: that doesn't prevent race conditions at the
application level, and for reasonable performance you need to lock at
a much coarser grain, again at the application level.  (That's one of
the things that make shared-memory programming with threads and locks
so incredibly painful and non-modular.)

Coming back to Victor's original question:

> Aside from handling a mutex myself (which I don't find very elegant for
> a read operation in a pure functional program) is there a solution I can
> use to manipulate lazy expressions in a pure functional multi-threaded
> program?

You need to think more / tell us more about what you're trying to
achieve with sharing lazy values between threads.

If your program is really purely functional (i.e. no I/O of any kind),
OCaml's multithreading is essentially useless, as you're not going to
get any speedup from it and would be better off with sequential
computations.  If your program does use threads to overlap computation
and I/O, using threads might be warranted, but then what is the
appropriate granularity of locking that you'd need?

A somewhat related question is: what semantics do you expect from
concurrent Lazy.force operations on a shared suspension?  One thread
blocks while the other completes the computation?  Same but with busy
waiting?  (if the computations are generally small).  Or do you want
speculative execution?  (Both threads may evaluate the suspended
computation.)

There is no unique answer to these questions: it all depends on what
you're trying to achieve...

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] speeding up matrix multiplication (newbie question)

2009-02-20 Thread Jon Harrop
On Friday 20 February 2009 15:40:00 Erick Matsen wrote:
> Hello Ocaml community---
>
> I'm working on speeding up some code, and I wanted to check with
> someone before implementation.
>
> As you can see below, the code primarily spends its time multiplying
> relatively small matrices. Precision is of course important but not an
> incredibly crucial issue, as the most important thing is relative
> comparison between things which *should* be pretty different. Currently I'm
> just using native
> (double-precision) ocaml floats and the native ocaml arrays for a first
> pass on the problem.
>
> Now I'm thinking about moving to using float32 bigarrays, and I'm hoping
> that the code will double in speed. I'd like to know: is that realistic?
> Any other suggestions?

What exactly are you doing? Exactly how big are the matrices? What is your 
exact code? What is the higher-level algorithm using the matrix multiply? Are 
you doing a loop just over many matrix multiplies? What platform and 
architecture are you using? Is the code parallelized?

Depending upon the answers to the above questions, you may be able to achieve 
huge performance improvements using tools like LLVM to generate SSE code 
whilst still acting upon OCaml's data structures.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] true parallelism / threads

2009-02-20 Thread Markus Mottl
2009/2/20 Atmam Ta :
> My question is: is ocaml good for parallel processing / hreaded computation,
> are there (mature) libraries or tools that let developers make use of
> multicore and multimachine environments?

For heavy-duty linear algebra you might want to use Lacaml:

  http://ocaml.info/home/ocaml_sources.html#lacaml

It interfaces almost all functions in BLAS and LAPACK and allows
executing multiple computations in several threads in parallel on
multi-core machines.  If you combine this with some tool for
distributed computation (e.g. MPI-based, etc.), you should get what
you need.

Regards,
Markus

-- 
Markus Mottlhttp://www.ocaml.infomarkus.mo...@gmail.com

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] Draft paper submission deadline extended: SETP-09

2009-02-20 Thread John Edward
Draft paper submission deadline extended: SETP-09
 
The deadline for draft paper submission at the 2009 International Conference on 
Software Engineering Theory and Practice (SETP-09) (website: 
http://www.PromoteResearch.org ) is extended due to numerous requests from the 
authors. The conference will be held during July 13-16 2009 in Orlando, FL, 
USA. We invite draft paper submissions. The conference will take place at the 
same time and venue where several other international conferences are taking 
place. The other conferences include:
· International Conference on Artificial Intelligence and Pattern 
Recognition (AIPR-09) 
· International Conference on Automation, Robotics and Control Systems 
(ARCS-09)
· International Conference on Bioinformatics, Computational Biology, 
Genomics and Chemoinformatics (BCBGC-09) 
· International Conference on Enterprise Information Systems and Web 
Technologies (EISWT-09)
· International Conference on High Performance Computing, Networking 
and Communication Systems (HPCNCS-09) 
· International Conference on Information Security and Privacy (ISP-09)
· International Conference on Recent Advances in Information Technology 
and Applications (RAITA-09)
· International Conference on Theory and Applications of Computational 
Science (TACS-09)
· International Conference on Theoretical and Mathematical Foundations 
of Computer Science (TMFCS-09)
 
The website http://www.PromoteResearch.org contains more details.
 
Sincerely
John Edward
Publicity committee
 


  ___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] true parallelism / threads

2009-02-20 Thread Will M. Farr
Atmam,

I've had some luck using OCaml with MPI (using the OCamlMPI library at
http://caml.inria.fr/cgi-bin/hump.en.cgi?contrib=401 ).  That may not
satisfy your needs as far as multi-core goes, but perhaps it will.  I
can't speak to the speed of the interface (my operations were
compute-bound on the individual processors, not communication bound,
so any OCaml overhead on the MPI communication was lost in the noise),
but it was definitely easy to use.  At the extreme easy-to-use end,
you can simply send arbitrary OCaml values over the MPI channels; for
more performance, you can use the functions specific to common types
(float arrays, int arrays, etc) to speed up the operations.

As far as single-core OCaml speed goes, I find that it is always
within a factor of 2 of C for straight-line loops (i.e. matrix-vector
multiply, etc), and usually *much* faster whenever more complicated
data structures are involved (maps, binary trees, etc), unless you
really sweat blood with the C implementation.

Hope this helps!

Will

2009/2/20 Atmam Ta :
> Hi,
>
> I am trying to evaluate ocaml for a project involving large scale numerical
> calculations. We would need parallel processing, i.e. a library that
> distributes jobs accross multiple processors within a machine and accross
> multiple PCs.
> Speed and easy programability are important. I have tried to search this
> issue first, but the postings I found were usually negative and 4-5 years
> old. On the other hand, I see a number of libraries in the Hump that by now
> might be taking care of these things.
>
> My question is: is ocaml good for parallel processing / hreaded computation,
> are there (mature) libraries or tools that let developers make use of
> multicore and multimachine environments?
>
> cheers,
> Atmam
>
> ___
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] true parallelism / threads

2009-02-20 Thread Hezekiah M. Carty
2009/2/20 Atmam Ta :
> Hi,
>
> I am trying to evaluate ocaml for a project involving large scale numerical
> calculations. We would need parallel processing, i.e. a library that
> distributes jobs accross multiple processors within a machine and accross
> multiple PCs.
> Speed and easy programability are important. I have tried to search this
> issue first, but the postings I found were usually negative and 4-5 years
> old. On the other hand, I see a number of libraries in the Hump that by now
> might be taking care of these things.
>
> My question is: is ocaml good for parallel processing / hreaded computation,
> are there (mature) libraries or tools that let developers make use of
> multicore and multimachine environments?
>
> cheers,
> Atmam

There are several libraries available which seem to be reasonably
usable in their current state.
Distributed processing across multiple machines:
- OCAMLMPI - http://pauillac.inria.fr/~xleroy/software.html
- OCamlP3l - http://camlp3l.inria.fr/eng.htm
- BSML - http://frederic.loulergue.eu/research/bsmllib/bsml-0.4beta.html

Fork-based parallelism for exploiting multiple cores/processors locally:
- Prelude.ml - http://github.com/kig/preludeml/tree/master

There is also JoCaml (http://jocaml.inria.fr/), which is an extension
of OCaml itself.  JoCaml has examples for various distributed
processing methods.

Hez

-- 
Hezekiah M. Carty
Graduate Research Assistant
University of Maryland
Department of Atmospheric and Oceanic Science

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] true parallelism / threads

2009-02-20 Thread Yoann Padioleau
Atmam Ta  writes:

> Hi,
>
> I am trying to evaluate ocaml for a project involving large scale numerical
> calculations. We would need parallel processing, i.e. a library that
> distributes jobs accross multiple processors within a machine and accross
> multiple PCs.
> Speed and easy programability are important. I have tried to search this issue
> first, but the postings I found were usually negative and 4-5 years old. On 
> the
> other hand, I see a number of libraries in the Hump that by now might be 
> taking
> care of these things.
>
> My question is: is ocaml good for parallel processing / hreaded computation,
> are there (mature) libraries or tools that let developers make use of 
> multicore
> and multimachine environments?

MPI ... 
http://pauillac.inria.fr/~xleroy/software.html#ocamlmpi

>

Then it's quite easy to define your own helpers on top of that.
Here is for example my poor's man google map-reduce in ocaml:
http://aryx.kicks-ass.org/~pad/darcs/commons/distribution.ml


> cheers,
> Atmam
> ___
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] true parallelism / threads

2009-02-20 Thread Atmam Ta
Hi,

I am trying to evaluate ocaml for a project involving large scale numerical
calculations. We would need parallel processing, i.e. a library that
distributes jobs accross multiple processors within a machine and accross
multiple PCs.
Speed and easy programability are important. I have tried to search this
issue first, but the postings I found were usually negative and 4-5 years
old. On the other hand, I see a number of libraries in the Hump that by now
might be taking care of these things.

My question is: is ocaml good for parallel processing / hreaded computation,
are there (mature) libraries or tools that let developers make use of
multicore and multimachine environments?

cheers,
Atmam
___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] ocamlbuild & deps

2009-02-20 Thread Daniel Bünzli


Le 20 févr. 09 à 16:39, Romain Bardou a écrit :

I think there is a difference. It is indeed an optimization issue  
but not at the level of Ocamlbuild itself : it is as the level of  
your compilation process. If A *dynamically* depends on B, and your  
whole project (say, 10 hours of compilation) depends on A, but you  
have no way to build B, then Ocamlbuild will start to compile your  
project until it finds out that A cannot be built (maybe several  
hours later). If B had been put as a ~dep, then Ocamlbuild would not  
even had started building the project in the first place, saving you  
a lot of time.


Heu no. If B cannot be built then the compilation of A stops and the  
compilation of your project stops.


It is however true that if A has a dependency on a heavy C in parallel  
to B you'll have to wait for the end of C. But even in this case, it's  
just a matter of calling 'build' with B and C in a sensible order (and  
not in parallel).


Best,

Daniel
___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


RE: [Caml-list] speeding up matrix multiplication (newbie question)

2009-02-20 Thread RABIH.ELCHAAR
I don't think you can do better than calling some C functions (bound checking, 
... ).
Why not have a look on ocaml bindings of C libraries (using bigarrays), 
like ocamlgsl (O Andrieu) http://oandrieu.nerim.net/ocaml/gsl/
or lacaml http://caml.inria.fr/cgi-bin/hump.fr.cgi?contrib=255

Hope this helps

Rabih

-Message d'origine-
De : caml-list-boun...@yquem.inria.fr [mailto:caml-list-boun...@yquem.inria.fr] 
De la part de Erick Matsen
Envoyé : vendredi 20 février 2009 16:40
À : caml-l...@inria.fr
Objet : [Caml-list] speeding up matrix multiplication (newbie question)

Hello Ocaml community---


I'm working on speeding up some code, and I wanted to check with
someone before implementation.

As you can see below, the code primarily spends its time multiplying relatively
small matrices. Precision is of course important but not an incredibly crucial
issue, as the most important thing is relative comparison between things which
*should* be pretty different. Currently I'm just using native
(double-precision) ocaml floats and the native ocaml arrays for a first pass on
the problem.

Now I'm thinking about moving to using float32 bigarrays, and I'm hoping that
the code will double in speed. I'd like to know: is that realistic? Any other
suggestions?


Thank you,

Erick



   profiling information 

%  cumulative  self self   total
time   seconds seconds  calls  s/call  s/call  name
30.27  7.447.44 836419 0.000.00camlMat__mul_vec_263
15.42  11.23   3.79 335237785  0.000.00camlMat__get_447
14.65  14.83   3.60 334624076  0.000.00camlNumber__mul_185
13.75  18.21   3.38 682814594  0.000.00caml_apply2
11.31  20.99   2.78 334624076  0.000.00camlNumber__add_183
6.02   22.47   1.48 335724401  0.000.00caml_apply3
1.14   22.75   0.28 480860 0.000.00camlDiagd__fun_304
1.06   23.01   0.26 159338 0.000.00caml_oldify_local_roots
1.06   23.27   0.26 79634  0.000.00sweep_slice
0.90   23.49   0.22 79828  0.000.00mark_slice
0.65   23.65   0.16 10455018   0.000.00camlQtree__code_begin
0.61   23.80   0.15 15173290.000.00caml_oldify_one
0.57   23.94   0.14 17592082   0.000.00camlMat__n_cols_458
0.57   24.08   0.14 13102569   0.000.00caml_modify
0.57   24.22   0.14 522761 0.000.00camlArray__mapi_142
...

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs
This message and any attachments (the "message") are confidential, intended 
solely for the addressee(s), and may contain legally privileged information. 
Any unauthorised use or dissemination is prohibited. 
E-mails are susceptible to alteration. 
Neither Societe Generale Asset Management nor any of its subsidiaries or 
affiliates shall be liable for the message if altered, changed or falsified. 
  
Find out more about Societe Generale Asset Management's proposal on www.sgam.com
  
 
  
Ce message et toutes les pieces jointes (ci-apres le "message") sont 
confidentiels et susceptibles de contenir des informations couvertes par le 
secret professionnel.
Ce message est etabli a l'intention exclusive de ses destinataires.
Toute utilisation ou diffusion non autorisee est interdite. 
Tout message electronique est susceptible d'alteration. Societe Generale Asset 
Management et ses filiales declinent toute responsabilite au titre de ce 
message s'il a ete altere, deforme ou falsifie. 

Decouvrez l'offre et les services de Societe Generale Asset Management sur le 
site www.sgam.fr

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] ocambuild, disabling caml rules ?

2009-02-20 Thread Romain Bardou

Daniel Bünzli a écrit :

Is it possible to disable the default rules ?

I'm using ocamlbuild for a plain C project with my own rules and it is 
painfull when something fails that it fallbacks on ocaml C's compilation 
rules. These rules wil anyway fail and they override the error that 
occured with my rule that should have been used.


I don't think there is any. In fact, the rules are hard-coded in the 
source code (in ocaml_specific.ml) like this :


rule "ocaml: ml -> d.cmo & cmi" ...;;
rule "ocaml: ml -> cmo & cmi" ...;;

So there is no test, they are always ran. And there is no way to delete 
a rule AFAIK.


--
Romain Bardou

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] speeding up matrix multiplication (newbie question)

2009-02-20 Thread Erick Matsen
Hello Ocaml community---


I'm working on speeding up some code, and I wanted to check with
someone before implementation.

As you can see below, the code primarily spends its time multiplying relatively
small matrices. Precision is of course important but not an incredibly crucial
issue, as the most important thing is relative comparison between things which
*should* be pretty different. Currently I'm just using native
(double-precision) ocaml floats and the native ocaml arrays for a first pass on
the problem.

Now I'm thinking about moving to using float32 bigarrays, and I'm hoping that
the code will double in speed. I'd like to know: is that realistic? Any other
suggestions?


Thank you,

Erick



   profiling information 

%  cumulative  self self   total
time   seconds seconds  calls  s/call  s/call  name
30.27  7.447.44 836419 0.000.00camlMat__mul_vec_263
15.42  11.23   3.79 335237785  0.000.00camlMat__get_447
14.65  14.83   3.60 334624076  0.000.00camlNumber__mul_185
13.75  18.21   3.38 682814594  0.000.00caml_apply2
11.31  20.99   2.78 334624076  0.000.00camlNumber__add_183
6.02   22.47   1.48 335724401  0.000.00caml_apply3
1.14   22.75   0.28 480860 0.000.00camlDiagd__fun_304
1.06   23.01   0.26 159338 0.000.00caml_oldify_local_roots
1.06   23.27   0.26 79634  0.000.00sweep_slice
0.90   23.49   0.22 79828  0.000.00mark_slice
0.65   23.65   0.16 10455018   0.000.00camlQtree__code_begin
0.61   23.80   0.15 15173290.000.00caml_oldify_one
0.57   23.94   0.14 17592082   0.000.00camlMat__n_cols_458
0.57   24.08   0.14 13102569   0.000.00caml_modify
0.57   24.22   0.14 522761 0.000.00camlArray__mapi_142
...

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] ocamlbuild & deps

2009-02-20 Thread Romain Bardou
Am I right in thinking that in rule specifications we could get rid of 
the ~dep(s) parameter of rules and have all deps be specified/discovered 
dynamically via the 'build' argument ? Otherwise stated is ~dep(s) just 
an optimization ?


Out of curiosity any idea in the cost of suppressing these arguments 
(i.e. was that road actually followed at some point) ?


If the answer to the first question is yes. Then I think the 
documentation could be made clearer by stating that what is asked to be 
built by the 'build' argument is considered as dependencies. However if 
you know some deps statically you can specify them as dep(s) argument 
this will just implicitely add  them to the list given to the 'build' 
argument.


I think there is a difference. It is indeed an optimization issue but 
not at the level of Ocamlbuild itself : it is as the level of your 
compilation process. If A *dynamically* depends on B, and your whole 
project (say, 10 hours of compilation) depends on A, but you have no way 
to build B, then Ocamlbuild will start to compile your project until it 
finds out that A cannot be built (maybe several hours later). If B had 
been put as a ~dep, then Ocamlbuild would not even had started building 
the project in the first place, saving you a lot of time.


Well, at least that's how I understand it ;)

--
Romain Bardou

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] Building pcre-ocaml on OCaml 3.11.0 on MinGW

2009-02-20 Thread David Allsopp
I've just had an enlightening few hours getting pcre-ocaml to compile under
Windows (I tried a few years ago and, very lazily, just gave up). I've
managed to get it to work but I'm wondering whether anyone else has done
this and, if so, whether they can explain/confirm/correct a couple of the
steps involved. I'm very much indebted to Alain Frisch's instructions for
building PCRE under OCaml 3.10.0 which are part of the CDuce distribution or
I would've been completely at sea with this!

The main thing that's got me puzzled is the renaming of libpcre.dll.a and
libpcre.a that I have to do to get the thing to link.

Note that I'm building the "official" way - so MinGW running within Cygwin.


+++
Building pcre for MinGW
  (Ensure that Cygwin's PCRE libraries are *not* installed)
  Unpacked PCRE 7.8
  ./configure --prefix="C:/Dev/OCaml" \  # set to install PCRE
  --includedir="C:/Dev/OCaml/lib" \  # to my OCaml tree.
  --disable-cpp \# from Alain's
instructions.
  --enable-utf8 \# Similarly.
  --build=mingw32 \  # MinGW, not Cygwin build
  CC="gcc -mno-cygwin"   # Necessary to ensure that
 # autoconf detects the
correct
 # library and include dirs 
 # when querying gcc (CFLAGS
 # won't work here)
  make
  make install


+++
Building pcre-ocaml
  (Note that with older (< 0.14) versions of flexlink, the linker errors
noted doesn't show up and the resulting library is broken)
  Unpacked pcre-ocaml 5.15.1
  Edit pcre-ocaml-release-5.15.1/Makefile.conf to contain:
export LIBDIRS := C:/Dev/OCaml/lib # Location of PCRE
export MINGW=1 # MinGW build
export CC=gcc  # Or you get lots of errors!
  patch -p0 -i pcre-ocaml-release-5.15.1.patch # (attached)

  The patch "fixes" two things in OCamlMakefile
a) It causes the ocamlc -where check to pass the result through cygpath.
My OCAMLLIB variable is correctly set (for Windows) as C:\Dev\OCaml\lib and
Cygwin configure scripts should generally respect that (build in Cygwin, run
in Windows so OCAMLLIB is a Windows environment variable...). I've been lazy
though and not done the same thing for the camlp4 -where test...
b) It adds a simple check for OCaml 3.11 (it would be better if it did a
>= 3.11 check but I haven't bothered to bring up the GNU make info pages to
check the syntax for doing that!) and uses ocamlmklib instead of manually
building the stub libraries if OCaml 3.11 is found - the manual build
instructions included in OCamlMakefile are for 3.10 and earlier and so don't
work (i.e. non-flexlink linking)

  OK, so at this stage it looks as though we should be ready to build. But
if I run make (with flexlink 0.14 or later) then I get the following errors
(compiling with flexlink 0.13 works, but the resulting library is broken):

ocamlmklib -LC:/Dev/OCaml/lib -o pcre_stubs  pcre_stubs.o -lpcre
c:\Users\DRA\AppData\Local\Temp\dyndll8e6a10.o:pcre_stubs.c:(.text+0x205):
undefined reference to `__imp__pcre_callout'
[and several more missing __imp__pcre_... messages]

  The problem is in C:\Dev\OCaml\lib, it appears. In there are libpcre.a,
libpcre.dll.a and libpcre.la. If I rename libpcre.a to libpcre.old.a and
then libpcre.dll.a to libpcre.a then the build works and the resulting
library builds. As far as I can tell, this is something to do with libtool
but I know very little about this - is the inability of the library to link
without renaming these files something to do with using flexlink as the
linker? If I link a test program in C using -lpcre then it works - but is
that because gcc knows how to read .la files and looks for the libpcre.dll.a
file correctly?

  However, once this is followed through, the library does correctly build
and install - and the examples all seem to be working. So, finally, it's
cheerio to the Str module for me :o)

  Any pointers appreciated!


David


pcre-ocaml-release-5.15.1.patch
Description: Binary data
___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] ocambuild, disabling caml rules ?

2009-02-20 Thread Daniel Bünzli

Is it possible to disable the default rules ?

I'm using ocamlbuild for a plain C project with my own rules and it is  
painfull when something fails that it fallbacks on ocaml C's  
compilation rules. These rules wil anyway fail and they override the  
error that occured with my rule that should have been used.


Best,

Daniel

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] ocamlbuild & deps

2009-02-20 Thread Daniel Bünzli
Am I right in thinking that in rule specifications we could get rid of  
the ~dep(s) parameter of rules and have all deps be specified/ 
discovered dynamically via the 'build' argument ? Otherwise stated is  
~dep(s) just an optimization ?


Out of curiosity any idea in the cost of suppressing these arguments  
(i.e. was that road actually followed at some point) ?


If the answer to the first question is yes. Then I think the  
documentation could be made clearer by stating that what is asked to  
be built by the 'build' argument is considered as dependencies.  
However if you know some deps statically you can specify them as  
dep(s) argument this will just implicitely add  them to the list given  
to the 'build' argument.


Best,

Daniel



___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs