Re: [Caml-list] JIT & HLVM, LLVM

2009-09-29 Thread Mikkel Fahnøe Jørgensen
2009/9/27 Jon Harrop :
> On Sunday 27 September 2009 20:23:13 kche...@math.carleton.ca wrote:
>> If Grand Central Dispatch makes its way
>> into *nix...
>
> Incidentally, what exactly (technically) is Grand Central and how does it
> relate to existing alternatives and the state-of-the-art? I Googled it but
> failed to find anything useful...
>

Grand Central Dispatch (GCD, where libdispatch is client side library)
is a very simple piece of code (with some non-trivial implementation
details). Some would say it is just a thread queue, but it is quite
clever the same way map-reduce is clever and simple.

It may help to read the Ars Technica article first, since I dig a
little below the user level api.

Two important concepts: GCD does not require threads, and it does not
require kernel support. When present, GCD becomes efficient at
handling multiple cores - or this is the claim, and at least OS X 10.6
Snow Leopard seems to be much smoother than OS X 10.5 Leopard, so I
guess there is something to it. But without threads and kernel
support, it is still a good programming model, it would seem (haven't
burned my fingers yet...)


To better understand GCD, think of a single threaded event driven
application with support for nested event queues.

We start with a single thread, later extend the concept to multiple
threads and cores:

There is set of global queues with different priorities, one which is
the default.
Since we assume single threaded for now, we limit this to only one global queue.

Tasks are pushed onto the global queue.

A task is just function with data, i.e. a closure.

In C this is a function pointer and a void * to context, or it is
block if you have the blocks extension, but the blocks extension
really is optional, although very useful since it can take an standard
C block and with a little syntax turn it into a closure with a copy of
the parent scope state which can then quickly be turned into a
parallel loop. But this is just syntax, really not too important for
the underpinnings of libdispatch.

A function can push other functions to the global queue. This in
effect creates a continuation, and the concept is similar to
continuation passing style or LWT threads, but requires no
continuation arguments to drag around.

A task should return soon, and avoid blocking operations - especially
in single threaded setup, but generally blocking is bad in GCD.

The interesting part is how this gets scheduled: There is no central scheduler.

When our current task function returns (which it should do soon),
there is another piece of calling code that earlier picked our task
from a queue. Upon task return this code now picks the next task on
the queue and calls it - ad infinitum until we reach an empty queue,
in which case our control code can exit - i.e. exit the thread.

So far this your standard http web server event loop.

Now we add a thread pool to the queue.

This means that N threads are all either executing a task from the
queue, or actively executing a task on the queue. One a task
completes, the next task is popped. Since there are N threads, we have
contention on the queue, so it must be briefly locked with
compare-exchange logic.

Now we have a concurrent queue with the semantic that tasks are
started in order, but execute concurrently. Any and all
synchronization between tasks must be managed by client code if they
access shared resources.

Which brings us to the next point:

A task (or the main program thread) can create a new serial queue. A
serial queue is really just a task pushed onto the default global
queue, but this is mostly invisible to the user, but internally this
makes the control very simple and elegant (thread local storage is
used to identify the current parent queue in the queue creation
function).

Our active task can now push new tasks onto the serial queue. These
tasks are guaranteed to be isolated. The serial queue tasks will
eventually be popped from the global concurrent queue either after our
function exits, or sooner if there or other vacant threads in the
pool. Once the serial queue task is popped, it starts popping tasks of
the queue, while still locking pop operations to avoid conflict with
new pushes.

Note that every single thread cooperate on scheduling by popping the
next task from the queue it is currently assigned to. When a task is a
queue task, it will temporarily change the thread task to the new
queue until that queue is exhausted, then the thread returns focus to
the previous queue.

A task can choose to wait on the completion of another task it pushes,
in which case it is easy to deadlock if not being careful.

It is possibly to create an insanely huge amounts of queues in this
system. The only real overhead is the costs of memory barriers used
when pushing and popping.

Apple refers to this as islands of serialization in a sea of concurrency.

There is also the event signalling subsystem that grabs descriptor
events and holds a task until signalled, then

Re: [Caml-list] xpath or alternatives

2009-09-29 Thread Mikkel Fahnøe Jørgensen
In line with what Yaron suggests, you can use a combinator parser.

I do this to parse json, and this parser could be adapted to xml by
focusing on basic syntax and ignoring the details, or you could
prefilter xml and use the json parser directly.

See the Fleece parser embedded here:

There is also the object abstraction that dives into an object
hierarchy after parsing, see the Objects module. The combination of
these two makes it quite easy to work on structured data, but 3 lines
only come after some xml adaptation work - but you can see many
one-liner json access in the last part of the file.

http://git.dvide.com/pub/symbiosis/tree/myocamlbuild_config.ml

Otherwise there is xmlm which is self-contained in single xml file,
and as I recall, has some sort of zipper navigator. (I initially
intended to use it before deciding on the json format):

http://erratique.ch/software/xmlm

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] ANN: pa_polyrec: syntax for polymorphic recursion

2009-09-29 Thread Jeremy Yallop

Thanks for the comments, Damien.

Damien Guichard wrote:

Problems are :
   1. functions are harder to type
   2. type errors are quite hard to interpret and to locate
  (although *pa-polyrec* certainly helps in this department)
   3. benchmarks show you pay a certain (small) price for the added type
  expressivity


I think that the arguments you make against polymorphic recursion could also be 
made against other features of OCaml, such as structurally-typed objects and 
polymorphic variants: they make type errors more complex and can be less 
efficient than the alternatives. Still, I'm glad that OCaml has those features 
because they make it possible to write programs that would otherwise be impossible.


I'm glad you raised these points, though, because they highlight the fact that 
there is a tradeoff involved.  If you use nested data types and polymorphic 
recursion then you may have to work harder to satisfy the compiler.  However, in 
return the compiler will guarantee the absence of certain errors that would 
otherwise remain undetected until runtime, and perhaps forever.



May be polymorphic recursion is actually a tool for the working ML programmer.
Yet i am still waiting to see the exemple that is more than a nice trick for the ML hacker. 


Here is one example: the following paper shows how to use OCaml extended with 
GADTs to give a precise type to the output of a parser generator:


   Towards efficient, typed LR parsers
   François Pottier and Yann Régis-Gianas
   ACM Workshop on ML, 2006
   http://gallium.inria.fr/~fpottier/publis/fpottier-regis-gianas-typed-lr.pdf

The run and gotoE procedures, which form a central part of the implementation 
described in that paper, are inherently polymorphic-recursive, just like most 
other non-trivial functions over GADTs.  OCaml doesn't have GADTs, of course, 
but Oleg recently described a technique that can be used to translate many 
programs that use GADTs into OCaml:



http://caml.inria.fr/pub/ml-archives/caml-list/2009/07/2984f23799f442d0579faacbf4e6e904.en.html

Polymorphic recursion is an important element of that technique.

In my opinion, when wants the added type expressivity he actually wants 
dependant types. 


That may be true, but since OCaml doesn't support dependent types I think that 
it's useful to know what can be accomplished without them.


There is an ancillary benefit to pa_polyrec (besides polymorphic recursion): it 
provides a clear way of ensuring that a function is polymorphic, giving a 
solution to the problem described in this blog post by Stephen Weeks:


http://ocaml.janestreet.com/?q=node/29

For example, suppose that you have a function

   let f = fun x -> 13 :: x

and you want to tell the compiler that it has type 'a list -> 'a list, for any 
type 'a.  (Since it actually does *not* have that type, we want the compiler to 
complain.)  This is surprisingly difficult to write straightforwardly in OCaml! 
 You can't write the following


   let f : 'a list -> 'a list = fun x -> 13 :: x

since OCaml will simply unify the variable 'a with the element type of the list, 
and type-checking will succeed.


That blog post shows how to use an uninhabited type to guarantee polymorphism: 
you can write something like the following


   type a

   let f = fun x -> 13 :: x
   let (_ : a list -> a list) = f

and the compiler will duly complain that 'a' and 'int' cannot be unified.  This 
is certainly a useful trick; still, it would be nice to be able to ascribe the 
polymorphic signature directly.  Using pa_polyrec one can write


   let rec f : 'a. 'a list -> 'a list = fun x -> 13 :: x

and the compiler will complain that the type of the right hand side is 
insufficiently general.  (Of course, it would be better to be able to omit the 
'rec'; perhaps a future release will permit that.)


___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Incremental linking

2009-09-29 Thread Gerd Stolpmann

Am Dienstag, den 29.09.2009, 20:39 +0200 schrieb Dawid Toton:
> I have lot of modules and they are compiled to native code.
> So I have .cmx and .o files and want to link them faster.

Well, you can link several .cmx files (and their accompanying .o files)
to a .cmxa file (and an accompanying .a file): ocamlopt -a

You cannot do the same again on the next level, i.e. link
several .cmxa/.a together to get another .cmxa/.a. (I don't remember why
this restriction exists.)

Gerd

> 
> Is is possible to make linking an associative operation acting on modules?
> 
> I would like to do something like the following:
> Knowing the correct partial order of modules (that compiler requires) I
> can create a tree that preserves that order. Leafs are modules. Other
> nodes of the tree correspond to a result of linking all descendant
> modules. Modules that are frequently recompiled are placed closer to the
> root. This way I expect to execute less linking operations during
> development.
> 
> Documentation of ld says that files produced with --relocatable can be
> used as intermediate partially linked files. Can something like this be
> done with object code produced by ocamlopt?
> I don't know ocaml-specific details of linking, so maybe I overlook some
> obvoius obstacle?
> 
> Dawid
> 
> ___
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
-- 

Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
g...@gerd-stolpmann.de  http://www.gerd-stolpmann.de
Phone: +49-6151-153855  Fax: +49-6151-997714


___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] Incremental linking

2009-09-29 Thread Dawid Toton

I have lot of modules and they are compiled to native code.
So I have .cmx and .o files and want to link them faster.

Is is possible to make linking an associative operation acting on modules?

I would like to do something like the following:
Knowing the correct partial order of modules (that compiler requires) I
can create a tree that preserves that order. Leafs are modules. Other
nodes of the tree correspond to a result of linking all descendant
modules. Modules that are frequently recompiled are placed closer to the
root. This way I expect to execute less linking operations during
development.

Documentation of ld says that files produced with --relocatable can be
used as intermediate partially linked files. Can something like this be
done with object code produced by ocamlopt?
I don't know ocaml-specific details of linking, so maybe I overlook some
obvoius obstacle?

Dawid

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] Incremental linking

2009-09-29 Thread Dawid Toton

I have lot of modules and they are compiled to native code.
So I have .cmx and .o files and want to link them faster.

Is is possible to make linking an associative operation acting on modules?

I would like to do something like the following:
Knowing the correct partial order of modules (that compiler requires) I 
can create a tree that preserves that order. Leafs are modules. Other 
nodes of the tree correspond to a result of linking all descendant 
modules. Modules that are frequently recompiled are placed closer to the 
root. This way I expect to execute less linking operations during 
development.


Documentation of ld says that files produced with --relocatable can be 
used as intermediate partially linked files. Can something like this be 
done with object code produced by ocamlopt?
I don't know ocaml-specific details of linking, so maybe I overlook some 
obvoius obstacle?


Dawid

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] ANN: pa_polyrec: syntax for polymorphic recursion

2009-09-29 Thread Damien Guichard

Hi Jeremy,

Your syntax extension is nice and simple enough.
It will really help in demystifying polymorphic recursion.

The provided examples are particularly usefull in understanding where 
polymorphic recursion matters.

My opinion is that polymorphic recursion is largeley overestimated and has 
little real-life use in a ML context.

Basically there are two usage patterns :
the Okasaki/Paterson style
the J.J.Hallett & A.J.Kfoury style

The J.J.Hallett & A.J.Kfoury style is just plain misuse of the rec keyword.

That is instead of  a wrong rec scope :
  let rec double : 'a . ('a -> 'a) -> 'a -> 'a
  = fun f y -> f (f y)
  and foo : 'x . int -> int
  = fun v -> double (fun x -> x + 1) v
  and goo : 'y . float -> float 
  = fun w -> double sqrt w
  in (foo 3, goo 16.0)

One shoud use a correct rec scope : 
  let rec double f y =
f (f y)
  in let foo v =
double (fun x -> x + 1) v
  and goo w = 
double sqrt w
  in (foo 3, goo 16.0)

Especially, OCaml beginner should be warned that polymorphic recursion is not a 
solution to their typing problems.

The Okasaki/Paterson style is more insteresting.
Basically it's a poor man's dependant types.

Instead of  :

   type 'a perfect = Leaf of 'a | Node of 'a perfect * 'a perfect

One does write :

   type 'a perfect = Zero of 'a | Succ of ('a * 'a) perfect

Then the compiler can statically check that the tree is always complete 
(perfectly balanced).

Problems are :
functions are harder to type
type errors are quite hard to interpret and to locate (although pa-polyrec 
certainly helps in this department)
benchmarks show you pay a certain (small) price for the added type expressivity

In my opinion, when wants the added type expressivity he actually wants 
dependant types. 
I mean one should prefer the following Coq code, that is dependant types rather 
than nested data types :

  Variable A : Set.

  Inductive perfect : nat -> Set :=
  | Leaf : perfect O
  | Node : forall n, A -> perfect n -> perfect n -> perfect (S n).

The de Bruijn exemple is representative enough :
either you seek after efficiency then nested data type will just hinder 
interpreter performance
either you seek after interpreter validation then dependant types is the way to 
go

May be polymorphic recursion is actually a tool for the working ML programmer.
Yet i am still waiting to see the exemple that is more than a nice trick for 
the ML hacker. 

Hoping i have made the matter less esoteric to OCaml beginners,
Regards,


- damien





Damien Guichard
2009-09-29



En réponse au message
de : Jeremy Yallop
du : 2009-09-28 22:56:32
À : caml-l...@inria.fr
CC : 
Sujet : [Caml-list] ANN: pa_polyrec: syntax for polymorphic recursion


I'm pleased to announce the initial release of pa_polyrec, a syntax extension 
for polymorphic recursion in OCaml.

https://forge.ocamlcore.org/projects/pa-polyrec/

There are several methods for encoding polymorphic-recursive functions in 
OCaml; 
this extension allows them to be written directly, using a natural syntax.  For 
example, given the following type of perfectly-balanced trees we might wish to 
write a function for summing the leaves.

   type 'a perfect = Zero of 'a | Succ of ('a * 'a) perfect

In standard OCaml such a function can be written as follows:

   let sump f =
 (object (o)
 method sump : 'a. ('a - > int) - > 'a perfect - > int =
   fun f - > function
| Zero x - > f x
| Succ x - > o#sump (fun (a, b) - > f a + f b) x
  end)#sump f

   let sum_perfect = sump id

Using pa_polyrec one can write the function in the following less obfuscated 
style:

   let rec sump : 'a. ('a - > int) - > 'a perfect - > int =
 fun f - > function
  | Zero x - > f x
  | Succ x - > sump (fun (a, b) - > f a + f b) x

   let sum_perfect = sump id

Note that the type variable 'a in the type of the function is quantified: this 
is what differentiates polymorphic-recursive functions from standard OCaml 
recursive function definitions.

More complex usage is supported, including mutual recursion.  A number of 
examples are included in the distribution, including several modules from Chris 
Okasaki's thesis "Purely Functional Data Structures" and code from Richard Bird 
and Ross Paterson's paper "de Bruijn notation as a nested datatype".

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs
___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs