Re: [Caml-list] Ocamlopt code generator question

2009-05-05 Thread Jon Harrop
On Tuesday 05 May 2009 15:15:33 Jean-Marc Eber wrote:
> Hi Dimitry,
>
> LexiFi for instance _is_ clearly interested by a sse2 32bit code generator.
>
> One should probably have the following in mind and/or ask the following
> questions:
>
> - it is probably not a good idea to support both backends (sse2 and old
> stack fp i386 architecture). It will be necessary to make a choice
> (especially taking in account the limited INRIA resources and the burden of
> already supporting different windows ports).
>
> - would INRIA be ok to switch to a sse2 code generator (based on Dimitry's
> patch - supposing that he is ok to donate it to INRIA - or Xavier's work or
> whatever)?
>
> - I also guess that a sse2 code generator would be simpler than the current
> one (that has to support this horrible fp stack architecture) and would
> therefore be a better candidate for further enhancements.
>
> - what is the opinion on this list, as a switch to a sse2 backend would
> exclude "old" processors from being OCaml compatible (I don't have a
> precise list at hand for now) ?
>
> My opinion is that this support of legacy hardware is not important, but I
> guess others are arguing in opposite directions... :-)
>
> But again, having better floating point performance (and predictable
> behaviour, compared to the bytecode version) would be a big plus for some
> applications.

If the idea is to provide better code generation on x86 going forwards with 
minimal effort then I'd have thought an LLVM-based backend would be the 
obvious choice. My tests with HLVM showed that numerical code can be a 
whopping 8x faster than today's ocamlopt on x86 and, of course, LLVM is 
improving much more rapidly.

LLVM can probably replace the x86, x64 and ppc backends. LLVM also seems like 
a sane approach to providing a native-code top level via its existing JIT 
functionality.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Ocamlopt code generator question

2009-05-05 Thread Jean-Marc Eber

Hi Dimitry,

LexiFi for instance _is_ clearly interested by a sse2 32bit code generator.

One should probably have the following in mind and/or ask the following 
questions:

- it is probably not a good idea to support both backends (sse2 and old stack fp 
i386 architecture). It will be necessary to make a choice (especially taking in 
account the limited INRIA resources and the burden of already supporting 
different windows ports).


- would INRIA be ok to switch to a sse2 code generator (based on Dimitry's patch 
- supposing that he is ok to donate it to INRIA - or Xavier's work or whatever)?


- I also guess that a sse2 code generator would be simpler than the current one 
(that has to support this horrible fp stack architecture) and would therefore be 
a better candidate for further enhancements.


- what is the opinion on this list, as a switch to a sse2 backend would exclude 
"old" processors from being OCaml compatible (I don't have a precise list at 
hand for now) ?


My opinion is that this support of legacy hardware is not important, but I guess 
others are arguing in opposite directions... :-)


But again, having better floating point performance (and predictable behaviour, 
compared to the bytecode version) would be a big plus for some applications.


Best regards,

Jean-Marc




Dmitry Bely a écrit :


I see. Why I asked this: trying to improve floating-point performance
on 32-bit x86 platform I have merged floating-point SSE2 code
generator from amd64 ocamlopt back end to i386 one, making ia32sse2
architecture. It also inlines sqrt() via -ffast-math flag and slightly
optimizes emit_float_test (usually eliminates an extra jump) -
features that are missed in the original amd64 code generator. All
this seems to work OK: beyond my own code all tests found in Ocaml CVS
test directory are passed. Of course this is idea is not new - you had
working IA32+SSE2 back end several years ago [1] but unfortunately
never released it to the public.

Is this of any interest to anybody?

- Dmitry Bely

[1] 
http://caml.inria.fr/pub/ml-archives/caml-list/2003/03/e0db2f3f54ce19e4bad589ffbb082484.fr.html

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Ocamlopt code generator question

2009-05-05 Thread Dmitry Bely
On Tue, May 5, 2009 at 1:24 PM, Xavier Leroy  wrote:
>> For amd64 we have in asmcomp/amd64/proc_nt.mlp:
>>
>> (*  xmm0 - xmm15  100 - 115       xmm0 - xmm9: Caml function arguments
>>                                xmm0 - xmm3: C function arguments
>>                                xmm0: Caml and C function results
>>                                xmm6-xmm15 are preserved by C *)
>>
>> let loc_arguments arg =
>>  calling_conventions 0 9 100 109 outgoing arg
>> let loc_parameters arg =
>>  let (loc, ofs) = calling_conventions 0 9 100 109 incoming arg in loc
>> let loc_results res =
>>  let (loc, ofs) = calling_conventions 0 0 100 100 not_supported res in loc
>>
>> What these first_float=100 and last_float=109 for loc_arguments and
>> loc_parameters affect? My impression is that floats are always passed
>> boxed, so xmm registers are in fact never used to pass parameters. And
>> float values are returned as a pointer in eax, not a value in xmm0 as
>> loc_results would suggest.
>
> The ocamlopt code generators support unboxed floats as function
> parameters and results, as well as returning multiple results in
> several registers.  (Except for the x86-32 bits port, because of the
> weird floating-point model of this architecture.)  You're right that
> the ocamlopt "middle-end" does not currently take advantage of this
> possibility, since floats are passed between functions in boxed state.

I see. Why I asked this: trying to improve floating-point performance
on 32-bit x86 platform I have merged floating-point SSE2 code
generator from amd64 ocamlopt back end to i386 one, making ia32sse2
architecture. It also inlines sqrt() via -ffast-math flag and slightly
optimizes emit_float_test (usually eliminates an extra jump) -
features that are missed in the original amd64 code generator. All
this seems to work OK: beyond my own code all tests found in Ocaml CVS
test directory are passed. Of course this is idea is not new - you had
working IA32+SSE2 back end several years ago [1] but unfortunately
never released it to the public.

Is this of any interest to anybody?

- Dmitry Bely

[1] 
http://caml.inria.fr/pub/ml-archives/caml-list/2003/03/e0db2f3f54ce19e4bad589ffbb082484.fr.html

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Ocamlopt code generator question

2009-05-05 Thread Xavier Leroy

For amd64 we have in asmcomp/amd64/proc_nt.mlp:

(*  xmm0 - xmm15  100 - 115   xmm0 - xmm9: Caml function arguments
xmm0 - xmm3: C function arguments
xmm0: Caml and C function results
xmm6-xmm15 are preserved by C *)

let loc_arguments arg =
  calling_conventions 0 9 100 109 outgoing arg
let loc_parameters arg =
  let (loc, ofs) = calling_conventions 0 9 100 109 incoming arg in loc
let loc_results res =
  let (loc, ofs) = calling_conventions 0 0 100 100 not_supported res in loc

What these first_float=100 and last_float=109 for loc_arguments and
loc_parameters affect? My impression is that floats are always passed
boxed, so xmm registers are in fact never used to pass parameters. And
float values are returned as a pointer in eax, not a value in xmm0 as
loc_results would suggest.


The ocamlopt code generators support unboxed floats as function
parameters and results, as well as returning multiple results in
several registers.  (Except for the x86-32 bits port, because of the
weird floating-point model of this architecture.)  You're right that
the ocamlopt "middle-end" does not currently take advantage of this
possibility, since floats are passed between functions in boxed state.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] Ocamlopt code generator question

2009-04-28 Thread Dmitry Bely
For amd64 we have in asmcomp/amd64/proc_nt.mlp:

(*  xmm0 - xmm15  100 - 115   xmm0 - xmm9: Caml function arguments
xmm0 - xmm3: C function arguments
xmm0: Caml and C function results
xmm6-xmm15 are preserved by C *)

let loc_arguments arg =
  calling_conventions 0 9 100 109 outgoing arg
let loc_parameters arg =
  let (loc, ofs) = calling_conventions 0 9 100 109 incoming arg in loc
let loc_results res =
  let (loc, ofs) = calling_conventions 0 0 100 100 not_supported res in loc

What these first_float=100 and last_float=109 for loc_arguments and
loc_parameters affect? My impression is that floats are always passed
boxed, so xmm registers are in fact never used to pass parameters. And
float values are returned as a pointer in eax, not a value in xmm0 as
loc_results would suggest.

- Dmitry Bely

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs