Re: [Caml-list] Alignment of data

2010-01-27 Thread Goswin von Brederlow
Pascal Cuoq  writes:

> Goswin von Brederlow  wrote:
>
>
> You need to write a new function
>
> CAMLextern value caml_alloc_double_array (mlsize_t),
>
> or similar that ensures alignment on 8 byte for double even for 32bit
> systems.
>
> You should also check the CAMLextern value caml_copy_double (double);
> that it does the same.
>
>
> If you decide to go this route, which this message
> neither endorses not condemns, you also need to
>
> A1/ allocate the doubles directly in the major heap, and
> A2/ deactivate compactions
>
> or
>
> B/ modify the garbage-collector.
>
> Pascal

Doubles are tagged with Double_tag and arrays of doubles with
Double_array_tag. So the GCC knows where doubles are.

Would it be hard to patch the allocation to leave a 4 byte gap in the
minor heap when needed to align doubles and patch the compation to do
the same?

The 4 bytes would mean inserting an Atom(0) during allocation and
compaction. Not the nicest way to do this but should be simple to patch
in.

MfG
Goswin

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Alignment of data

2010-01-27 Thread Richard Jones
On Wed, Jan 27, 2010 at 06:20:44PM +0100, Christophe Papazian wrote:
> Is there a 64-bit PowerPC Linux (ELF) support in ocaml ? I thought
> it was only a 64-bit PowerPC OSX (Darwin) support...

Yes indeed there is.  For years we maintained an out of tree patch to
support this for Fedora/ppc64:

http://cvs.fedoraproject.org/viewvc/F-12/ocaml/ocaml-3.11.0-ppc64.patch

However Fedora 13 (onwards) has relegated ppc (32 & 64 bit) support to
status of a "secondary architecture"[1], which effectively means we
don't care about it.  For this reason I dropped this patch and don't
intend to maintain it.

The patch itself seems relatively trouble-free.  We built all the
Fedora packages with it, and only a couple had problems compiling on
ppc64.  Since I never had access to a real ppc64 machine, I was never
able to determine if these build problems were because this patch is
faulty or for some other unrelated reason, so YMMV.

Rich.

[1] http://fedoraproject.org/wiki/Architectures#Structure

-- 
Richard Jones
Red Hat

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] Alignment of data

2010-01-27 Thread Christophe Papazian

Dear Xavier Leroy,
thank you for your answer

I am working on some ppc architecture, and I realize that I have a  
(very) big slowdown due to bad alignment of data by ocamlopt. I  
need to have my data aligned in memory depending of the size of the  
data : floats are to be aligned on 8 bytes, int on 4 bytes, etc


First, make sure that misalignment is really the source of your
slowdown.  The PowerPC processors I'm familiar with can access
4-aligned 8-byte floats with minimal overhead, while the penalty is
much bigger for other misalignments.


I am sorry, but I am sure of that. I ran some tests to ensure that the  
problem

is coming from that particular point.


Data allocated in the Caml heap is word-aligned, where a word is 4
bytes on a 32-bit platform and 8 bytes on a 64-bit platform.  This is
deeply ingrained in the Caml GC and allocator, so don't expect to
change this easily.


I didn't expect to change myself such a deep feature in ocaml, but I  
hoped
that you or somebody in your team could. Could it be possible to have  
everything
8 aligned on a 32-bit platform with minimum efforts ? Any help is  
welcomed !



What you can do, however:

1- Use the 64-bit PowerPC port.  Everything will be 8-aligned then.


Is there a 64-bit PowerPC Linux (ELF) support in ocaml ? I thought it  
was only

a 64-bit PowerPC OSX (Darwin) support...

Thank you to Goswin von Brederlow and Pascal Cuoq for their answers,  
but I should

say that I really prefer to use the GC as usual, without rewriting it :)

Christophe

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] Alignment of data

2010-01-27 Thread Pascal Cuoq


Goswin von Brederlow  wrote:


You need to write a new function

CAMLextern value caml_alloc_double_array (mlsize_t),

or similar that ensures alignment on 8 byte for double even for 32bit
systems.

You should also check the CAMLextern value caml_copy_double (double);
that it does the same.


If you decide to go this route, which this message
neither endorses not condemns, you also need to

A1/ allocate the doubles directly in the major heap, and
A2/ deactivate compactions

or

B/ modify the garbage-collector.

Pascal

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Alignment of data

2010-01-27 Thread Xavier Leroy
I am working on some ppc architecture, and I realize that I have a 
(very) big slowdown due to bad alignment of data by ocamlopt. I need to 
have my data aligned in memory depending of the size of the data : 
floats are to be aligned on 8 bytes, int on 4 bytes, etc


First, make sure that misalignment is really the source of your
slowdown.  The PowerPC processors I'm familiar with can access
4-aligned 8-byte floats with minimal overhead, while the penalty is
much bigger for other misalignments.  Indeed, the PowerPC calling
conventions mandate that some 8-byte float arguments are passed on the
stack at 4-aligned addresses, so that's strong incentive for the
hardware people to implement those accesses efficiently.

BUT, after verification, I remark that ocamlopt doesn't align as I need. 
I tried to use ARCH_ALIGN_DOUBLE, but it doesn't seem to be what I 
thought, and doesn't change anything for my needs. Is there ANY way to 
obtain what I need easily or at least quickly ?


Data allocated in the Caml heap is word-aligned, where a word is 4
bytes on a 32-bit platform and 8 bytes on a 64-bit platform.  This is
deeply ingrained in the Caml GC and allocator, so don't expect to
change this easily.

What you can do, however:

1- Use the 64-bit PowerPC port.  Everything will be 8-aligned then.

2- Use a bigarray instead of a float array.  Bigarray data is
allocated outside the heap, at naturally-aligned addresses.

- Xavier Leroy

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


Re: [Caml-list] Alignment of data

2010-01-27 Thread Goswin von Brederlow
Christophe Papazian  writes:

> Dear users and developers of OCAML,
>
> I am working on some ppc architecture, and I realize that I have a
> (very) big slowdown due to bad alignment of data by ocamlopt. I need
> to have my data aligned in memory depending of the size of the data :
> floats are to be aligned on 8 bytes, int on 4 bytes, etc
> BUT, after verification, I remark that ocamlopt doesn't align as I
> need. I tried to use ARCH_ALIGN_DOUBLE, but it doesn't seem to be what
> I thought, and doesn't change anything for my needs. Is there ANY way
> to obtain what I need easily or at least quickly ?
>
> You can use the following code to test your alignment on your
> architecture :
> [ compile with ocamlopt align_stubs.c align.ml -o align ]
>
> # align.ml #
> open Obj
>
> external get_addr : 'a -> int * string = "get_addr"
>
> let rec align acc r =
>   if r mod 2 = 1 then acc else align (acc*2) (r/2)
>
> let get_addr_print v = let a,b = get_addr v in Printf.printf "%6X %s
> \n" a b; a

That will cut of the upper bits of my address. Not important for
alignment but bad practice.

> let rec get_align acc = function
> h::q as l -> get_align (acc lor get_addr_print l) q
>   | [] -> acc
>
> let  f block s l =
>   let r =
> if block then (* if the element is a block, consider it like a
> pointer *)
>   List.fold_left (fun r e -> r lor get_addr_print e) 0 l
> else get_align 0 l
>   in
>   Printf.printf "%s are aligned on %i bytes\n%!" s (align 1 r)
>
> let build_list v l = List.map (fun i -> Array.make i v) l
>
> let main =
>   f false "Chars" ['a';'b';'c';'d';'e'];
>   f false "Integers" [0;1;2;3;4];
>   f true "Floats" [0.;1./.3.;2./.5.;3./.7.;4./.9.];
>   f true "Int Arrays" (build_list  37 [3;4;5;6;7]);
>   f true "Float Arrays" (build_list  (1./.3.) [2;3;4;5;6]);
>   f true "Other Float Arrays" [Array.make 1 max_float;Array.make 2
> 0.;Array.make 3 0.;Array.make 37 0.;Array.make 17 0.];
>
> ### align_stubs.c 
>
> #include 
>
> #include 
> #include 
> #include 
> #include 
>
> CAMLprim
> value get_addr(value v)
> {
>   CAMLparam1 (v);
>   char *repr = malloc(9);
>   value res = alloc_tuple(2);
>   Field(res,0) = Val_int((unsigned int) v);
>   sprintf(repr,"%8X", *((int*)v));

Again cutting of upper bits. I have a 64bit cpu so up to 16 hex digits
for an address.

>   Field(res,1) = (caml_copy_string(repr));
>   CAMLreturn(res);
> }
>
> # Results ##
>
>  1D8C0   C3
>  1D8CC   C5
>  1D8D8   C7
>  1D8E4   C9
>  1D8F0   CB
> Chars are aligned on 4 bytes
>  1D8781
>  1D8843
>  1D8905
>  1D89C7
>  1D8A89
> Integers are aligned on 4 bytes
>  1D85C0
>  7612C 
>  76114 999A
>  760FC DB6DB6DB
>  760E4 1C71C71C
> Floats are aligned on 4 bytes
>  74A2C   4B
>  74A18   4B
>  74A00   4B
>  749E4   4B
>  749C4   4B
> Int Arrays are aligned on 4 bytes
>  732C0 
>  732A4 
>  73280 
>  73254 
>  73220 
> Float Arrays are aligned on 4 bytes
>  71928 
>  719400
>  719600
>  719880
>  71AC00
> Other Float Arrays are aligned on 8 bytes
>
> You can see the addresses in memory of each element of the lists and
> it's internal representation (to check
> if the memory pointer really point to the right value : you can even
> see that 31 bit ocaml integer (and Chars) i have a C representation of
> 2*i+1).
> It seems that small values
> are on the minor heap, and large values are on major heap.
> Note that the last array is correctly aligned, but it's just a matter
> of luck : If I change
> something else before this line in my code, I usually get the last
> array aligned on 4 bytes.
> (But I can't find a way to obtain a float array aligned on 8 bytes
> with the use of "build_list")

Everything is aligned to a value. I don't think there is a special alloc
call for the GC that gives you double alignement. Nothing in
caml/alloc.h anyway.

> Si if you have any idea of how to get floats and floats arrays aligned
> on 8 bytes both on major and minor heap, please answer me !
>
> Thank you very much
>
>   Christophe

You need to write a new function

CAMLextern value caml_alloc_double_array (mlsize_t),

or similar that ensures alignment on 8 byte for double even for 32bit
systems.

You should also check the CAMLextern value caml_copy_double (double);
that it does the same.


An alternative might be to use a Bigarray.

MfG
Goswin

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] Alignment of data

2010-01-27 Thread Christophe Papazian

Dear users and developers of OCAML,

I am working on some ppc architecture, and I realize that I have a  
(very) big slowdown due to bad alignment of data by ocamlopt. I need  
to have my data aligned in memory depending of the size of the data :  
floats are to be aligned on 8 bytes, int on 4 bytes, etc
BUT, after verification, I remark that ocamlopt doesn't align as I  
need. I tried to use ARCH_ALIGN_DOUBLE, but it doesn't seem to be what  
I thought, and doesn't change anything for my needs. Is there ANY way  
to obtain what I need easily or at least quickly ?


You can use the following code to test your alignment on your  
architecture :

[ compile with ocamlopt align_stubs.c align.ml -o align ]

# align.ml #
open Obj

external get_addr : 'a -> int * string = "get_addr"

let rec align acc r =
  if r mod 2 = 1 then acc else align (acc*2) (r/2)

let get_addr_print v = let a,b = get_addr v in Printf.printf "%6X %s 
\n" a b; a


let rec get_align acc = function
h::q as l -> get_align (acc lor get_addr_print l) q
  | [] -> acc

let  f block s l =
  let r =
if block then (* if the element is a block, consider it like a  
pointer *)

  List.fold_left (fun r e -> r lor get_addr_print e) 0 l
else get_align 0 l
  in
  Printf.printf "%s are aligned on %i bytes\n%!" s (align 1 r)

let build_list v l = List.map (fun i -> Array.make i v) l

let main =
  f false "Chars" ['a';'b';'c';'d';'e'];
  f false "Integers" [0;1;2;3;4];
  f true "Floats" [0.;1./.3.;2./.5.;3./.7.;4./.9.];
  f true "Int Arrays" (build_list  37 [3;4;5;6;7]);
  f true "Float Arrays" (build_list  (1./.3.) [2;3;4;5;6]);
  f true "Other Float Arrays" [Array.make 1 max_float;Array.make 2  
0.;Array.make 3 0.;Array.make 37 0.;Array.make 17 0.];


### align_stubs.c 

#include 

#include 
#include 
#include 
#include 

CAMLprim
value get_addr(value v)
{
  CAMLparam1 (v);
  char *repr = malloc(9);
  value res = alloc_tuple(2);
  Field(res,0) = Val_int((unsigned int) v);
  sprintf(repr,"%8X", *((int*)v));
  Field(res,1) = (caml_copy_string(repr));
  CAMLreturn(res);
}

# Results ##

 1D8C0   C3
 1D8CC   C5
 1D8D8   C7
 1D8E4   C9
 1D8F0   CB
Chars are aligned on 4 bytes
 1D8781
 1D8843
 1D8905
 1D89C7
 1D8A89
Integers are aligned on 4 bytes
 1D85C0
 7612C 
 76114 999A
 760FC DB6DB6DB
 760E4 1C71C71C
Floats are aligned on 4 bytes
 74A2C   4B
 74A18   4B
 74A00   4B
 749E4   4B
 749C4   4B
Int Arrays are aligned on 4 bytes
 732C0 
 732A4 
 73280 
 73254 
 73220 
Float Arrays are aligned on 4 bytes
 71928 
 719400
 719600
 719880
 71AC00
Other Float Arrays are aligned on 8 bytes

You can see the addresses in memory of each element of the lists and  
it's internal representation (to check
if the memory pointer really point to the right value : you can even  
see that 31 bit ocaml integer (and Chars) i have a C representation of  
2*i+1).

It seems that small values
are on the minor heap, and large values are on major heap.
Note that the last array is correctly aligned, but it's just a matter  
of luck : If I change
something else before this line in my code, I usually get the last  
array aligned on 4 bytes.
(But I can't find a way to obtain a float array aligned on 8 bytes  
with the use of "build_list")


Si if you have any idea of how to get floats and floats arrays aligned  
on 8 bytes both on major and minor heap, please answer me !


Thank you very much

Christophe





___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs