Re: [Caml-list] Alignment of data
Pascal Cuoq writes: > Goswin von Brederlow wrote: > > > You need to write a new function > > CAMLextern value caml_alloc_double_array (mlsize_t), > > or similar that ensures alignment on 8 byte for double even for 32bit > systems. > > You should also check the CAMLextern value caml_copy_double (double); > that it does the same. > > > If you decide to go this route, which this message > neither endorses not condemns, you also need to > > A1/ allocate the doubles directly in the major heap, and > A2/ deactivate compactions > > or > > B/ modify the garbage-collector. > > Pascal Doubles are tagged with Double_tag and arrays of doubles with Double_array_tag. So the GCC knows where doubles are. Would it be hard to patch the allocation to leave a 4 byte gap in the minor heap when needed to align doubles and patch the compation to do the same? The 4 bytes would mean inserting an Atom(0) during allocation and compaction. Not the nicest way to do this but should be simple to patch in. MfG Goswin ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Alignment of data
On Wed, Jan 27, 2010 at 06:20:44PM +0100, Christophe Papazian wrote: > Is there a 64-bit PowerPC Linux (ELF) support in ocaml ? I thought > it was only a 64-bit PowerPC OSX (Darwin) support... Yes indeed there is. For years we maintained an out of tree patch to support this for Fedora/ppc64: http://cvs.fedoraproject.org/viewvc/F-12/ocaml/ocaml-3.11.0-ppc64.patch However Fedora 13 (onwards) has relegated ppc (32 & 64 bit) support to status of a "secondary architecture"[1], which effectively means we don't care about it. For this reason I dropped this patch and don't intend to maintain it. The patch itself seems relatively trouble-free. We built all the Fedora packages with it, and only a couple had problems compiling on ppc64. Since I never had access to a real ppc64 machine, I was never able to determine if these build problems were because this patch is faulty or for some other unrelated reason, so YMMV. Rich. [1] http://fedoraproject.org/wiki/Architectures#Structure -- Richard Jones Red Hat ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
[Caml-list] Alignment of data
Dear Xavier Leroy, thank you for your answer I am working on some ppc architecture, and I realize that I have a (very) big slowdown due to bad alignment of data by ocamlopt. I need to have my data aligned in memory depending of the size of the data : floats are to be aligned on 8 bytes, int on 4 bytes, etc First, make sure that misalignment is really the source of your slowdown. The PowerPC processors I'm familiar with can access 4-aligned 8-byte floats with minimal overhead, while the penalty is much bigger for other misalignments. I am sorry, but I am sure of that. I ran some tests to ensure that the problem is coming from that particular point. Data allocated in the Caml heap is word-aligned, where a word is 4 bytes on a 32-bit platform and 8 bytes on a 64-bit platform. This is deeply ingrained in the Caml GC and allocator, so don't expect to change this easily. I didn't expect to change myself such a deep feature in ocaml, but I hoped that you or somebody in your team could. Could it be possible to have everything 8 aligned on a 32-bit platform with minimum efforts ? Any help is welcomed ! What you can do, however: 1- Use the 64-bit PowerPC port. Everything will be 8-aligned then. Is there a 64-bit PowerPC Linux (ELF) support in ocaml ? I thought it was only a 64-bit PowerPC OSX (Darwin) support... Thank you to Goswin von Brederlow and Pascal Cuoq for their answers, but I should say that I really prefer to use the GC as usual, without rewriting it :) Christophe ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
[Caml-list] Alignment of data
Goswin von Brederlow wrote: You need to write a new function CAMLextern value caml_alloc_double_array (mlsize_t), or similar that ensures alignment on 8 byte for double even for 32bit systems. You should also check the CAMLextern value caml_copy_double (double); that it does the same. If you decide to go this route, which this message neither endorses not condemns, you also need to A1/ allocate the doubles directly in the major heap, and A2/ deactivate compactions or B/ modify the garbage-collector. Pascal ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Alignment of data
I am working on some ppc architecture, and I realize that I have a (very) big slowdown due to bad alignment of data by ocamlopt. I need to have my data aligned in memory depending of the size of the data : floats are to be aligned on 8 bytes, int on 4 bytes, etc First, make sure that misalignment is really the source of your slowdown. The PowerPC processors I'm familiar with can access 4-aligned 8-byte floats with minimal overhead, while the penalty is much bigger for other misalignments. Indeed, the PowerPC calling conventions mandate that some 8-byte float arguments are passed on the stack at 4-aligned addresses, so that's strong incentive for the hardware people to implement those accesses efficiently. BUT, after verification, I remark that ocamlopt doesn't align as I need. I tried to use ARCH_ALIGN_DOUBLE, but it doesn't seem to be what I thought, and doesn't change anything for my needs. Is there ANY way to obtain what I need easily or at least quickly ? Data allocated in the Caml heap is word-aligned, where a word is 4 bytes on a 32-bit platform and 8 bytes on a 64-bit platform. This is deeply ingrained in the Caml GC and allocator, so don't expect to change this easily. What you can do, however: 1- Use the 64-bit PowerPC port. Everything will be 8-aligned then. 2- Use a bigarray instead of a float array. Bigarray data is allocated outside the heap, at naturally-aligned addresses. - Xavier Leroy ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
Re: [Caml-list] Alignment of data
Christophe Papazian writes: > Dear users and developers of OCAML, > > I am working on some ppc architecture, and I realize that I have a > (very) big slowdown due to bad alignment of data by ocamlopt. I need > to have my data aligned in memory depending of the size of the data : > floats are to be aligned on 8 bytes, int on 4 bytes, etc > BUT, after verification, I remark that ocamlopt doesn't align as I > need. I tried to use ARCH_ALIGN_DOUBLE, but it doesn't seem to be what > I thought, and doesn't change anything for my needs. Is there ANY way > to obtain what I need easily or at least quickly ? > > You can use the following code to test your alignment on your > architecture : > [ compile with ocamlopt align_stubs.c align.ml -o align ] > > # align.ml # > open Obj > > external get_addr : 'a -> int * string = "get_addr" > > let rec align acc r = > if r mod 2 = 1 then acc else align (acc*2) (r/2) > > let get_addr_print v = let a,b = get_addr v in Printf.printf "%6X %s > \n" a b; a That will cut of the upper bits of my address. Not important for alignment but bad practice. > let rec get_align acc = function > h::q as l -> get_align (acc lor get_addr_print l) q > | [] -> acc > > let f block s l = > let r = > if block then (* if the element is a block, consider it like a > pointer *) > List.fold_left (fun r e -> r lor get_addr_print e) 0 l > else get_align 0 l > in > Printf.printf "%s are aligned on %i bytes\n%!" s (align 1 r) > > let build_list v l = List.map (fun i -> Array.make i v) l > > let main = > f false "Chars" ['a';'b';'c';'d';'e']; > f false "Integers" [0;1;2;3;4]; > f true "Floats" [0.;1./.3.;2./.5.;3./.7.;4./.9.]; > f true "Int Arrays" (build_list 37 [3;4;5;6;7]); > f true "Float Arrays" (build_list (1./.3.) [2;3;4;5;6]); > f true "Other Float Arrays" [Array.make 1 max_float;Array.make 2 > 0.;Array.make 3 0.;Array.make 37 0.;Array.make 17 0.]; > > ### align_stubs.c > > #include > > #include > #include > #include > #include > > CAMLprim > value get_addr(value v) > { > CAMLparam1 (v); > char *repr = malloc(9); > value res = alloc_tuple(2); > Field(res,0) = Val_int((unsigned int) v); > sprintf(repr,"%8X", *((int*)v)); Again cutting of upper bits. I have a 64bit cpu so up to 16 hex digits for an address. > Field(res,1) = (caml_copy_string(repr)); > CAMLreturn(res); > } > > # Results ## > > 1D8C0 C3 > 1D8CC C5 > 1D8D8 C7 > 1D8E4 C9 > 1D8F0 CB > Chars are aligned on 4 bytes > 1D8781 > 1D8843 > 1D8905 > 1D89C7 > 1D8A89 > Integers are aligned on 4 bytes > 1D85C0 > 7612C > 76114 999A > 760FC DB6DB6DB > 760E4 1C71C71C > Floats are aligned on 4 bytes > 74A2C 4B > 74A18 4B > 74A00 4B > 749E4 4B > 749C4 4B > Int Arrays are aligned on 4 bytes > 732C0 > 732A4 > 73280 > 73254 > 73220 > Float Arrays are aligned on 4 bytes > 71928 > 719400 > 719600 > 719880 > 71AC00 > Other Float Arrays are aligned on 8 bytes > > You can see the addresses in memory of each element of the lists and > it's internal representation (to check > if the memory pointer really point to the right value : you can even > see that 31 bit ocaml integer (and Chars) i have a C representation of > 2*i+1). > It seems that small values > are on the minor heap, and large values are on major heap. > Note that the last array is correctly aligned, but it's just a matter > of luck : If I change > something else before this line in my code, I usually get the last > array aligned on 4 bytes. > (But I can't find a way to obtain a float array aligned on 8 bytes > with the use of "build_list") Everything is aligned to a value. I don't think there is a special alloc call for the GC that gives you double alignement. Nothing in caml/alloc.h anyway. > Si if you have any idea of how to get floats and floats arrays aligned > on 8 bytes both on major and minor heap, please answer me ! > > Thank you very much > > Christophe You need to write a new function CAMLextern value caml_alloc_double_array (mlsize_t), or similar that ensures alignment on 8 byte for double even for 32bit systems. You should also check the CAMLextern value caml_copy_double (double); that it does the same. An alternative might be to use a Bigarray. MfG Goswin ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs
[Caml-list] Alignment of data
Dear users and developers of OCAML, I am working on some ppc architecture, and I realize that I have a (very) big slowdown due to bad alignment of data by ocamlopt. I need to have my data aligned in memory depending of the size of the data : floats are to be aligned on 8 bytes, int on 4 bytes, etc BUT, after verification, I remark that ocamlopt doesn't align as I need. I tried to use ARCH_ALIGN_DOUBLE, but it doesn't seem to be what I thought, and doesn't change anything for my needs. Is there ANY way to obtain what I need easily or at least quickly ? You can use the following code to test your alignment on your architecture : [ compile with ocamlopt align_stubs.c align.ml -o align ] # align.ml # open Obj external get_addr : 'a -> int * string = "get_addr" let rec align acc r = if r mod 2 = 1 then acc else align (acc*2) (r/2) let get_addr_print v = let a,b = get_addr v in Printf.printf "%6X %s \n" a b; a let rec get_align acc = function h::q as l -> get_align (acc lor get_addr_print l) q | [] -> acc let f block s l = let r = if block then (* if the element is a block, consider it like a pointer *) List.fold_left (fun r e -> r lor get_addr_print e) 0 l else get_align 0 l in Printf.printf "%s are aligned on %i bytes\n%!" s (align 1 r) let build_list v l = List.map (fun i -> Array.make i v) l let main = f false "Chars" ['a';'b';'c';'d';'e']; f false "Integers" [0;1;2;3;4]; f true "Floats" [0.;1./.3.;2./.5.;3./.7.;4./.9.]; f true "Int Arrays" (build_list 37 [3;4;5;6;7]); f true "Float Arrays" (build_list (1./.3.) [2;3;4;5;6]); f true "Other Float Arrays" [Array.make 1 max_float;Array.make 2 0.;Array.make 3 0.;Array.make 37 0.;Array.make 17 0.]; ### align_stubs.c #include #include #include #include #include CAMLprim value get_addr(value v) { CAMLparam1 (v); char *repr = malloc(9); value res = alloc_tuple(2); Field(res,0) = Val_int((unsigned int) v); sprintf(repr,"%8X", *((int*)v)); Field(res,1) = (caml_copy_string(repr)); CAMLreturn(res); } # Results ## 1D8C0 C3 1D8CC C5 1D8D8 C7 1D8E4 C9 1D8F0 CB Chars are aligned on 4 bytes 1D8781 1D8843 1D8905 1D89C7 1D8A89 Integers are aligned on 4 bytes 1D85C0 7612C 76114 999A 760FC DB6DB6DB 760E4 1C71C71C Floats are aligned on 4 bytes 74A2C 4B 74A18 4B 74A00 4B 749E4 4B 749C4 4B Int Arrays are aligned on 4 bytes 732C0 732A4 73280 73254 73220 Float Arrays are aligned on 4 bytes 71928 719400 719600 719880 71AC00 Other Float Arrays are aligned on 8 bytes You can see the addresses in memory of each element of the lists and it's internal representation (to check if the memory pointer really point to the right value : you can even see that 31 bit ocaml integer (and Chars) i have a C representation of 2*i+1). It seems that small values are on the minor heap, and large values are on major heap. Note that the last array is correctly aligned, but it's just a matter of luck : If I change something else before this line in my code, I usually get the last array aligned on 4 bytes. (But I can't find a way to obtain a float array aligned on 8 bytes with the use of "build_list") Si if you have any idea of how to get floats and floats arrays aligned on 8 bytes both on major and minor heap, please answer me ! Thank you very much Christophe ___ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs