Hello George:
While the change on the shm side does initially seem unnecessary, it
is handling a bus error case on the sending side, not on the receiving
side.
The change in the mca_btl_sm_hdr_t is necessary because of the way the
pml and btl headers are stored in shared memory and because of the
fact that in some cases, the pml header has a uint64_t in it. If the
mca_btl_sm_hdr_t is size 12, then the pml header does not start on a
double-word aligned boundary. In the case the pml header is a
mca_pml_ob1_rendezvous_hdr_t, we get a bus error while loading the
hdr_msg_length. Here is an example of it although it can happen in
other places as well. (Line numbers are close to what is in the trunk
give or take a few lines)
program terminated by signal BUS (invalid address alignment)
Current function is mca_pml_ob1_send_request_start_rndv (optimized)
743 hdr->hdr_rndv.hdr_msg_length = sendreq->req_send.req_bytes_packed;
(dbx) print &(hdr->hdr_rndv.hdr_msg_length)
&hdr->hdr_rndv.hdr_msg_length = 0xf4d1e81c
(dbx) where
=>[1] mca_pml_ob1_send_request_start_rndv() (optimized),
at 0xfd5f76b8 (line ~743) in "pml_ob1_sendreq.c"
[2] mca_pml_ob1_send_request_start() (optimized),
at 0xfd5d013c (line ~388) in "pml_ob1_sendreq.h"
[3] mca_pml_ob1_send() (optimized), at 0xfd5d1544 (line ~117) in
"pml_ob1_isend.c"
[4] PMPI_Send), at 0xfedd7204 (line ~65) in "psend.c"
[5] main(0xffbfed40, 0xfffffff8, 0x2, 0x0, 0x7d1, 0x7d0), at 0x125bc
(dbx)
George Bosilca wrote:
Rolf,
If we memcpy instead of assigning the header in the OB1 PML why do we
need the padding in the frag header ?
Thanks,
george.
On Jan 3, 2008, at 2:47 PM, Rolf vandeVaart wrote:
Greetings. We have seen some bus errors when compiling a user
application with certain compiler flags and running on a sparc based
server. The issue is that some structures are not word or double word
aligned causing a bus error. I have tracked down two places where I can
make a minor change and everything seems to work fine. However, I want
to see if anyone has issues with these changes. The two changes are
shown below.
burl-ct-v440-0 206 =>svn diff
Index: ompi/mca/btl/sm/btl_sm_frag.h
===================================================================
--- ompi/mca/btl/sm/btl_sm_frag.h (revision 17039)
+++ ompi/mca/btl/sm/btl_sm_frag.h (working copy)
@@ -9,6 +9,7 @@
* University of Stuttgart. All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
+ * Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved.
* $COPYRIGHT$
* * Additional copyrights may follow
@@ -41,6 +42,10 @@
struct mca_btl_sm_frag_t *frag;
size_t len;
mca_btl_base_tag_t tag;
+ /* Add a 4 byte pad to round out structure to 16 bytes for 32-bit
+ * and to 24 bytes for 64-bit. Helps prevent bus errors for strict
+ * alignment cases like SPARC. */
+ char pad[4];
};
typedef struct mca_btl_sm_hdr_t mca_btl_sm_hdr_t;
Index: ompi/mca/pml/ob1/pml_ob1_recvfrag.h
===================================================================
--- ompi/mca/pml/ob1/pml_ob1_recvfrag.h (revision 17039)
+++ ompi/mca/pml/ob1/pml_ob1_recvfrag.h (working copy)
@@ -9,6 +9,7 @@
* University of Stuttgart. All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
+ * Copyright (c) 2008 Sun Microsystems, Inc. All rights reserved.
* $COPYRIGHT$
* * Additional copyrights may follow
@@ -67,7 +68,8 @@
unsigned char* _ptr = (unsigned char*)frag->addr; \
/* init recv_frag */ \
frag->btl = btl; \
- frag->hdr =
*(mca_pml_ob1_hdr_t*)hdr; \
+ memcpy(&frag->hdr, (void
*)((mca_pml_ob1_hdr_t*)hdr) \
+
sizeof(mca_pml_ob1_hdr_t)); \
frag->num_segments = 1; \
_size = segs[0].seg_len; \
for( i = 1; i < cnt; i++ ) { \
burl-ct-v440-0 207 =>
The ticket associated with this issue is
https://svn.open-mpi.org/trac/ompi/ticket/1148
Rolf
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
------------------------------------------------------------------------
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
=========================
rolf.vandeva...@sun.com
781-442-3043
=========================