Michael S. Tsirkin wrote: >>Quoting John W. Marland <[EMAIL PROTECTED]>: >>Subject: Re: [PATCH] IB/core - ib_umad can cause address alignment fault on >>ia64 >> >>Michael S. Tsirkin wrote: >> >> >> >>>>Quoting Ralph Campbell <[EMAIL PROTECTED]>: >>>>Subject: [PATCH] IB/core - ib_umad can cause address alignment fault on ia64 >>>> >>>>IB/core - ib_umad can cause address alignment fault >>>> >>>>In user_mad.c, the definition for struct ib_umad_packet includes >>>>struct ib_user_mad at an odd 32-bit offset. When ib_umad_write() >>>>tries to assign rmpp_mad->mad_hdr.tid, there is an alignment fault on >>>>architectures which have strict alignment for load/stores. >>>>This patch fixes the problem by changing the offset on which >>>>struct ib_user_mad is defined within struct ib_umad_packet. >>>> >>>>Thanks go to John W. Marland <[EMAIL PROTECTED]> for finding this. >>>> >>>>Signed-off-by: Ralph Campbell <[EMAIL PROTECTED]> >>>> >>>>diff -r b1128b48dc99 drivers/infiniband/core/user_mad.c >>>>--- a/drivers/infiniband/core/user_mad.c Fri Jan 12 20:00:03 2007 +0000 >>>>+++ b/drivers/infiniband/core/user_mad.c Wed Jan 17 14:09:37 2007 -0800 >>>>@@ -125,7 +125,7 @@ struct ib_umad_packet { >>>> struct ib_mad_send_buf *msg; >>>> struct ib_mad_recv_wc *recv_wc; >>>> struct list_head list; >>>>- int length; >>>>+ long length; >>>> struct ib_user_mad mad; >>>>}; >>>> >>>> >>>> >>>> >>>This does not make sense to me - do we have to replace all int fields with >>>long >>>now? Looks like a compiler or makefile bug in your setup - struct fields >>>should >>>be naturally aligned. >>> >>> >>> >>> >>> >> We should probably have given a more complete explanation. The >>unaligned access hits in two places, that I've tracked down so far. >> The one where it's easiest to see what's happening is in ib_umad_write. >>______________________________________________________________________________________ >> if (!ib_response_mad(packet->msg->mad)) { >> tid = &((struct ib_mad_hdr *) packet->msg->mad)->tid; >> *tid = cpu_to_be64(((u64) agent->hi_tid) << 32 | >> (be64_to_cpup(tid) & 0xffffffff)); >> >>---> this line causes the access problem >>rmpp_mad->mad_hdr.tid = *tid; >> } >>________________________________________________________________________________________ >> The rmpp_mad variable is an ib_rmpp_mad pointer that is initialized >> from the packet->mad.data early in the function. >> Because the ib_umad_packet structure has a as it's last element an >> ib_user_mad structure, not a pointer to one, but the structure. >> This means that the Data[0] declaration at the end of the ib_umad >> structure is forced onto a 4 byte boundary. >> >> > >So the issue is that we are casting char *data which has no alignment >guarantees >to 64 bit number. We really must find a way to force 64 bit alignment for >struct ib_user_mad all over. Would not something like the following simple >trick work? > >struct ib_user_mad_hdr { > ............. >} __attribute__((aligned (8))); > > In this case I don't think that will solve it. The memory area where the structure area is copied is one of those open ended declarations .i.e. ... __u8 Data[0] }; The allocation allows one of two different sizes of structures to be placed on that Data[0]. Which would STILL work fine if the ib_umad_packet and ib_user_mad and varied sized data area are all allocated in one lump. I've never cared for these open ended structures. This is a good reason how/why it will eventually bite you. ....JW
> > > _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general