Michael S. Tsirkin wrote: >>Quoting Ralph Campbell <[EMAIL PROTECTED]>: >>Subject: [PATCH] IB/core - ib_umad can cause address alignment fault on ia64 >> >>IB/core - ib_umad can cause address alignment fault >> >>In user_mad.c, the definition for struct ib_umad_packet includes >>struct ib_user_mad at an odd 32-bit offset. When ib_umad_write() >>tries to assign rmpp_mad->mad_hdr.tid, there is an alignment fault on >>architectures which have strict alignment for load/stores. >>This patch fixes the problem by changing the offset on which >>struct ib_user_mad is defined within struct ib_umad_packet. >> >>Thanks go to John W. Marland <[EMAIL PROTECTED]> for finding this. >> >>Signed-off-by: Ralph Campbell <[EMAIL PROTECTED]> >> >>diff -r b1128b48dc99 drivers/infiniband/core/user_mad.c >>--- a/drivers/infiniband/core/user_mad.c Fri Jan 12 20:00:03 2007 +0000 >>+++ b/drivers/infiniband/core/user_mad.c Wed Jan 17 14:09:37 2007 -0800 >>@@ -125,7 +125,7 @@ struct ib_umad_packet { >> struct ib_mad_send_buf *msg; >> struct ib_mad_recv_wc *recv_wc; >> struct list_head list; >>- int length; >>+ long length; >> struct ib_user_mad mad; >> }; >> >> > >This does not make sense to me - do we have to replace all int fields with long >now? Looks like a compiler or makefile bug in your setup - struct fields should >be naturally aligned. > > > We should probably have given a more complete explanation. The unaligned access hits in two places, that I've tracked down so far. The one where it's easiest to see what's happening is in ib_umad_write. ______________________________________________________________________________________ if (!ib_response_mad(packet->msg->mad)) { tid = &((struct ib_mad_hdr *) packet->msg->mad)->tid; *tid = cpu_to_be64(((u64) agent->hi_tid) << 32 | (be64_to_cpup(tid) & 0xffffffff));
---> this line causes the access problem rmpp_mad->mad_hdr.tid = *tid; } ________________________________________________________________________________________ The rmpp_mad variable is an ib_rmpp_mad pointer that is initialized from the packet->mad.data early in the function. Because the ib_umad_packet structure has a as it's last element an ib_user_mad structure, not a pointer to one, but the structure. This means that the Data[0] declaration at the end of the ib_umad structure is forced onto a 4 byte boundary. A chunk-o-memory is allocated to encapsulate the ib_umad_packet with the ib_user_mad AND enough space for a user created rmpp header. So the incoming structure is copied into the Data[0] area - and tid ends up on a 4 byte boundary even thought it's an 8 byte wide variable. Nothing to do with the compiler - more like pilot error :-) --------- Other possible solutions would be to exchange the position of the tid with the 16 bit variable before it or the one after it. That would put it on a correct alignment, but what about when someone else attaches the ib_mad_hdr to some other random place? Another solution would be to make the ib_user_mad structure in the ib_umad_packet into a pointer, but that would mean another kzalloc. The best solution general rule is probably to make sure that if you are going to embed a structure in another, that it's embedded on an $(ARCH_SIZE) boundary. I could have used an int PADD; variable but since there was a length field I figured that no one would mind if a longer length was possible. ....JW John W. Marland System Fabric Works _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general