On Fri, 2009-06-12 at 21:22 -0700, Roland Dreier wrote:
> > Valgrind replaces the libc memcpy call with a simple version that
>  > copies a byte at a time (in order).  If libmlx4 is not built with
>  > --with-valgrind, valgrind considers each write an invalid write and
>  > spends a very long time after each write updating its error database.
>  > We experimented with replacing the Valgrind error database update
>  > with a configurable spin loop and found that if we put a delay of
>  > around 100,000 cycles between writes in the 'byte memcpy' when
>  > writing to the blueflame page, that a sent message gets
>  > lost/misplaced in a simple testcase with two MPI_barriers back to
>  > back (resulting in a hang because not all processes exit the first
>  > barrier).  Our theory is the card sees 'byte' writes to the blueflame
>  > page and due to the long delay, uses the information before it is all
>  > written out (and thus getting wrong info).
> 
> That makes sense.  The HW documentation says that blueflame writes must
> be done in aligned chunks of at least 4 bytes, so it's not surprising
> that byte writes confuse the HW in some cases.

Hi Roland.  I'm removing the XRC patch from our kernel like we discussed
(for the other people on the list, it's because the XRC API is likely to
change by the time integration into the official packages are done, and
we don't want to ship one API for the support now and then a different
API later).  This also means that I need to respin libibverbs and all
driver packages.  However, my deadline is *very* tight for getting this
done.  Any chance I could talk you into rushing the release?  Today
would be best, but tomorrow would work too.

-- 
Doug Ledford <[email protected]>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to