Hi,

We have seen a huge performance drop in 1.6.3, due to the checksum being 
enabled by default.  I looked at the algorithm being used, and it is actually a 
CRC32, which is a very strong algorithm for detecting all sorts of problems, 
such as single bit errors, swapped bytes, and missing bytes.

I've been experimenting with using a simple XOR algorithm.  I've been able to 
recover most of the lost performance.  This algorithm will detected corrupted 
bytes and words.  This algorithm will not detect swapped bytes errors, but I 
think that these are pretty rare.  This algorithm will not detect missing 
bytes, but I suspect that other things in Lustre or LNET will detect this 
problem.  This algorithm will not detect two errors that offset each other, 
such as a single bit error in two words that are a multiple of 4 bytes apart.

Should we consider using a more efficient checksum algorithm, in order to 
regain performance?  Should the algorithm be configurable?  

-Roger

_________________________________________________________________
Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare!
http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews
_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to