Hi,
We have seen a huge performance drop in 1.6.3, due to the checksum being
enabled by default. I looked at the algorithm being used, and it is actually a
CRC32, which is a very strong algorithm for detecting all sorts of problems,
such as single bit errors, swapped bytes, and missing bytes.
I've been experimenting with using a simple XOR algorithm. I've been able to
recover most of the lost performance. This algorithm will detected corrupted
bytes and words. This algorithm will not detect swapped bytes errors, but I
think that these are pretty rare. This algorithm will not detect missing
bytes, but I suspect that other things in Lustre or LNET will detect this
problem. This algorithm will not detect two errors that offset each other,
such as a single bit error in two words that are a multiple of 4 bytes apart.
Should we consider using a more efficient checksum algorithm, in order to
regain performance? Should the algorithm be configurable?
-Roger
_________________________________________________________________
Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare!
http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews
_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel