On Aug 28, 2008 21:49 +0800, Stuart Midgley wrote: > for completeness, here are the logs from 172.16.4.93 > > Aug 27 07:49:55 clus093 kernel: LustreError: 132-0: BAD WRITE > CHECKSUM: changed on the client after we checksummed it - likely false > positive due to mmap IO (bug 11742): from [EMAIL PROTECTED] inum
This is the important part to note - if you are doing mmap writes then the VM doesn't protect the pages from being modified while they are being checksummed and sent over the wire. > Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c: > 1162:check_write_checksum()) original client csum 2dbc1696 (type 2), > server csum 9d081697 (type 2), client csum now 9d081697 This means the data changed after the initial checksum was computed, but now it has settled down. In some cases the "client csum now" can have changed again, depending on whether the process is rewriting the same file repeatedly. > Aug 27 07:49:55 clus093 kernel: LustreError: 28573:0:(osc_request.c: > 1372:osc_brw_redo_request()) @@@ redo for recoverable error > [EMAIL PROTECTED] x4720217/t820873 o4->p1- > [EMAIL PROTECTED]@tcp:6/4 lens 384/480 e 0 to 100 dl 1219794694 > ref 2 fl Interpret:R/0/0 rc 0/0 Here it tells you it is resending the RPC. > > always from the same cluster node... Should we be worried? I > > suspect this means we shouldn't turn check summing off? I assume > > these are rejected and resent from the client? If you are NOT doing mmap IO (just normal read/write) then it is possible your node has memory corruption. There is an extra check that can be done to checksum the pages while they are in memory, instead of just over the wire. More overhead of course, but can help isolate the problem. echo 1> /proc/fs/lustre/llite/*/checksum_pages This will also enable on-wire checksumming, which is already on by default. One caveat is that turning off checksum_pages will also turn off the on-wire checksums (which can be re-enabled via /proc/fs/lustre/osc/*/checksums)... Blame Phil. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss