A bug has been identified in the 1.8 releases (1.8.0, 1.8.0.1 & 1.8.1 are impacted) that can cause data corruption on the OSTs. This problem is related to the OSS read cache feature that has been introduced in 1.8.0. This can happen when a bulk read or write request is aborted due to the client being evicted or because the data transfer over the network has timed out. More details are available in bug 20560: https://bugzilla.lustre.org/show_bug.cgi?id=20560
A patch is under testing and will be included in 1.8.1.1. Until 1.8.1.1 is available, we recommend to disable the OSS read cache feature. This feature can be disabled by running the two following commands on the OSSs: # lctl set_param obdfilter.*.writethrough_cache_enable=0 # lctl set_param obdfilter.*.read_cache_enable=0 This has to be done each time an OST is restarted. Best regards, Johann, for the Lustre team _______________________________________________ Lustre-announce mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-announce
