Hello guys, i like btrfs, and i want put it in production soon, one of the feature that i want use, is a deduplication.
i frequently testing duperemove on btrfs and already see this problem before. i know what btrfs before, change mtime while deduping, but after dedup fixes from Mark (https://github.com/markfasheh), i've try to get checksums. As i know duperemove use kernel ioctl for deduping, i.e. it's not a duperemove issue, kernel must keep data consistent. File system is fresh and btrfs check not show any metadata corruption. Github issue: https://github.com/markfasheh/duperemove/issues/91 System info: $ uname -a Linux titovetst-beplan 4.2.0-rc8-next-20150825-0959-ARCH #1 SMP Wed Aug 26 10:27:18 MSK 2015 x86_64 GNU/Linux Mount options: rw,relatime,compress=lzo,space_cache,subvolid=257,subvol=/@home Okay, how i find it: md5sum_recursive(){ find $@ -type f -exec md5sum {} \; } cp -av --reflink=always ~/<src> ~/<dest> md5sum_recursive ~/<dest> > ~/dedup.before duperemove -vhrdb 8k ~/<dest> md5sum_recursive ~/<dest> > ~/dedup.after diff -up ~/dedup.before ~/dedup.after what i've got (full diff in attach): --- /home/nefelim4ag/dedup.after 2015-08-26 21:36:55.773452558 +0300 +++ /home/nefelim4ag/dedup.before 2015-08-26 21:21:01.203600761 +0300 @@ -25139,9 +25139,9 @@ caf9d41036e46b85d90a9541e8bc9ce1 /home/ .... -0ccbc9c81a51f59dcf2ac0d102de37cb /home/nefelim4ag/L4D2/left4dead2/pak01_003.vpk +e665b502ee977dc1c619ecbd415c91b8 /home/nefelim4ag/L4D2/left4dead2/pak01_000.vpk .... Files sizes not changed and it's > 1MB. Every time i've get a random data corruption. Only dependencies what i've find it is what smallest block -> more corruptions and vise versa, i.e. more data deduped -> more corrupted. Smart of the disk, it's not looks, like damaged. (attach) What i can provide to help fix this issue? If it's needed, i can recompile kernel with some parameters if it can help, of course. Thanks. -- Have a nice day, Timofey.
diff.dedup
Description: Binary data
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.2.0-rc8-next-20150825-0959-ARCH] (local build) Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Blue Mobile Device Model: WDC WD10JPCX-24UE4T0 Serial Number: WD-WX61AC3J6551 LU WWN Device Id: 5 0014ee 6599e2c1a Firmware Version: 01.01A01 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Wed Aug 26 22:28:49 2015 MSK SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM level is: 254 (maximum performance) Rd look-ahead is: Enabled Write cache is: Enabled ATA Security is: Disabled, NOT FROZEN [SEC1] Wt Cache Reorder: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (18480) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 207) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x7035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0 3 Spin_Up_Time POS--K 182 178 021 - 1883 4 Start_Stop_Count -O--CK 095 095 000 - 5413 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 7 Seek_Error_Rate POSR-K 200 200 051 - 0 9 Power_On_Hours -O--CK 091 091 000 - 7153 10 Spin_Retry_Count -O--CK 100 100 000 - 0 11 Calibration_Retry_Count -O--CK 100 100 000 - 0 12 Power_Cycle_Count -O--CK 099 099 000 - 1340 192 Power-Off_Retract_Count -O--CK 200 200 000 - 190 193 Load_Cycle_Count -O--CK 191 191 000 - 28327 194 Temperature_Celsius -O---K 093 085 000 - 54 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 197 Current_Pending_Sector -O--CK 200 200 000 - 0 198 Offline_Uncorrectable ----CK 100 253 000 - 0 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0 240 Head_Flying_Hours -O--CK 091 091 000 - 6968 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x02 SL R/O 5 Comprehensive SMART error log 0x03 GPL R/O 6 Ext. Comprehensive SMART error log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x09 SL R/W 1 Selective self-test log 0x10 GPL R/O 1 SATA NCQ Queued Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xa0-0xa7 GPL,SL VS 16 Device vendor specific log 0xa8-0xb6 GPL,SL VS 1 Device vendor specific log 0xb7 GPL,SL VS 38 Device vendor specific log 0xbd GPL,SL VS 1 Device vendor specific log 0xc0 GPL,SL VS 1 Device vendor specific log 0xc1 GPL VS 93 Device vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log Version: 1 (6 sectors) No Errors Logged SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Aborted by host 70% 78 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 258 (0x0102) SCT Support Level: 1 Device State: Active (0) Current Temperature: 54 Celsius Power Cycle Min/Max Temperature: 30/55 Celsius Lifetime Min/Max Temperature: 17/62 Celsius Lifetime Average Temperature: 37 Celsius Under/Over Temperature Limit Count: 0/0 Vendor specific: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 1 minute Min/Max recommended Temperature: 0/60 Celsius Min/Max Temperature Limit: -41/85 Celsius Temperature History Size (Index): 128 (37) Index Estimated Time Temperature Celsius 38 2015-08-26 20:21 46 *************************** 39 2015-08-26 20:22 46 *************************** 40 2015-08-26 20:23 46 *************************** 41 2015-08-26 20:24 47 **************************** ... ..( 5 skipped). .. **************************** 47 2015-08-26 20:30 47 **************************** 48 2015-08-26 20:31 48 ***************************** ... ..( 6 skipped). .. ***************************** 55 2015-08-26 20:38 48 ***************************** 56 2015-08-26 20:39 49 ****************************** ... ..( 4 skipped). .. ****************************** 61 2015-08-26 20:44 49 ****************************** 62 2015-08-26 20:45 50 ******************************* ... ..( 17 skipped). .. ******************************* 80 2015-08-26 21:03 50 ******************************* 81 2015-08-26 21:04 51 ******************************** ... ..( 11 skipped). .. ******************************** 93 2015-08-26 21:16 51 ******************************** 94 2015-08-26 21:17 52 ********************************* 95 2015-08-26 21:18 52 ********************************* 96 2015-08-26 21:19 52 ********************************* 97 2015-08-26 21:20 53 ********************************** ... ..( 2 skipped). .. ********************************** 100 2015-08-26 21:23 53 ********************************** 101 2015-08-26 21:24 54 *********************************** ... ..( 9 skipped). .. *********************************** 111 2015-08-26 21:34 54 *********************************** 112 2015-08-26 21:35 55 ************************************ ... ..( 15 skipped). .. ************************************ 0 2015-08-26 21:51 55 ************************************ 1 2015-08-26 21:52 54 *********************************** ... ..( 35 skipped). .. *********************************** 37 2015-08-26 22:28 54 *********************************** SCT Error Recovery Control command not supported Device Statistics (GP/SMART Log 0x04) not supported SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 0 Command failed due to ICRC error 0x0002 2 0 R_ERR response for data FIS 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x0008 2 0 Device-to-host non-data FIS retries 0x0009 2 31 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 1 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC 0x8000 4 9736 Vendor specific