Hello guys,
i like btrfs, and i want put it in production soon,
one of the feature that i want use, is a deduplication.

i frequently testing duperemove on btrfs and already see this problem before.
i know what btrfs before, change mtime while deduping, but after dedup
fixes from Mark (https://github.com/markfasheh), i've try to get
checksums.

As i know duperemove use kernel ioctl for deduping, i.e. it's not a
duperemove issue, kernel must keep data consistent.

File system is fresh and btrfs check not show any metadata corruption.

Github issue:
https://github.com/markfasheh/duperemove/issues/91

System info:
$ uname -a
Linux titovetst-beplan 4.2.0-rc8-next-20150825-0959-ARCH #1 SMP Wed
Aug 26 10:27:18 MSK 2015 x86_64 GNU/Linux

Mount options:
rw,relatime,compress=lzo,space_cache,subvolid=257,subvol=/@home

Okay, how i find it:

md5sum_recursive(){
        find $@ -type f -exec md5sum {} \;
}

cp -av --reflink=always ~/<src> ~/<dest>
md5sum_recursive ~/<dest> > ~/dedup.before
duperemove -vhrdb 8k ~/<dest>
md5sum_recursive ~/<dest> > ~/dedup.after
diff -up ~/dedup.before ~/dedup.after

what i've got (full diff in attach):
--- /home/nefelim4ag/dedup.after        2015-08-26 21:36:55.773452558 +0300
+++ /home/nefelim4ag/dedup.before       2015-08-26 21:21:01.203600761 +0300
@@ -25139,9 +25139,9 @@ caf9d41036e46b85d90a9541e8bc9ce1  /home/
....
-0ccbc9c81a51f59dcf2ac0d102de37cb
/home/nefelim4ag/L4D2/left4dead2/pak01_003.vpk
+e665b502ee977dc1c619ecbd415c91b8
/home/nefelim4ag/L4D2/left4dead2/pak01_000.vpk
....

Files sizes not changed and it's > 1MB.

Every time i've get a random data corruption.
Only dependencies what i've find it is what smallest block -> more
corruptions and vise versa, i.e. more data deduped -> more corrupted.

Smart of the disk, it's not looks, like damaged. (attach)

What i can provide to help fix this issue?
If it's needed, i can recompile kernel with some parameters if it can
help, of course.

Thanks.

-- 
Have a nice day,
Timofey.

Attachment: diff.dedup
Description: Binary data

smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.2.0-rc8-next-20150825-0959-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Blue Mobile
Device Model:     WDC WD10JPCX-24UE4T0
Serial Number:    WD-WX61AC3J6551
LU WWN Device Id: 5 0014ee 6599e2c1a
Firmware Version: 01.01A01
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Aug 26 22:28:49 2015 MSK
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(18480) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 207) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x7035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  3 Spin_Up_Time            POS--K   182   178   021    -    1883
  4 Start_Stop_Count        -O--CK   095   095   000    -    5413
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         POSR-K   200   200   051    -    0
  9 Power_On_Hours          -O--CK   091   091   000    -    7153
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   099   099   000    -    1340
192 Power-Off_Retract_Count -O--CK   200   200   000    -    190
193 Load_Cycle_Count        -O--CK   191   191   000    -    28327
194 Temperature_Celsius     -O---K   093   085   000    -    54
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   253   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   100   253   000    -    0
240 Head_Flying_Hours       -O--CK   091   091   000    -    6968
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb6  GPL,SL  VS       1  Device vendor specific log
0xb7       GPL,SL  VS      38  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               70%        78         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    54 Celsius
Power Cycle Min/Max Temperature:     30/55 Celsius
Lifetime    Min/Max Temperature:     17/62 Celsius
Lifetime    Average Temperature:        37 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    128 (37)

Index    Estimated Time   Temperature Celsius
  38    2015-08-26 20:21    46  ***************************
  39    2015-08-26 20:22    46  ***************************
  40    2015-08-26 20:23    46  ***************************
  41    2015-08-26 20:24    47  ****************************
 ...    ..(  5 skipped).    ..  ****************************
  47    2015-08-26 20:30    47  ****************************
  48    2015-08-26 20:31    48  *****************************
 ...    ..(  6 skipped).    ..  *****************************
  55    2015-08-26 20:38    48  *****************************
  56    2015-08-26 20:39    49  ******************************
 ...    ..(  4 skipped).    ..  ******************************
  61    2015-08-26 20:44    49  ******************************
  62    2015-08-26 20:45    50  *******************************
 ...    ..( 17 skipped).    ..  *******************************
  80    2015-08-26 21:03    50  *******************************
  81    2015-08-26 21:04    51  ********************************
 ...    ..( 11 skipped).    ..  ********************************
  93    2015-08-26 21:16    51  ********************************
  94    2015-08-26 21:17    52  *********************************
  95    2015-08-26 21:18    52  *********************************
  96    2015-08-26 21:19    52  *********************************
  97    2015-08-26 21:20    53  **********************************
 ...    ..(  2 skipped).    ..  **********************************
 100    2015-08-26 21:23    53  **********************************
 101    2015-08-26 21:24    54  ***********************************
 ...    ..(  9 skipped).    ..  ***********************************
 111    2015-08-26 21:34    54  ***********************************
 112    2015-08-26 21:35    55  ************************************
 ...    ..( 15 skipped).    ..  ************************************
   0    2015-08-26 21:51    55  ************************************
   1    2015-08-26 21:52    54  ***********************************
 ...    ..( 35 skipped).    ..  ***********************************
  37    2015-08-26 22:28    54  ***********************************

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2           31  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            1  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4         9736  Vendor specific

Reply via email to