Concerning family

2017-10-23 Thread Mr Andrew Hai
Did you receive my previous email regarding your family inheritance ?

Andrew.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Scrub doesn't correct coruption

2017-10-23 Thread Wolf
Hi,
I'm having problem with corruption in one file on my disk array. This is
third time it happened (probably). First time I didn't checked the
offending file so I'm not sure but it's likely. Btrfs scrub finds the
corruption, according to both dmesg and it's output it fixes it.
However, next run finds it too.

However, according to SMART the disk appears to be healthy (see below).
Plus the corruption is limited to one file.

Is this and issue somewhere inside btrfs or is disk HW related problem?

Thank you for your help :)

W.

smartctl -a /dev/sde


SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000b   100   100   016Pre-fail  Always   
-   0
  2 Throughput_Performance  0x0005   131   131   054Pre-fail  Offline  
-   116
  3 Spin_Up_Time0x0007   100   100   024Pre-fail  Always   
-   0
  4 Start_Stop_Count0x0012   100   100   000Old_age   Always   
-   8
  5 Reallocated_Sector_Ct   0x0033   100   100   005Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x000b   100   100   067Pre-fail  Always   
-   0
  8 Seek_Time_Performance   0x0005   140   140   020Pre-fail  Offline  
-   15
  9 Power_On_Hours  0x0012   100   100   000Old_age   Always   
-   401
 10 Spin_Retry_Count0x0013   100   100   060Pre-fail  Always   
-   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always   
-   8
 22 Unknown_Attribute   0x0023   100   100   025Pre-fail  Always   
-   100
192 Power-Off_Retract_Count 0x0032   100   100   000Old_age   Always   
-   33
193 Load_Cycle_Count0x0012   100   100   000Old_age   Always   
-   33
194 Temperature_Celsius 0x0002   147   147   000Old_age   Always   
-   44 (Min/Max 23/46)
196 Reallocated_Event_Count 0x0032   100   100   000Old_age   Always   
-   0
197 Current_Pending_Sector  0x0022   100   100   000Old_age   Always   
-   0
198 Offline_Uncorrectable   0x0008   100   100   000Old_age   Offline  
-   0
199 UDMA_CRC_Error_Count0x000a   200   200   000Old_age   Always   
-   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_DescriptionStatus  Remaining  LifeTime(hours)  
LBA_of_first_error
# 1  Extended offlineCompleted without error   00%   357 -
# 2  Short offline   Completed without error   00%   335 -

uname -a


Linux ws 4.13.8-1-ARCH #1 SMP PREEMPT Wed Oct 18 11:49:44 CEST 2017 x86_64 
GNU/Linux

btrfs --version
===

btrfs-progs v4.13

btrfs fi show
=

Label: none  uuid: db7e86f5-649d-44ce-9514-53c7ee0fbe09
Total devices 2 FS bytes used 9.91GiB
devid1 size 103.79GiB used 20.03GiB path /dev/mapper/storage1-root
devid2 size 103.79GiB used 20.03GiB path /dev/mapper/storage2-root

Label: 'RAID'  uuid: 9a4be3ac-e942-4e6a-bb24-2c4009a42572
Total devices 7 FS bytes used 6.48TiB
devid1 size 1.82TiB used 715.03GiB path /dev/mapper/data3
devid2 size 1.82TiB used 715.00GiB path /dev/mapper/data4
devid3 size 2.73TiB used 1.40TiB path /dev/mapper/data2
devid4 size 2.73TiB used 1.40TiB path /dev/mapper/data1
devid5 size 2.73TiB used 1.40TiB path /dev/mapper/data5
devid6 size 2.73TiB used 1.40TiB path /dev/mapper/data6
devid7 size 7.28TiB used 5.95TiB path /dev/mapper/data7

btrfs fi df /raid
=

Data, RAID1: total=6.47TiB, used=6.47TiB
System, RAID1: total=64.00MiB, used=944.00KiB
Metadata, RAID1: total=9.00GiB, used=7.56GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

dmesg
=

[0.00] microcode: microcode updated early to revision 0xba, date = 
2017-04-09
[0.00] random: get_random_bytes called from start_kernel+0x42/0x4b7 
with crng_init=0
[0.00] Linux version 4.13.8-1-ARCH (builduser@tobias) (gcc version 
7.2.0 (GCC)) #1 SMP PREEMPT Wed Oct 18 11:49:44 CEST 2017
[0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-linux 
root=UUID=db7e86f5-649d-44ce-9514-53c7ee0fbe09 rw 
cryptdevice=UUID=eb4011d2-38cd-467d-b515-7acf3ef68f01:storage1:allow-discards 
cryptkey=rootfs:/boot/crypto_keyfile.bin 
cryptdevice2=UUID=dd0821ae-8fc4-41d2-aab8-f313e2f6d0e8:storage2:allow-discards 
cryptkey2=rootfs:/boot/crypto_keyfile2.bin root=/dev/mapper/storage1-root
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: S

Re: Scrub doesn't correct coruption

2017-10-23 Thread Qu Wenruo


On 2017年10月23日 16:39, Wolf wrote:
> Hi,
> I'm having problem with corruption in one file on my disk array. This is
> third time it happened (probably). First time I didn't checked the
> offending file so I'm not sure but it's likely. Btrfs scrub finds the
> corruption, according to both dmesg and it's output it fixes it.
> However, next run finds it too.
> 
> However, according to SMART the disk appears to be healthy (see below).
> Plus the corruption is limited to one file.
> 
> Is this and issue somewhere inside btrfs or is disk HW related problem?
> 
> Thank you for your help :)
> 
> W.
> 
> smartctl -a /dev/sde
> 
> 
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
> WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate 0x000b   100   100   016Pre-fail  Always  
>  -   0
>   2 Throughput_Performance  0x0005   131   131   054Pre-fail  Offline 
>  -   116
>   3 Spin_Up_Time0x0007   100   100   024Pre-fail  Always  
>  -   0
>   4 Start_Stop_Count0x0012   100   100   000Old_age   Always  
>  -   8
>   5 Reallocated_Sector_Ct   0x0033   100   100   005Pre-fail  Always  
>  -   0
>   7 Seek_Error_Rate 0x000b   100   100   067Pre-fail  Always  
>  -   0
>   8 Seek_Time_Performance   0x0005   140   140   020Pre-fail  Offline 
>  -   15
>   9 Power_On_Hours  0x0012   100   100   000Old_age   Always  
>  -   401
>  10 Spin_Retry_Count0x0013   100   100   060Pre-fail  Always  
>  -   0
>  12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always  
>  -   8
>  22 Unknown_Attribute   0x0023   100   100   025Pre-fail  Always  
>  -   100
> 192 Power-Off_Retract_Count 0x0032   100   100   000Old_age   Always  
>  -   33
> 193 Load_Cycle_Count0x0012   100   100   000Old_age   Always  
>  -   33
> 194 Temperature_Celsius 0x0002   147   147   000Old_age   Always  
>  -   44 (Min/Max 23/46)
> 196 Reallocated_Event_Count 0x0032   100   100   000Old_age   Always  
>  -   0
> 197 Current_Pending_Sector  0x0022   100   100   000Old_age   Always  
>  -   0
> 198 Offline_Uncorrectable   0x0008   100   100   000Old_age   Offline 
>  -   0
> 199 UDMA_CRC_Error_Count0x000a   200   200   000Old_age   Always  
>  -   0
> 
> SMART Error Log Version: 1
> No Errors Logged
> 
> SMART Self-test log structure revision number 1
> Num  Test_DescriptionStatus  Remaining  LifeTime(hours)  
> LBA_of_first_error
> # 1  Extended offlineCompleted without error   00%   357 -
> # 2  Short offline   Completed without error   00%   335 -
> 
> uname -a
> 
> 
> Linux ws 4.13.8-1-ARCH #1 SMP PREEMPT Wed Oct 18 11:49:44 CEST 2017 x86_64 
> GNU/Linux
> 
> btrfs --version
> ===
> 
> btrfs-progs v4.13
> 
> btrfs fi show
> =
> 
> Label: none  uuid: db7e86f5-649d-44ce-9514-53c7ee0fbe09
>   Total devices 2 FS bytes used 9.91GiB
>   devid1 size 103.79GiB used 20.03GiB path /dev/mapper/storage1-root
>   devid2 size 103.79GiB used 20.03GiB path /dev/mapper/storage2-root
> 
> Label: 'RAID'  uuid: 9a4be3ac-e942-4e6a-bb24-2c4009a42572
>   Total devices 7 FS bytes used 6.48TiB
>   devid1 size 1.82TiB used 715.03GiB path /dev/mapper/data3
>   devid2 size 1.82TiB used 715.00GiB path /dev/mapper/data4
>   devid3 size 2.73TiB used 1.40TiB path /dev/mapper/data2
>   devid4 size 2.73TiB used 1.40TiB path /dev/mapper/data1
>   devid5 size 2.73TiB used 1.40TiB path /dev/mapper/data5
>   devid6 size 2.73TiB used 1.40TiB path /dev/mapper/data6
>   devid7 size 7.28TiB used 5.95TiB path /dev/mapper/data7
> 
> btrfs fi df /raid
> =
> 
> Data, RAID1: total=6.47TiB, used=6.47TiB
> System, RAID1: total=64.00MiB, used=944.00KiB
> Metadata, RAID1: total=9.00GiB, used=7.56GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B

RAID1 for both data and meta.
So if nothing went wrong, it should be fixed.

And IIRC RAID1 repair is already tested and checked, so it should not
has such problem.

> 
> dmesg
> =
> 
> [0.00] microcode: microcode updated early to revision 0xba, date = 
> 2017-04-09
> [0.00] random: get_random_bytes called from start_kernel+0x42/0x4b7 
> with crng_init=0
> [0.00] Linux version 4.13.8-1-ARCH (builduser@tobias) (gcc version 
> 7.2.0 (GCC)) #1 SMP PREEMPT Wed Oct 18 11:49:44 CEST 2017

Arch user here too.

> [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-linux 
> root=UUID=db7e86f5-649d-44ce-9514-53c7ee0fbe09 rw 
> cryptdevice=UUID=eb4011d2-38cd-467d-b515-7acf3ef68f01:storage1:allow-discards 
> cryptkey=roo

Re: Scrub doesn't correct coruption

2017-10-23 Thread Qu Wenruo


On 2017年10月23日 17:17, Qu Wenruo wrote:
> 
> 
> On 2017年10月23日 16:39, Wolf wrote:
>> Hi,
>> I'm having problem with corruption in one file on my disk array. This is
>> third time it happened (probably). First time I didn't checked the
>> offending file so I'm not sure but it's likely. Btrfs scrub finds the
>> corruption, according to both dmesg and it's output it fixes it.
>> However, next run finds it too.
>>
>> However, according to SMART the disk appears to be healthy (see below).
>> Plus the corruption is limited to one file.
>>
>> Is this and issue somewhere inside btrfs or is disk HW related problem?
>>
>> Thank you for your help :)
>>
>> W.
>>
>> smartctl -a /dev/sde
>> 
>>
>> SMART Attributes Data Structure revision number: 16
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
>> WHEN_FAILED RAW_VALUE
>>   1 Raw_Read_Error_Rate 0x000b   100   100   016Pre-fail  Always 
>>   -   0
>>   2 Throughput_Performance  0x0005   131   131   054Pre-fail  Offline
>>   -   116
>>   3 Spin_Up_Time0x0007   100   100   024Pre-fail  Always 
>>   -   0
>>   4 Start_Stop_Count0x0012   100   100   000Old_age   Always 
>>   -   8
>>   5 Reallocated_Sector_Ct   0x0033   100   100   005Pre-fail  Always 
>>   -   0
>>   7 Seek_Error_Rate 0x000b   100   100   067Pre-fail  Always 
>>   -   0
>>   8 Seek_Time_Performance   0x0005   140   140   020Pre-fail  Offline
>>   -   15
>>   9 Power_On_Hours  0x0012   100   100   000Old_age   Always 
>>   -   401
>>  10 Spin_Retry_Count0x0013   100   100   060Pre-fail  Always 
>>   -   0
>>  12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always 
>>   -   8
>>  22 Unknown_Attribute   0x0023   100   100   025Pre-fail  Always 
>>   -   100
>> 192 Power-Off_Retract_Count 0x0032   100   100   000Old_age   Always 
>>   -   33
>> 193 Load_Cycle_Count0x0012   100   100   000Old_age   Always 
>>   -   33
>> 194 Temperature_Celsius 0x0002   147   147   000Old_age   Always 
>>   -   44 (Min/Max 23/46)
>> 196 Reallocated_Event_Count 0x0032   100   100   000Old_age   Always 
>>   -   0
>> 197 Current_Pending_Sector  0x0022   100   100   000Old_age   Always 
>>   -   0
>> 198 Offline_Uncorrectable   0x0008   100   100   000Old_age   Offline
>>   -   0
>> 199 UDMA_CRC_Error_Count0x000a   200   200   000Old_age   Always 
>>   -   0
>>
>> SMART Error Log Version: 1
>> No Errors Logged
>>
>> SMART Self-test log structure revision number 1
>> Num  Test_DescriptionStatus  Remaining  LifeTime(hours)  
>> LBA_of_first_error
>> # 1  Extended offlineCompleted without error   00%   357 
>> -
>> # 2  Short offline   Completed without error   00%   335 
>> -
>>
>> uname -a
>> 
>>
>> Linux ws 4.13.8-1-ARCH #1 SMP PREEMPT Wed Oct 18 11:49:44 CEST 2017 x86_64 
>> GNU/Linux
>>
>> btrfs --version
>> ===
>>
>> btrfs-progs v4.13
>>
>> btrfs fi show
>> =
>>
>> Label: none  uuid: db7e86f5-649d-44ce-9514-53c7ee0fbe09
>>  Total devices 2 FS bytes used 9.91GiB
>>  devid1 size 103.79GiB used 20.03GiB path /dev/mapper/storage1-root
>>  devid2 size 103.79GiB used 20.03GiB path /dev/mapper/storage2-root
>>
>> Label: 'RAID'  uuid: 9a4be3ac-e942-4e6a-bb24-2c4009a42572
>>  Total devices 7 FS bytes used 6.48TiB
>>  devid1 size 1.82TiB used 715.03GiB path /dev/mapper/data3
>>  devid2 size 1.82TiB used 715.00GiB path /dev/mapper/data4
>>  devid3 size 2.73TiB used 1.40TiB path /dev/mapper/data2
>>  devid4 size 2.73TiB used 1.40TiB path /dev/mapper/data1
>>  devid5 size 2.73TiB used 1.40TiB path /dev/mapper/data5
>>  devid6 size 2.73TiB used 1.40TiB path /dev/mapper/data6
>>  devid7 size 7.28TiB used 5.95TiB path /dev/mapper/data7
>>
>> btrfs fi df /raid
>> =
>>
>> Data, RAID1: total=6.47TiB, used=6.47TiB
>> System, RAID1: total=64.00MiB, used=944.00KiB
>> Metadata, RAID1: total=9.00GiB, used=7.56GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> RAID1 for both data and meta.
> So if nothing went wrong, it should be fixed.
> 
> And IIRC RAID1 repair is already tested and checked, so it should not
> has such problem.
> 
>>
>> dmesg
>> =
>>
>> [0.00] microcode: microcode updated early to revision 0xba, date = 
>> 2017-04-09
>> [0.00] random: get_random_bytes called from start_kernel+0x42/0x4b7 
>> with crng_init=0
>> [0.00] Linux version 4.13.8-1-ARCH (builduser@tobias) (gcc version 
>> 7.2.0 (GCC)) #1 SMP PREEMPT Wed Oct 18 11:49:44 CEST 2017
> 
> Arch user here too.
> 
>> [0.00] Command line: BOOT_IMAGE=/boot/vm

Re: Scrub doesn't correct coruption

2017-10-23 Thread ein
On 10/23/2017 10:39 AM, Wolf wrote:
> [...]
>
> Is this and issue somewhere inside btrfs or is disk HW related problem?

Highly unlikely hardware related. According to SMART and dmsg, there's
no indication which would suggest disk failure.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs: Make btrfs_async_run_delayed_root use a loop rather than multiple labels

2017-10-23 Thread Nikolay Borisov
Currently btrfs_async_run_delayed_root's implementation uses 3 goto labels to
mimic the functionality of a simple do {} while loop. Refactor the function
to use a do {} while construct, making intention clear and code easier to
follow. No functional changes

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/delayed-inode.c | 52 +---
 1 file changed, 27 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 19e4ad2f3f2e..1bfdb90d7633 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1323,40 +1323,42 @@ static void btrfs_async_run_delayed_root(struct 
btrfs_work *work)
if (!path)
goto out;
 
-again:
-   if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND / 2)
-   goto free_path;
+   do {
+   if (atomic_read(&delayed_root->items) <
+   BTRFS_DELAYED_BACKGROUND / 2)
+   break;
 
-   delayed_node = btrfs_first_prepared_delayed_node(delayed_root);
-   if (!delayed_node)
-   goto free_path;
+   delayed_node = btrfs_first_prepared_delayed_node(delayed_root);
+   if (!delayed_node)
+   break;
 
-   path->leave_spinning = 1;
-   root = delayed_node->root;
+   path->leave_spinning = 1;
+   root = delayed_node->root;
 
-   trans = btrfs_join_transaction(root);
-   if (IS_ERR(trans))
-   goto release_path;
+   trans = btrfs_join_transaction(root);
+   if (IS_ERR(trans)) {
+   btrfs_release_path(path);
+   btrfs_release_prepared_delayed_node(delayed_node);
+   total_done++;
+   continue;
+   }
 
-   block_rsv = trans->block_rsv;
-   trans->block_rsv = &root->fs_info->delayed_block_rsv;
+   block_rsv = trans->block_rsv;
+   trans->block_rsv = &root->fs_info->delayed_block_rsv;
 
-   __btrfs_commit_inode_delayed_items(trans, path, delayed_node);
+   __btrfs_commit_inode_delayed_items(trans, path, delayed_node);
 
-   trans->block_rsv = block_rsv;
-   btrfs_end_transaction(trans);
-   btrfs_btree_balance_dirty_nodelay(root->fs_info);
+   trans->block_rsv = block_rsv;
+   btrfs_end_transaction(trans);
+   btrfs_btree_balance_dirty_nodelay(root->fs_info);
 
-release_path:
-   btrfs_release_path(path);
-   total_done++;
+   btrfs_release_path(path);
+   btrfs_release_prepared_delayed_node(delayed_node);
+   total_done++;
 
-   btrfs_release_prepared_delayed_node(delayed_node);
-   if ((async_work->nr == 0 && total_done < BTRFS_DELAYED_WRITEBACK) ||
-   total_done < async_work->nr)
-   goto again;
+   } while ((async_work->nr == 0 && total_done < BTRFS_DELAYED_WRITEBACK)
+|| total_done < async_work->nr);
 
-free_path:
btrfs_free_path(path);
 out:
wake_up(&delayed_root->wait);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs: Move checks from btrfs_wq_run_delayed_node to btrfs_balance_delayed_items

2017-10-23 Thread Nikolay Borisov
btrfs_balance_delayed_items is the sole caller of btrfs_wq_run_delayed_node and
already includes one of the checks whether the delayed inodes should be run. On
the other hand btrfs_wq_run_delayed_node duplicates that check and performs an
additional one for wq congestion.

Let's remove the duplicate check and move the congestion one in
btrfs_balance_delayed_items, leaving btrfs_wq_run_delayed_node to only care
about setting up the wq run. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/delayed-inode.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 1bfdb90d7633..b7a0ec2c41e6 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1371,10 +1371,6 @@ static int btrfs_wq_run_delayed_node(struct 
btrfs_delayed_root *delayed_root,
 {
struct btrfs_async_delayed_work *async_work;
 
-   if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND ||
-   btrfs_workqueue_normal_congested(fs_info->delayed_workers))
-   return 0;
-
async_work = kmalloc(sizeof(*async_work), GFP_NOFS);
if (!async_work)
return -ENOMEM;
@@ -1410,7 +1406,8 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info 
*fs_info)
 {
struct btrfs_delayed_root *delayed_root = fs_info->delayed_root;
 
-   if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND)
+   if ((atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND) ||
+   btrfs_workqueue_normal_congested(fs_info->delayed_workers))
return;
 
if (atomic_read(&delayed_root->items) >= BTRFS_DELAYED_WRITEBACK) {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: Make btrfs_async_run_delayed_root use a loop rather than multiple labels

2017-10-23 Thread Qu Wenruo


On 2017年10月23日 18:51, Nikolay Borisov wrote:
> Currently btrfs_async_run_delayed_root's implementation uses 3 goto labels to
> mimic the functionality of a simple do {} while loop. Refactor the function
> to use a do {} while construct, making intention clear and code easier to
> follow. No functional changes
> 
> Signed-off-by: Nikolay Borisov 

Looks good to me.

Reviewed-by: Qu Wenruo 
> ---
>  fs/btrfs/delayed-inode.c | 52 
> +---
>  1 file changed, 27 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index 19e4ad2f3f2e..1bfdb90d7633 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -1323,40 +1323,42 @@ static void btrfs_async_run_delayed_root(struct 
> btrfs_work *work)
>   if (!path)
>   goto out;
>  
> -again:
> - if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND / 2)
> - goto free_path;
> + do {
> + if (atomic_read(&delayed_root->items) <
> + BTRFS_DELAYED_BACKGROUND / 2)
> + break;
>  
> - delayed_node = btrfs_first_prepared_delayed_node(delayed_root);
> - if (!delayed_node)
> - goto free_path;
> + delayed_node = btrfs_first_prepared_delayed_node(delayed_root);
> + if (!delayed_node)
> + break;
>  
> - path->leave_spinning = 1;
> - root = delayed_node->root;
> + path->leave_spinning = 1;
> + root = delayed_node->root;
>  
> - trans = btrfs_join_transaction(root);
> - if (IS_ERR(trans))
> - goto release_path;
> + trans = btrfs_join_transaction(root);
> + if (IS_ERR(trans)) {
> + btrfs_release_path(path);
> + btrfs_release_prepared_delayed_node(delayed_node);
> + total_done++;
> + continue;
> + }
>  
> - block_rsv = trans->block_rsv;
> - trans->block_rsv = &root->fs_info->delayed_block_rsv;
> + block_rsv = trans->block_rsv;
> + trans->block_rsv = &root->fs_info->delayed_block_rsv;
>  
> - __btrfs_commit_inode_delayed_items(trans, path, delayed_node);
> + __btrfs_commit_inode_delayed_items(trans, path, delayed_node);
>  
> - trans->block_rsv = block_rsv;
> - btrfs_end_transaction(trans);
> - btrfs_btree_balance_dirty_nodelay(root->fs_info);
> + trans->block_rsv = block_rsv;
> + btrfs_end_transaction(trans);
> + btrfs_btree_balance_dirty_nodelay(root->fs_info);
>  
> -release_path:
> - btrfs_release_path(path);
> - total_done++;
> + btrfs_release_path(path);
> + btrfs_release_prepared_delayed_node(delayed_node);
> + total_done++;
>  
> - btrfs_release_prepared_delayed_node(delayed_node);
> - if ((async_work->nr == 0 && total_done < BTRFS_DELAYED_WRITEBACK) ||
> - total_done < async_work->nr)
> - goto again;
> + } while ((async_work->nr == 0 && total_done < BTRFS_DELAYED_WRITEBACK)
> +  || total_done < async_work->nr);
>  
> -free_path:
>   btrfs_free_path(path);
>  out:
>   wake_up(&delayed_root->wait);
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 2/2] btrfs: Move checks from btrfs_wq_run_delayed_node to btrfs_balance_delayed_items

2017-10-23 Thread Qu Wenruo


On 2017年10月23日 18:51, Nikolay Borisov wrote:
> btrfs_balance_delayed_items is the sole caller of btrfs_wq_run_delayed_node 
> and
> already includes one of the checks whether the delayed inodes should be run. 
> On
> the other hand btrfs_wq_run_delayed_node duplicates that check and performs an
> additional one for wq congestion.
> 
> Let's remove the duplicate check and move the congestion one in
> btrfs_balance_delayed_items, leaving btrfs_wq_run_delayed_node to only care
> about setting up the wq run. No functional changes.
> 
> Signed-off-by: Nikolay Borisov 
btrfs_workqueue_normal_congested() is moved to the caller and removed
duplicated atomic_read().

Unless delayed_root->items get modified in the period, it should be good.
But anyway, the original code has nothing to protect different
atomic_read(), so I don't think it will cause any new problem.

Reviewed-by: Qu Wenruo 

Thanks,
Qu
> ---
>  fs/btrfs/delayed-inode.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index 1bfdb90d7633..b7a0ec2c41e6 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -1371,10 +1371,6 @@ static int btrfs_wq_run_delayed_node(struct 
> btrfs_delayed_root *delayed_root,
>  {
>   struct btrfs_async_delayed_work *async_work;
>  
> - if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND ||
> - btrfs_workqueue_normal_congested(fs_info->delayed_workers))
> - return 0;
> -
>   async_work = kmalloc(sizeof(*async_work), GFP_NOFS);
>   if (!async_work)
>   return -ENOMEM;
> @@ -1410,7 +1406,8 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info 
> *fs_info)
>  {
>   struct btrfs_delayed_root *delayed_root = fs_info->delayed_root;
>  
> - if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND)
> + if ((atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND) ||
> + btrfs_workqueue_normal_congested(fs_info->delayed_workers))
>   return;
>  
>   if (atomic_read(&delayed_root->items) >= BTRFS_DELAYED_WRITEBACK) {
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v4] btrfs: Fix transaction abort during failure in btrfs_rm_dev_item

2017-10-23 Thread Edmund Nadolski
On 10/23/2017 12:58 AM, Nikolay Borisov wrote:
> btrfs_rm_dev_item calls several function under an activa transaction, however
^^
active

> it fails to abort it if an error happens. Fix this by adding explicit
> btrfs_abort_transaction/btrfs_end_transaction calls
> 
> Signed-off-by: Nikolay Borisov 
> ---
> V4:
>  * Reorder the code a bit to prevent duplication of btrfs_free_path 
>  invocation. 
> 
>  * Collapse the handling of btrfs_search_slot return value in a single if
>  branch rather than having it spread across 2 branches 
> 
> V3:
>  * The path needs to be freed before the the transaction is comitted 
> otherwise 
>   we will deadlock.
>  fs/btrfs/volumes.c | 20 
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 0e8f16c305df..8b139d203f8c 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1765,20 +1765,24 @@ static int btrfs_rm_dev_item(struct btrfs_fs_info 
> *fs_info,
>   key.offset = device->devid;
>  
>   ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
> - if (ret < 0)
> - goto out;
> -
> - if (ret > 0) {
> - ret = -ENOENT;
> + if (ret) {
> + if (ret > 0)
> + ret = -ENOENT;
> + btrfs_abort_transaction(trans, ret);
> + btrfs_end_transaction(trans);
>   goto out;
>   }
>  
>   ret = btrfs_del_item(trans, root, path);
> - if (ret)
> - goto out;
> + if (ret) {
> + btrfs_abort_transaction(trans, ret);
> + btrfs_end_transaction(trans);
> + }
> +
>  out:
>   btrfs_free_path(path);
> - btrfs_commit_transaction(trans);
> + if (!ret)
> + ret = btrfs_commit_transaction(trans);
>   return ret;
>  }
>  
> 

Perhaps slightly simpler (and the 'out:' label maybe goes away):

.
ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
if (ret > 0)
ret = -ENOENT;
else if (!ret)
ret = btrfs_del_item(trans, root, path);

if (ret) {
btrfs_abort_transaction(trans, ret);
btrfs_end_transaction(trans);
}
out:
.


Ed
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 0/6] Btrfs: populate heuristic with code

2017-10-23 Thread Timofey Titovets
2017-10-22 16:44 GMT+03:00 Timofey Titovets :
> 2017-10-20 16:45 GMT+03:00 David Sterba :
>> On Fri, Oct 20, 2017 at 01:48:01AM +0300, Timofey Titovets wrote:
>>> 2017-10-19 18:39 GMT+03:00 David Sterba :
>>> > On Fri, Sep 29, 2017 at 06:22:00PM +0200, David Sterba wrote:
>>> >> On Thu, Sep 28, 2017 at 05:33:35PM +0300, Timofey Titovets wrote:
>>> >> > Compile tested, hand tested on live system
>>> >> >
>>> >> > Change v7 -> v8
>>> >> >   - All code moved to compression.c (again)
>>> >> >   - Heuristic workspaces inmplemented another way
>>> >> > i.e. only share logic with compression workspaces
>>> >> >   - Some style fixes suggested by Devid
>>> >> >   - Move sampling function from heuristic code
>>> >> > (I'm afraid of big functions)
>>> >> >   - Much more comments and explanations
>>> >>
>>> >> Thanks for the update, I went through the patches and they looked good
>>> >> enough to be put into for-next. I may have more comments about a few
>>> >> things, but nothing serious that would hinder testing.
>>> >
>>> > I did a final pass through the patches and edited comments wehre I was
>>> > not able to undrerstand them. Please check the updated patches in [1] if
>>> > I did not accidentally change the meaning.
>>>
>>> I don't see a link [1] in mail, may be you missed it?
>>
>> Yeah, sorry:
>> https://github.com/kdave/btrfs-devel/commits/ext/timofey/heuristic
>
> I did re-read updated comments, looks ok to me
> (i only found one typo, leave a comment).
>
>
> Thanks
> --
> Have a nice day,
> Timofey.

Can you please try that patch? (in attach)

I think some time about performance hit of heuristic and
how to avoid using sorting,

That patch will try prefind min/max values (before sorting) in array,
and (max - min), used to filter edge data cases where
byte core size < 64 or bigger > 200
It's a bit hacky workaround =\,
That show a ~same speedup on my data set as show using of radix sort.
(i.e. x2 speed up)

Thanks.

-- 
Have a nice day,
Timofey.
From fb2a329828e64ad0e224a8cb97dbc17147149629 Mon Sep 17 00:00:00 2001
From: Timofey Titovets 
Date: Mon, 23 Oct 2017 21:24:29 +0300
Subject: [PATCH] Btrfs: heuristic try avoid bucket sorting on edge data cases

Heap sort used in kernel are too slow and costly,
So let's make some statistic assume about egde input data cases
Based on observation of difference between min/max values in bucket.

Signed-off-by: Timofey Titovets 
---
 fs/btrfs/compression.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 0ca16909894e..56b67ec4fb5b 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1310,8 +1310,46 @@ static int byte_core_set_size(struct heuristic_ws *ws)
 	u32 i;
 	u32 coreset_sum = 0;
 	const u32 core_set_threshold = ws->sample_size * 90 / 100;
+	struct bucket_item *max, *min;
+	struct bucket_item tmp;
 	struct bucket_item *bucket = ws->bucket;
 
+
+	/* Presort for find min/max value */
+	max = &bucket[0];
+	min = &bucket[BUCKET_SIZE - 1];
+	for (i = 1; i < BUCKET_SIZE - 1; i++) {
+		if (bucket[i].count > max->count) {
+			tmp = *max;
+			*max = bucket[i];
+			bucket[i] = tmp;
+		}
+		if (bucket[i].count < min->count) {
+			tmp = *min;
+			*min = bucket[i];
+			bucket[i] = tmp;
+		}
+	}
+
+	/*
+	 * Hacks for avoid sorting on Edge data cases (sorting too constly)
+	 * i.e. that will fast filter easy compressible
+	 * and bad compressible data
+	 * Based on observation of number distribution on different data sets
+	 *
+	 * Assume 1: For bad compressible data distribution between min/max
+	 * will be less then 0.6% of sample size
+	 *
+	 * Assume 2: For good compressible data distribution between min/max
+	 * will be far bigger then 4% of sample size
+	 */
+
+	if (max->count - min->count < ws->sample_size * 6 / 1000)
+		return BYTE_CORE_SET_HIGH + 1;
+
+	if (max->count - min->count > ws->sample_size * 4 / 100)
+		return BYTE_CORE_SET_LOW - 1;
+
 	/* Sort in reverse order */
 	sort(bucket, BUCKET_SIZE, sizeof(*bucket), &bucket_comp_rev, NULL);
 
-- 
2.14.2



Re: Scrub doesn't correct coruption

2017-10-23 Thread Wolf
On , Qu Wenruo wrote:
> [27240.680874] perf: interrupt took too long (3952 > 3942), lowering 
> kernel.perf_event_max_sample_rate to 50400
> > [30658.875802] BTRFS warning (device dm-12): checksum error at logical 
> > 37889245122560 on dev /dev/mapper/data7, sector 2743145096, root 23674, 
> > inode 206751, offset 762638336, length 4096, links 1 (path: 
> > アニメ/!waiting_for_better_quality/Gate: Jieitai Kanochi nite, Kaku 
> > Tatakaeri/GATE Jieitai Kanochi nite, Kaku Tatakaeri 05v2.mp4)
> 
> Well, it's several seasons ago, and I think there are better BDrip raws now.
> (Yeah, I'm also an Otaku)
> 
> Despite that, it's better to hide such personal info though.

Since downloading stuff from internet is legal in my country I don't
usually bother to hide stuff like this, but will do so if it's an issue
in this mailing list.

> And, did you tried to scrub the corrupted device other than the whole fs?
> Btrfs default scrub will start threads to scrub all devices at the same
> time, maybe some concurrency caused the false alert.

Tbh I had no idea I can scrub just the device and not whole filesystem,
running it now (but the scrub on this drive takes like 12 hours so I see
tomorrow if it helped).

> Also, it could be possible to check/repair it by using btrfs-progs.
> Although it's still out-of-tree.
> 
> Could you please try the following branch and use "btrfs scrub start
> --offline /dev/mapper/data7" to check if it reports the corruption is
> fixable?
> https://github.com/gujx2017/btrfs-progs/tree/offline_scrub
> 
> Offline scrub gives us a quite good reference on whether it's fixable,
> without the possible hassle in kernel.
> So it's worth trying.

If scrubbing just the device doesn't help in any way, will give it a
try.

> (But hey, there is better better BDrip raws already, so I don't think
> you're really interested in fixing the corruption)

True. Plus since it's RAID1 no data were actually lost and the is
working without problem. I'm mainly interested in knowing if it's

1) Issue with HW
2) Some hidden issue with the whole fs and it's going to fall apart soon



Thanks for tips so far :)

W.
-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.


signature.asc
Description: PGP signature


Re: Scrub doesn't correct coruption

2017-10-23 Thread Wolf
On , ein wrote:
> On 10/23/2017 10:39 AM, Wolf wrote:
> > [...]
> >
> > Is this and issue somewhere inside btrfs or is disk HW related problem?
> 
> Highly unlikely hardware related. According to SMART and dmsg, there's
> no indication which would suggest disk failure.

That's my thinking too (and the reason while the disk is still in the
array instead of going back for warranty), but since the scrub failed to
correct the issue despite saying it did, I'm a bit curious what's going
on.

W.

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.


signature.asc
Description: PGP signature


[PATCH] fstests: btrfs/143: make test case more reliable

2017-10-23 Thread Liu Bo
Currently drop_caches is used to invalidate file's page cache so that
buffered read can hit disk, but the problem is that it may also
invalidate metadata's page cache, so the test case may not get read
errors (and repair) if reading metadata has consumed the injected
faults.

This changes it to do 'fadvise -d' to firstly access all metadata it
needs to locate the file and then only drops the test file's page
cache.  Also this changes it to read the file only if pid%2 == 1.

Reported-by: Nikolay Borisov 
Signed-off-by: Liu Bo 
---
 tests/btrfs/143 | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tests/btrfs/143 b/tests/btrfs/143
index da7bfd8..dabd03d 100755
--- a/tests/btrfs/143
+++ b/tests/btrfs/143
@@ -127,16 +127,16 @@ echo "step 3..repair the bad copy" >>$seqres.full
 # since raid1 consists of two copies, and the bad copy was put on stripe #1
 # while the good copy lies on stripe #0, the bad copy only gets access when the
 # reader's pid % 2 == 1 is true
-while true; do
-   # start_fail only fails the following buffered read so the repair is
-   # supposed to work.
-   echo 3 > /proc/sys/vm/drop_caches
-   start_fail
-   $XFS_IO_PROG -c "pread 0 4K" "$SCRATCH_MNT/foobar" > /dev/null &
-   pid=$!
-   wait
-   stop_fail
-   [ $((pid % 2)) == 1 ] && break
+while [[ -z ${result} ]]; do
+# invalidate the page cache.
+$XFS_IO_PROG -c "fadvise -d 0 128K" $SCRATCH_MNT/foobar
+
+start_fail
+result=$(bash -c "
+if [[ \$((\$\$ % 2)) -eq 1 ]]; then
+exec $XFS_IO_PROG -c \"pread 0 4K\" \"$SCRATCH_MNT/foobar\"
+fi");
+stop_fail
 done
 
 _scratch_unmount
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] Btrfs: compress_file_range() change page dirty status once

2017-10-23 Thread Timofey Titovets
We need to call extent_range_clear_dirty_for_io()
on compression range to prevent application from changing
page content, while pages compressing.

extent_range_clear_dirty_for_io() run on each loop iteration,
"(end - start)" can be much (up to 1024 times) bigger
then compression range (BTRFS_MAX_UNCOMPRESSED).

That produce extra calls to page managment code.

Fix that behaviour by call extent_range_clear_dirty_for_io()
only once.

v1 -> v2:
 - Make that more obviously and more safeprone

v2 -> v3:
 - Rebased on:
   Btrfs: compress_file_range() remove dead variable num_bytes
 - Update change log
 - Add comments

v3 -> v4:
 - Rebased on: kdave for-next
 - To avoid dirty bit clear/set behaviour change
   call clear_bit once, istead of per compression range

Signed-off-by: Timofey Titovets 
---
 fs/btrfs/inode.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b93fe05a39c7..5816dd3cb6e6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -536,8 +536,10 @@ static noinline void compress_file_range(struct inode 
*inode,
 * If the compression fails for any reason, we set the pages
 * dirty again later on.
 */
-   extent_range_clear_dirty_for_io(inode, start, end);
-   redirty = 1;
+   if (!redirty) {
+   extent_range_clear_dirty_for_io(inode, start, end);
+   redirty = 1;
+   }

/* Compression level is applied here and only here */
ret = btrfs_compress_pages(
--
2.14.2
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send yields "ERROR: send ioctl failed with -5: Input/output error"

2017-10-23 Thread Zak Kohler
All three devices completed the 'long' SMART selftest without error:

# 1  Extended offlineCompleted without error   00%


Here is the standard data that I forgot to include in my first message:
Running Arch linux

$ uname -a
Linux HOSTNAME 4.9.56-1-lts #1 SMP Thu Oct 12 22:34:15 CEST 2017
x86_64 GNU/Linux

$ btrfs --version
btrfs-progs v4.13

$ sudo btrfs fi show
Label: 'CRUCIAL116'  uuid: 31c38558-c8c7-49c4-8fea-9d0730ee58a7
Total devices 1 FS bytes used 7.77GiB
devid1 size 59.62GiB used 59.62GiB path /dev/sda2

Label: 'OfflineJ'  uuid: 88406942-e3e1-42c6-ad71-e23bb315caa7
Total devices 3 FS bytes used 1.98TiB
devid1 size 1.82TiB used 679.00GiB path /dev/sdi
devid2 size 1.82TiB used 679.01GiB path /dev/sdh
devid3 size 1.82TiB used 679.01GiB path /dev/sdn

$ sudo btrfs fi df /mnt
Data, RAID0: total=1.98TiB, used=1.98TiB
System, RAID1: total=8.00MiB, used=144.00KiB
Metadata, RAID1: total=3.00GiB, used=2.44GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

$ dmesg | grep BTRFS
[5.262090] BTRFS: device label CRUCIAL116 devid 1 transid 98407 /dev/sda2
[   15.636475] BTRFS: device label OfflineJ devid 2 transid 612 /dev/sdh
[   15.646343] BTRFS: device label OfflineJ devid 1 transid 612 /dev/sdi
[   15.647194] BTRFS: device label OfflineJ devid 3 transid 612 /dev/sdn
[   15.754204] BTRFS info (device sda2): disk space caching is enabled
[   15.754206] BTRFS info (device sda2): has skinny extents
[   15.778659] BTRFS info (device sda2): detected SSD devices, enabling SSD mode
[   58.492530] BTRFS info (device sdn): disk space caching is enabled
[   58.492532] BTRFS info (device sdn): has skinny extents
[   61.243226] BTRFS info (device sdn): checking UUID tree
[  114.437424] BTRFS warning (device sdn): csum failed ino 6407 off
7683907584 csum 1745651892 expected csum 3952841867
[  114.450699] BTRFS warning (device sdn): csum failed ino 6407 off
7683907584 csum 1745651892 expected csum 3952841867
[38494.978379] BTRFS warning (device sdn): csum failed ino 4708 off
27529216 csum 876064455 expected csum 874979996
[38494.989301] BTRFS warning (device sdn): csum failed ino 4708 off
27529216 csum 2615801759 expected csum 874979996
[38541.079264] BTRFS warning (device sdn): csum failed ino 4708 off
27529216 csum 876064455 expected csum 874979996
[38571.245421] BTRFS warning (device sdn): csum failed ino 4708 off
27529216 csum 2615801759 expected csum 874979996
[39434.215600] BTRFS warning (device sdn): csum failed ino 4708 off
27529216 csum 2615801759 expected csum 874979996
[73132.653297] BTRFS warning (device sdn): csum failed ino 4708 off
27529216 csum 2615801759 expected csum 874979996
[73167.897106] BTRFS warning (device sdn): csum failed ino 4708 off
27529216 csum 2615801759 expected csum 874979996


One thing I notice is that ino 4708 keeps returns a few different
'wrong' csums, I can also
confirm that one of those 'csum failed' messages gets written each
time I run '$ sudo btrfs send /mnt/dataroot.2017.10.21/ | pv -i5 > /dev/null'

Does anyone know why scrub did not catch these errors that show up in dmesg?



On Mon, Oct 23, 2017 at 12:25 AM, Zak Kohler  wrote:
> Was attempting my first btrfs send receive over ssh and continually
> received ioctl error at different points but always in the first 3
> minutes. The volume consists of three devices with only metadata
> duplication. I narrowed down the error to the send command by
> recreating the error while redirecting to /dev/null. Sometime it would
> happen after ~12Gib, or ~7.6Gib, right now rerunning multiple times it
> has stopped on exactly 3.76 multiple times.
>
> $ sudo btrfs send /mnt/dataroot.2017.10.21/ | pv -i5 > /dev/null
> At subvol /mnt/dataroot.2017.10.21/
> ERROR: send ioctl failed with -5: Input/output error]
> 3.76GiB 0:00:13 [ 290MiB/s] [  <=>  ]
>
>
> First I checked the btrfs device stats, each of the 3 drives appear clean:
> $ sudo btrfs device stats /mnt
> [/dev/sdi].write_io_errs0
> [/dev/sdi].read_io_errs 0
> [/dev/sdi].flush_io_errs0
> [/dev/sdi].corruption_errs  0
> [/dev/sdi].generation_errs  0
> [/dev/sdh].write_io_errs0
> [/dev/sdh].read_io_errs 0
> [/dev/sdh].flush_io_errs0
> [/dev/sdh].corruption_errs  0
> [/dev/sdh].generation_errs  0
> [/dev/sdn].write_io_errs0
> [/dev/sdn].read_io_errs 0
> [/dev/sdn].flush_io_errs0
> [/dev/sdn].corruption_errs  0
> [/dev/sdn].generation_errs  0
>
> The next thing I tried was running and checking that SMART short
> selftest passed on each of three drives with no error.
> $ sudo smartctl -l selftest /dev/sdh
> # 1  Short offline   Completed without error
>
>
> I read somewhere to check dmesg, which yielded some info:
> BTRFS warning (device sdn): csum failed ino 6407 off 7683907584 csum
> 1745651892 expected csum 3952841867
>
> But when I when to see if scrub could detect the errors, nothing was found:
> $ sudo btrfs scrub

Re: [PATCH 1/2] btrfs-progs: fi: move dev_to_fsid() to cmds-fi-usage for later use

2017-10-23 Thread Anand Jain



On 10/23/2017 12:44 PM, Misono, Tomohiro wrote:

Move dev_to_fsid() from cmds-filesystem.c to cmds-fi-usage.c in order to
call it from both "fi show" and "fi usage".

Signed-off-by: Tomohiro Misono 


Reviewed-by: Anand Jain 

Thanks, Anand


---
  cmds-fi-usage.c   | 29 +
  cmds-fi-usage.h   |  1 +
  cmds-filesystem.c | 27 ---
  3 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c
index 6c846c1..a72fb4e 100644
--- a/cmds-fi-usage.c
+++ b/cmds-fi-usage.c
@@ -22,6 +22,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "utils.h"

  #include "kerncompat.h"
@@ -29,6 +30,7 @@
  #include "string-table.h"
  #include "cmds-fi-usage.h"
  #include "commands.h"
+#include "disk-io.h"
  
  #include "version.h"

  #include "help.h"
@@ -506,6 +508,33 @@ static int cmp_device_info(const void *a, const void *b)
((struct device_info *)b)->path);
  }
  
+int dev_to_fsid(const char *dev, __u8 *fsid)

+{
+   struct btrfs_super_block *disk_super;
+   char buf[BTRFS_SUPER_INFO_SIZE];
+   int ret;
+   int fd;
+
+   fd = open(dev, O_RDONLY);
+   if (fd < 0) {
+   ret = -errno;
+   return ret;
+   }
+
+   disk_super = (struct btrfs_super_block *)buf;
+   ret = btrfs_read_dev_super(fd, disk_super,
+  BTRFS_SUPER_INFO_OFFSET, SBREAD_DEFAULT);
+   if (ret)
+   goto out;
+
+   memcpy(fsid, disk_super->fsid, BTRFS_FSID_SIZE);
+   ret = 0;
+
+out:
+   close(fd);
+   return ret;
+}
+
  /*
   *  This function loads the device_info structure and put them in an array
   */
diff --git a/cmds-fi-usage.h b/cmds-fi-usage.h
index a399517..0e82951 100644
--- a/cmds-fi-usage.h
+++ b/cmds-fi-usage.h
@@ -50,5 +50,6 @@ void print_device_chunks(struct device_info *devinfo,
struct chunk_info *chunks_info_ptr,
int chunks_info_count, unsigned unit_mode);
  void print_device_sizes(struct device_info *devinfo, unsigned unit_mode);
+int dev_to_fsid(const char *dev, __u8 *fsid);
  
  #endif

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index c39f2d1..3dc86a2 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -431,33 +431,6 @@ out:
return !found;
  }
  
-static int dev_to_fsid(const char *dev, __u8 *fsid)

-{
-   struct btrfs_super_block *disk_super;
-   char buf[BTRFS_SUPER_INFO_SIZE];
-   int ret;
-   int fd;
-
-   fd = open(dev, O_RDONLY);
-   if (fd < 0) {
-   ret = -errno;
-   return ret;
-   }
-
-   disk_super = (struct btrfs_super_block *)buf;
-   ret = btrfs_read_dev_super(fd, disk_super,
-  BTRFS_SUPER_INFO_OFFSET, SBREAD_DEFAULT);
-   if (ret)
-   goto out;
-
-   memcpy(fsid, disk_super->fsid, BTRFS_FSID_SIZE);
-   ret = 0;
-
-out:
-   close(fd);
-   return ret;
-}
-
  static void free_fs_devices(struct btrfs_fs_devices *fs_devices)
  {
struct btrfs_fs_devices *cur_seed, *next_seed;


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs-progs: fi: enable fi usage for filesystem top of seed device

2017-10-23 Thread Anand Jain



On 10/23/2017 12:45 PM, Misono, Tomohiro wrote:

Currently "fi usage" (and "dev usage") cannot run for the filesystem using
seed device.

This is because FS_INFO ioctl returns the number of devices excluding
seeds, but load_device_info() tries to access valid device from devid 0
to max_id, and results in accessing seeds too (thus causing mismatching
of number of devices).


 A long time back I tried to fix this by fixing the FS_INFO num_devs
 itself, but the concern was backward compatibility of the ioctl.
 However there is no such a concern here. I am ok with this approach.


Since only the size of non-seed devices is matter, fix this by just
skipping seed device by checking device's fsid and comparing it to the fsid
obtained by FS_INFO ioctl.

Signed-off-by: Tomohiro Misono 


Reviewed-by: Anand Jain 

Thanks, Anand



---
  cmds-fi-usage.c | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c
index a72fb4e..50c7e51 100644
--- a/cmds-fi-usage.c
+++ b/cmds-fi-usage.c
@@ -545,6 +545,7 @@ static int load_device_info(int fd, struct device_info 
**device_info_ptr,
struct btrfs_ioctl_fs_info_args fi_args;
struct btrfs_ioctl_dev_info_args dev_info;
struct device_info *info;
+   __u8 fsid[BTRFS_UUID_SIZE];
  
  	*device_info_count = 0;

*device_info_ptr = NULL;
@@ -568,6 +569,7 @@ static int load_device_info(int fd, struct device_info 
**device_info_ptr,
if (ndevs >= fi_args.num_devices) {
error("unexpected number of devices: %d >= %llu", ndevs,
(unsigned long long)fi_args.num_devices);
+   error("if seed device is used, try run as root.");
goto out;
}
memset(&dev_info, 0, sizeof(dev_info));
@@ -580,6 +582,19 @@ static int load_device_info(int fd, struct device_info 
**device_info_ptr,
goto out;
}
  
+		/*

+* Skip seed device by cheking device's fsid (require root).
+* Ignore EACCES since if seed is not used this function works
+* correctly without root privilege.
+*/
+   ret = dev_to_fsid((const char *)dev_info.path, fsid);
+   if (ret != -EACCES) {
+   if (ret)
+   goto out;
+   if (memcmp(fi_args.fsid, fsid, BTRFS_FSID_SIZE) != 0)
+   continue;
+   }
+
info[ndevs].devid = dev_info.devid;
if (!dev_info.path[0]) {
strcpy(info[ndevs].path, "missing");


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send yields "ERROR: send ioctl failed with -5: Input/output error"

2017-10-23 Thread Lakshmipathi.G
> Does anyone know why scrub did not catch these errors that show up in dmesg?

Can you try offline scrub from this repo
https://github.com/gujx2017/btrfs-progs/tree/offline_scrub and see
whether it
detects the issue?  "btrfs scrub start --offline "



Cheers,
Lakshmipathi.G
http://www.giis.co.in http://www.webminal.org
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstests: btrfs/143: make test case more reliable

2017-10-23 Thread Nikolay Borisov


On 23.10.2017 23:57, Liu Bo wrote:
> Currently drop_caches is used to invalidate file's page cache so that
> buffered read can hit disk, but the problem is that it may also
> invalidate metadata's page cache, so the test case may not get read
> errors (and repair) if reading metadata has consumed the injected
> faults.
> 
> This changes it to do 'fadvise -d' to firstly access all metadata it
> needs to locate the file and then only drops the test file's page
> cache.  Also this changes it to read the file only if pid%2 == 1.
> 
> Reported-by: Nikolay Borisov 
> Signed-off-by: Liu Bo 
> ---
>  tests/btrfs/143 | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/tests/btrfs/143 b/tests/btrfs/143
> index da7bfd8..dabd03d 100755
> --- a/tests/btrfs/143
> +++ b/tests/btrfs/143
> @@ -127,16 +127,16 @@ echo "step 3..repair the bad copy" >>$seqres.full
>  # since raid1 consists of two copies, and the bad copy was put on stripe #1
>  # while the good copy lies on stripe #0, the bad copy only gets access when 
> the
>  # reader's pid % 2 == 1 is true
> -while true; do
> - # start_fail only fails the following buffered read so the repair is
> - # supposed to work.
> - echo 3 > /proc/sys/vm/drop_caches
> - start_fail
> - $XFS_IO_PROG -c "pread 0 4K" "$SCRATCH_MNT/foobar" > /dev/null &
> - pid=$!
> - wait
> - stop_fail
> - [ $((pid % 2)) == 1 ] && break
> +while [[ -z ${result} ]]; do
> +# invalidate the page cache.
> +$XFS_IO_PROG -c "fadvise -d 0 128K" $SCRATCH_MNT/foobar

I'm a bit worried about the expectations of the DONT_NEED:

https://linux.die.net/man/2/posix_fadvise:

The advice is not binding; it merely constitutes an expectation on
behalf of the application.

This might very well be a moot point but still



> +
> +start_fail
> +result=$(bash -c "
> +if [[ \$((\$\$ % 2)) -eq 1 ]]; then
> +exec $XFS_IO_PROG -c \"pread 0 4K\" \"$SCRATCH_MNT/foobar\"
> +fi");
> +stop_fail
>  done
>  
>  _scratch_unmount
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send yields "ERROR: send ioctl failed with -5: Input/output error"

2017-10-23 Thread Zak Kohler
Yes, it is finding much more than just one error.

>From dmesg
[89520.441354] BTRFS warning (device sdn): csum failed ino 4708 off
27529216 csum 2615801759 expected csum 874979996

$ sudo btrfs scrub start --offline --progress /dev/sdn
ERROR: data at bytenr 68431499264 mirror 1 csum mismatch, have
0x5aa0d40f expect 0xd4a15873
ERROR: extent 68431474688 len 14467072 CORRUPTED, all mirror(s)
corrupted, can't be repaired
ERROR: data at bytenr 83646357504 mirror 1 csum mismatch, have
0xfc0baabe expect 0x7f9cb681
ERROR: extent 83519741952 len 134217728 CORRUPTED, all mirror(s)
corrupted, can't be repaired
ERROR: data at bytenr 121936633856 mirror 1 csum mismatch, have
0x507016a5 expect 0x50609afe
ERROR: extent 121858334720 len 134217728 CORRUPTED, all mirror(s)
corrupted, can't be repaired
ERROR: data at bytenr 144872591360 mirror 1 csum mismatch, have
0x33964d73 expect 0xf9937032
ERROR: extent 144822386688 len 61231104 CORRUPTED, all mirror(s)
corrupted, can't be repaired
ERROR: data at bytenr 167961075712 mirror 1 csum mismatch, have
0xf43bd0e3 expect 0x5be589bb
ERROR: extent 167950999552 len 27537408 CORRUPTED, all mirror(s)
corrupted, can't be repaired
ERROR: data at bytenr 175643619328 mirror 1 csum mismatch, have
0x1e168ca1 expect 0xd413b1e0
ERROR: data at bytenr 175643754496 mirror 1 csum mismatch, have
0x6cfdc8ae expect 0xa6f8f5ef
ERROR: extent 175640539136 len 6381568 CORRUPTED, all mirror(s)
corrupted, can't be repaired
ERROR: data at bytenr 183316750336 mirror 1 csum mismatch, have
0x145bdf76 expect 0x7390565e
.
and the list goes on.


Questions:
1. Using "find /mnt -inum 4708" I can link the dmesg to a specific
file. Is there a
way link the the --offline ERRORs above to the inode?

2. How could do "btrfs device stats /mnt" and normal full scrub fail
to detect the csum errors?

3. Do these errors appear to be hardware failure (despite pristine
SMART), user error on
volume creation/mounting, or an actual btrfs issue? I feel that the
need for question #1
indicates a problem with btrfs regardless of whether there is a real
hardware failure or not.


Next I will try an online scrub of only the sdn device, as before I
was running the full filesystem scrub.

On Tue, Oct 24, 2017 at 12:52 AM, Lakshmipathi.G
 wrote:
>> Does anyone know why scrub did not catch these errors that show up in dmesg?
>
> Can you try offline scrub from this repo
> https://github.com/gujx2017/btrfs-progs/tree/offline_scrub and see
> whether it
> detects the issue?  "btrfs scrub start --offline "
>
>
> 
> Cheers,
> Lakshmipathi.G
> http://www.giis.co.in http://www.webminal.org
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] Btrfs: free btrfs_device in place

2017-10-23 Thread Liu Bo
It's pointless to defer it to a kthread helper as we're not under a
special context.

For reference, commit 1f78160ce1b1 ("Btrfs: using rcu lock in the
reader side of devices list") introduced RCU freeing for device
structures.

Signed-off-by: Liu Bo 
Reviewed-by: Anand Jain 
---

v3: - Enhance changelog with commit id which introduced this for future
  reference.
- Now we can remove %rcu_work.

v2: - Clarify the lifetime of device and device->bdev respectively and
  clear the concern about raising the 'device is in use' problem.

 fs/btrfs/volumes.c | 14 ++
 fs/btrfs/volumes.h |  1 -
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d983cea..4a72c45 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -836,26 +836,16 @@ void btrfs_close_extra_devices(struct btrfs_fs_devices 
*fs_devices, int step)
mutex_unlock(&uuid_mutex);
 }
 
-static void __free_device(struct work_struct *work)
+static void free_device(struct rcu_head *head)
 {
struct btrfs_device *device;
 
-   device = container_of(work, struct btrfs_device, rcu_work);
+   device = container_of(head, struct btrfs_device, rcu);
rcu_string_free(device->name);
bio_put(device->flush_bio);
kfree(device);
 }
 
-static void free_device(struct rcu_head *head)
-{
-   struct btrfs_device *device;
-
-   device = container_of(head, struct btrfs_device, rcu);
-
-   INIT_WORK(&device->rcu_work, __free_device);
-   schedule_work(&device->rcu_work);
-}
-
 static void btrfs_close_bdev(struct btrfs_device *device)
 {
if (device->bdev && device->writeable) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6108fdf..f60c535 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -133,7 +133,6 @@ struct btrfs_device {
 
struct btrfs_work work;
struct rcu_head rcu;
-   struct work_struct rcu_work;
 
/* readahead state */
spinlock_t reada_lock;
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: add write_flags for compression bio

2017-10-23 Thread Liu Bo
Compression code path has only flaged bios with REQ_OP_WRITE no matter
where the bios come from, but it could be a sync write if fsync starts
this writeback or a normal writeback write if wb kthread starts a
periodic writeback.

It breaks the rule that sync writes and writeback writes need to be
differentiated from each other, because from the POV of block layer,
all bios need to be recognized by these flags in order to do some
management, e.g. throttlling.

This passes writeback_control to compression write path so that it can
send bios with proper flags to block layer.

Signed-off-by: Liu Bo 
---

v2: Enhance changlog with more background details.

 fs/btrfs/compression.c |  7 ---
 fs/btrfs/compression.h |  3 ++-
 fs/btrfs/extent_io.c   |  2 +-
 fs/btrfs/extent_io.h   |  3 ++-
 fs/btrfs/inode.c   | 15 +++
 5 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 280384b..3dae2f5 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -292,7 +292,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode 
*inode, u64 start,
 unsigned long len, u64 disk_start,
 unsigned long compressed_len,
 struct page **compressed_pages,
-unsigned long nr_pages)
+unsigned long nr_pages,
+unsigned int write_flags)
 {
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct bio *bio = NULL;
@@ -324,7 +325,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode 
*inode, u64 start,
bdev = fs_info->fs_devices->latest_bdev;
 
bio = btrfs_bio_alloc(bdev, first_byte);
-   bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
+   bio->bi_opf = REQ_OP_WRITE | write_flags;
bio->bi_private = cb;
bio->bi_end_io = end_compressed_bio_write;
refcount_set(&cb->pending_bios, 1);
@@ -371,7 +372,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode 
*inode, u64 start,
bio_put(bio);
 
bio = btrfs_bio_alloc(bdev, first_byte);
-   bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
+   bio->bi_opf = REQ_OP_WRITE | write_flags;
bio->bi_private = cb;
bio->bi_end_io = end_compressed_bio_write;
bio_add_page(bio, page, PAGE_SIZE, 0);
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index d2781ff..dc45b94 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -91,7 +91,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode 
*inode, u64 start,
  unsigned long len, u64 disk_start,
  unsigned long compressed_len,
  struct page **compressed_pages,
- unsigned long nr_pages);
+ unsigned long nr_pages,
+ unsigned int write_flags);
 blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 int mirror_num, unsigned long bio_flags);
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3e5bb0c..ea64ad0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3252,7 +3252,7 @@ static noinline_for_stack int writepage_delalloc(struct 
inode *inode,
   delalloc_start,
   delalloc_end,
   &page_started,
-  nr_written);
+  nr_written, wbc);
/* File system has been set read-only */
if (ret) {
SetPageError(page);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index faffa28..a92fd98 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -116,7 +116,8 @@ struct extent_io_ops {
 */
int (*fill_delalloc)(void *private_data, struct page *locked_page,
 u64 start, u64 end, int *page_started,
-unsigned long *nr_written);
+unsigned long *nr_written,
+struct writeback_control *wbc);
 
int (*writepage_start_hook)(struct page *page, u64 start, u64 end);
void (*writepage_end_io_hook)(struct page *page, u64 start, u64 end,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 128f3e5..ee67773 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -367,6 +367,7 @@ struct async_cow {
struct page *locked_page;
u64 start;
u64 end;
+   unsigned int write_flags;
struct list_head extents;
struct btrfs_work work;
 };
@@ -846,7 +8