Concerning family
Did you receive my previous email regarding your family inheritance ? Andrew. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Scrub doesn't correct coruption
Hi, I'm having problem with corruption in one file on my disk array. This is third time it happened (probably). First time I didn't checked the offending file so I'm not sure but it's likely. Btrfs scrub finds the corruption, according to both dmesg and it's output it fixes it. However, next run finds it too. However, according to SMART the disk appears to be healthy (see below). Plus the corruption is limited to one file. Is this and issue somewhere inside btrfs or is disk HW related problem? Thank you for your help :) W. smartctl -a /dev/sde SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016Pre-fail Always - 0 2 Throughput_Performance 0x0005 131 131 054Pre-fail Offline - 116 3 Spin_Up_Time0x0007 100 100 024Pre-fail Always - 0 4 Start_Stop_Count0x0012 100 100 000Old_age Always - 8 5 Reallocated_Sector_Ct 0x0033 100 100 005Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 140 140 020Pre-fail Offline - 15 9 Power_On_Hours 0x0012 100 100 000Old_age Always - 401 10 Spin_Retry_Count0x0013 100 100 060Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000Old_age Always - 8 22 Unknown_Attribute 0x0023 100 100 025Pre-fail Always - 100 192 Power-Off_Retract_Count 0x0032 100 100 000Old_age Always - 33 193 Load_Cycle_Count0x0012 100 100 000Old_age Always - 33 194 Temperature_Celsius 0x0002 147 147 000Old_age Always - 44 (Min/Max 23/46) 196 Reallocated_Event_Count 0x0032 100 100 000Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x000a 200 200 000Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_DescriptionStatus Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offlineCompleted without error 00% 357 - # 2 Short offline Completed without error 00% 335 - uname -a Linux ws 4.13.8-1-ARCH #1 SMP PREEMPT Wed Oct 18 11:49:44 CEST 2017 x86_64 GNU/Linux btrfs --version === btrfs-progs v4.13 btrfs fi show = Label: none uuid: db7e86f5-649d-44ce-9514-53c7ee0fbe09 Total devices 2 FS bytes used 9.91GiB devid1 size 103.79GiB used 20.03GiB path /dev/mapper/storage1-root devid2 size 103.79GiB used 20.03GiB path /dev/mapper/storage2-root Label: 'RAID' uuid: 9a4be3ac-e942-4e6a-bb24-2c4009a42572 Total devices 7 FS bytes used 6.48TiB devid1 size 1.82TiB used 715.03GiB path /dev/mapper/data3 devid2 size 1.82TiB used 715.00GiB path /dev/mapper/data4 devid3 size 2.73TiB used 1.40TiB path /dev/mapper/data2 devid4 size 2.73TiB used 1.40TiB path /dev/mapper/data1 devid5 size 2.73TiB used 1.40TiB path /dev/mapper/data5 devid6 size 2.73TiB used 1.40TiB path /dev/mapper/data6 devid7 size 7.28TiB used 5.95TiB path /dev/mapper/data7 btrfs fi df /raid = Data, RAID1: total=6.47TiB, used=6.47TiB System, RAID1: total=64.00MiB, used=944.00KiB Metadata, RAID1: total=9.00GiB, used=7.56GiB GlobalReserve, single: total=512.00MiB, used=0.00B dmesg = [0.00] microcode: microcode updated early to revision 0xba, date = 2017-04-09 [0.00] random: get_random_bytes called from start_kernel+0x42/0x4b7 with crng_init=0 [0.00] Linux version 4.13.8-1-ARCH (builduser@tobias) (gcc version 7.2.0 (GCC)) #1 SMP PREEMPT Wed Oct 18 11:49:44 CEST 2017 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=db7e86f5-649d-44ce-9514-53c7ee0fbe09 rw cryptdevice=UUID=eb4011d2-38cd-467d-b515-7acf3ef68f01:storage1:allow-discards cryptkey=rootfs:/boot/crypto_keyfile.bin cryptdevice2=UUID=dd0821ae-8fc4-41d2-aab8-f313e2f6d0e8:storage2:allow-discards cryptkey2=rootfs:/boot/crypto_keyfile2.bin root=/dev/mapper/storage1-root [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu: S
Re: Scrub doesn't correct coruption
On 2017年10月23日 16:39, Wolf wrote: > Hi, > I'm having problem with corruption in one file on my disk array. This is > third time it happened (probably). First time I didn't checked the > offending file so I'm not sure but it's likely. Btrfs scrub finds the > corruption, according to both dmesg and it's output it fixes it. > However, next run finds it too. > > However, according to SMART the disk appears to be healthy (see below). > Plus the corruption is limited to one file. > > Is this and issue somewhere inside btrfs or is disk HW related problem? > > Thank you for your help :) > > W. > > smartctl -a /dev/sde > > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED > WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000b 100 100 016Pre-fail Always > - 0 > 2 Throughput_Performance 0x0005 131 131 054Pre-fail Offline > - 116 > 3 Spin_Up_Time0x0007 100 100 024Pre-fail Always > - 0 > 4 Start_Stop_Count0x0012 100 100 000Old_age Always > - 8 > 5 Reallocated_Sector_Ct 0x0033 100 100 005Pre-fail Always > - 0 > 7 Seek_Error_Rate 0x000b 100 100 067Pre-fail Always > - 0 > 8 Seek_Time_Performance 0x0005 140 140 020Pre-fail Offline > - 15 > 9 Power_On_Hours 0x0012 100 100 000Old_age Always > - 401 > 10 Spin_Retry_Count0x0013 100 100 060Pre-fail Always > - 0 > 12 Power_Cycle_Count 0x0032 100 100 000Old_age Always > - 8 > 22 Unknown_Attribute 0x0023 100 100 025Pre-fail Always > - 100 > 192 Power-Off_Retract_Count 0x0032 100 100 000Old_age Always > - 33 > 193 Load_Cycle_Count0x0012 100 100 000Old_age Always > - 33 > 194 Temperature_Celsius 0x0002 147 147 000Old_age Always > - 44 (Min/Max 23/46) > 196 Reallocated_Event_Count 0x0032 100 100 000Old_age Always > - 0 > 197 Current_Pending_Sector 0x0022 100 100 000Old_age Always > - 0 > 198 Offline_Uncorrectable 0x0008 100 100 000Old_age Offline > - 0 > 199 UDMA_CRC_Error_Count0x000a 200 200 000Old_age Always > - 0 > > SMART Error Log Version: 1 > No Errors Logged > > SMART Self-test log structure revision number 1 > Num Test_DescriptionStatus Remaining LifeTime(hours) > LBA_of_first_error > # 1 Extended offlineCompleted without error 00% 357 - > # 2 Short offline Completed without error 00% 335 - > > uname -a > > > Linux ws 4.13.8-1-ARCH #1 SMP PREEMPT Wed Oct 18 11:49:44 CEST 2017 x86_64 > GNU/Linux > > btrfs --version > === > > btrfs-progs v4.13 > > btrfs fi show > = > > Label: none uuid: db7e86f5-649d-44ce-9514-53c7ee0fbe09 > Total devices 2 FS bytes used 9.91GiB > devid1 size 103.79GiB used 20.03GiB path /dev/mapper/storage1-root > devid2 size 103.79GiB used 20.03GiB path /dev/mapper/storage2-root > > Label: 'RAID' uuid: 9a4be3ac-e942-4e6a-bb24-2c4009a42572 > Total devices 7 FS bytes used 6.48TiB > devid1 size 1.82TiB used 715.03GiB path /dev/mapper/data3 > devid2 size 1.82TiB used 715.00GiB path /dev/mapper/data4 > devid3 size 2.73TiB used 1.40TiB path /dev/mapper/data2 > devid4 size 2.73TiB used 1.40TiB path /dev/mapper/data1 > devid5 size 2.73TiB used 1.40TiB path /dev/mapper/data5 > devid6 size 2.73TiB used 1.40TiB path /dev/mapper/data6 > devid7 size 7.28TiB used 5.95TiB path /dev/mapper/data7 > > btrfs fi df /raid > = > > Data, RAID1: total=6.47TiB, used=6.47TiB > System, RAID1: total=64.00MiB, used=944.00KiB > Metadata, RAID1: total=9.00GiB, used=7.56GiB > GlobalReserve, single: total=512.00MiB, used=0.00B RAID1 for both data and meta. So if nothing went wrong, it should be fixed. And IIRC RAID1 repair is already tested and checked, so it should not has such problem. > > dmesg > = > > [0.00] microcode: microcode updated early to revision 0xba, date = > 2017-04-09 > [0.00] random: get_random_bytes called from start_kernel+0x42/0x4b7 > with crng_init=0 > [0.00] Linux version 4.13.8-1-ARCH (builduser@tobias) (gcc version > 7.2.0 (GCC)) #1 SMP PREEMPT Wed Oct 18 11:49:44 CEST 2017 Arch user here too. > [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-linux > root=UUID=db7e86f5-649d-44ce-9514-53c7ee0fbe09 rw > cryptdevice=UUID=eb4011d2-38cd-467d-b515-7acf3ef68f01:storage1:allow-discards > cryptkey=roo
Re: Scrub doesn't correct coruption
On 2017年10月23日 17:17, Qu Wenruo wrote: > > > On 2017年10月23日 16:39, Wolf wrote: >> Hi, >> I'm having problem with corruption in one file on my disk array. This is >> third time it happened (probably). First time I didn't checked the >> offending file so I'm not sure but it's likely. Btrfs scrub finds the >> corruption, according to both dmesg and it's output it fixes it. >> However, next run finds it too. >> >> However, according to SMART the disk appears to be healthy (see below). >> Plus the corruption is limited to one file. >> >> Is this and issue somewhere inside btrfs or is disk HW related problem? >> >> Thank you for your help :) >> >> W. >> >> smartctl -a /dev/sde >> >> >> SMART Attributes Data Structure revision number: 16 >> Vendor Specific SMART Attributes with Thresholds: >> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED >> WHEN_FAILED RAW_VALUE >> 1 Raw_Read_Error_Rate 0x000b 100 100 016Pre-fail Always >> - 0 >> 2 Throughput_Performance 0x0005 131 131 054Pre-fail Offline >> - 116 >> 3 Spin_Up_Time0x0007 100 100 024Pre-fail Always >> - 0 >> 4 Start_Stop_Count0x0012 100 100 000Old_age Always >> - 8 >> 5 Reallocated_Sector_Ct 0x0033 100 100 005Pre-fail Always >> - 0 >> 7 Seek_Error_Rate 0x000b 100 100 067Pre-fail Always >> - 0 >> 8 Seek_Time_Performance 0x0005 140 140 020Pre-fail Offline >> - 15 >> 9 Power_On_Hours 0x0012 100 100 000Old_age Always >> - 401 >> 10 Spin_Retry_Count0x0013 100 100 060Pre-fail Always >> - 0 >> 12 Power_Cycle_Count 0x0032 100 100 000Old_age Always >> - 8 >> 22 Unknown_Attribute 0x0023 100 100 025Pre-fail Always >> - 100 >> 192 Power-Off_Retract_Count 0x0032 100 100 000Old_age Always >> - 33 >> 193 Load_Cycle_Count0x0012 100 100 000Old_age Always >> - 33 >> 194 Temperature_Celsius 0x0002 147 147 000Old_age Always >> - 44 (Min/Max 23/46) >> 196 Reallocated_Event_Count 0x0032 100 100 000Old_age Always >> - 0 >> 197 Current_Pending_Sector 0x0022 100 100 000Old_age Always >> - 0 >> 198 Offline_Uncorrectable 0x0008 100 100 000Old_age Offline >> - 0 >> 199 UDMA_CRC_Error_Count0x000a 200 200 000Old_age Always >> - 0 >> >> SMART Error Log Version: 1 >> No Errors Logged >> >> SMART Self-test log structure revision number 1 >> Num Test_DescriptionStatus Remaining LifeTime(hours) >> LBA_of_first_error >> # 1 Extended offlineCompleted without error 00% 357 >> - >> # 2 Short offline Completed without error 00% 335 >> - >> >> uname -a >> >> >> Linux ws 4.13.8-1-ARCH #1 SMP PREEMPT Wed Oct 18 11:49:44 CEST 2017 x86_64 >> GNU/Linux >> >> btrfs --version >> === >> >> btrfs-progs v4.13 >> >> btrfs fi show >> = >> >> Label: none uuid: db7e86f5-649d-44ce-9514-53c7ee0fbe09 >> Total devices 2 FS bytes used 9.91GiB >> devid1 size 103.79GiB used 20.03GiB path /dev/mapper/storage1-root >> devid2 size 103.79GiB used 20.03GiB path /dev/mapper/storage2-root >> >> Label: 'RAID' uuid: 9a4be3ac-e942-4e6a-bb24-2c4009a42572 >> Total devices 7 FS bytes used 6.48TiB >> devid1 size 1.82TiB used 715.03GiB path /dev/mapper/data3 >> devid2 size 1.82TiB used 715.00GiB path /dev/mapper/data4 >> devid3 size 2.73TiB used 1.40TiB path /dev/mapper/data2 >> devid4 size 2.73TiB used 1.40TiB path /dev/mapper/data1 >> devid5 size 2.73TiB used 1.40TiB path /dev/mapper/data5 >> devid6 size 2.73TiB used 1.40TiB path /dev/mapper/data6 >> devid7 size 7.28TiB used 5.95TiB path /dev/mapper/data7 >> >> btrfs fi df /raid >> = >> >> Data, RAID1: total=6.47TiB, used=6.47TiB >> System, RAID1: total=64.00MiB, used=944.00KiB >> Metadata, RAID1: total=9.00GiB, used=7.56GiB >> GlobalReserve, single: total=512.00MiB, used=0.00B > > RAID1 for both data and meta. > So if nothing went wrong, it should be fixed. > > And IIRC RAID1 repair is already tested and checked, so it should not > has such problem. > >> >> dmesg >> = >> >> [0.00] microcode: microcode updated early to revision 0xba, date = >> 2017-04-09 >> [0.00] random: get_random_bytes called from start_kernel+0x42/0x4b7 >> with crng_init=0 >> [0.00] Linux version 4.13.8-1-ARCH (builduser@tobias) (gcc version >> 7.2.0 (GCC)) #1 SMP PREEMPT Wed Oct 18 11:49:44 CEST 2017 > > Arch user here too. > >> [0.00] Command line: BOOT_IMAGE=/boot/vm
Re: Scrub doesn't correct coruption
On 10/23/2017 10:39 AM, Wolf wrote: > [...] > > Is this and issue somewhere inside btrfs or is disk HW related problem? Highly unlikely hardware related. According to SMART and dmsg, there's no indication which would suggest disk failure. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs: Make btrfs_async_run_delayed_root use a loop rather than multiple labels
Currently btrfs_async_run_delayed_root's implementation uses 3 goto labels to mimic the functionality of a simple do {} while loop. Refactor the function to use a do {} while construct, making intention clear and code easier to follow. No functional changes Signed-off-by: Nikolay Borisov --- fs/btrfs/delayed-inode.c | 52 +--- 1 file changed, 27 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 19e4ad2f3f2e..1bfdb90d7633 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1323,40 +1323,42 @@ static void btrfs_async_run_delayed_root(struct btrfs_work *work) if (!path) goto out; -again: - if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND / 2) - goto free_path; + do { + if (atomic_read(&delayed_root->items) < + BTRFS_DELAYED_BACKGROUND / 2) + break; - delayed_node = btrfs_first_prepared_delayed_node(delayed_root); - if (!delayed_node) - goto free_path; + delayed_node = btrfs_first_prepared_delayed_node(delayed_root); + if (!delayed_node) + break; - path->leave_spinning = 1; - root = delayed_node->root; + path->leave_spinning = 1; + root = delayed_node->root; - trans = btrfs_join_transaction(root); - if (IS_ERR(trans)) - goto release_path; + trans = btrfs_join_transaction(root); + if (IS_ERR(trans)) { + btrfs_release_path(path); + btrfs_release_prepared_delayed_node(delayed_node); + total_done++; + continue; + } - block_rsv = trans->block_rsv; - trans->block_rsv = &root->fs_info->delayed_block_rsv; + block_rsv = trans->block_rsv; + trans->block_rsv = &root->fs_info->delayed_block_rsv; - __btrfs_commit_inode_delayed_items(trans, path, delayed_node); + __btrfs_commit_inode_delayed_items(trans, path, delayed_node); - trans->block_rsv = block_rsv; - btrfs_end_transaction(trans); - btrfs_btree_balance_dirty_nodelay(root->fs_info); + trans->block_rsv = block_rsv; + btrfs_end_transaction(trans); + btrfs_btree_balance_dirty_nodelay(root->fs_info); -release_path: - btrfs_release_path(path); - total_done++; + btrfs_release_path(path); + btrfs_release_prepared_delayed_node(delayed_node); + total_done++; - btrfs_release_prepared_delayed_node(delayed_node); - if ((async_work->nr == 0 && total_done < BTRFS_DELAYED_WRITEBACK) || - total_done < async_work->nr) - goto again; + } while ((async_work->nr == 0 && total_done < BTRFS_DELAYED_WRITEBACK) +|| total_done < async_work->nr); -free_path: btrfs_free_path(path); out: wake_up(&delayed_root->wait); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] btrfs: Move checks from btrfs_wq_run_delayed_node to btrfs_balance_delayed_items
btrfs_balance_delayed_items is the sole caller of btrfs_wq_run_delayed_node and already includes one of the checks whether the delayed inodes should be run. On the other hand btrfs_wq_run_delayed_node duplicates that check and performs an additional one for wq congestion. Let's remove the duplicate check and move the congestion one in btrfs_balance_delayed_items, leaving btrfs_wq_run_delayed_node to only care about setting up the wq run. No functional changes. Signed-off-by: Nikolay Borisov --- fs/btrfs/delayed-inode.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 1bfdb90d7633..b7a0ec2c41e6 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1371,10 +1371,6 @@ static int btrfs_wq_run_delayed_node(struct btrfs_delayed_root *delayed_root, { struct btrfs_async_delayed_work *async_work; - if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND || - btrfs_workqueue_normal_congested(fs_info->delayed_workers)) - return 0; - async_work = kmalloc(sizeof(*async_work), GFP_NOFS); if (!async_work) return -ENOMEM; @@ -1410,7 +1406,8 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info *fs_info) { struct btrfs_delayed_root *delayed_root = fs_info->delayed_root; - if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND) + if ((atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND) || + btrfs_workqueue_normal_congested(fs_info->delayed_workers)) return; if (atomic_read(&delayed_root->items) >= BTRFS_DELAYED_WRITEBACK) { -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs: Make btrfs_async_run_delayed_root use a loop rather than multiple labels
On 2017年10月23日 18:51, Nikolay Borisov wrote: > Currently btrfs_async_run_delayed_root's implementation uses 3 goto labels to > mimic the functionality of a simple do {} while loop. Refactor the function > to use a do {} while construct, making intention clear and code easier to > follow. No functional changes > > Signed-off-by: Nikolay Borisov Looks good to me. Reviewed-by: Qu Wenruo > --- > fs/btrfs/delayed-inode.c | 52 > +--- > 1 file changed, 27 insertions(+), 25 deletions(-) > > diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c > index 19e4ad2f3f2e..1bfdb90d7633 100644 > --- a/fs/btrfs/delayed-inode.c > +++ b/fs/btrfs/delayed-inode.c > @@ -1323,40 +1323,42 @@ static void btrfs_async_run_delayed_root(struct > btrfs_work *work) > if (!path) > goto out; > > -again: > - if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND / 2) > - goto free_path; > + do { > + if (atomic_read(&delayed_root->items) < > + BTRFS_DELAYED_BACKGROUND / 2) > + break; > > - delayed_node = btrfs_first_prepared_delayed_node(delayed_root); > - if (!delayed_node) > - goto free_path; > + delayed_node = btrfs_first_prepared_delayed_node(delayed_root); > + if (!delayed_node) > + break; > > - path->leave_spinning = 1; > - root = delayed_node->root; > + path->leave_spinning = 1; > + root = delayed_node->root; > > - trans = btrfs_join_transaction(root); > - if (IS_ERR(trans)) > - goto release_path; > + trans = btrfs_join_transaction(root); > + if (IS_ERR(trans)) { > + btrfs_release_path(path); > + btrfs_release_prepared_delayed_node(delayed_node); > + total_done++; > + continue; > + } > > - block_rsv = trans->block_rsv; > - trans->block_rsv = &root->fs_info->delayed_block_rsv; > + block_rsv = trans->block_rsv; > + trans->block_rsv = &root->fs_info->delayed_block_rsv; > > - __btrfs_commit_inode_delayed_items(trans, path, delayed_node); > + __btrfs_commit_inode_delayed_items(trans, path, delayed_node); > > - trans->block_rsv = block_rsv; > - btrfs_end_transaction(trans); > - btrfs_btree_balance_dirty_nodelay(root->fs_info); > + trans->block_rsv = block_rsv; > + btrfs_end_transaction(trans); > + btrfs_btree_balance_dirty_nodelay(root->fs_info); > > -release_path: > - btrfs_release_path(path); > - total_done++; > + btrfs_release_path(path); > + btrfs_release_prepared_delayed_node(delayed_node); > + total_done++; > > - btrfs_release_prepared_delayed_node(delayed_node); > - if ((async_work->nr == 0 && total_done < BTRFS_DELAYED_WRITEBACK) || > - total_done < async_work->nr) > - goto again; > + } while ((async_work->nr == 0 && total_done < BTRFS_DELAYED_WRITEBACK) > + || total_done < async_work->nr); > > -free_path: > btrfs_free_path(path); > out: > wake_up(&delayed_root->wait); > signature.asc Description: OpenPGP digital signature
Re: [PATCH 2/2] btrfs: Move checks from btrfs_wq_run_delayed_node to btrfs_balance_delayed_items
On 2017年10月23日 18:51, Nikolay Borisov wrote: > btrfs_balance_delayed_items is the sole caller of btrfs_wq_run_delayed_node > and > already includes one of the checks whether the delayed inodes should be run. > On > the other hand btrfs_wq_run_delayed_node duplicates that check and performs an > additional one for wq congestion. > > Let's remove the duplicate check and move the congestion one in > btrfs_balance_delayed_items, leaving btrfs_wq_run_delayed_node to only care > about setting up the wq run. No functional changes. > > Signed-off-by: Nikolay Borisov btrfs_workqueue_normal_congested() is moved to the caller and removed duplicated atomic_read(). Unless delayed_root->items get modified in the period, it should be good. But anyway, the original code has nothing to protect different atomic_read(), so I don't think it will cause any new problem. Reviewed-by: Qu Wenruo Thanks, Qu > --- > fs/btrfs/delayed-inode.c | 7 ++- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c > index 1bfdb90d7633..b7a0ec2c41e6 100644 > --- a/fs/btrfs/delayed-inode.c > +++ b/fs/btrfs/delayed-inode.c > @@ -1371,10 +1371,6 @@ static int btrfs_wq_run_delayed_node(struct > btrfs_delayed_root *delayed_root, > { > struct btrfs_async_delayed_work *async_work; > > - if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND || > - btrfs_workqueue_normal_congested(fs_info->delayed_workers)) > - return 0; > - > async_work = kmalloc(sizeof(*async_work), GFP_NOFS); > if (!async_work) > return -ENOMEM; > @@ -1410,7 +1406,8 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info > *fs_info) > { > struct btrfs_delayed_root *delayed_root = fs_info->delayed_root; > > - if (atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND) > + if ((atomic_read(&delayed_root->items) < BTRFS_DELAYED_BACKGROUND) || > + btrfs_workqueue_normal_congested(fs_info->delayed_workers)) > return; > > if (atomic_read(&delayed_root->items) >= BTRFS_DELAYED_WRITEBACK) { > signature.asc Description: OpenPGP digital signature
Re: [PATCH v4] btrfs: Fix transaction abort during failure in btrfs_rm_dev_item
On 10/23/2017 12:58 AM, Nikolay Borisov wrote: > btrfs_rm_dev_item calls several function under an activa transaction, however ^^ active > it fails to abort it if an error happens. Fix this by adding explicit > btrfs_abort_transaction/btrfs_end_transaction calls > > Signed-off-by: Nikolay Borisov > --- > V4: > * Reorder the code a bit to prevent duplication of btrfs_free_path > invocation. > > * Collapse the handling of btrfs_search_slot return value in a single if > branch rather than having it spread across 2 branches > > V3: > * The path needs to be freed before the the transaction is comitted > otherwise > we will deadlock. > fs/btrfs/volumes.c | 20 > 1 file changed, 12 insertions(+), 8 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 0e8f16c305df..8b139d203f8c 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -1765,20 +1765,24 @@ static int btrfs_rm_dev_item(struct btrfs_fs_info > *fs_info, > key.offset = device->devid; > > ret = btrfs_search_slot(trans, root, &key, path, -1, 1); > - if (ret < 0) > - goto out; > - > - if (ret > 0) { > - ret = -ENOENT; > + if (ret) { > + if (ret > 0) > + ret = -ENOENT; > + btrfs_abort_transaction(trans, ret); > + btrfs_end_transaction(trans); > goto out; > } > > ret = btrfs_del_item(trans, root, path); > - if (ret) > - goto out; > + if (ret) { > + btrfs_abort_transaction(trans, ret); > + btrfs_end_transaction(trans); > + } > + > out: > btrfs_free_path(path); > - btrfs_commit_transaction(trans); > + if (!ret) > + ret = btrfs_commit_transaction(trans); > return ret; > } > > Perhaps slightly simpler (and the 'out:' label maybe goes away): . ret = btrfs_search_slot(trans, root, &key, path, -1, 1); if (ret > 0) ret = -ENOENT; else if (!ret) ret = btrfs_del_item(trans, root, path); if (ret) { btrfs_abort_transaction(trans, ret); btrfs_end_transaction(trans); } out: . Ed -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 0/6] Btrfs: populate heuristic with code
2017-10-22 16:44 GMT+03:00 Timofey Titovets : > 2017-10-20 16:45 GMT+03:00 David Sterba : >> On Fri, Oct 20, 2017 at 01:48:01AM +0300, Timofey Titovets wrote: >>> 2017-10-19 18:39 GMT+03:00 David Sterba : >>> > On Fri, Sep 29, 2017 at 06:22:00PM +0200, David Sterba wrote: >>> >> On Thu, Sep 28, 2017 at 05:33:35PM +0300, Timofey Titovets wrote: >>> >> > Compile tested, hand tested on live system >>> >> > >>> >> > Change v7 -> v8 >>> >> > - All code moved to compression.c (again) >>> >> > - Heuristic workspaces inmplemented another way >>> >> > i.e. only share logic with compression workspaces >>> >> > - Some style fixes suggested by Devid >>> >> > - Move sampling function from heuristic code >>> >> > (I'm afraid of big functions) >>> >> > - Much more comments and explanations >>> >> >>> >> Thanks for the update, I went through the patches and they looked good >>> >> enough to be put into for-next. I may have more comments about a few >>> >> things, but nothing serious that would hinder testing. >>> > >>> > I did a final pass through the patches and edited comments wehre I was >>> > not able to undrerstand them. Please check the updated patches in [1] if >>> > I did not accidentally change the meaning. >>> >>> I don't see a link [1] in mail, may be you missed it? >> >> Yeah, sorry: >> https://github.com/kdave/btrfs-devel/commits/ext/timofey/heuristic > > I did re-read updated comments, looks ok to me > (i only found one typo, leave a comment). > > > Thanks > -- > Have a nice day, > Timofey. Can you please try that patch? (in attach) I think some time about performance hit of heuristic and how to avoid using sorting, That patch will try prefind min/max values (before sorting) in array, and (max - min), used to filter edge data cases where byte core size < 64 or bigger > 200 It's a bit hacky workaround =\, That show a ~same speedup on my data set as show using of radix sort. (i.e. x2 speed up) Thanks. -- Have a nice day, Timofey. From fb2a329828e64ad0e224a8cb97dbc17147149629 Mon Sep 17 00:00:00 2001 From: Timofey Titovets Date: Mon, 23 Oct 2017 21:24:29 +0300 Subject: [PATCH] Btrfs: heuristic try avoid bucket sorting on edge data cases Heap sort used in kernel are too slow and costly, So let's make some statistic assume about egde input data cases Based on observation of difference between min/max values in bucket. Signed-off-by: Timofey Titovets --- fs/btrfs/compression.c | 38 ++ 1 file changed, 38 insertions(+) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 0ca16909894e..56b67ec4fb5b 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -1310,8 +1310,46 @@ static int byte_core_set_size(struct heuristic_ws *ws) u32 i; u32 coreset_sum = 0; const u32 core_set_threshold = ws->sample_size * 90 / 100; + struct bucket_item *max, *min; + struct bucket_item tmp; struct bucket_item *bucket = ws->bucket; + + /* Presort for find min/max value */ + max = &bucket[0]; + min = &bucket[BUCKET_SIZE - 1]; + for (i = 1; i < BUCKET_SIZE - 1; i++) { + if (bucket[i].count > max->count) { + tmp = *max; + *max = bucket[i]; + bucket[i] = tmp; + } + if (bucket[i].count < min->count) { + tmp = *min; + *min = bucket[i]; + bucket[i] = tmp; + } + } + + /* + * Hacks for avoid sorting on Edge data cases (sorting too constly) + * i.e. that will fast filter easy compressible + * and bad compressible data + * Based on observation of number distribution on different data sets + * + * Assume 1: For bad compressible data distribution between min/max + * will be less then 0.6% of sample size + * + * Assume 2: For good compressible data distribution between min/max + * will be far bigger then 4% of sample size + */ + + if (max->count - min->count < ws->sample_size * 6 / 1000) + return BYTE_CORE_SET_HIGH + 1; + + if (max->count - min->count > ws->sample_size * 4 / 100) + return BYTE_CORE_SET_LOW - 1; + /* Sort in reverse order */ sort(bucket, BUCKET_SIZE, sizeof(*bucket), &bucket_comp_rev, NULL); -- 2.14.2
Re: Scrub doesn't correct coruption
On , Qu Wenruo wrote: > [27240.680874] perf: interrupt took too long (3952 > 3942), lowering > kernel.perf_event_max_sample_rate to 50400 > > [30658.875802] BTRFS warning (device dm-12): checksum error at logical > > 37889245122560 on dev /dev/mapper/data7, sector 2743145096, root 23674, > > inode 206751, offset 762638336, length 4096, links 1 (path: > > アニメ/!waiting_for_better_quality/Gate: Jieitai Kanochi nite, Kaku > > Tatakaeri/GATE Jieitai Kanochi nite, Kaku Tatakaeri 05v2.mp4) > > Well, it's several seasons ago, and I think there are better BDrip raws now. > (Yeah, I'm also an Otaku) > > Despite that, it's better to hide such personal info though. Since downloading stuff from internet is legal in my country I don't usually bother to hide stuff like this, but will do so if it's an issue in this mailing list. > And, did you tried to scrub the corrupted device other than the whole fs? > Btrfs default scrub will start threads to scrub all devices at the same > time, maybe some concurrency caused the false alert. Tbh I had no idea I can scrub just the device and not whole filesystem, running it now (but the scrub on this drive takes like 12 hours so I see tomorrow if it helped). > Also, it could be possible to check/repair it by using btrfs-progs. > Although it's still out-of-tree. > > Could you please try the following branch and use "btrfs scrub start > --offline /dev/mapper/data7" to check if it reports the corruption is > fixable? > https://github.com/gujx2017/btrfs-progs/tree/offline_scrub > > Offline scrub gives us a quite good reference on whether it's fixable, > without the possible hassle in kernel. > So it's worth trying. If scrubbing just the device doesn't help in any way, will give it a try. > (But hey, there is better better BDrip raws already, so I don't think > you're really interested in fixing the corruption) True. Plus since it's RAID1 no data were actually lost and the is working without problem. I'm mainly interested in knowing if it's 1) Issue with HW 2) Some hidden issue with the whole fs and it's going to fall apart soon Thanks for tips so far :) W. -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. signature.asc Description: PGP signature
Re: Scrub doesn't correct coruption
On , ein wrote: > On 10/23/2017 10:39 AM, Wolf wrote: > > [...] > > > > Is this and issue somewhere inside btrfs or is disk HW related problem? > > Highly unlikely hardware related. According to SMART and dmsg, there's > no indication which would suggest disk failure. That's my thinking too (and the reason while the disk is still in the array instead of going back for warranty), but since the scrub failed to correct the issue despite saying it did, I'm a bit curious what's going on. W. -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. signature.asc Description: PGP signature
[PATCH] fstests: btrfs/143: make test case more reliable
Currently drop_caches is used to invalidate file's page cache so that buffered read can hit disk, but the problem is that it may also invalidate metadata's page cache, so the test case may not get read errors (and repair) if reading metadata has consumed the injected faults. This changes it to do 'fadvise -d' to firstly access all metadata it needs to locate the file and then only drops the test file's page cache. Also this changes it to read the file only if pid%2 == 1. Reported-by: Nikolay Borisov Signed-off-by: Liu Bo --- tests/btrfs/143 | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/tests/btrfs/143 b/tests/btrfs/143 index da7bfd8..dabd03d 100755 --- a/tests/btrfs/143 +++ b/tests/btrfs/143 @@ -127,16 +127,16 @@ echo "step 3..repair the bad copy" >>$seqres.full # since raid1 consists of two copies, and the bad copy was put on stripe #1 # while the good copy lies on stripe #0, the bad copy only gets access when the # reader's pid % 2 == 1 is true -while true; do - # start_fail only fails the following buffered read so the repair is - # supposed to work. - echo 3 > /proc/sys/vm/drop_caches - start_fail - $XFS_IO_PROG -c "pread 0 4K" "$SCRATCH_MNT/foobar" > /dev/null & - pid=$! - wait - stop_fail - [ $((pid % 2)) == 1 ] && break +while [[ -z ${result} ]]; do +# invalidate the page cache. +$XFS_IO_PROG -c "fadvise -d 0 128K" $SCRATCH_MNT/foobar + +start_fail +result=$(bash -c " +if [[ \$((\$\$ % 2)) -eq 1 ]]; then +exec $XFS_IO_PROG -c \"pread 0 4K\" \"$SCRATCH_MNT/foobar\" +fi"); +stop_fail done _scratch_unmount -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] Btrfs: compress_file_range() change page dirty status once
We need to call extent_range_clear_dirty_for_io() on compression range to prevent application from changing page content, while pages compressing. extent_range_clear_dirty_for_io() run on each loop iteration, "(end - start)" can be much (up to 1024 times) bigger then compression range (BTRFS_MAX_UNCOMPRESSED). That produce extra calls to page managment code. Fix that behaviour by call extent_range_clear_dirty_for_io() only once. v1 -> v2: - Make that more obviously and more safeprone v2 -> v3: - Rebased on: Btrfs: compress_file_range() remove dead variable num_bytes - Update change log - Add comments v3 -> v4: - Rebased on: kdave for-next - To avoid dirty bit clear/set behaviour change call clear_bit once, istead of per compression range Signed-off-by: Timofey Titovets --- fs/btrfs/inode.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b93fe05a39c7..5816dd3cb6e6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -536,8 +536,10 @@ static noinline void compress_file_range(struct inode *inode, * If the compression fails for any reason, we set the pages * dirty again later on. */ - extent_range_clear_dirty_for_io(inode, start, end); - redirty = 1; + if (!redirty) { + extent_range_clear_dirty_for_io(inode, start, end); + redirty = 1; + } /* Compression level is applied here and only here */ ret = btrfs_compress_pages( -- 2.14.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send yields "ERROR: send ioctl failed with -5: Input/output error"
All three devices completed the 'long' SMART selftest without error: # 1 Extended offlineCompleted without error 00% Here is the standard data that I forgot to include in my first message: Running Arch linux $ uname -a Linux HOSTNAME 4.9.56-1-lts #1 SMP Thu Oct 12 22:34:15 CEST 2017 x86_64 GNU/Linux $ btrfs --version btrfs-progs v4.13 $ sudo btrfs fi show Label: 'CRUCIAL116' uuid: 31c38558-c8c7-49c4-8fea-9d0730ee58a7 Total devices 1 FS bytes used 7.77GiB devid1 size 59.62GiB used 59.62GiB path /dev/sda2 Label: 'OfflineJ' uuid: 88406942-e3e1-42c6-ad71-e23bb315caa7 Total devices 3 FS bytes used 1.98TiB devid1 size 1.82TiB used 679.00GiB path /dev/sdi devid2 size 1.82TiB used 679.01GiB path /dev/sdh devid3 size 1.82TiB used 679.01GiB path /dev/sdn $ sudo btrfs fi df /mnt Data, RAID0: total=1.98TiB, used=1.98TiB System, RAID1: total=8.00MiB, used=144.00KiB Metadata, RAID1: total=3.00GiB, used=2.44GiB GlobalReserve, single: total=512.00MiB, used=0.00B $ dmesg | grep BTRFS [5.262090] BTRFS: device label CRUCIAL116 devid 1 transid 98407 /dev/sda2 [ 15.636475] BTRFS: device label OfflineJ devid 2 transid 612 /dev/sdh [ 15.646343] BTRFS: device label OfflineJ devid 1 transid 612 /dev/sdi [ 15.647194] BTRFS: device label OfflineJ devid 3 transid 612 /dev/sdn [ 15.754204] BTRFS info (device sda2): disk space caching is enabled [ 15.754206] BTRFS info (device sda2): has skinny extents [ 15.778659] BTRFS info (device sda2): detected SSD devices, enabling SSD mode [ 58.492530] BTRFS info (device sdn): disk space caching is enabled [ 58.492532] BTRFS info (device sdn): has skinny extents [ 61.243226] BTRFS info (device sdn): checking UUID tree [ 114.437424] BTRFS warning (device sdn): csum failed ino 6407 off 7683907584 csum 1745651892 expected csum 3952841867 [ 114.450699] BTRFS warning (device sdn): csum failed ino 6407 off 7683907584 csum 1745651892 expected csum 3952841867 [38494.978379] BTRFS warning (device sdn): csum failed ino 4708 off 27529216 csum 876064455 expected csum 874979996 [38494.989301] BTRFS warning (device sdn): csum failed ino 4708 off 27529216 csum 2615801759 expected csum 874979996 [38541.079264] BTRFS warning (device sdn): csum failed ino 4708 off 27529216 csum 876064455 expected csum 874979996 [38571.245421] BTRFS warning (device sdn): csum failed ino 4708 off 27529216 csum 2615801759 expected csum 874979996 [39434.215600] BTRFS warning (device sdn): csum failed ino 4708 off 27529216 csum 2615801759 expected csum 874979996 [73132.653297] BTRFS warning (device sdn): csum failed ino 4708 off 27529216 csum 2615801759 expected csum 874979996 [73167.897106] BTRFS warning (device sdn): csum failed ino 4708 off 27529216 csum 2615801759 expected csum 874979996 One thing I notice is that ino 4708 keeps returns a few different 'wrong' csums, I can also confirm that one of those 'csum failed' messages gets written each time I run '$ sudo btrfs send /mnt/dataroot.2017.10.21/ | pv -i5 > /dev/null' Does anyone know why scrub did not catch these errors that show up in dmesg? On Mon, Oct 23, 2017 at 12:25 AM, Zak Kohler wrote: > Was attempting my first btrfs send receive over ssh and continually > received ioctl error at different points but always in the first 3 > minutes. The volume consists of three devices with only metadata > duplication. I narrowed down the error to the send command by > recreating the error while redirecting to /dev/null. Sometime it would > happen after ~12Gib, or ~7.6Gib, right now rerunning multiple times it > has stopped on exactly 3.76 multiple times. > > $ sudo btrfs send /mnt/dataroot.2017.10.21/ | pv -i5 > /dev/null > At subvol /mnt/dataroot.2017.10.21/ > ERROR: send ioctl failed with -5: Input/output error] > 3.76GiB 0:00:13 [ 290MiB/s] [ <=> ] > > > First I checked the btrfs device stats, each of the 3 drives appear clean: > $ sudo btrfs device stats /mnt > [/dev/sdi].write_io_errs0 > [/dev/sdi].read_io_errs 0 > [/dev/sdi].flush_io_errs0 > [/dev/sdi].corruption_errs 0 > [/dev/sdi].generation_errs 0 > [/dev/sdh].write_io_errs0 > [/dev/sdh].read_io_errs 0 > [/dev/sdh].flush_io_errs0 > [/dev/sdh].corruption_errs 0 > [/dev/sdh].generation_errs 0 > [/dev/sdn].write_io_errs0 > [/dev/sdn].read_io_errs 0 > [/dev/sdn].flush_io_errs0 > [/dev/sdn].corruption_errs 0 > [/dev/sdn].generation_errs 0 > > The next thing I tried was running and checking that SMART short > selftest passed on each of three drives with no error. > $ sudo smartctl -l selftest /dev/sdh > # 1 Short offline Completed without error > > > I read somewhere to check dmesg, which yielded some info: > BTRFS warning (device sdn): csum failed ino 6407 off 7683907584 csum > 1745651892 expected csum 3952841867 > > But when I when to see if scrub could detect the errors, nothing was found: > $ sudo btrfs scrub
Re: [PATCH 1/2] btrfs-progs: fi: move dev_to_fsid() to cmds-fi-usage for later use
On 10/23/2017 12:44 PM, Misono, Tomohiro wrote: Move dev_to_fsid() from cmds-filesystem.c to cmds-fi-usage.c in order to call it from both "fi show" and "fi usage". Signed-off-by: Tomohiro Misono Reviewed-by: Anand Jain Thanks, Anand --- cmds-fi-usage.c | 29 + cmds-fi-usage.h | 1 + cmds-filesystem.c | 27 --- 3 files changed, 30 insertions(+), 27 deletions(-) diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c index 6c846c1..a72fb4e 100644 --- a/cmds-fi-usage.c +++ b/cmds-fi-usage.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "utils.h" #include "kerncompat.h" @@ -29,6 +30,7 @@ #include "string-table.h" #include "cmds-fi-usage.h" #include "commands.h" +#include "disk-io.h" #include "version.h" #include "help.h" @@ -506,6 +508,33 @@ static int cmp_device_info(const void *a, const void *b) ((struct device_info *)b)->path); } +int dev_to_fsid(const char *dev, __u8 *fsid) +{ + struct btrfs_super_block *disk_super; + char buf[BTRFS_SUPER_INFO_SIZE]; + int ret; + int fd; + + fd = open(dev, O_RDONLY); + if (fd < 0) { + ret = -errno; + return ret; + } + + disk_super = (struct btrfs_super_block *)buf; + ret = btrfs_read_dev_super(fd, disk_super, + BTRFS_SUPER_INFO_OFFSET, SBREAD_DEFAULT); + if (ret) + goto out; + + memcpy(fsid, disk_super->fsid, BTRFS_FSID_SIZE); + ret = 0; + +out: + close(fd); + return ret; +} + /* * This function loads the device_info structure and put them in an array */ diff --git a/cmds-fi-usage.h b/cmds-fi-usage.h index a399517..0e82951 100644 --- a/cmds-fi-usage.h +++ b/cmds-fi-usage.h @@ -50,5 +50,6 @@ void print_device_chunks(struct device_info *devinfo, struct chunk_info *chunks_info_ptr, int chunks_info_count, unsigned unit_mode); void print_device_sizes(struct device_info *devinfo, unsigned unit_mode); +int dev_to_fsid(const char *dev, __u8 *fsid); #endif diff --git a/cmds-filesystem.c b/cmds-filesystem.c index c39f2d1..3dc86a2 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -431,33 +431,6 @@ out: return !found; } -static int dev_to_fsid(const char *dev, __u8 *fsid) -{ - struct btrfs_super_block *disk_super; - char buf[BTRFS_SUPER_INFO_SIZE]; - int ret; - int fd; - - fd = open(dev, O_RDONLY); - if (fd < 0) { - ret = -errno; - return ret; - } - - disk_super = (struct btrfs_super_block *)buf; - ret = btrfs_read_dev_super(fd, disk_super, - BTRFS_SUPER_INFO_OFFSET, SBREAD_DEFAULT); - if (ret) - goto out; - - memcpy(fsid, disk_super->fsid, BTRFS_FSID_SIZE); - ret = 0; - -out: - close(fd); - return ret; -} - static void free_fs_devices(struct btrfs_fs_devices *fs_devices) { struct btrfs_fs_devices *cur_seed, *next_seed; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs-progs: fi: enable fi usage for filesystem top of seed device
On 10/23/2017 12:45 PM, Misono, Tomohiro wrote: Currently "fi usage" (and "dev usage") cannot run for the filesystem using seed device. This is because FS_INFO ioctl returns the number of devices excluding seeds, but load_device_info() tries to access valid device from devid 0 to max_id, and results in accessing seeds too (thus causing mismatching of number of devices). A long time back I tried to fix this by fixing the FS_INFO num_devs itself, but the concern was backward compatibility of the ioctl. However there is no such a concern here. I am ok with this approach. Since only the size of non-seed devices is matter, fix this by just skipping seed device by checking device's fsid and comparing it to the fsid obtained by FS_INFO ioctl. Signed-off-by: Tomohiro Misono Reviewed-by: Anand Jain Thanks, Anand --- cmds-fi-usage.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c index a72fb4e..50c7e51 100644 --- a/cmds-fi-usage.c +++ b/cmds-fi-usage.c @@ -545,6 +545,7 @@ static int load_device_info(int fd, struct device_info **device_info_ptr, struct btrfs_ioctl_fs_info_args fi_args; struct btrfs_ioctl_dev_info_args dev_info; struct device_info *info; + __u8 fsid[BTRFS_UUID_SIZE]; *device_info_count = 0; *device_info_ptr = NULL; @@ -568,6 +569,7 @@ static int load_device_info(int fd, struct device_info **device_info_ptr, if (ndevs >= fi_args.num_devices) { error("unexpected number of devices: %d >= %llu", ndevs, (unsigned long long)fi_args.num_devices); + error("if seed device is used, try run as root."); goto out; } memset(&dev_info, 0, sizeof(dev_info)); @@ -580,6 +582,19 @@ static int load_device_info(int fd, struct device_info **device_info_ptr, goto out; } + /* +* Skip seed device by cheking device's fsid (require root). +* Ignore EACCES since if seed is not used this function works +* correctly without root privilege. +*/ + ret = dev_to_fsid((const char *)dev_info.path, fsid); + if (ret != -EACCES) { + if (ret) + goto out; + if (memcmp(fi_args.fsid, fsid, BTRFS_FSID_SIZE) != 0) + continue; + } + info[ndevs].devid = dev_info.devid; if (!dev_info.path[0]) { strcpy(info[ndevs].path, "missing"); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send yields "ERROR: send ioctl failed with -5: Input/output error"
> Does anyone know why scrub did not catch these errors that show up in dmesg? Can you try offline scrub from this repo https://github.com/gujx2017/btrfs-progs/tree/offline_scrub and see whether it detects the issue? "btrfs scrub start --offline " Cheers, Lakshmipathi.G http://www.giis.co.in http://www.webminal.org -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fstests: btrfs/143: make test case more reliable
On 23.10.2017 23:57, Liu Bo wrote: > Currently drop_caches is used to invalidate file's page cache so that > buffered read can hit disk, but the problem is that it may also > invalidate metadata's page cache, so the test case may not get read > errors (and repair) if reading metadata has consumed the injected > faults. > > This changes it to do 'fadvise -d' to firstly access all metadata it > needs to locate the file and then only drops the test file's page > cache. Also this changes it to read the file only if pid%2 == 1. > > Reported-by: Nikolay Borisov > Signed-off-by: Liu Bo > --- > tests/btrfs/143 | 20 ++-- > 1 file changed, 10 insertions(+), 10 deletions(-) > > diff --git a/tests/btrfs/143 b/tests/btrfs/143 > index da7bfd8..dabd03d 100755 > --- a/tests/btrfs/143 > +++ b/tests/btrfs/143 > @@ -127,16 +127,16 @@ echo "step 3..repair the bad copy" >>$seqres.full > # since raid1 consists of two copies, and the bad copy was put on stripe #1 > # while the good copy lies on stripe #0, the bad copy only gets access when > the > # reader's pid % 2 == 1 is true > -while true; do > - # start_fail only fails the following buffered read so the repair is > - # supposed to work. > - echo 3 > /proc/sys/vm/drop_caches > - start_fail > - $XFS_IO_PROG -c "pread 0 4K" "$SCRATCH_MNT/foobar" > /dev/null & > - pid=$! > - wait > - stop_fail > - [ $((pid % 2)) == 1 ] && break > +while [[ -z ${result} ]]; do > +# invalidate the page cache. > +$XFS_IO_PROG -c "fadvise -d 0 128K" $SCRATCH_MNT/foobar I'm a bit worried about the expectations of the DONT_NEED: https://linux.die.net/man/2/posix_fadvise: The advice is not binding; it merely constitutes an expectation on behalf of the application. This might very well be a moot point but still > + > +start_fail > +result=$(bash -c " > +if [[ \$((\$\$ % 2)) -eq 1 ]]; then > +exec $XFS_IO_PROG -c \"pread 0 4K\" \"$SCRATCH_MNT/foobar\" > +fi"); > +stop_fail > done > > _scratch_unmount > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send yields "ERROR: send ioctl failed with -5: Input/output error"
Yes, it is finding much more than just one error. >From dmesg [89520.441354] BTRFS warning (device sdn): csum failed ino 4708 off 27529216 csum 2615801759 expected csum 874979996 $ sudo btrfs scrub start --offline --progress /dev/sdn ERROR: data at bytenr 68431499264 mirror 1 csum mismatch, have 0x5aa0d40f expect 0xd4a15873 ERROR: extent 68431474688 len 14467072 CORRUPTED, all mirror(s) corrupted, can't be repaired ERROR: data at bytenr 83646357504 mirror 1 csum mismatch, have 0xfc0baabe expect 0x7f9cb681 ERROR: extent 83519741952 len 134217728 CORRUPTED, all mirror(s) corrupted, can't be repaired ERROR: data at bytenr 121936633856 mirror 1 csum mismatch, have 0x507016a5 expect 0x50609afe ERROR: extent 121858334720 len 134217728 CORRUPTED, all mirror(s) corrupted, can't be repaired ERROR: data at bytenr 144872591360 mirror 1 csum mismatch, have 0x33964d73 expect 0xf9937032 ERROR: extent 144822386688 len 61231104 CORRUPTED, all mirror(s) corrupted, can't be repaired ERROR: data at bytenr 167961075712 mirror 1 csum mismatch, have 0xf43bd0e3 expect 0x5be589bb ERROR: extent 167950999552 len 27537408 CORRUPTED, all mirror(s) corrupted, can't be repaired ERROR: data at bytenr 175643619328 mirror 1 csum mismatch, have 0x1e168ca1 expect 0xd413b1e0 ERROR: data at bytenr 175643754496 mirror 1 csum mismatch, have 0x6cfdc8ae expect 0xa6f8f5ef ERROR: extent 175640539136 len 6381568 CORRUPTED, all mirror(s) corrupted, can't be repaired ERROR: data at bytenr 183316750336 mirror 1 csum mismatch, have 0x145bdf76 expect 0x7390565e . and the list goes on. Questions: 1. Using "find /mnt -inum 4708" I can link the dmesg to a specific file. Is there a way link the the --offline ERRORs above to the inode? 2. How could do "btrfs device stats /mnt" and normal full scrub fail to detect the csum errors? 3. Do these errors appear to be hardware failure (despite pristine SMART), user error on volume creation/mounting, or an actual btrfs issue? I feel that the need for question #1 indicates a problem with btrfs regardless of whether there is a real hardware failure or not. Next I will try an online scrub of only the sdn device, as before I was running the full filesystem scrub. On Tue, Oct 24, 2017 at 12:52 AM, Lakshmipathi.G wrote: >> Does anyone know why scrub did not catch these errors that show up in dmesg? > > Can you try offline scrub from this repo > https://github.com/gujx2017/btrfs-progs/tree/offline_scrub and see > whether it > detects the issue? "btrfs scrub start --offline " > > > > Cheers, > Lakshmipathi.G > http://www.giis.co.in http://www.webminal.org -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] Btrfs: free btrfs_device in place
It's pointless to defer it to a kthread helper as we're not under a special context. For reference, commit 1f78160ce1b1 ("Btrfs: using rcu lock in the reader side of devices list") introduced RCU freeing for device structures. Signed-off-by: Liu Bo Reviewed-by: Anand Jain --- v3: - Enhance changelog with commit id which introduced this for future reference. - Now we can remove %rcu_work. v2: - Clarify the lifetime of device and device->bdev respectively and clear the concern about raising the 'device is in use' problem. fs/btrfs/volumes.c | 14 ++ fs/btrfs/volumes.h | 1 - 2 files changed, 2 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index d983cea..4a72c45 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -836,26 +836,16 @@ void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step) mutex_unlock(&uuid_mutex); } -static void __free_device(struct work_struct *work) +static void free_device(struct rcu_head *head) { struct btrfs_device *device; - device = container_of(work, struct btrfs_device, rcu_work); + device = container_of(head, struct btrfs_device, rcu); rcu_string_free(device->name); bio_put(device->flush_bio); kfree(device); } -static void free_device(struct rcu_head *head) -{ - struct btrfs_device *device; - - device = container_of(head, struct btrfs_device, rcu); - - INIT_WORK(&device->rcu_work, __free_device); - schedule_work(&device->rcu_work); -} - static void btrfs_close_bdev(struct btrfs_device *device) { if (device->bdev && device->writeable) { diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 6108fdf..f60c535 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -133,7 +133,6 @@ struct btrfs_device { struct btrfs_work work; struct rcu_head rcu; - struct work_struct rcu_work; /* readahead state */ spinlock_t reada_lock; -- 2.9.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Btrfs: add write_flags for compression bio
Compression code path has only flaged bios with REQ_OP_WRITE no matter where the bios come from, but it could be a sync write if fsync starts this writeback or a normal writeback write if wb kthread starts a periodic writeback. It breaks the rule that sync writes and writeback writes need to be differentiated from each other, because from the POV of block layer, all bios need to be recognized by these flags in order to do some management, e.g. throttlling. This passes writeback_control to compression write path so that it can send bios with proper flags to block layer. Signed-off-by: Liu Bo --- v2: Enhance changlog with more background details. fs/btrfs/compression.c | 7 --- fs/btrfs/compression.h | 3 ++- fs/btrfs/extent_io.c | 2 +- fs/btrfs/extent_io.h | 3 ++- fs/btrfs/inode.c | 15 +++ 5 files changed, 20 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 280384b..3dae2f5 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -292,7 +292,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, unsigned long len, u64 disk_start, unsigned long compressed_len, struct page **compressed_pages, -unsigned long nr_pages) +unsigned long nr_pages, +unsigned int write_flags) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct bio *bio = NULL; @@ -324,7 +325,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, bdev = fs_info->fs_devices->latest_bdev; bio = btrfs_bio_alloc(bdev, first_byte); - bio_set_op_attrs(bio, REQ_OP_WRITE, 0); + bio->bi_opf = REQ_OP_WRITE | write_flags; bio->bi_private = cb; bio->bi_end_io = end_compressed_bio_write; refcount_set(&cb->pending_bios, 1); @@ -371,7 +372,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, bio_put(bio); bio = btrfs_bio_alloc(bdev, first_byte); - bio_set_op_attrs(bio, REQ_OP_WRITE, 0); + bio->bi_opf = REQ_OP_WRITE | write_flags; bio->bi_private = cb; bio->bi_end_io = end_compressed_bio_write; bio_add_page(bio, page, PAGE_SIZE, 0); diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index d2781ff..dc45b94 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -91,7 +91,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, unsigned long len, u64 disk_start, unsigned long compressed_len, struct page **compressed_pages, - unsigned long nr_pages); + unsigned long nr_pages, + unsigned int write_flags); blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, int mirror_num, unsigned long bio_flags); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 3e5bb0c..ea64ad0 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3252,7 +3252,7 @@ static noinline_for_stack int writepage_delalloc(struct inode *inode, delalloc_start, delalloc_end, &page_started, - nr_written); + nr_written, wbc); /* File system has been set read-only */ if (ret) { SetPageError(page); diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index faffa28..a92fd98 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -116,7 +116,8 @@ struct extent_io_ops { */ int (*fill_delalloc)(void *private_data, struct page *locked_page, u64 start, u64 end, int *page_started, -unsigned long *nr_written); +unsigned long *nr_written, +struct writeback_control *wbc); int (*writepage_start_hook)(struct page *page, u64 start, u64 end); void (*writepage_end_io_hook)(struct page *page, u64 start, u64 end, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 128f3e5..ee67773 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -367,6 +367,7 @@ struct async_cow { struct page *locked_page; u64 start; u64 end; + unsigned int write_flags; struct list_head extents; struct btrfs_work work; }; @@ -846,7 +8