On 2017年12月21日 15:56, Adam Bahe wrote: > Alright, I have rebuilt kernel 4.14.8 and added the line of code you > gave me. The kernel is installed and I have a full balance running. > Right off the bat one thing I noticed is that the last time I ran a > full balance, balance status showed something like "14 out of about > 200 chunks balanced". I thought that was interesting that it was only > trying to balance 200 chunks. With your change, a full balance status > right now shows "34 out of 9770 chunks balanced". It has been running > for about 10 minutes now. But sometimes it took awhile to cause the > filesystem to go read only. So we shall wait and see. A full balance > across 21 devices and 120TB raw or thereabouts will take some time.
Well, if ENOSPC happens, there should be some kernel message along with the line I added. It's recommended to monitor dmesg too. Thanks, Qu > > > On Wed, Dec 20, 2017 at 4:13 PM, Adam Bahe <adamb...@gmail.com> wrote: >> Yeah I had a hunch that it was something to do with the 2TB disks. >> I've been slowly trying to replace them. But they're the remnants of >> my old storage system so it has been slow going. When I get some time >> I will try and compile the kernel with your patch. Thanks! >> >> On Wed, Dec 20, 2017 at 12:20 AM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote: >>> Also, if you're OK to compile kernel, would you please try the following >>> diff to help us to further enhance btrfs chunk allocator? >>> >>> The chunk allocator itself is designed to handle your case, so it should >>> pick up the remaining devices and allocate new RAID10 chunk with the >>> unallocated space. >>> >>> But the truth is not the case. >>> I'm wondering if it's the devs_increment and substripes calculation >>> causing the problem. >>> >>> If you could help testing btrfs with the diff applied, dmesg would >>> contain extra info when you try to do device remove. >>> >>> Thanks, >>> Qu >>> >>> ================ >>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >>> index 49810b70afd3..851ff13f5c29 100644 >>> --- a/fs/btrfs/volumes.c >>> +++ b/fs/btrfs/volumes.c >>> @@ -4732,6 +4732,8 @@ static int __btrfs_alloc_chunk(struct >>> btrfs_trans_handle *trans, >>> >>> if (ndevs < devs_increment * sub_stripes || ndevs < devs_min) { >>> ret = -ENOSPC; >>> + pr_info("ndevs=%d dev_increment=%d sub_stripes=%d >>> devs_min=%d\n", >>> + ndevs, devs_increment, sub_stripes, devs_min); >>> goto error; >>> } >>> >>> >>> On 2017年12月20日 14:11, Qu Wenruo wrote: >>>> >>>> >>>> On 2017年12月20日 13:00, Adam Bahe wrote: >>>>> I'm using raid10. >>>> >>>> Pretty much the same. >>>> >>>> Raid10 is RAID1 first then RAID0. >>>> >>>> For your 20 disks (well, quite amazing) layout, btrfs will try to >>>> allocate using all disks for RAID10. >>>> >>>> Any unfortunately, devid 7, 9, 12, 13, 14, 15 are already full. >>>> >>>> Normally btrfs should exclude them in chunk allocation, but I think >>>> there is some small unaligned unallocated space making btrfs to choose >>>> them for allocation. >>>> >>>> And causing no new chunk could be allocated. >>>> >>>> And considering the size of your fs, common method like adding temporary >>>> small USB disk to allow convert doesn't work in your case. >>>> >>>> >>>> You could try convert the profile from RAID10 to RAID1, which will >>>> always use 2 disk to allocate space, so it would be OK to allocate new >>>> chunks to start your convert. >>>> >>>> And final suggestion, don't use any profile with stripe with so many >>>> uneven disks. >>>> >>>> Thanks, >>>> Qu >>>> >>>>> >>>>> On Tue, Dec 19, 2017 at 10:51 PM, Qu Wenruo <quwenruo.bt...@gmx.com> >>>>> wrote: >>>>>> >>>>>> >>>>>> On 2017年12月20日 10:51, Adam Bahe wrote: >>>>>>> Forgot to add, I should have plenty of space: >>>>>>> >>>>>>> Label: 'nas' uuid: 4fcd5725-b6c6-4d8a-9860-f2fc5474cbcb >>>>>>> Total devices 20 FS bytes used 26.69TiB >>>>>>> devid 1 size 3.64TiB used 3.28TiB path /dev/sdm >>>>>>> devid 2 size 3.64TiB used 3.28TiB path /dev/sde >>>>>>> devid 3 size 7.28TiB used 3.45TiB path /dev/sdt >>>>>>> devid 4 size 9.03TiB used 2.51TiB path /dev/sdo >>>>>>> devid 5 size 7.28TiB used 3.45TiB path /dev/sdi >>>>>>> devid 6 size 7.28TiB used 3.45TiB path /dev/sdd >>>>>>> devid 7 size 1.82TiB used 1.82TiB path /dev/sdp >>>>>>> devid 9 size 1.82TiB used 1.82TiB path /dev/sdw >>>>>>> devid 10 size 1.82TiB used 1.82TiB path /dev/sdk >>>>>>> devid 11 size 9.03TiB used 1.82TiB path /dev/sdy >>>>>>> devid 12 size 1.82TiB used 1.82TiB path /dev/sdg >>>>>>> devid 13 size 1.82TiB used 1.82TiB path /dev/sdl >>>>>>> devid 14 size 1.82TiB used 1.82TiB path /dev/sdr >>>>>>> devid 15 size 1.82TiB used 1.82TiB path /dev/sdf >>>>>>> devid 16 size 5.46TiB used 3.45TiB path /dev/sds >>>>>>> devid 17 size 9.10TiB used 3.45TiB path /dev/sdn >>>>>>> devid 18 size 9.10TiB used 3.45TiB path /dev/sdh >>>>>>> devid 19 size 9.10TiB used 3.45TiB path /dev/sdc >>>>>>> devid 20 size 9.10TiB used 3.45TiB path /dev/sdu >>>>>>> devid 21 size 3.64TiB used 2.19TiB path /dev/sdj >>>>>> >>>>>> Depends on your profile. >>>>>> >>>>>> If using RAID0/5/6, the extra unallocated space means nothing if the >>>>>> smallest disk is used up. >>>>>> >>>>>> Thanks, >>>>>> Qu >>>>>> >>>>>>> >>>>>>> On Tue, Dec 19, 2017 at 8:47 PM, Adam Bahe <adamb...@gmail.com> wrote: >>>>>>>> I have been having ENOSPC errors on any btrfs device delete, btrfs >>>>>>>> balance, btrfs device add actions for awhile now. How do I fix this? I >>>>>>>> need to be able to remove devices and balance my filesystem again. >>>>>>>> >>>>>>>> [Tue Dec 19 15:25:26 2017] BTRFS info (device sdc): relocating block >>>>>>>> group 190774812082176 flags system|raid10 >>>>>>>> [Tue Dec 19 15:25:26 2017] BTRFS: Transaction aborted (error -27) >>>>>>>> [Tue Dec 19 15:25:26 2017] ------------[ cut here ]------------ >>>>>>>> [Tue Dec 19 15:25:26 2017] WARNING: CPU: 1 PID: 17036 at >>>>>>>> fs/btrfs/extent-tree.c:10151 >>>>>>>> btrfs_create_pending_block_groups+0x250/0x260 [btrfs] >>>>>>>> [Tue Dec 19 15:25:26 2017] Modules linked in: dm_mod dax rpcrdma >>>>>>>> ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi >>>>>>>> ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm >>>>>>>> ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core >>>>>>>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass >>>>>>>> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel >>>>>>>> crypto_simd glue_helper cryptd ext4 intel_cstate jbd2 intel_rapl_perf >>>>>>>> mbcache iTCO_wdt iTCO_vendor_support ses mei_me lpc_ich pcspkr joydev >>>>>>>> input_leds i2c_i801 mfd_core mei enclosure ioatdma sg wmi shpchp >>>>>>>> ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter nfsd >>>>>>>> auth_rpcgss nfs_acl lockd grace sunrpc ip_tables btrfs xor raid6_pq >>>>>>>> mlx4_en sd_mod crc32c_intel ast i2c_algo_bit mlx4_core drm_kms_helper >>>>>>>> ata_generic syscopyarea >>>>>>>> [Tue Dec 19 15:25:26 2017] pata_acpi sysfillrect sysimgblt >>>>>>>> fb_sys_fops ttm drm ata_piix ixgbe mdio ptp mpt3sas pps_core libata >>>>>>>> raid_class dca scsi_transport_sas 8021q garp mrp >>>>>>>> [Tue Dec 19 15:25:26 2017] CPU: 1 PID: 17036 Comm: btrfs Not tainted >>>>>>>> 4.14.2-1.x86_64 #1 >>>>>>>> [Tue Dec 19 15:25:26 2017] Hardware name: Supermicro Super >>>>>>>> Server/X10DRi-T4+, BIOS 2.0 12/17/2015 >>>>>>>> [Tue Dec 19 15:25:26 2017] task: ffff8804620416c0 task.stack: >>>>>>>> ffffc90026f64000 >>>>>>>> [Tue Dec 19 15:25:26 2017] RIP: >>>>>>>> 0010:btrfs_create_pending_block_groups+0x250/0x260 [btrfs] >>>>>>>> [Tue Dec 19 15:25:26 2017] RSP: 0018:ffffc90026f679e8 EFLAGS: 00010246 >>>>>>>> [Tue Dec 19 15:25:26 2017] RAX: 0000000000000026 RBX: 00000000ffffffe5 >>>>>>>> RCX: 0000000000000000 >>>>>>>> [Tue Dec 19 15:25:26 2017] RDX: 0000000000000000 RSI: ffff88046f84e108 >>>>>>>> RDI: ffff88046f84e108 >>>>>>>> [Tue Dec 19 15:25:26 2017] RBP: ffffc90026f67a68 R08: 0000000000000000 >>>>>>>> R09: 0000000000003611 >>>>>>>> [Tue Dec 19 15:25:26 2017] R10: 0000000000000004 R11: 0000000000003610 >>>>>>>> R12: ffff8804679c38e8 >>>>>>>> [Tue Dec 19 15:25:26 2017] R13: ffff880108578000 R14: ffff8804679c3828 >>>>>>>> R15: ffff880108578128 >>>>>>>> [Tue Dec 19 15:25:26 2017] FS: 00007f46ad2e48c0(0000) >>>>>>>> GS:ffff88046f840000(0000) knlGS:0000000000000000 >>>>>>>> [Tue Dec 19 15:25:26 2017] CS: 0010 DS: 0000 ES: 0000 CR0: >>>>>>>> 0000000080050033 >>>>>>>> [Tue Dec 19 15:25:26 2017] CR2: 00007f2daae8a000 CR3: 000000048be96000 >>>>>>>> CR4: 00000000001406e0 >>>>>>>> [Tue Dec 19 15:25:26 2017] DR0: 0000000000000000 DR1: 0000000000000000 >>>>>>>> DR2: 0000000000000000 >>>>>>>> [Tue Dec 19 15:25:26 2017] DR3: 0000000000000000 DR6: 00000000fffe0ff0 >>>>>>>> DR7: 0000000000000400 >>>>>>>> [Tue Dec 19 15:25:26 2017] Call Trace: >>>>>>>> [Tue Dec 19 15:25:26 2017] do_chunk_alloc+0x278/0x2f0 [btrfs] >>>>>>>> [Tue Dec 19 15:25:26 2017] btrfs_force_chunk_alloc+0x30/0x40 [btrfs] >>>>>>>> [Tue Dec 19 15:25:26 2017] relocate_block_group+0xd2/0x610 [btrfs] >>>>>>>> [Tue Dec 19 15:25:26 2017] btrfs_relocate_block_group+0x187/0x240 >>>>>>>> [btrfs] >>>>>>>> [Tue Dec 19 15:25:26 2017] btrfs_relocate_chunk+0x3b/0xc0 [btrfs] >>>>>>>> [Tue Dec 19 15:25:26 2017] __btrfs_balance+0x854/0xc20 [btrfs] >>>>>>>> [Tue Dec 19 15:25:26 2017] btrfs_balance+0x2f4/0x670 [btrfs] >>>>>>>> [Tue Dec 19 15:25:26 2017] btrfs_ioctl_balance+0x439/0x560 [btrfs] >>>>>>>> [Tue Dec 19 15:25:26 2017] btrfs_ioctl+0xf03/0x20f0 [btrfs] >>>>>>>> [Tue Dec 19 15:25:26 2017] ? tty_write_unlock+0x31/0x40 >>>>>>>> [Tue Dec 19 15:25:26 2017] ? tty_ldisc_deref+0x16/0x20 >>>>>>>> [Tue Dec 19 15:25:26 2017] ? tty_write+0x1e4/0x2c0 >>>>>>>> [Tue Dec 19 15:25:26 2017] ? process_echoes+0x70/0x70 >>>>>>>> [Tue Dec 19 15:25:26 2017] ? __vfs_write+0x37/0x140 >>>>>>>> [Tue Dec 19 15:25:26 2017] do_vfs_ioctl+0xa7/0x5f0 >>>>>>>> [Tue Dec 19 15:25:26 2017] ? getnstimeofday64+0xe/0x20 >>>>>>>> [Tue Dec 19 15:25:26 2017] ? __audit_syscall_entry+0xb3/0xf0 >>>>>>>> [Tue Dec 19 15:25:26 2017] ? syscall_trace_enter+0x1d0/0x2b0 >>>>>>>> [Tue Dec 19 15:25:26 2017] SyS_ioctl+0x79/0x90 >>>>>>>> [Tue Dec 19 15:25:26 2017] do_syscall_64+0x67/0x150 >>>>>>>> [Tue Dec 19 15:25:26 2017] entry_SYSCALL64_slow_path+0x25/0x25 >>>>>>>> [Tue Dec 19 15:25:26 2017] RIP: 0033:0x7f46ac36b537 >>>>>>>> [Tue Dec 19 15:25:26 2017] RSP: 002b:00007ffcceffa448 EFLAGS: 00000206 >>>>>>>> ORIG_RAX: 0000000000000010 >>>>>>>> [Tue Dec 19 15:25:26 2017] RAX: ffffffffffffffda RBX: 0000000000000003 >>>>>>>> RCX: 00007f46ac36b537 >>>>>>>> [Tue Dec 19 15:25:26 2017] RDX: 00007ffcceffa4e0 RSI: 00000000c4009420 >>>>>>>> RDI: 0000000000000003 >>>>>>>> [Tue Dec 19 15:25:26 2017] RBP: 00007ffcceffa4e0 R08: 00007f46ac639a00 >>>>>>>> R09: 00007ffcceffa1c0 >>>>>>>> [Tue Dec 19 15:25:26 2017] R10: 00007ffcceffa1d0 R11: 0000000000000206 >>>>>>>> R12: 00007f46ac638870 >>>>>>>> [Tue Dec 19 15:25:26 2017] R13: 00007ffcceffc893 R14: 0000000000000000 >>>>>>>> R15: 0000000000000000 >>>>>>>> [Tue Dec 19 15:25:26 2017] Code: 79 ff ff ff 49 8b 7c 24 60 89 da 48 >>>>>>>> c7 c6 a8 55 57 a0 31 c0 e8 52 50 fe ff eb 9b 89 de 48 c7 c7 78 55 57 >>>>>>>> a0 31 c0 e8 be 8a cd e0 <0f> ff eb 87 66 90 66 2e 0f 1f 84 00 00 00 00 >>>>>>>> 00 0f 1f 44 00 00 >>>>>>>> [Tue Dec 19 15:25:26 2017] ---[ end trace 7b033857aa29250b ]--- >>>>>>>> [Tue Dec 19 15:25:26 2017] BTRFS: error (device sdc) in >>>>>>>> btrfs_create_pending_block_groups:10151: errno=-27 unknown >>>>>>>> [Tue Dec 19 15:25:26 2017] BTRFS info (device sdc): forced readonly >>>>>>>> [Tue Dec 19 15:25:26 2017] BTRFS info (device sdc): 3388 enospc errors >>>>>>>> during balance >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >>>>>>> in >>>>>>> the body of a message to majord...@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>>>> the body of a message to majord...@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>> >>> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
signature.asc
Description: OpenPGP digital signature