Re: corrupt leaf, bad key order on kernel 5.0
05.04.19 22:32, Hugo Mills пише: >> Yet another corruption of my root BTRFS filesystem happened today. >> Didn't bother to run scrub, balance or check, just created disk image for >> future investigation and restored everything from backup. >> >> Here is what corruption looks like: >> [ 274.241339] BTRFS info (device dm-0): disk space caching is enabled >> [ 274.241344] BTRFS info (device dm-0): has skinny extents >> [ 274.283238] BTRFS info (device dm-0): enabling ssd optimizations >> [ 310.436672] BTRFS critical (device dm-0): corrupt leaf: root=268 >> block=42044719104 slot=123, bad key order, prev (1240717 108 41447424) >> current (1240717 76 41451520) >"Bad key order" is usually an indicator of faulty RAM -- a piece of > metadata gets loaded into RAM for modification, a bit gets flipped in > it (because the bit is stuck on one value), and then the csum is > computed for the page (including the faulty bit), and written out to > disk. In this case, it's not obvious, but I'd suggest that the second > field of the key has been flipped, as 108 is 0x6c, and 76 is 0x4c -- > one bit away from each other. > >I recommend you check your hardware thoroughly before attempting to > rebuild the FS. > >Hugo. Hm... this might indeed be related to RAM being overclocked a bit too much. It worked fine for a long time, but apparently not 100% stable. Rolled back overclock, thanks for suggestion! Sincerely, Nazar Mokrynskyi github.com/nazar-pc
corrupt leaf, bad key order on kernel 5.0
NOTE: I do not need help with recovery, I have fully automated snapshots, backups and restoration mechanisms, the only purpose of this email is to help developers find the reason of yet another filesystem corruption and hopefully fix it. Yet another corruption of my root BTRFS filesystem happened today. Didn't bother to run scrub, balance or check, just created disk image for future investigation and restored everything from backup. Here is what corruption looks like: [ 274.241339] BTRFS info (device dm-0): disk space caching is enabled [ 274.241344] BTRFS info (device dm-0): has skinny extents [ 274.283238] BTRFS info (device dm-0): enabling ssd optimizations [ 310.436672] BTRFS critical (device dm-0): corrupt leaf: root=268 block=42044719104 slot=123, bad key order, prev (1240717 108 41447424) current (1240717 76 41451520) [ 310.449304] BTRFS critical (device dm-0): corrupt leaf: root=268 block=42044719104 slot=123, bad key order, prev (1240717 108 41447424) current (1240717 76 41451520) [ 310.449309] BTRFS: error (device dm-0) in btrfs_dropa_snapshot:9250: errno=-5 IO failure [ 310.449311] BTRFS info (device dm-0): forced readonly [ 311.266789] BTRFS info (device dm-0): delayed_refs has NO entry [ 311.277088] BTRFS error (device dm-0): cleaner transaction attach returned -30 My system just freezed when I was not looking at it and this is the state it is in now. File system survived from March 8th til April 05, one of the fastest corruptions in my experience. Looks like this happened during sending incremental snapshot to the other BTRFS filesystem, since last snapshot on that one was not read-only as it should have been otherwise. I'm on Ubuntu 19.04 with Linux kernel 5.0.5 and btrfs-progs v4.20.2. My filesystem is on top of LUKS on NVMe SSD (SM961), I have 3 snapshots created every 15 minutes from 3 subvolumes with rotation of old snapshots (can be from tens to hundreds of snapshots at any time). Mount options: compress=lzo,noatime,ssd I have full disk image with corrupted filesystem and will create Qcow2 snapshots of it, so if you want me to run any experiments, including potentially destructive, including usage of custom patches to btrfs-progs to find out the reason of corruption, would be happy to help as much as I can. P.S. I'm riding latest stable and rc kernels all the time and during last 6 months I've got about as many corruptions of different BTRFS filesystems as during 3 years before that, really worrying if you ask me. -- Sincerely, Nazar Mokrynskyi github.com/nazar-pc
Another btrfs corruption (unable to find ref byte, kernel 5.0-rc4)
NOTE: I don't need asistance with data recovery, everything was restored shortly from fully automated backups, I only hope this information is useful for developers in some way. So my primary BTRFS filesystem corrupted itself again. Software: Kernel 5.0-rc4 btrfs-progs v4.20.1 Ubuntu 19.04 (development branch) It is running on NVMe SSD on top of full-disk LUKS with BFQ scheduler. Mounting options (multiple subvolumes like this): compress=lzo,noatime,ssd,subvol=/root Also I'm using automated snapshots a lot and there are a few directories with CoW disabled if that matters. Filesystem was created with Ubuntu 18.10 Live USB (kernel 4.18.0-10-generic, btrfs-progs v4.16.1). It corrupted itself and remounted in read-only state, so I had to hard reset it and from Ubuntu 18.10 Live USB run scrub: > [ 49.674792] Btrfs loaded, crc32c=crc32c-intel > [ 49.679962] BTRFS: device fsid 5170aca4-061a-4c6c-ab00-bd7fc8ae6030 devid > 1 transid 178701 /dev/dm-0 > [ 52.199834] BTRFS info (device dm-0): disk space caching is enabled > [ 52.199839] BTRFS info (device dm-0): has skinny extents > [ 52.239346] BTRFS info (device dm-0): enabling ssd optimizations > [ 82.909833] WARNING: CPU: 14 PID: 6082 at fs/btrfs/extent-tree.c:6944 > __btrfs_free_extent.isra.72+0x751/0xac0 [btrfs] > [ 82.909834] Modules linked in: btrfs zstd_compress libcrc32c xor raid6_pq > dm_crypt algif_skcipher af_alg intel_rapl x86_pkg_temp_thermal > intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi snd_hda_intel kvm > snd_usb_audio snd_hda_codec snd_usbmidi_lib snd_hda_core snd_hwdep snd_pcm > irqbypass crct10dif_pclmul snd_seq_midi snd_seq_midi_event crc32_pclmul > snd_rawmidi ghash_clmulni_intel uvcvideo pcbc snd_seq videobuf2_vmalloc > videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq_device snd_timer > aesni_intel videodev snd aes_x86_64 crypto_simd cdc_acm soundcore media > input_leds joydev cryptd mei_me glue_helper intel_wmi_thunderbolt > intel_cstate mei intel_rapl_perf acpi_pad mac_hid sch_fq_codel parport_pc > ppdev lp parport ip_tables x_tables autofs4 overlay nls_iso8859_1 dm_mirror > dm_region_hash dm_log > [ 82.909854] hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 > amdgpu nouveau chash gpu_sched ttm mxm_wmi drm_kms_helper syscopyarea > sysfillrect sysimgblt nvme fb_sys_fops igb e1000e atlantic drm ahci dca > i2c_algo_bit nvme_core libahci wmi video > [ 82.909864] CPU: 14 PID: 6082 Comm: btrfs-cleaner Not tainted > 4.18.0-10-generic #11-Ubuntu > [ 82.909864] Hardware name: To Be Filled By O.E.M. To Be Filled By > O.E.M./Z370 Professional Gaming i7, BIOS P3.40 11/08/2018 > [ 82.909871] RIP: 0010:__btrfs_free_extent.isra.72+0x751/0xac0 [btrfs] > [ 82.909871] Code: ff 75 18 ff 75 10 e8 7e a6 ff ff c6 85 6c ff ff ff 00 41 > 89 c5 58 5a 45 85 ed 0f 84 b9 f9 ff ff 41 83 fd fe 0f 85 6e fc ff ff <0f> 0b > 49 8b 3c 24 e8 44 c8 00 00 ff 75 18 4c 8b 4d 10 4d 89 f8 48 > [ 82.909885] RSP: 0018:a71bc6833bf8 EFLAGS: 00010246 > [ 82.909885] RAX: 000c81bc RBX: 000f3e55e000 RCX: > > [ 82.909886] RDX: 0017b000 RSI: RDI: > 89cfd82d4f50 > [ 82.909886] RBP: a71bc6833ca0 R08: a71bc6833b14 R09: > > [ 82.909887] R10: 0102 R11: R12: > 89cfd8338af0 > [ 82.909887] R13: fffe R14: 000e8e9f8000 R15: > 010b > [ 82.909888] FS: () GS:89d09e58() > knlGS: > [ 82.909888] CS: 0010 DS: ES: CR0: 80050033 > [ 82.909889] CR2: 564398e70760 CR3: 00076ce0a002 CR4: > 003606e0 > [ 82.909889] DR0: DR1: DR2: > > [ 82.909890] DR3: DR6: fffe0ff0 DR7: > 0400 > [ 82.909890] Call Trace: > [ 82.909898] __btrfs_run_delayed_refs+0x20e/0x1010 [btrfs] > [ 82.909904] ? add_pinned_bytes+0x67/0x70 [btrfs] > [ 82.909909] ? btrfs_free_tree_block+0x167/0x2d0 [btrfs] > [ 82.909916] btrfs_run_delayed_refs+0x80/0x190 [btrfs] > [ 82.909923] btrfs_should_end_transaction+0x47/0x60 [btrfs] > [ 82.909928] btrfs_drop_snapshot+0x3d1/0x800 [btrfs] > [ 82.909936] btrfs_clean_one_deleted_snapshot+0xbb/0xf0 [btrfs] > [ 82.909942] cleaner_kthread+0x136/0x160 [btrfs] > [ 82.909944] kthread+0x120/0x140 > [ 82.909950] ? btree_submit_bio_start+0x20/0x20 [btrfs] > [ 82.909951] ? kthread_bind+0x40/0x40 > [ 82.909953] ret_from_fork+0x35/0x40 > [ 82.909953] ---[ end trace 32964933c87d1d27 ]--- > [ 82.909955] BTRFS info (device dm-0): leaf 656310272 gen 178703 total ptrs > 117 free space 4483 owner 2 > [ 82.909956] item 0 key (65470308352 168 4096) itemoff 16233 itemsize 50 > [ 82.909956] extent refs 2 gen 170784 flags 1 > [ 82.909957] ref#0: shared data backref parent 62803951616 count 1 > [ 82.909957] ref#1: shared data backre
Re: Unrecoverable btrfs corruption (backref bytes do not match extent backref)
05.01.19 03:18, Qu Wenruo пише: > Please don't mount the fs RW, and copy your data out. I have regular automated backups and wrote initial message from restored system already, lesson learned a long time ago. Next time I'll be smarter and will make partition image prior to doing any operations on it, maybe `balance` did something to it. It is quite unfortunate that information provided is not useful this time. If there any other ideas for tests to run - I'm willing to help.
Re: Unrecoverable btrfs corruption (backref bytes do not match extent backref)
04.01.19 03:15, Chris Murphy пише: > What do you get with 'btrfs check --mode=lowmem' ? This is a different > implementation of check, and might reveal some additional information > useful to developers. It is wickedly slow however. root@nazarpc-Standard-PC-Q35-ICH9-2009:~# btrfs check --mode=lowmem /dev/vdb warning, bad space info total_bytes 2155872256 used 2155876352 warning, bad space info total_bytes 3229614080 used 3229618176 warning, bad space info total_bytes 4303355904 used 430336 warning, bad space info total_bytes 5377097728 used 5377101824 warning, bad space info total_bytes 6450839552 used 6450843648 warning, bad space info total_bytes 7524581376 used 7524585472 warning, bad space info total_bytes 8598323200 used 8598327296 warning, bad space info total_bytes 9672065024 used 9672069120 warning, bad space info total_bytes 10745806848 used 10745810944 warning, bad space info total_bytes 11819548672 used 11819552768 warning, bad space info total_bytes 12893290496 used 12893294592 warning, bad space info total_bytes 13967032320 used 13967036416 warning, bad space info total_bytes 15040774144 used 15040778240 warning, bad space info total_bytes 16114515968 used 16114520064 warning, bad space info total_bytes 17188257792 used 17188261888 warning, bad space info total_bytes 18261999616 used 18262003712 warning, bad space info total_bytes 19335741440 used 19335745536 warning, bad space info total_bytes 20409483264 used 20409487360 warning, bad space info total_bytes 21483225088 used 21483229184 warning, bad space info total_bytes 22556966912 used 22556971008 warning, bad space info total_bytes 23630708736 used 23630712832 warning, bad space info total_bytes 24704450560 used 24704454656 warning, bad space info total_bytes 25778192384 used 25778196480 warning, bad space info total_bytes 26851934208 used 26851938304 warning, bad space info total_bytes 27925676032 used 27925680128 warning, bad space info total_bytes 28999417856 used 28999421952 warning, bad space info total_bytes 30073159680 used 30073163776 warning, bad space info total_bytes 31146901504 used 31146905600 warning, bad space info total_bytes 32220643328 used 32220647424 Checking filesystem on /dev/vdb UUID: 5170aca4-061a-4c6c-ab00-bd7fc8ae6030 checking extents checking free space cache checking fs roots ERROR: root 304 INODE REF[274921, 256895] name 25da95e3e893bb2fa69a2f0acd77bfe725626a1e filetype 1 missing ERROR: root 304 EXTENT_DATA[910393 4096] gap exists, expected: EXTENT_DATA[910393 25] ERROR: root 304 EXTENT_DATA[910393 8192] gap exists, expected: EXTENT_DATA[910393 4121] ERROR: root 304 EXTENT_DATA[910393 16384] gap exists, expected: EXTENT_DATA[910393 12313] ERROR: root 304 EXTENT_DATA[910400 4096] gap exists, expected: EXTENT_DATA[910400 25] ERROR: root 304 EXTENT_DATA[910400 8192] gap exists, expected: EXTENT_DATA[910400 4121] ERROR: root 304 EXTENT_DATA[910400 16384] gap exists, expected: EXTENT_DATA[910400 12313] ERROR: root 304 EXTENT_DATA[910401 4096] gap exists, expected: EXTENT_DATA[910401 25] ERROR: root 304 EXTENT_DATA[910401 8192] gap exists, expected: EXTENT_DATA[910401 4121] ERROR: root 304 EXTENT_DATA[910401 16384] gap exists, expected: EXTENT_DATA[910401 12313] ERROR: root 101721 INODE REF[274921, 256895] name 25da95e3e893bb2fa69a2f0acd77bfe725626a1e filetype 1 missing ERROR: root 101721 EXTENT_DATA[910393 4096] gap exists, expected: EXTENT_DATA[910393 25] ERROR: root 101721 EXTENT_DATA[910393 8192] gap exists, expected: EXTENT_DATA[910393 4121] ERROR: root 101721 EXTENT_DATA[910393 16384] gap exists, expected: EXTENT_DATA[910393 12313] ERROR: root 101721 EXTENT_DATA[910400 4096] gap exists, expected: EXTENT_DATA[910400 25] ERROR: root 101721 EXTENT_DATA[910400 8192] gap exists, expected: EXTENT_DATA[910400 4121] ERROR: root 101721 EXTENT_DATA[910400 16384] gap exists, expected: EXTENT_DATA[910400 12313] ERROR: root 101721 EXTENT_DATA[910401 4096] gap exists, expected: EXTENT_DATA[910401 25] ERROR: root 101721 EXTENT_DATA[910401 8192] gap exists, expected: EXTENT_DATA[910401 4121] ERROR: root 101721 EXTENT_DATA[910401 16384] gap exists, expected: EXTENT_DATA[910401 12313] ERROR: errors found in fs roots found 39410126848 bytes used, error(s) found total csum bytes: 35990412 total tree bytes: 196955471872 total fs tree bytes: 196809785344 total extent tree bytes: 96534528 btree space waste bytes: 33486070155 file data blocks allocated: 1705720172544 referenced 2238568390656
Re: Unrecoverable btrfs corruption (backref bytes do not match extent backref)
04.01.19 03:32, Qu Wenruo пише: > Please provide the dump of the following command: > > # btrfs ins dump-tree -t extent | grep 3114475520 -C 20 root@nazarpc-Standard-PC-Q35-ICH9-2009:~# btrfs ins dump-tree -t extent /dev/vdc | grep 3114475520 -C 20 refs 1 gen 1712966 flags DATA shared data backref parent 311408951296 count 1 item 146 key (3114242048 EXTENT_ITEM 36864) itemoff 10402 itemsize 37 refs 1 gen 1712966 flags DATA shared data backref parent 311408951296 count 1 item 147 key (3114278912 EXTENT_ITEM 36864) itemoff 10365 itemsize 37 refs 1 gen 1712966 flags DATA shared data backref parent 311408951296 count 1 item 148 key (3114315776 EXTENT_ITEM 36864) itemoff 10328 itemsize 37 refs 1 gen 1712966 flags DATA shared data backref parent 311408951296 count 1 item 149 key (3114352640 EXTENT_ITEM 45056) itemoff 10291 itemsize 37 refs 1 gen 1712966 flags DATA shared data backref parent 311408951296 count 1 item 150 key (3114397696 EXTENT_ITEM 40960) itemoff 10254 itemsize 37 refs 1 gen 1712966 flags DATA shared data backref parent 311408951296 count 1 item 151 key (3114438656 EXTENT_ITEM 36864) itemoff 10217 itemsize 37 refs 1 gen 1712966 flags DATA shared data backref parent 311408951296 count 1 item 152 key (3114475520 EXTENT_ITEM 4096) itemoff 10193 itemsize 24 refs 2 gen 1701147 flags DATA item 153 key (3114475520 EXTENT_ITEM 36864) itemoff 10169 itemsize 24 refs 1 gen 1712966 flags DATA item 154 key (3114475520 SHARED_DATA_REF 311408951296) itemoff 10165 itemsize 4 shared data backref count 1 item 155 key (3114475520 SHARED_DATA_REF 342561947648) itemoff 10161 itemsize 4 shared data backref count 1 item 156 key (3114475520 SHARED_DATA_REF 348547874816) itemoff 10157 itemsize 4 shared data backref count 1 item 157 key (3114508288 EXTENT_ITEM 4096) itemoff 10120 itemsize 37 refs 1 gen 1713581 flags DATA shared data backref parent 311693983744 count 1 item 158 key (3114512384 EXTENT_ITEM 45056) itemoff 10083 itemsize 37 refs 1 gen 1712966 flags DATA shared data backref parent 311408951296 count 1 item 159 key (3114557440 EXTENT_ITEM 110592) itemoff 10046 itemsize 37 refs 1 gen 1712966 flags DATA shared data backref parent 311408951296 count 1 item 160 key (3114668032 EXTENT_ITEM 102400) itemoff 10009 itemsize 37 refs 1 gen 1712966 flags DATA shared data backref parent 311408951296 count 1 item 161 key (3114770432 EXTENT_ITEM 36864) itemoff 9972 itemsize 37 refs 1 gen 1712966 flags DATA shared data backref parent 311408951296 count 1 item 162 key (3114807296 EXTENT_ITEM 40960) itemoff 9935 itemsize 37 refs 1 gen 1712966 flags DATA shared data backref parent 311408951296 count 1 item 163 key (3114848256 EXTENT_ITEM 36864) itemoff 9898 itemsize 37 -- item 194 key (311447486464 METADATA_ITEM 0) itemoff 13136 itemsize 33 refs 1 gen 1713594 flags TREE_BLOCK|FULL_BACKREF tree block skinny level 0 shared block backref parent 348437413888 item 195 key (311447502848 METADATA_ITEM 0) itemoff 13094 itemsize 42 refs 2 gen 1713200 flags TREE_BLOCK|FULL_BACKREF tree block skinny level 0 shared block backref parent 924172288 shared block backref parent 871956480 item 196 key (311447519232 METADATA_ITEM 0) itemoff 13043 itemsize 51 refs 3 gen 1713594 flags TREE_BLOCK|FULL_BACKREF tree block skinny level 0 shared block backref parent 348542287872 shared block backref parent 348542271488 shared block backref parent 348542238720 item 197 key (311447535616 METADATA_ITEM 0) itemoff 13001 itemsize 42 refs 2 gen 1713311 flags TREE_BLOCK|FULL_BACKREF tree block skinny level 0 shared block backref parent 311441817600 shared block backref parent 773259264 item 198 key (311447552000 METADATA_ITEM 0) itemoff 12959 itemsize 42 refs 2 gen 1713594 flags TREE_BLOCK|FULL_BACKREF tree block skinny level 0 shared block backref parent 1068089344 shared block backref parent 1067450368 item 199 key (311447568384 METADATA_ITEM 0) itemoff 12917 itemsize 42 refs 2 gen 1713499 flags TREE_BLOCK|FULL_BACKREF tree block skinny level 0 shared block backref parent 310977855488 shared block backref parent 941277184 item 200 key (311447584768 METADATA_ITEM 0) itemoff 12884 itemsize 33 refs 1 gen 1713594 flags TREE_BLOCK|FULL_BACKREF tree block skinny level 0 shared block backref parent 310736814080 item 201 key (311447601152 METADATA_ITEM 0) itemoff 12842 itemsize 42 refs 2 gen 1713499 flags TREE_BLOCK|FULL_BACKREF tree block skinny level 0 shared block backref parent 3
Unrecoverable btrfs corruption (backref bytes do not match extent backref)
If this seems anything important and you want me to run some commands to check what happened exactly, I can start VMs with this partition image connected and do whatever is needed. I can't send image anywhere though, since it contains sensitive information. NOTE: I don't need help with partition or data recovery, I'm used to these kinds of crashes and have backups, so no data were lost. P.S. I really wish BTRFS can stop accidentally corrupting itself one day. -- Sincerely, Nazar Mokrynskyi github.com/nazar-pc
Re: Linux 4.14 breaks btrfs filesystems (3 times already)
24.12.17 12:07, Nikolay Borisov пише: > > On 24.12.2017 11:37, Nazar Mokrynskyi wrote: >> Hi folks, >> >> I know this is a bold statement, but this is also exactly what I'm >> experiencing. >> >> 2 filesystems that worked perfectly since July 2015 and one freshly created >> crashed during last 5 weeks since Ubuntu 18.04 switched from 4.13 to 4.14 >> (my current kernel is 4.14.0-11-generic). >> >> I wrote about the first case (backup partition) 5 weeks ago (title was >> "Unrecoverable scrub errors"), but eventually recreated mentioned corrupted >> filesystem, scrubbed and checked other filesystems - everything was good, no >> errors and no warnings. >> >> 4 days ago I noticed that random files on my primary filesystem become >> corrupted in a very interesting way. Sometimes completely, sometimes only >> partially (like I was playing a game and it crashed at certain moment, when >> particular piece of data file was read). I've recreated primary filesystem >> too. >> >> This morning primary filesystem crashed again even harder that before. >> >> Scrub on latest crashed filesystem: >> >> [ 1074.544160] [ cut here ] >> [ 1074.544162] kernel BUG at >> /build/linux-XO_uEE/linux-4.13.0/fs/btrfs/ctree.h:1802! >> [ 1074.544166] invalid opcode: [#1] SMP >> [ 1074.544174] Modules linked in: btrfs xor raid6_pq dm_crypt algif_skcipher >> af_alg intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi >> coretemp kvm_intel kvm snd_usb_audio snd_hda_intel snd_usbmidi_lib irqbypass >> snd_hda_codec crct10dif_pclmul crc32_pclmul snd_hda_core ghash_clmulni_intel >> snd_hwdep pcbc snd_seq_midi snd_seq_midi_event aesni_intel snd_seq >> snd_rawmidi snd_pcm snd_seq_device snd_timer snd cdc_acm soundcore joydev >> input_leds aes_x86_64 crypto_simd glue_helper serio_raw cryptd intel_cstate >> intel_rapl_perf lpc_ich mei_me mei shpchp mac_hid parport_pc ppdev lp >> parport ip_tables x_tables autofs4 overlay nls_iso8859_1 dm_mirror >> dm_region_hash dm_log hid_generic usbhid hid uas usb_storage nouveau mxm_wmi >> video ttm drm_kms_helper igb syscopyarea sysfillrect sysimgblt dca >> fb_sys_fops >> [ 1074.544232] ahci i2c_algo_bit drm ptp libahci nvme pps_core nvme_core wmi >> [ 1074.544240] CPU: 8 PID: 5459 Comm: kworker/u24:0 Not tainted >> 4.13.0-16-generic #19-Ubuntu >> [ 1074.544244] Hardware name: MSI MS-7885/X99A SLI Krait Edition (MS-7885), >> BIOS N.92 01/10/2017 >> [ 1074.544271] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] >> [ 1074.544276] task: 8d7eaecf5d00 task.stack: 9ab182ecc000 >> [ 1074.544292] RIP: 0010:btrfs_extent_inline_ref_size.part.38+0x4/0x6 [btrfs] >> [ 1074.544296] RSP: 0018:9ab182ecfa98 EFLAGS: 00010297 >> [ 1074.544300] RAX: RBX: 00b6 RCX: >> 9ab182ecfa50 >> [ 1074.544303] RDX: 0001 RSI: 36a6 RDI: >> >> [ 1074.544307] RBP: 9ab182ecfa98 R08: 36a7 R09: >> 9ab182ecfa60 >> [ 1074.544310] R10: R11: 0003 R12: >> 8d7e860c6348 >> [ 1074.544313] R13: R14: 36a6 R15: >> 36e5 >> [ 1074.544317] FS: () GS:8d7eef40() >> knlGS: >> [ 1074.544321] CS: 0010 DS: ES: CR0: 80050033 >> [ 1074.544324] CR2: 7faeb402 CR3: 0003bb609000 CR4: >> 003406e0 >> [ 1074.544328] DR0: DR1: DR2: >> >> [ 1074.544332] DR3: DR6: fffe0ff0 DR7: >> 0400 >> [ 1074.544335] Call Trace: >> [ 1074.544348] lookup_inline_extent_backref+0x5a3/0x5b0 [btrfs] >> [ 1074.544360] ? setup_inline_extent_backref+0x16e/0x260 [btrfs] >> [ 1074.544371] insert_inline_extent_backref+0x50/0xe0 [btrfs] >> [ 1074.544382] __btrfs_inc_extent_ref.isra.51+0x7e/0x260 [btrfs] >> [ 1074.544396] ? btrfs_merge_delayed_refs+0x62/0x550 [btrfs] >> [ 1074.544408] __btrfs_run_delayed_refs+0xc52/0x1380 [btrfs] >> [ 1074.544420] btrfs_run_delayed_refs+0x6b/0x250 [btrfs] >> [ 1074.544431] delayed_ref_async_start+0x98/0xb0 [btrfs] >> [ 1074.55] btrfs_worker_helper+0x7a/0x2e0 [btrfs] >> [ 1074.544458] btrfs_extent_refs_helper+0xe/0x10 [btrfs] >> [ 1074.544464] process_one_work+0x1e7/0x410 >> [ 1074.544467] worker_thread+0x4a/0x410 >> [ 1074.544471] kthread+0x125/0x140 >> [ 1074.544474] ? process_one_work+0x410/0x
Linux 4.14 breaks btrfs filesystems (3 times already)
+0x1202f)[0x55748d36202f] btrfs check(+0x4d8cf)[0x55748d39d8cf] btrfs check(+0x4f1c3)[0x55748d39f1c3] btrfs check(+0x52a1c)[0x55748d3a2a1c] btrfs check(+0x53265)[0x55748d3a3265] btrfs check(+0x53d3d)[0x55748d3a3d3d] btrfs check(cmd_check+0x1309)[0x55748d3a6fbc] btrfs check(main+0x142)[0x55748d3686e9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fccb1f971c1] btrfs check(_start+0x2a)[0x55748d36872a] Simple ls in the root of the filesystem right after fresh boot and mount resulted in following: [ 106.573579] [ cut here ] [ 106.573582] kernel BUG at /build/linux-XO_uEE/linux-4.13.0/fs/btrfs/ctree.h:1802! [ 106.573589] invalid opcode: [#1] SMP [ 106.573602] Modules linked in: btrfs xor raid6_pq dm_crypt algif_skcipher af_alg intel_rapl snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm snd_hda_intel snd_usb_audio snd_hda_codec snd_hda_core snd_usbmidi_lib snd_hwdep snd_pcm irqbypass crct10dif_pclmul crc32_pclmul snd_seq_midi ghash_clmulni_intel snd_seq_midi_event pcbc aesni_intel snd_rawmidi snd_seq snd_seq_device cdc_acm snd_timer aes_x86_64 joydev input_leds snd crypto_simd glue_helper soundcore cryptd intel_cstate serio_raw intel_rapl_perf lpc_ich mei_me mei shpchp mac_hid parport_pc ppdev lp parport ip_tables x_tables autofs4 overlay nls_iso8859_1 dm_mirror dm_region_hash dm_log hid_generic usbhid hid uas usb_storage nouveau mxm_wmi video igb ttm drm_kms_helper syscopyarea sysfillrect dca sysimgblt fb_sys_fops [ 106.573704] i2c_algo_bit ahci ptp libahci drm pps_core nvme nvme_core wmi [ 106.573720] CPU: 6 PID: 245 Comm: kworker/u24:4 Not tainted 4.13.0-16-generic #19-Ubuntu [ 106.573727] Hardware name: MSI MS-7885/X99A SLI Krait Edition (MS-7885), BIOS N.92 01/10/2017 [ 106.573773] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [ 106.573785] task: 92b56227dd00 task.stack: a2fb8237c000 [ 106.573817] RIP: 0010:btrfs_extent_inline_ref_size.part.38+0x4/0x6 [btrfs] [ 106.573825] RSP: 0018:a2fb8237fa98 EFLAGS: 00010297 [ 106.573831] RAX: RBX: 00b6 RCX: a2fb8237fa50 [ 106.573838] RDX: 0001 RSI: 36a6 RDI: [ 106.573845] RBP: a2fb8237fa98 R08: 36a7 R09: a2fb8237fa60 [ 106.573852] R10: R11: 0003 R12: 92b52ac96460 [ 106.573858] R13: R14: 36a6 R15: 36e5 [ 106.573866] FS: () GS:92b56f38() knlGS: [ 106.573873] CS: 0010 DS: ES: CR0: 80050033 [ 106.573879] CR2: 7f67cc017028 CR3: 000275009000 CR4: 003406e0 [ 106.573886] DR0: DR1: DR2: [ 106.573893] DR3: DR6: fffe0ff0 DR7: 0400 [ 106.573899] Call Trace: [ 106.573923] lookup_inline_extent_backref+0x5a3/0x5b0 [btrfs] [ 106.573946] ? setup_inline_extent_backref+0x16e/0x260 [btrfs] [ 106.573968] insert_inline_extent_backref+0x50/0xe0 [btrfs] [ 106.573990] __btrfs_inc_extent_ref.isra.51+0x7e/0x260 [btrfs] [ 106.574019] ? btrfs_merge_delayed_refs+0x62/0x550 [btrfs] [ 106.574042] __btrfs_run_delayed_refs+0xc52/0x1380 [btrfs] [ 106.574052] ? __slab_free+0x14c/0x2d0 [ 106.574075] btrfs_run_delayed_refs+0x6b/0x250 [btrfs] [ 106.574097] delayed_ref_async_start+0x98/0xb0 [btrfs] [ 106.574126] btrfs_worker_helper+0x7a/0x2e0 [btrfs] [ 106.574151] btrfs_extent_refs_helper+0xe/0x10 [btrfs] [ 106.574160] process_one_work+0x1e7/0x410 [ 106.574167] worker_thread+0x4a/0x410 [ 106.574174] kthread+0x125/0x140 [ 106.574181] ? process_one_work+0x410/0x410 [ 106.574187] ? kthread_create_on_node+0x70/0x70 [ 106.574195] ret_from_fork+0x25/0x30 [ 106.574200] Code: 89 d1 4c 89 da e8 26 ae f4 ff 58 48 8b 45 c0 65 48 33 04 25 28 00 00 00 74 05 e8 81 a8 80 c1 c9 c3 55 48 89 e5 0f 0b 55 48 89 e5 <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 0f 1f 44 00 00 55 31 [ 106.574276] RIP: btrfs_extent_inline_ref_size.part.38+0x4/0x6 [btrfs] RSP: a2fb8237fa98 [ 106.578666] ---[ end trace bd9d2e91fa0ddda7 ]--- After this kernel was also corrupted and not capable of running the system anymore, so I had to hard reset the system after collecting each piece above. Thankfully I'm doing backups each 15 minutes (after initial btrfs experience) and backup partition is fine (I did scrub and btrfsck on it), so I've quickly restored everything, but this is not funny anymore. Here are mount options for my primary filesystem (SSD > LUKS > BTRFS) and backup filesystem (HDD > LUKS > GPT > BTRFS): compress=lzo,noatime,ssd,subvol=/root compress=lzo,noatime,noexec,noauto Have anyone noticed anything similar (I'm not subscribed to the mailing list)? -- Sincerely, Nazar Mokrynskyi github.com/nazar-pc -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs&qu
Re: Unrecoverable scrub errors
This particular partition was initially created in July 2015. I've added/removed drives a few times when migrating from older to newer hardware, but never used RAID0 or any other RAID level beyond that. Sincerely, Nazar Mokrynskyi github.com/nazar-pc 19.11.17 22:39, Roy Sigurd Karlsbakk пише: > I guess not using RAID-0 would be a good start… > > Vennlig hilsen > > roy > -- > Roy Sigurd Karlsbakk > (+47) 98013356 > http://blogg.karlsbakk.net/ > GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt > -- > Hið góða skaltu í stein höggva, hið illa í snjó rita. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unrecoverable scrub errors
Looks like it is not going to resolve nicely. After removing that problematic snapshot filesystem quickly becomes readonly like so: > [23552.839055] BTRFS error (device dm-2): cleaner transaction attach returned > -30 > [23577.374390] BTRFS info (device dm-2): use lzo compression > [23577.374391] BTRFS info (device dm-2): disk space caching is enabled > [23577.374392] BTRFS info (device dm-2): has skinny extents > [23577.506214] BTRFS info (device dm-2): bdev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, > flush 0, corrupt 24, gen 0 > [23795.026390] BTRFS error (device dm-2): bad tree block start 0 470069510144 > [23795.148193] BTRFS error (device dm-2): bad tree block start 56 470069542912 > [23795.148424] BTRFS warning (device dm-2): dm-2 checksum verify failed on > 470069460992 wanted 54C49539 found FD171FBB level 0 > [23795.148526] BTRFS error (device dm-2): bad tree block start 0 470069493760 > [23795.150461] BTRFS error (device dm-2): bad tree block start 1459617832 > 470069477376 > [23795.639781] BTRFS error (device dm-2): bad tree block start 0 470069510144 > [23795.655487] BTRFS error (device dm-2): bad tree block start 0 470069510144 > [23795.655496] BTRFS: error (device dm-2) in btrfs_drop_snapshot:9244: > errno=-5 IO failure > [23795.655498] BTRFS info (device dm-2): forced readonly Check and repaid doesn't help either: > nazar-pc@nazar-pc ~> sudo btrfs check -p > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > Checking filesystem on > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5 > Extent back ref already exists for 797694840832 parent 330760175616 root 0 > owner 0 offset 0 num_refs 1 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > Ignoring transid failure > leaf parent key incorrect 470072098816 > bad block 470072098816 > > ERROR: errors found in extent allocation tree or chunk allocation > There is no free space entry for 797694844928-797694808064 > There is no free space entry for 797694844928-797819535360 > cache appears valid but isn't 796745793536 > There is no free space entry for 814739984384-814739988480 > There is no free space entry for 814739984384-814999404544 > cache appears valid but isn't 813925662720 > block group 894456299520 has wrong amount of free space > failed to load free space cache for block group 894456299520 > block group 922910457856 has wrong amount of free space > failed to load free space cache for block group 922910457856 > > ERROR: errors found in free space cache > found 963515335717 bytes used, error(s) found > total csum bytes: 921699896 > total tree bytes: 20361920512 > total fs tree bytes: 17621073920 > total extent tree bytes: 1629323264 > btree space waste bytes: 3812167723 > file data blocks allocated: 21167059447808 > referenced 2283091746816 > > nazar-pc@nazar-pc ~> sudo btrfs check --repair -p > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > enabling repair mode > Checking filesystem on > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5 > Extent back ref already exists for 797694840832 parent 330760175616 root 0 > owner 0 offset 0 num_refs 1 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > parent transid verify failed on 470072098816 wanted 1431 found 307965 > Ignoring transid failure > leaf parent key incorrect 470072098816 > bad block 470072098816 > > ERROR: errors found in extent allocation tree or chunk allocation > Fixed 0 roots. > There is no free space entry for 797694844928-797694808064 > There is no free space entry for 797694844928-797819535360 > cache appears valid but isn't 796745793536 > There is no free space entry for 814739984384-814739988480 > There is no free space entry for 814739984384-814999404544 > cache appears valid but isn't 813925662720 > block group 894456299520 has wrong amount of free space > failed to load free space cache for block group 894456299520 > block group 922910457856 has wrong amount of free space > failed to load free space cache for block group 922910457856 > > ERROR: errors found in free space cache > found 963515335717 bytes used, error(s) found > total csum bytes: 921699896 > total tree bytes: 20361920512 > tot
Re: Unrecoverable scrub errors
19.11.17 07:23, Chris Murphy пише: > On Sat, Nov 18, 2017 at 10:13 PM, Nazar Mokrynskyi > wrote: > >> That was eventually useful: >> >> * found some familiar file names (mangled eCryptfs file names from times >> when I used it for home directory) and decided to search for it in old >> snapshots of home directory (about 1/3 of snapshots on that partition) >> * file name was present in snapshots back to July of 2015, but during search >> through snapshot from 2016-10-26_18:47:04 I've got I/O error reported by >> find command at one directory >> * tried to open directory in file manager - same error, fails to open >> * after removing this lets call it "broken" snapshot started new scrub, >> hopefully it'll finish fine >> >> If it is not actually related to recent memory issues I'd be positively >> surprised. Not sure what happened towards the end of October 2016 though, >> especially that backups were on different physical device back then. > Wrong csum computation during the transfer? Did you use btrfs send receive? Yes, I've used send/receive to copy snapshots from primary SSD to backup HDD. Not sure when wrong csum computation happened, since SSD contains only most recent snapshots and only HDD contains older snapshots. Even if error happened on SSD, those older snapshots are gone a long time ago and there is no way to check this. Sincerely, Nazar Mokrynskyi github.com/nazar-pc -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unrecoverable scrub errors
19.11.17 06:33, Chris Murphy пише: > On Sat, Nov 18, 2017 at 8:45 PM, Nazar Mokrynskyi > wrote: >> 19.11.17 05:19, Chris Murphy пише: >>> On Sat, Nov 18, 2017 at 1:15 AM, Nazar Mokrynskyi >>> wrote: >>>> I can assure you that drive (it is HDD) is perfectly functional with 0 >>>> SMART errors or warnings and doesn't have any problems. dmesg is clean in >>>> that regard too, HDD itself can be excluded from potential causes. >>>> >>>> There were however some memory-related issues on my machine a few months >>>> ago, so there is a chance that data might have being written incorrectly >>>> to the drive back then (I didn't run scrub on backup drive for a long >>>> time). >>>> >>>> How can I identify to which files these metadata belong to replace or just >>>> remove them (files)? >>> You might look through the archives about bad ram and btrfs check >>> --repair and include Hugo Mills in the search, I'm pretty sure there >>> is code in repair that can fix certain kinds of memory induced >>> corruption in metadata. But I have no idea if this is that type or if >>> repair can make things worse in this case. So I'd say you get >>> everything off this file system that you want, and then go ahead and >>> try --repair and see what happens. >> In this case I'm not sure if data were written incorrectly or checksum or >> both. So I'd like to first identify the files affected, check them manually >> and then decide what to do with it. Especially there not many errors yet. >> >>> One alternative is to just leave it alone. If you're not hitting these >>> leaves in day to day operation, they won't hurt anything. >> It was working for some time, but I have suspicion that occasionally it >> causes spikes of disk activity because of this errors (which is why I run >> scrub initially). >>> Another alternative is to umount, and use btrfs-debug-tree -b on one >>> of the leaf/node addresses and see what you get (probably an error), >>> but it might still also show the node content so we have some idea >>> what's affected by the error. If it flat out refuses to show the node, >>> might be a feature request to get a flag that forces display of the >>> node such as it is... >> Here is what I've got: >> >>> nazar-pc@nazar-pc ~> sudo btrfs-debug-tree -b 470069460992 >>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 >>> btrfs-progs v4.13.3 >>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 >>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 >>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 >>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 >>> Csum didn't match >>> ERROR: failed to read 470069460992 >> Looks like I indeed need a --force here. >> > Huh, seems overdue. But what do I know? > > You can use btrfs-map-logical -l to get a physical address for this > leaf, and then plug that into dd > > # dd if=/dev/ skip= bs=1 count=16384 2>/dev/null | hexdump -C > > Gotcha of course is this is not translated into the more plain > language output by btrfs-debug-tree. And you're in the weeds with the > on disk format documentation. But maybe you'll see filenames on the > right hand side of the hexdump output and maybe that's enough... Or > maybe it's worth computing a csum on that leaf to check against the > csum for that leaf which is found in the first field of the leaf. I'd > expect the csum itself is what's wrong, because if you get memory > corruption in creating the node, the resulting csum will be *correct* > for that malformed node and there'd be no csum error, you'd just see > some other crazy faceplant. That was eventually useful: * found some familiar file names (mangled eCryptfs file names from times when I used it for home directory) and decided to search for it in old snapshots of home directory (about 1/3 of snapshots on that partition) * file name was present in snapshots back to July of 2015, but during search through snapshot from 2016-10-26_18:47:04 I've got I/O error reported by find command at one directory * tried to open directory in file manager - same error, fails to open * after removing this lets call it "broken" snapshot started new scrub, hopefully it'll finish fine If it is not actually related to recent memory issues I'd be positively surprised. Not sure what happened towards the end of October 2016 though, especially that backups were on different physical device back then. Sincerely, Nazar Mokrynskyi github.com/nazar-pc -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unrecoverable scrub errors
19.11.17 05:19, Chris Murphy пише: > On Sat, Nov 18, 2017 at 1:15 AM, Nazar Mokrynskyi > wrote: >> I can assure you that drive (it is HDD) is perfectly functional with 0 SMART >> errors or warnings and doesn't have any problems. dmesg is clean in that >> regard too, HDD itself can be excluded from potential causes. >> >> There were however some memory-related issues on my machine a few months >> ago, so there is a chance that data might have being written incorrectly to >> the drive back then (I didn't run scrub on backup drive for a long time). >> >> How can I identify to which files these metadata belong to replace or just >> remove them (files)? > You might look through the archives about bad ram and btrfs check > --repair and include Hugo Mills in the search, I'm pretty sure there > is code in repair that can fix certain kinds of memory induced > corruption in metadata. But I have no idea if this is that type or if > repair can make things worse in this case. So I'd say you get > everything off this file system that you want, and then go ahead and > try --repair and see what happens. In this case I'm not sure if data were written incorrectly or checksum or both. So I'd like to first identify the files affected, check them manually and then decide what to do with it. Especially there not many errors yet. > One alternative is to just leave it alone. If you're not hitting these > leaves in day to day operation, they won't hurt anything. It was working for some time, but I have suspicion that occasionally it causes spikes of disk activity because of this errors (which is why I run scrub initially). > Another alternative is to umount, and use btrfs-debug-tree -b on one > of the leaf/node addresses and see what you get (probably an error), > but it might still also show the node content so we have some idea > what's affected by the error. If it flat out refuses to show the node, > might be a feature request to get a flag that forces display of the > node such as it is... Here is what I've got: > nazar-pc@nazar-pc ~> sudo btrfs-debug-tree -b 470069460992 > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > btrfs-progs v4.13.3 > checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 > checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 > checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 > checksum verify failed on 470069460992 found FD171FBB wanted 54C49539 > Csum didn't match > ERROR: failed to read 470069460992 Looks like I indeed need a --force here. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unrecoverable scrub errors
I can assure you that drive (it is HDD) is perfectly functional with 0 SMART errors or warnings and doesn't have any problems. dmesg is clean in that regard too, HDD itself can be excluded from potential causes. There were however some memory-related issues on my machine a few months ago, so there is a chance that data might have being written incorrectly to the drive back then (I didn't run scrub on backup drive for a long time). How can I identify to which files these metadata belong to replace or just remove them (files)? Sincerely, Nazar Mokrynskyi github.com/nazar-pc 18.11.17 05:33, Adam Borowski пише: > On Fri, Nov 17, 2017 at 08:19:11PM -0700, Chris Murphy wrote: >> On Fri, Nov 17, 2017 at 8:41 AM, Nazar Mokrynskyi >> wrote: >> >>>> [551049.038718] BTRFS warning (device dm-2): checksum error at logical >>>> 470069460992 on dev >>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector >>>> 942238048: metadata leaf (level 0) in tree 985 >>>> [551049.038720] BTRFS warning (device dm-2): checksum error at logical >>>> 470069460992 on dev >>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector >>>> 942238048: metadata leaf (level 0) in tree 985 >>>> [551049.038723] BTRFS error (device dm-2): bdev >>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd >>>> 0, flush 0, corrupt 1, gen 0 >>>> [551049.039634] BTRFS warning (device dm-2): checksum error at logical >>>> 470069526528 on dev >>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector >>>> 942238176: metadata leaf (level 0) in tree 985 >>>> [551049.039635] BTRFS warning (device dm-2): checksum error at logical >>>> 470069526528 on dev >>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector >>>> 942238176: metadata leaf (level 0) in tree 985 >>>> [551049.039637] BTRFS error (device dm-2): bdev >>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd >>>> 0, flush 0, corrupt 2, gen 0 >>>> [551049.413114] BTRFS error (device dm-2): unable to fixup (regular) error >>>> at logical 470069460992 on dev >>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 >> These are metadata errors. Are there any other storage stack related >> errors in the previous 2-5 minutes, such as read errors (UNC) or SATA >> link reset messages? >> >>> Maybe I can find snapshot that contains file with wrong checksum and >>> remove corresponding snapshot or something like that? >> It's not a file. It's metadata leaf. > Just for the record: had this be a data block (ie, a non-inline file > extent), the dmesg message would include one of filenames that refer to that > extent. To clear the error, you'd need to remove all such files. > >>>> nazar-pc@nazar-pc ~> sudo btrfs filesystem df /media/Backup >>>> Data, single: total=879.01GiB, used=877.24GiB >>>> System, DUP: total=40.00MiB, used=128.00KiB >>>> Metadata, DUP: total=20.50GiB, used=18.96GiB >>>> GlobalReserve, single: total=512.00MiB, used=0.00B >> Metadata is DUP, but both copies have corruption. Kinda strange. But I >> don't know how close the DUP copies are to each other, if possibly a >> big enough media defect can explain this. > The original post mentioned SSD (but was unclear if _this_ filesystem is > backed by one). If so, DUP is nearly worthless as both copies will be > written to physical cells next to each other, no matter what positions the > FTL shows them at. > > > Meow! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Unrecoverable scrub errors
per/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > [551049.479989] BTRFS warning (device dm-2): checksum error at logical > 470069542912 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 942238208: metadata leaf (level 0) in tree 985 > [551049.479993] BTRFS warning (device dm-2): checksum error at logical > 470069542912 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 942238208: metadata leaf (level 0) in tree 985 > [551049.479997] BTRFS error (device dm-2): bdev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, > flush 0, corrupt 6, gen 0 > [551049.523539] BTRFS error (device dm-2): unable to fixup (regular) error at > logical 470069542912 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > [551051.672589] BTRFS warning (device dm-2): checksum error at logical > 470069460992 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 943286624: metadata leaf (level 0) in tree 985 > [551051.672593] BTRFS warning (device dm-2): checksum error at logical > 470069460992 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 943286624: metadata leaf (level 0) in tree 985 > [551051.672597] BTRFS error (device dm-2): bdev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, > flush 0, corrupt 7, gen 0 > [551051.820776] BTRFS error (device dm-2): unable to fixup (regular) error at > logical 470069460992 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > [551051.945310] BTRFS warning (device dm-2): checksum error at logical > 470069477376 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 943286656: metadata leaf (level 0) in tree 985 > [551051.945314] BTRFS warning (device dm-2): checksum error at logical > 470069477376 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 943286656: metadata leaf (level 0) in tree 985 > [551051.945318] BTRFS error (device dm-2): bdev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, > flush 0, corrupt 8, gen 0 > [551052.112245] BTRFS warning (device dm-2): checksum error at logical > 470069526528 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 943286752: metadata leaf (level 0) in tree 985 > [551052.112247] BTRFS warning (device dm-2): checksum error at logical > 470069526528 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 943286752: metadata leaf (level 0) in tree 985 > [551052.112248] BTRFS error (device dm-2): bdev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, > flush 0, corrupt 9, gen 0 > [551052.183671] BTRFS error (device dm-2): unable to fixup (regular) error at > logical 470069477376 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > [551052.253278] BTRFS error (device dm-2): unable to fixup (regular) error at > logical 470069526528 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > [551052.260305] BTRFS warning (device dm-2): checksum error at logical > 470069493760 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 943286688: metadata leaf (level 0) in tree 985 > [551052.260307] BTRFS warning (device dm-2): checksum error at logical > 470069493760 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector > 943286688: metadata leaf (level 0) in tree 985 > [551052.260308] BTRFS error (device dm-2): bdev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, > flush 0, corrupt 10, gen 0 > [551052.300024] BTRFS error (device dm-2): unable to fixup (regular) error at > logical 470069493760 on dev > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 This is an online backup partition and I have an offline backup partition with the same data, so not very concerned about loosing any data here, but would like to repair it. Are there any better options before resorting to `btrfsck --repair`? Maybe I can find snapshot that contains file with wrong checksum and remove corresponding snapshot or something like that? > nazar-pc@nazar-pc ~> sudo btrfs filesystem show /media/Backup > Label: 'Backup' uuid: 82cfcb0f-0b80-4764-bed6-f529f2030ac5 > Total devices 1 FS bytes used 896.20GiB > devid 1 size 1.00TiB used 920.09GiB path > /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 > > nazar-pc@nazar-pc ~> sudo btrfs filesystem df /media/Backup > Data, single: total=879.01GiB, used=877.24GiB > System, DUP: total=40.00MiB, used=128.00KiB > Metadata, DUP: total=20.50GiB, used
Re: [PATCH 1/2 v2] btrfs-progs: fix btrfs send & receive with -e flag
Hi, Sorry for confusion, I've checked once again and the same issue happens in all cases. I didn't notice this because my regular backups are done automatically in cron task + snapshots look fine despite the error, so I incorrectly assumed an error didn't happen there, but it actually did. I've clarified this in last comment on bugzilla. Sincerely, Nazar Mokrynskyi github.com/nazar-pc Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 28.04.17 13:03, Lakshmipathi.G пише: > I can take a look. What I'm wondering about is why it fails only in the HDD > to SSD case. If -ENODATA is returned with this patch it should mean that there > was no header data. So is the user sure that this doesn't indicate a valid > error? > > Christian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Subvolume copy fails with "ERROR: empty stream is not considered valid"
I've just reported a bug (https://bugzilla.kernel.org/show_bug.cgi?id=195597) that hit me after recent update of btrfs-progs. It seems to be a false-positive that resulted from the changes that aimed to fix another issue. Short version of it is following: root@nazar-pc:~# /bin/btrfs send "/media/Backup/web/2017-04-04_14:30:06" | /bin/btrfs receive "/media/Backup_backup/web" At subvol /media/Backup/web/2017-04-04_14:30:06 At subvol 2017-04-04_14:30:06 ERROR: empty stream is not considered valid I've also added Stéphane Graber to CC as the author of the recent update to Ubuntu's btrfs-progs package. I'm not subscribed to the mailing list, so keep me in copy, please. -- Sincerely, Nazar Mokrynskyi github.com/nazar-pc Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Major HDD performance degradation on btrfs receive
I've being running newer version of just-backup-btrfs, which was configured to remove snapshots in batches ~ at least 3x100 at once (this is what I typically have in 1.5-2 days). Snapshots transferring become much faster, however when I delete 300 snapshots at once, well... you can imagine what happens, but I can afford this on desktop. Seekwatcher fails to run on my system with following error: ~> sudo seekwatcher -t find.trace -o find.png -p 'find /backup_hdd > /dev/null' -d /dev/sda1 Traceback (most recent call last): File "/usr/bin/seekwatcher", line 58, in from seekwatcher import rundata File "numpy.pxd", line 43, in seekwatcher.rundata (seekwatcher/rundata.c:7885) ValueError: numpy.dtype does not appear to be the correct type object I have no idea what does it mean, but generally I think if seeking because of fragmentation is a real cause of performance degradation, then this is something that BTRFS can improve, since I still have 65% of free space on BTRFS partition that receives snapshots and fragmentation in this case seems weird. P.S. I've unsubscribed from mailing list, cc me on answers, please. Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 On 18.03.16 16:22, Nazar Mokrynskyi wrote: >> But seriously, what are you doing that you can't lose more than 15 >> minutes of? Couldn't it be even 20 minutes, or a half-hour or an hour >> with 15 minute snapshots only on the ssds (yes, I know the raid0 factor, >> but the question still applies)? > This is artificial psychological limit. Sometimes when you're actively > coding, it is quite sad to loose even 5 minutes of work, since productivity > is not constant. This is why 15 minutes was chosen as something that is not > too critical. There is no other real reason behind this limit other than how > I feel it. > >> And perhaps more importantly for your data, btrfs is still considered >> "stabilizing, not fully stable and mature". Use without backups is >> highly discouraged, but I'd suggest that btrfs in its current state might >> not be what you're looking for if you can't deal with loss of more than >> 15 minutes worth of changes anyway. >> >> Be that as it may... >> >> Btrfs is definitely not yet optimized. In many cases it reads or writes >> only one device at a time, for instance, even in RaidN configuration. >> And there are definitely snapshot scaling issues altho at your newer 500 >> snapshots total that shouldn't be /too/ bad. > As an (relatively) early adopter I'm fine using experimental stuff with extra > safeties like backups (hey, I've used it even without those while back:)). I > fully acknowledge what is current state of BTRFS and want to help make it > even better by stressing issues that me and other users encounter, searching > for solutions, etc. > >> Dealing with reality, regardless of how or why, you currently have a >> situation of intolerably slow receives that needs addressed. From a >> practical perspective you said an ssd for backups is ridiculous and I >> can't disagree, but there's another "throw hardware at it" solution that >> might be a bit more reasonable... >> >> Spinning rust hard drives are cheap. What about getting another one, and >> alternating your backup receives between them? That would halve the load >> to one every thirty minutes, without changing your 15-minute snapshot and >> backup policy at all. =:^) >> >> So that gives you two choices for halving the load to the spinning rust. >> Either decide you really can live with half-hour loss of data, or throw >> only a relatively small amount of money (well, as long as you have room >> to plug in another sata device anyway, otherwise...) at it for a second >> backup device, and alternate between them. > Yes, I'm leaning toward earning new hardware right now, fortunately, laptop > allows me to insert 2 x mSATA + 2 x 2.5 SATA drives, so I have exactly 2.5 > SATA slot free. > >> OTOH, since you mentioned possible coding, optimization might not be a >> bad thing, if you're willing to put in the time necessary to get up to >> speed with the code and can work with the other devs in terms of timing, >> etc. But that will definitely take significant time even if you do it, >> and the alternating backup solution can be put to use as soon as you can >> get another device plugged in and setup. =:^) > I'm not coding C/C++, so my capabilities to improve BTRFS itself are limited, > but
Re: "bad metadata" not fixed by btrfs repair
I have the same thing with kernel 4.5 and btrfs-progs 4.4. Wrote about it 2 weeks ago and didn't get any answer: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg51609.html However, despite those messages everything seems to work fine. Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 On 28.03.16 21:42, Marc Haber wrote: On Mon, Mar 28, 2016 at 04:37:14PM +0200, Marc Haber wrote: I have a btrfs which btrfs check --repair doesn't fix: # btrfs check --repair /dev/mapper/fanbtr bad metadata [4425377054720, 4425377071104) crossing stripe boundary bad metadata [4425380134912, 4425380151296) crossing stripe boundary bad metadata [4427532795904, 4427532812288) crossing stripe boundary bad metadata [4568321753088, 4568321769472) crossing stripe boundary bad metadata [4568489656320, 4568489672704) crossing stripe boundary bad metadata [4571474493440, 4571474509824) crossing stripe boundary bad metadata [4571946811392, 4571946827776) crossing stripe boundary bad metadata [4572782919680, 4572782936064) crossing stripe boundary bad metadata [4573086351360, 4573086367744) crossing stripe boundary bad metadata [4574221041664, 4574221058048) crossing stripe boundary bad metadata [4574373412864, 4574373429248) crossing stripe boundary bad metadata [4574958649344, 4574958665728) crossing stripe boundary bad metadata [4575996018688, 4575996035072) crossing stripe boundary bad metadata [4580376772608, 4580376788992) crossing stripe boundary repaired damaged extent references Fixed 0 roots. checking free space cache checking fs roots checking csums checking root refs enabling repair mode Checking filesystem on /dev/mapper/fanbtr UUID: 90f8d728-6bae-4fca-8cda-b368ba2c008e cache and super generation don't match, space cache will be invalidated found 97171628230 bytes used err is 0 total csum bytes: 91734220 total tree bytes: 3021848576 total fs tree bytes: 2762784768 total extent tree bytes: 148570112 btree space waste bytes: 545440822 file data blocks allocated: 308328280064 referenced 177314340864 Mounting this filesystem gives: Mar 28 20:25:18 fan kernel: [ 20.979673] BTRFS error (device dm-16): could not find root 8 Mar 28 20:25:18 fan kernel: [ 20.979739] BTRFS error (device dm-16): could not find root 8 Mar 28 20:25:18 fan kernel: [ 20.980900] BTRFS error (device dm-16): could not find root 8 Mar 28 20:25:18 fan kernel: [ 20.980948] BTRFS error (device dm-16): could not find root 8 Mar 28 20:25:18 fan kernel: [ 20.981428] BTRFS error (device dm-16): could not find root 8 Mar 28 20:25:18 fan kernel: [ 20.981472] BTRFS error (device dm-16): could not find root 8 which is not detected by btrfs check. What is going on here? Greetings Marc smime.p7s Description: Кріптографічний підпис S/MIME
Re: Major HDD performance degradation on btrfs receive
But seriously, what are you doing that you can't lose more than 15 minutes of? Couldn't it be even 20 minutes, or a half-hour or an hour with 15 minute snapshots only on the ssds (yes, I know the raid0 factor, but the question still applies)? This is artificial psychological limit. Sometimes when you're actively coding, it is quite sad to loose even 5 minutes of work, since productivity is not constant. This is why 15 minutes was chosen as something that is not too critical. There is no other real reason behind this limit other than how I feel it. And perhaps more importantly for your data, btrfs is still considered "stabilizing, not fully stable and mature". Use without backups is highly discouraged, but I'd suggest that btrfs in its current state might not be what you're looking for if you can't deal with loss of more than 15 minutes worth of changes anyway. Be that as it may... Btrfs is definitely not yet optimized. In many cases it reads or writes only one device at a time, for instance, even in RaidN configuration. And there are definitely snapshot scaling issues altho at your newer 500 snapshots total that shouldn't be /too/ bad. As an (relatively) early adopter I'm fine using experimental stuff with extra safeties like backups (hey, I've used it even without those while back:)). I fully acknowledge what is current state of BTRFS and want to help make it even better by stressing issues that me and other users encounter, searching for solutions, etc. Dealing with reality, regardless of how or why, you currently have a situation of intolerably slow receives that needs addressed. From a practical perspective you said an ssd for backups is ridiculous and I can't disagree, but there's another "throw hardware at it" solution that might be a bit more reasonable... Spinning rust hard drives are cheap. What about getting another one, and alternating your backup receives between them? That would halve the load to one every thirty minutes, without changing your 15-minute snapshot and backup policy at all. =:^) So that gives you two choices for halving the load to the spinning rust. Either decide you really can live with half-hour loss of data, or throw only a relatively small amount of money (well, as long as you have room to plug in another sata device anyway, otherwise...) at it for a second backup device, and alternate between them. Yes, I'm leaning toward earning new hardware right now, fortunately, laptop allows me to insert 2 x mSATA + 2 x 2.5 SATA drives, so I have exactly 2.5 SATA slot free. OTOH, since you mentioned possible coding, optimization might not be a bad thing, if you're willing to put in the time necessary to get up to speed with the code and can work with the other devs in terms of timing, etc. But that will definitely take significant time even if you do it, and the alternating backup solution can be put to use as soon as you can get another device plugged in and setup. =:^) I'm not coding C/C++, so my capabilities to improve BTRFS itself are limited, but I'm always trying to find the reason and fix it instead of living with workarounds forever. I'll play with Seekwatcher and optimizing snapshots deletion and will post an update afterwards. Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 On 17.03.16 09:00, Duncan wrote: Nazar Mokrynskyi posted on Wed, 16 Mar 2016 05:37:02 +0200 as excerpted: I'm not sure what you mean exactly by searching. My first SSD died during waking up from suspend mode, it worked perfectly till last moment. It was not used for critical data at that time, but now I understand clearly that SSD failure can happen at any time. Having RAID0 of 2 SSDs it 2 times more risky, so I'm not ready to lose anything beyond 15 minutes threshold. I'd rather end up having another HDD purely for backup purposes. I understand the raid0 N times the danger part, which is why I only ever used raid0 on stuff like the distro packages cache that I could easily redownload from the net, here. But seriously, what are you doing that you can't lose more than 15 minutes of? Couldn't it be even 20 minutes, or a half-hour or an hour with 15 minute snapshots only on the ssds (yes, I know the raid0 factor, but the question still applies)? What /would/ you do if you lost a whole hour's worth of work? Surely you could duplicate it in the next hour? Or are you doing securities trading or something, where you /can't/ recover work at all, because by then the market and your world have moved on? But in that case... And perhaps more importantly for your data, btrfs is still considered "stabilizing, not fully stable and mature". Use without backups is highly discouraged,
Re: Major HDD performance degradation on btrfs receive
Sounds like a really good idea! I'll try to implement in in my backup tool, but it might take some time to see real benefit from it (or no benefit:)). Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora:naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 On 16.03.16 06:18, Chris Murphy wrote: Very simplistically: visualizing Btrfs writes without file deletion, it's a contiguous write. There isn't much scatter, even accounting for metadata and data chunk writes happening in slightly different regions of platter space. (I'm thinking this slow down happens overwhelmingly on HDDs.) If there are file deletions, holes appear, and now some later writes will fill those holes, but not exactly, which will lead to fragmentation and thus seek times. Seeks would go up by a lot the smaller the holes are. And the holes are smaller the fewer files are being deleted at once. If there's a snapshot, and then file deletions, holes don't appear. Everything is always copy on write and deleted files don't actually get deleted (they're still in another subvolume). So as soon as a file is reflinked or in a snapshotted subvolume there's no fragmentation happening with file deletions. If there's many snapshots happening in a short time, such as once every 10 minutes, that means only 10 minutes worth of writes happening in a given subvolume. If that space is later released by deleting snapshots one at time (like a rolling snapshot and delete strategy every 10 minutes) that means only small holes are opening up for later writes. It's maybe the worst case scenario for fragmenting Btrfs. A better way might be to delay snapshot deletion. Keep taking the snapshots, but delete old snapshots in batches. Delete maybe 10 or 100 (if we're talking thousands of snapshots) at once. This should free a lot more contiguous space for later writes and significantly reduce the chance of significant fragmentation. Of course some fragmentation is going to happen no matter what, but I think the usage pattern described in a lot of these slow down cases sound to me like worse case scenario for cow. Now, a less lazy person would actually test this hypothesis. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message tomajord...@vger.kernel.org More majordomo info athttp://vger.kernel.org/majordomo-info.html smime.p7s Description: Кріптографічний підпис S/MIME
bad metadata [125501440, 125517824) crossing stripe boundary
I was running btrfsck today and got many of such errors: bad metadata [125501440, 125517824) crossing stripe boundary bad metadata [131334144, 131350528) crossing stripe boundary bad metadata [142999552, 143015936) crossing stripe boundary bad metadata [153944064, 153960448) crossing stripe boundary bad metadata [281870336, 281886720) crossing stripe boundary bad metadata [528285696, 528302080) crossing stripe boundary bad metadata [661323776, 661340160) crossing stripe boundary bad metadata [986316800, 986333184) crossing stripe boundary bad metadata [987168768, 987185152) crossing stripe boundary bad metadata [1029111808, 1029128192) crossing stripe boundary bad metadata [1099169792, 1099186176) crossing stripe boundary I was able to find message with similar error on mainling list from January, but didn't found any answer. I'm on Kernel 4.5.0 stable and btrfs-tools 4.4. filesystem was created at the beginning of July 2015. Here is btrfs scrub output: scrub status for 40b8240a-a0a2-4034-ae55-f8558c0343a8 scrub started at Wed Mar 16 04:13:54 2016 and finished after 00:52:51 total bytes scrubbed: 274.05GiB with 0 errors Looks like no metadata errors found, so what that "bad metadata" things really mean? -- Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 smime.p7s Description: Кріптографічний підпис S/MIME
Re: Major HDD performance degradation on btrfs receive
test log data structure revision number 0 Note: revision number not 1 implies that no selective self-test has ever been run SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 100 Completed [00% left] (0-65535) 200 Not_testing 300 Not_testing 400 Not_testing 500 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. gnome-disks says it worked 15 days and 8 minutes I also almost never look at the backups, and when I do, indeed scanning through a 1000 snapshots fs on spinning disk takes time. If a script does that every 15mins, and the fs uses LZO compression and there is another active partition then you will have to deal with the slowness. Well, it is not that bad and hard in reality. Every 15 minutes I'm transfering 3 diffs. Right now HDD contains 453 subvolumes totally, 34% of 359 GiB partition space used. After writing last message I've decided to collect diffs for further analysis. So /home subvolume's diffs ranging from 6 to 270 MiB. Typically 30-40 MiB. /root subvolume's diffs ranging from 10 KiB to 380 MiB (during software updates). Typically 40-80 KiB. /web (source code here) subvolume's diffs ranging from bytes to 1 MiB, typically 150 KiB. So generally when I'm watching movie or playing some game (not changing source code, not updating software and not doing anything that might cause significant changes in /home subvolume) I'll get about 30 MiB of diff in total. This is not that much for SATA3 HDD, it shouldn't stuck for some seconds when everything is so slow that video stops completely for few seconds. Maybe BTRFS construction requires this small diff to make a big party all over HDD, I don't know, but there is some problem here for sure. You could adapt the script or backup method not to search every time, but to just write the next diff send|receive and only step back and search if this fails. Or keeping more 15min snapshots only on SSD and lower the rate of send|receive them to HDD I'm not sure what you mean exactly by searching. My first SSD died during waking up from suspend mode, it worked perfectly till last moment. It was not used for critical data at that time, but now I understand clearly that SSD failure can happen at any time. Having RAID0 of 2 SSDs it 2 times more risky, so I'm not ready to lose anything beyond 15 minutes threshold. I'd rather end up having another HDD purely for backup purposes. Interesting question: is there any tool to see the whole picture about how btrfs partition is fragmented? I saw many tools for NTFS on Windows that show nice picture, but not for Linux filesystems. Saw answer on StackOverlow about fsck, but btrfsck doesn't provide similar output. Also, I can't really run defragmentation anyway since all backups are read-only. Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 On 16.03.16 01:11, Henk Slager wrote: On Tue, Mar 15, 2016 at 1:47 AM, Nazar Mokrynskyi wrote: Some update since last time (few weeks ago). All filesystems are mounted with noatime, I've also added mounting optimization - so there is no problem with remounting filesystem every time, it is done only once. Remounting optimization helped by reducing 1 complete snapshot + send/receive cycle by some seconds, but otherwise it is still very slow when `btrfs receive` is active. OK, great that the umount+mount is gone. I think most time is unfortunately spent in seeks; I think over time and due to various factors, both free space and files are highly fragmented on your disk. It could also be that the disk is bit older and has or is starting to use its spare sectors. I'm not considering bcache + btrfs as potential setup because I do not currently have free SSD for it and basically spending SSD besides HDD for backup partition feels like a bit of overkill (especially for desktop use). Yes I think so too; For backup, I am also a bit reluctant to use bcache. But the big difference is that you do a snapshot transfer every 15minute while I do that only every 24hour. So I almost dont care how long the send|receive takes in the middle of the night. I also almost never look at the backups, and when I do, indeed scanning through a 1000 snapshots fs on spinning disk takes time. If a script does that every 15mins, and the fs uses LZO compression and there is another active partition then you will have to deal with the slowness. And if the files are mostly small, like source-trees, it gets even worse. So it is about 100x more creates+deletes of subvolumes. To be honest, it is just requiring too much from a HDD
Re: Major HDD performance degradation on btrfs receive
Some update since last time (few weeks ago). All filesystems are mounted with noatime, I've also added mounting optimization - so there is no problem with remounting filesystem every time, it is done only once. Remounting optimization helped by reducing 1 complete snapshot + send/receive cycle by some seconds, but otherwise it is still very slow when `btrfs receive` is active. I'm not considering bcache + btrfs as potential setup because I do not currently have free SSD for it and basically spending SSD besides HDD for backup partition feels like a bit of overkill (especially for desktop use). My current kernel is 4.5.0 stable, btrfs-tools still 4.4-1 from Ubuntu 16.04 repository as of today. As I'm reading mailing list there are other folks having similar performance issues. So can we debug things to find the root cause and fix it at some point? My C/C++/Kernel/BTRFS knowledges are scarce, which is why some assistance here is needed from someone more experienced. Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 On 25.02.16 03:04, Henk Slager wrote: On Wed, Feb 24, 2016 at 11:45 PM, Nazar Mokrynskyi wrote: Here is btrfs-show-super output: nazar-pc@nazar-pc ~> sudo btrfs-show-super /dev/sda1 superblock: bytenr=65536, device=/dev/sda1 - csum0x1e3c6fb8 [match] bytenr65536 flags0x1 ( WRITTEN ) magic_BHRfS_M [match] fsid40b8240a-a0a2-4034-ae55-f8558c0343a8 labelBackup generation165491 root143985360896 sys_array_size226 chunk_root_generation162837 root_level1 chunk_root247023583232 chunk_root_level1 log_root0 log_root_transid0 log_root_level0 total_bytes858993459200 bytes_used276512202752 sectorsize4096 nodesize16384 leafsize16384 stripesize4096 root_dir6 num_devices1 compat_flags0x0 compat_ro_flags0x0 incompat_flags0x169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) csum_type0 csum_size4 cache_generation165491 uuid_tree_generation165491 dev_item.uuid81eee7a6-774e-4bb5-8b72-cebb85a2f2ce dev_item.fsid40b8240a-a0a2-4034-ae55-f8558c0343a8 [match] dev_item.type0 dev_item.total_bytes858993459200 dev_item.bytes_used291072114688 dev_item.io_align4096 dev_item.io_width4096 dev_item.sector_size4096 dev_item.devid1 dev_item.dev_group0 dev_item.seek_speed0 dev_item.bandwidth0 dev_item.generation0 It is sad that skinny metadata will only affect new data, probably, I'll end up re-creating it:( Can I rebalance it or something simple for this purpose? A balance won't help for that and also your metadata does look quite compact already. But I think you should not expect so much of this skinny metadata on a PC with 16G RAM Those are quite typical values for an already heavily used btrfs on a HDD. Bad news, since I'm doing mounting/unmounting few times during snapshots creation because of how BTRFS works (source code: https://github.com/nazar-pc/just-backup-btrfs/blob/master/just-backup-btrfs.php#L148) So if 10+20 seconds is typical, then in my case HDD can be very busy during a minute or sometimes more, this is not good and basically part or even real reason of initial question. Yes indeed! This mount/unmount every 15 minutes (or more times per 15 minutes) is killing for performance IMO. At the moment I don't fully understand why you are bothered by the limitation you mention in the php source comments. I think it's definitely worth to change paths and/or your requirements in such a way that you can avoid the umount/mount. As a workaround, bcache with its cache device nicely filled over time, will absolutely speedup the mount. But as you had some troubles with btrfs in the past and also you use ext4 on the same disk because it is a more mature filesystem, you might not want bache+btrfs for backup storage, it is up to you. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html smime.p7s Description: Кріптографічний підпис S/MIME
Re: Major HDD performance degradation on btrfs receive
Here is btrfs-show-super output: nazar-pc@nazar-pc ~> sudo btrfs-show-super /dev/sda1 superblock: bytenr=65536, device=/dev/sda1 - csum0x1e3c6fb8 [match] bytenr65536 flags0x1 ( WRITTEN ) magic_BHRfS_M [match] fsid40b8240a-a0a2-4034-ae55-f8558c0343a8 labelBackup generation165491 root143985360896 sys_array_size226 chunk_root_generation162837 root_level1 chunk_root247023583232 chunk_root_level1 log_root0 log_root_transid0 log_root_level0 total_bytes858993459200 bytes_used276512202752 sectorsize4096 nodesize16384 leafsize16384 stripesize4096 root_dir6 num_devices1 compat_flags0x0 compat_ro_flags0x0 incompat_flags0x169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) csum_type0 csum_size4 cache_generation165491 uuid_tree_generation165491 dev_item.uuid81eee7a6-774e-4bb5-8b72-cebb85a2f2ce dev_item.fsid40b8240a-a0a2-4034-ae55-f8558c0343a8 [match] dev_item.type0 dev_item.total_bytes858993459200 dev_item.bytes_used291072114688 dev_item.io_align4096 dev_item.io_width4096 dev_item.sector_size4096 dev_item.devid1 dev_item.dev_group0 dev_item.seek_speed0 dev_item.bandwidth0 dev_item.generation0 It is sad that skinny metadata will only affect new data, probably, I'll end up re-creating it:( Can I rebalance it or something simple for this purpose? Those are quite typical values for an already heavily used btrfs on a HDD. Bad news, since I'm doing mounting/unmounting few times during snapshots creation because of how BTRFS works (source code: https://github.com/nazar-pc/just-backup-btrfs/blob/master/just-backup-btrfs.php#L148) So if 10+20 seconds is typical, then in my case HDD can be very busy during a minute or sometimes more, this is not good and basically part or even real reason of initial question. Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 On 24.02.16 23:32, Henk Slager wrote: On Tue, Feb 23, 2016 at 6:44 PM, Nazar Mokrynskyi wrote: Looks like btrfstune -x did nothing, probably, it was already used at creation time, I'm using rcX versions of kernel all the time and rolling version of Ubuntu, so this is very likely to be the case. The commandbtrfs-show-super shows the features of the filesystem. You have a 'dummy' single profiles on the HDD fs and that gives me a hint that you likely have used older tools to create the fs. The current kernel does not set this feature flag on disk. If the flag was already set, then no difference in performance. If it was not set, then from now on, new metadata extents should be skinny, which saves on total memory size and processing for (the larger) filesystems. But for your existing data (snapshot subvolumes in your case) the metadata is then still non-skinny. So you won't notice an instant difference only after all exiting fileblocks are re-written or removed. You will probably have a measurable difference if you equally fill 2 filesystems, one with and the other without the flag. One thing I've noticed is much slower mount/umount on HDD than on SSD: nazar-pc@nazar-pc ~> time sudo umount /backup 0.00user 0.00system 0:00.01elapsed 36%CPU (0avgtext+0avgdata 7104maxresident)k 0inputs+0outputs (0major+784minor)pagefaults 0swaps nazar-pc@nazar-pc ~> time sudo mount /backup 0.00user 0.00system 0:00.03elapsed 23%CPU (0avgtext+0avgdata 7076maxresident)k 0inputs+0outputs (0major+803minor)pagefaults 0swaps nazar-pc@nazar-pc ~> time sudo umount /backup_hdd 0.00user 0.11system 0:01.04elapsed 11%CPU (0avgtext+0avgdata 7092maxresident)k 0inputs+15296outputs (0major+787minor)pagefaults 0swaps nazar-pc@nazar-pc ~> time sudo mount /backup_hdd 0.00user 0.02system 0:04.45elapsed 0%CPU (0avgtext+0avgdata 7140maxresident)k 14648inputs+0outputs (0major+795minor)pagefaults 0swaps It is especially long (tenth of seconds with hight HDD load) when called after some time, not consequently. Once it took something like 20 seconds to unmount filesystem and around 10 seconds to mount it. Those are quite typical values for an already heavily used btrfs on a HDD. About memory - 16 GiB of RAM should be enough I guess:) Can I measure somehow if seeking is a problem? I don't know a tool that can measure seek times and gather statistics over and extended period of time and relate that to filesystem internal actions. It would be best if all this were done by the HDD firmware (under command of t
Re: Major HDD performance degradation on btrfs receive
Looks like btrfstune -x did nothing, probably, it was already used at creation time, I'm using rcX versions of kernel all the time and rolling version of Ubuntu, so this is very likely to be the case. One thing I've noticed is much slower mount/umount on HDD than on SSD: nazar-pc@nazar-pc ~> time sudo umount /backup 0.00user 0.00system 0:00.01elapsed 36%CPU (0avgtext+0avgdata 7104maxresident)k 0inputs+0outputs (0major+784minor)pagefaults 0swaps nazar-pc@nazar-pc ~> time sudo mount /backup 0.00user 0.00system 0:00.03elapsed 23%CPU (0avgtext+0avgdata 7076maxresident)k 0inputs+0outputs (0major+803minor)pagefaults 0swaps nazar-pc@nazar-pc ~> time sudo umount /backup_hdd 0.00user 0.11system 0:01.04elapsed 11%CPU (0avgtext+0avgdata 7092maxresident)k 0inputs+15296outputs (0major+787minor)pagefaults 0swaps nazar-pc@nazar-pc ~> time sudo mount /backup_hdd 0.00user 0.02system 0:04.45elapsed 0%CPU (0avgtext+0avgdata 7140maxresident)k 14648inputs+0outputs (0major+795minor)pagefaults 0swaps It is especially long (tenth of seconds with hight HDD load) when called after some time, not consequently. Once it took something like 20 seconds to unmount filesystem and around 10 seconds to mount it. Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora:naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 On 22.02.16 20:58, Nazar Mokrynskyi wrote: On Tue, Feb 16, 2016 at 5:44 AM, Nazar Mokrynskyi wrote: > I have 2 SSD with BTRFS filesystem (RAID) on them and several subvolumes. > Each 15 minutes I'm creating read-only snapshot of subvolumes /root, /home > and /web inside /backup. > After this I'm searching for last common subvolume on /backup_hdd, sending > difference between latest common snapshot and simply latest snapshot to > /backup_hdd. > On top of all above there is snapshots rotation, so that /backup contains > much less snapshots than /backup_hdd. > > I'm using this setup for last 7 months or so and this is luckily the longest > period when I had no problems with BTRFS at all. > However, last 2+ months btrfs receive command loads HDD so much that I can't > even get list of directories in it. > This happens even if diff between snapshots is really small. > HDD contains 2 filesystems - mentioned BTRFS and ext4 for other files, so I > can't even play mp3 file from ext4 filesystem while btrfs receive is > running. > Since I'm running everything each 15 minutes this is a real headache. > > My guess is that performance hit might be caused by filesystem fragmentation > even though there is more than enough empty space. But I'm not sure how to > properly check this and can't, obviously, run defragmentation on read-only > subvolumes. > > I'll be thankful for anything that might help to identify and resolve this > issue. > > ~> uname -a > Linux nazar-pc 4.5.0-rc4-haswell #1 SMP Tue Feb 16 02:09:13 CET 2016 x86_64 > x86_64 x86_64 GNU/Linux > > ~> btrfs --version > btrfs-progs v4.4 > > ~> sudo btrfs fi show > Label: none uuid: 5170aca4-061a-4c6c-ab00-bd7fc8ae6030 > Total devices 2 FS bytes used 71.00GiB > devid1 size 111.30GiB used 111.30GiB path /dev/sdb2 > devid2 size 111.30GiB used 111.29GiB path /dev/sdc2 > > Label: 'Backup' uuid: 40b8240a-a0a2-4034-ae55-f8558c0343a8 > Total devices 1 FS bytes used 252.54GiB > devid1 size 800.00GiB used 266.08GiB path /dev/sda1 > > ~> sudo btrfs fi df / > Data, RAID0: total=214.56GiB, used=69.10GiB > System, RAID1: total=8.00MiB, used=16.00KiB > System, single: total=4.00MiB, used=0.00B > Metadata, RAID1: total=4.00GiB, used=1.87GiB > Metadata, single: total=8.00MiB, used=0.00B > GlobalReserve, single: total=512.00MiB, used=0.00B > > ~> sudo btrfs fi df /backup_hdd > Data, single: total=245.01GiB, used=243.61GiB > System, DUP: total=32.00MiB, used=48.00KiB > System, single: total=4.00MiB, used=0.00B > Metadata, DUP: total=10.50GiB, used=8.93GiB > Metadata, single: total=8.00MiB, used=0.00B > GlobalReserve, single: total=512.00MiB, used=0.00B > > Relevant mount options: > UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/ btrfs > compress=lzo,noatime,relatime,ssd,subvol=/root0 1 > UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/home btrfs > compress=lzo,noatime,relatime,ssd,subvol=/home 01 > UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/backup btrfs > compress=lzo,noatime,relatime,ssd,subvol=/backup 01 > UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/web btrfs > compress=lzo,noatime,relatime,ssd,subvol=/web 01 > UUID=40b8240a-a0a2-4034-ae55-f8558c0343a8/backup_hdd btrfs > compress=lzo,noatime,relatime,noexec 01 As alrea
Re: Major HDD performance degradation on btrfs receive
Wow, this is interesting, didn't know it. I'll probably try noatime instead:) Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 On 23.02.16 18:29, Alexander Fougner wrote: 2016-02-23 18:18 GMT+01:00 Nazar Mokrynskyi : But why? I have relatime option, it should not cause changes unless file contents is actually changed if I understand this option correctly. *or* if it is older than 1 day. From the manpages: relatime Update inode access times relative to modify or change time. Access time is only updated if the previous access time was earlier than the current modify or change time. (Similar to noatime, but it doesn't break mutt or other applications that need to know if a file has been read since the last time it was modified.) Since Linux 2.6.30, the kernel defaults to the behavior provided by this option (unless noatime was specified), and the strictatime option is required to obtain traditional semantics. In addition, since Linux 2.6.30, the file's last access time is always updated if it is more than 1 day old. <<<<< Also, if you only use relatime, then you don't need to specify it, it's the default since 2.6.30 as mentioned above. Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 On 23.02.16 18:05, Alexander Fougner wrote: 2016-02-23 17:55 GMT+01:00 Nazar Mokrynskyi : What is wrong with noatime,relatime? I'm using them for a long time as good compromise in terms of performance. The one option ends up canceling the other, as they're both atime related options that say do different things. I'd have to actually setup a test or do some research to be sure which one overrides the other (but someone here probably can say without further research), tho I'd /guess/ the latter one overrides the earlier one, which would effectively make them both pretty much useless, since relatime is the normal kernel default and thus doesn't need to be specified. Noatime is strongly recommended for btrfs, however, particularly with snapshots, as otherwise, the changes between snapshots can consist mostly of generally useless atime changes. (FWIW, after over a decade of using noatime here (I first used it on the then new reiserfs, after finding a recommendation for it on that), I got tired of specifying the option on nearly all my fstab entries, and now days carry a local kernel patch that changes the default to noatime, allowing me to drop specifying it everywhere. I don't claim to be a coder, let alone a kernel level coder, but as a gentooer used to building from source for over a decade, I've found that I can often find the code behind some behavior I'd like to tweak, and given good enough comments, I can often create trivial patches to accomplish that tweak, even if it's not exactly the code a real C coder would choose to use, which is exactly what I've done here. So now, unless some other atime option is specified, my filesystems are all mounted noatime. =:^) Well, then I'll leave relatime on root fs and noatime on partition with snapshots, thanks. If you snapshot the root filesystem then the atime changes will still be there, and you'll be having a lot of unnecessary changes between each snapshot. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html smime.p7s Description: Кріптографічний підпис S/MIME
Re: Major HDD performance degradation on btrfs receive
But why? I have relatime option, it should not cause changes unless file contents is actually changed if I understand this option correctly. Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 On 23.02.16 18:05, Alexander Fougner wrote: 2016-02-23 17:55 GMT+01:00 Nazar Mokrynskyi : What is wrong with noatime,relatime? I'm using them for a long time as good compromise in terms of performance. The one option ends up canceling the other, as they're both atime related options that say do different things. I'd have to actually setup a test or do some research to be sure which one overrides the other (but someone here probably can say without further research), tho I'd /guess/ the latter one overrides the earlier one, which would effectively make them both pretty much useless, since relatime is the normal kernel default and thus doesn't need to be specified. Noatime is strongly recommended for btrfs, however, particularly with snapshots, as otherwise, the changes between snapshots can consist mostly of generally useless atime changes. (FWIW, after over a decade of using noatime here (I first used it on the then new reiserfs, after finding a recommendation for it on that), I got tired of specifying the option on nearly all my fstab entries, and now days carry a local kernel patch that changes the default to noatime, allowing me to drop specifying it everywhere. I don't claim to be a coder, let alone a kernel level coder, but as a gentooer used to building from source for over a decade, I've found that I can often find the code behind some behavior I'd like to tweak, and given good enough comments, I can often create trivial patches to accomplish that tweak, even if it's not exactly the code a real C coder would choose to use, which is exactly what I've done here. So now, unless some other atime option is specified, my filesystems are all mounted noatime. =:^) Well, then I'll leave relatime on root fs and noatime on partition with snapshots, thanks. If you snapshot the root filesystem then the atime changes will still be there, and you'll be having a lot of unnecessary changes between each snapshot. smime.p7s Description: Кріптографічний підпис S/MIME
Re: Major HDD performance degradation on btrfs receive
> What is wrong with noatime,relatime? I'm using them for a long time as > good compromise in terms of performance. The one option ends up canceling the other, as they're both atime related options that say do different things. I'd have to actually setup a test or do some research to be sure which one overrides the other (but someone here probably can say without further research), tho I'd /guess/ the latter one overrides the earlier one, which would effectively make them both pretty much useless, since relatime is the normal kernel default and thus doesn't need to be specified. Noatime is strongly recommended for btrfs, however, particularly with snapshots, as otherwise, the changes between snapshots can consist mostly of generally useless atime changes. (FWIW, after over a decade of using noatime here (I first used it on the then new reiserfs, after finding a recommendation for it on that), I got tired of specifying the option on nearly all my fstab entries, and now days carry a local kernel patch that changes the default to noatime, allowing me to drop specifying it everywhere. I don't claim to be a coder, let alone a kernel level coder, but as a gentooer used to building from source for over a decade, I've found that I can often find the code behind some behavior I'd like to tweak, and given good enough comments, I can often create trivial patches to accomplish that tweak, even if it's not exactly the code a real C coder would choose to use, which is exactly what I've done here. So now, unless some other atime option is specified, my filesystems are all mounted noatime. =:^) Well, then I'll leave relatime on root fs and noatime on partition with snapshots, thanks. Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 smime.p7s Description: Кріптографічний підпис S/MIME
Re: Major HDD performance degradation on btrfs receive
On Tue, Feb 16, 2016 at 5:44 AM, Nazar Mokrynskyi wrote: > I have 2 SSD with BTRFS filesystem (RAID) on them and several subvolumes. > Each 15 minutes I'm creating read-only snapshot of subvolumes /root, /home > and /web inside /backup. > After this I'm searching for last common subvolume on /backup_hdd, sending > difference between latest common snapshot and simply latest snapshot to > /backup_hdd. > On top of all above there is snapshots rotation, so that /backup contains > much less snapshots than /backup_hdd. > > I'm using this setup for last 7 months or so and this is luckily the longest > period when I had no problems with BTRFS at all. > However, last 2+ months btrfs receive command loads HDD so much that I can't > even get list of directories in it. > This happens even if diff between snapshots is really small. > HDD contains 2 filesystems - mentioned BTRFS and ext4 for other files, so I > can't even play mp3 file from ext4 filesystem while btrfs receive is > running. > Since I'm running everything each 15 minutes this is a real headache. > > My guess is that performance hit might be caused by filesystem fragmentation > even though there is more than enough empty space. But I'm not sure how to > properly check this and can't, obviously, run defragmentation on read-only > subvolumes. > > I'll be thankful for anything that might help to identify and resolve this > issue. > > ~> uname -a > Linux nazar-pc 4.5.0-rc4-haswell #1 SMP Tue Feb 16 02:09:13 CET 2016 x86_64 > x86_64 x86_64 GNU/Linux > > ~> btrfs --version > btrfs-progs v4.4 > > ~> sudo btrfs fi show > Label: none uuid: 5170aca4-061a-4c6c-ab00-bd7fc8ae6030 > Total devices 2 FS bytes used 71.00GiB > devid1 size 111.30GiB used 111.30GiB path /dev/sdb2 > devid2 size 111.30GiB used 111.29GiB path /dev/sdc2 > > Label: 'Backup' uuid: 40b8240a-a0a2-4034-ae55-f8558c0343a8 > Total devices 1 FS bytes used 252.54GiB > devid1 size 800.00GiB used 266.08GiB path /dev/sda1 > > ~> sudo btrfs fi df / > Data, RAID0: total=214.56GiB, used=69.10GiB > System, RAID1: total=8.00MiB, used=16.00KiB > System, single: total=4.00MiB, used=0.00B > Metadata, RAID1: total=4.00GiB, used=1.87GiB > Metadata, single: total=8.00MiB, used=0.00B > GlobalReserve, single: total=512.00MiB, used=0.00B > > ~> sudo btrfs fi df /backup_hdd > Data, single: total=245.01GiB, used=243.61GiB > System, DUP: total=32.00MiB, used=48.00KiB > System, single: total=4.00MiB, used=0.00B > Metadata, DUP: total=10.50GiB, used=8.93GiB > Metadata, single: total=8.00MiB, used=0.00B > GlobalReserve, single: total=512.00MiB, used=0.00B > > Relevant mount options: > UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/ btrfs > compress=lzo,noatime,relatime,ssd,subvol=/root0 1 > UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/home btrfs > compress=lzo,noatime,relatime,ssd,subvol=/home 01 > UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/backup btrfs > compress=lzo,noatime,relatime,ssd,subvol=/backup 01 > UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/web btrfs > compress=lzo,noatime,relatime,ssd,subvol=/web 01 > UUID=40b8240a-a0a2-4034-ae55-f8558c0343a8/backup_hdd btrfs > compress=lzo,noatime,relatime,noexec 01 As already indicated by Duncan, the amount of snapshots might be just too much. The fragmentation on the HDD might have become very high. If there is limited amount of RAM in the system (so limited caching), too much time is lost in seeks. In addition: compress=lzo this also increases the chance of scattering fragments and fragmentation. noatime,relatime I am not sure why you have this. Hopefully you have the actual mount listed as noatime You could use the principles of the tool/package called snapper to do a sort of non-linear snapshot thinning: further back in time you will have a much higher granularity of snapshot over a certain timeframe. You could use skinny metadata (recreate the fs with newer tools or use btrfstune -x on /dev/sda1). I think at the moment this flag is not enabled on /dev/sda1 If you put just 1 btrfs fs on the hdd (so move all the content from the ext4 fs in the the btrfs fs) you might get better overall performance. I assume the ext4 fs is on the second (slower part) of the HDD and that is a disadvantage I think. But you probably have reasons for why the setup is like it is. I've replied to Duncan's message about number of snapshots, there is snapshots rotation and number of snapshots it is quite small, 491 in total. About memory - 16 GiB of RAM should be enough I guess:) Can I measure somehow if seeking is a problem? What is wrong with noatime,relatime? I'm using them for a long time as good compromise in terms of pe
Re: Major HDD performance degradation on btrfs receive
x27;ll have 2-3 days worth of hourly snapshots on LABEL=backup, so upto 72 hourly snapshots per subvolume. If on the 8th day you thin down to six-hourly, 4/day, cutting out 2/3, you'll have five days of 12/day/ subvolume, 60 snapshots per subvolume, plus the 72, 132 snapshots per subvolume total, to 8 days out so you can recover over a week's worth at at least 2-hourly, if needed. If then on the 32 day (giving you a month's worth of at least 4X/day), you cut every other one, giving you twice a day snapshots, that's 24 days of 2X/day or 48 snapshots per subvolume, plus the 132 from before, 180 snapshots per subvolume total, now. If then on the 92 day (giving you two more months of 2X/day, a quarter's worth of at least 2X/day) you again thin every other one, to one per day, you have 60 days @ 2X/day or 120 snapshots per subvolume, plus the 180 we had already, 300 snapshots per subvolume, now. OK, so we're already over our target 250/subvolume, so we could thin a bit more drastically. However, we're only snapshotting three subvolumes, so we can afford a bit of lenience on the per-subvolume cap as that's assuming 4-8 snapshotted subvolumes, and we're still well under our total filesystem snapshot cap. If then you keep another quarter's worth of daily snapshots, out to 183 days, that's 91 days of daily snapshots, 91 per subvolume, on top of the 300 we had, so now 391 snapshots per subvolume. If you then thin to weekly snapshots, cutting 6/7, and keep them around another 27 weeks (just over half a year, thus over a year total), that's 27 more snapshots per subvolume, plus the 391 we had, 418 snapshots per subvolume total. 418 snapshots per subvolume total, starting at 3-4X per hour to /backup and hourly to LABEL=Backup, thinning down gradually to weekly after six months and keeping that for the rest of the year. Given that you're snapshotting three subvolumes, that's 1254 snapshots total, still well within the 1000-2000 total snapshots per filesystem target cap. During that year, if the data is worth it, you should have done an offsite or at least offline backup, we'll say quarterly. After that, keeping the local online backup around is merely for convenience, and with quarterly backups, after a year you have multiple copies and can simply delete the year-old snapshots, one a week, probably at the same time you thin down the six-month-old daily snapshots to weekly. Compare that just over 1200 snapshots to the 60K+ snapshots you may have now, knowing that scaling over 10K snapshots is an issue particularly on spinning rust, and you should be able to appreciate the difference it's likely to make. =:^) But at the same time, in practice it'll probably be much easier to actually retrieve something from a snapshot a few months old, because you won't have tens of thousands of effectively useless snapshots to sort thru as you will be regularly thinning them down! =:^) > ~> uname [-r] > 4.5.0-rc4-haswell > > ~> btrfs --version > btrfs-progs v4.4 You're staying current with your btrfs versions. Kudos on that! =:^) And on including btrfs fi show and btrfs fi df, as they were useful, tho I'm snipping them here. One more tip. Btrfs quotas are known to have scaling issues as well. If you're using them, they'll exacerbate the problem. And while I'm not sure about current 4.4 status, thru 4.3 at least, they were buggy and not reliable anyway. So the recommendation is to leave quotas off on btrfs, and use some other more mature filesystem where they're known to work reliably if you really need them. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman First of all, sorry for delay, for whatever reason was not subscribed to mailing list. You are right, RAID is on 2 SSDs and backup_hdd (LABEL=Backup) is separate really HDD. Example was simplified to give an overview to not dig too deep into details. I actually have correct backups rotation, so we are not talking about thousands of snapshots:) Here is tool I've created and using right now: https://github.com/nazar-pc/just-backup-btrfs I'm keeping all snapshots for last day, up to 90 for last month and up to 48 throughout the year. So as result there are: * 166 snapshots in /backup_hdd/root * 166 snapshots in /backup_hdd/home * 159 snapshots in /backup_hdd/web I'm not using quotas, there is nothing on this BTRFS partition besides mentioned snapshots. -- Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 smime.p7s Description: Кріптографічний підпис S/MIME
Major HDD performance degradation on btrfs receive
I have 2 SSD with BTRFS filesystem (RAID) on them and several subvolumes. Each 15 minutes I'm creating read-only snapshot of subvolumes /root, /home and /web inside /backup. After this I'm searching for last common subvolume on /backup_hdd, sending difference between latest common snapshot and simply latest snapshot to /backup_hdd. On top of all above there is snapshots rotation, so that /backup contains much less snapshots than /backup_hdd. I'm using this setup for last 7 months or so and this is luckily the longest period when I had no problems with BTRFS at all. However, last 2+ months btrfs receive command loads HDD so much that I can't even get list of directories in it. This happens even if diff between snapshots is really small. HDD contains 2 filesystems - mentioned BTRFS and ext4 for other files, so I can't even play mp3 file from ext4 filesystem while btrfs receive is running. Since I'm running everything each 15 minutes this is a real headache. My guess is that performance hit might be caused by filesystem fragmentation even though there is more than enough empty space. But I'm not sure how to properly check this and can't, obviously, run defragmentation on read-only subvolumes. I'll be thankful for anything that might help to identify and resolve this issue. ~> uname -a Linux nazar-pc 4.5.0-rc4-haswell #1 SMP Tue Feb 16 02:09:13 CET 2016 x86_64 x86_64 x86_64 GNU/Linux ~> btrfs --version btrfs-progs v4.4 ~> sudo btrfs fi show Label: none uuid: 5170aca4-061a-4c6c-ab00-bd7fc8ae6030 Total devices 2 FS bytes used 71.00GiB devid1 size 111.30GiB used 111.30GiB path /dev/sdb2 devid2 size 111.30GiB used 111.29GiB path /dev/sdc2 Label: 'Backup' uuid: 40b8240a-a0a2-4034-ae55-f8558c0343a8 Total devices 1 FS bytes used 252.54GiB devid1 size 800.00GiB used 266.08GiB path /dev/sda1 ~> sudo btrfs fi df / Data, RAID0: total=214.56GiB, used=69.10GiB System, RAID1: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00B Metadata, RAID1: total=4.00GiB, used=1.87GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B ~> sudo btrfs fi df /backup_hdd Data, single: total=245.01GiB, used=243.61GiB System, DUP: total=32.00MiB, used=48.00KiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=10.50GiB, used=8.93GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B Relevant mount options: UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/ btrfs compress=lzo,noatime,relatime,ssd,subvol=/root0 1 UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/home btrfs compress=lzo,noatime,relatime,ssd,subvol=/home 01 UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/backup btrfs compress=lzo,noatime,relatime,ssd,subvol=/backup 01 UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/web btrfs compress=lzo,noatime,relatime,ssd,subvol=/web 01 UUID=40b8240a-a0a2-4034-ae55-f8558c0343a8/backup_hdd btrfs compress=lzo,noatime,relatime,noexec 01 -- Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 smime.p7s Description: Кріптографічний підпис S/MIME
Re: Btrfs wiki account
Thanks Dave, I was finally approved, but with "Nazar Mokrynskyi2" username) Any chance to update username to Nazar Mokrynskyi (without "2" at the end)? I've already changed Real name. Tried to reply on admin's email, but it doesn't accept emails actually, so I have to ask here again. Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 On 29.08.15 11:24, David Sterba wrote: On Sat, Aug 29, 2015 at 03:21:53AM +0200, Nazar Mokrynskyi wrote: I wanted to add one more tool for incremental backups to wiki, but accidentally had typo in email at registration. Now, more than one month after I still can't register, though registration request should expire already. Does anyone have access to fix that? Can't find any contacts of person who supports wiki. Forwarded your request to mighty wiky admin. smime.p7s Description: Кріптографічний підпис S/MIME
Btrfs wiki account
I wanted to add one more tool for incremental backups to wiki, but accidentally had typo in email at registration. Now, more than one month after I still can't register, though registration request should expire already. Does anyone have access to fix that? Can't find any contacts of person who supports wiki. Accounts can be under names "Nazar Mokrynskyi" and "Nazar Mokrynskyi2" (yes, second trial). Sorry for a bit off-topic message. -- Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 smime.p7s Description: Кріптографічний підпис S/MIME
Btrfs wiki account
I wanted to add one more tool for incremental backups to wiki, but accidentally had typo in email at registration. Now, more than one month after I still can't register, though registration request should expire already. Does anyone have access to fix that? Can't find any contacts of person who supports wiki. Accounts can be under names "Nazar Mokrynskyi" and "Nazar Mokrynskyi2" (yes, second trial). Sorry for a bit off-topic message. -- Sincerely, Nazar Mokrynskyi github.com/nazar-pc Skype: nazar-pc Diaspora: naza...@diaspora.mokrynskyi.com Tox: A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249 smime.p7s Description: Кріптографічний підпис S/MIME