Re: corrupt leaf, bad key order on kernel 5.0

2019-04-05 Thread Nazar Mokrynskyi
05.04.19 22:32, Hugo Mills пише:
>> Yet another corruption of my root BTRFS filesystem happened today.
>> Didn't bother to run scrub, balance or check, just created disk image for 
>> future investigation and restored everything from backup.
>>
>> Here is what corruption looks like:
>> [  274.241339] BTRFS info (device dm-0): disk space caching is enabled
>> [  274.241344] BTRFS info (device dm-0): has skinny extents
>> [  274.283238] BTRFS info (device dm-0): enabling ssd optimizations
>> [  310.436672] BTRFS critical (device dm-0): corrupt leaf: root=268 
>> block=42044719104 slot=123, bad key order, prev (1240717 108 41447424) 
>> current (1240717 76 41451520)
>"Bad key order" is usually an indicator of faulty RAM -- a piece of
> metadata gets loaded into RAM for modification, a bit gets flipped in
> it (because the bit is stuck on one value), and then the csum is
> computed for the page (including the faulty bit), and written out to
> disk. In this case, it's not obvious, but I'd suggest that the second
> field of the key has been flipped, as 108 is 0x6c, and 76 is 0x4c --
> one bit away from each other.
>
>I recommend you check your hardware thoroughly before attempting to
> rebuild the FS.
>
>Hugo.

Hm... this might indeed be related to RAM being overclocked a bit too much. It 
worked fine for a long time, but apparently not 100% stable.
Rolled back overclock, thanks for suggestion!

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc



corrupt leaf, bad key order on kernel 5.0

2019-04-05 Thread Nazar Mokrynskyi
NOTE: I do not need help with recovery, I have fully automated snapshots, 
backups and restoration mechanisms, the only purpose of this email is to help 
developers find the reason of yet another filesystem corruption and hopefully 
fix it.

Yet another corruption of my root BTRFS filesystem happened today.
Didn't bother to run scrub, balance or check, just created disk image for 
future investigation and restored everything from backup.

Here is what corruption looks like:
[  274.241339] BTRFS info (device dm-0): disk space caching is enabled
[  274.241344] BTRFS info (device dm-0): has skinny extents
[  274.283238] BTRFS info (device dm-0): enabling ssd optimizations
[  310.436672] BTRFS critical (device dm-0): corrupt leaf: root=268 
block=42044719104 slot=123, bad key order, prev (1240717 108 41447424) current 
(1240717 76 41451520)
[  310.449304] BTRFS critical (device dm-0): corrupt leaf: root=268 
block=42044719104 slot=123, bad key order, prev (1240717 108 41447424) current 
(1240717 76 41451520)
[  310.449309] BTRFS: error (device dm-0) in btrfs_dropa_snapshot:9250: 
errno=-5 IO failure
[  310.449311] BTRFS info (device dm-0): forced readonly
[  311.266789] BTRFS info (device dm-0): delayed_refs has NO entry
[  311.277088] BTRFS error (device dm-0): cleaner transaction attach returned 
-30

My system just freezed when I was not looking at it and this is the state it is 
in now.
File system survived from March 8th til April 05, one of the fastest 
corruptions in my experience.

Looks like this happened during sending incremental snapshot to the other BTRFS 
filesystem, since last snapshot on that one was not read-only as it should have 
been otherwise.

I'm on Ubuntu 19.04 with Linux kernel 5.0.5 and btrfs-progs v4.20.2.

My filesystem is on top of LUKS on NVMe SSD (SM961), I have 3 snapshots created 
every 15 minutes from 3 subvolumes with rotation of old snapshots (can be from 
tens to hundreds of snapshots at any time).

Mount options: compress=lzo,noatime,ssd

I have full disk image with corrupted filesystem and will create Qcow2 
snapshots of it, so if you want me to run any experiments, including 
potentially destructive, including usage of custom patches to btrfs-progs to 
find out the reason of corruption, would be happy to help as much as I can.

P.S. I'm riding latest stable and rc kernels all the time and during last 6 
months I've got about as many corruptions of different BTRFS filesystems as 
during 3 years before that, really worrying if you ask me.

-- 
Sincerely, Nazar Mokrynskyi
github.com/nazar-pc



Another btrfs corruption (unable to find ref byte, kernel 5.0-rc4)

2019-02-06 Thread Nazar Mokrynskyi
NOTE: I don't need asistance with data recovery, everything was restored 
shortly from fully automated backups, I only hope this information is useful 
for developers in some way.

So my primary BTRFS filesystem corrupted itself again.

Software:
Kernel 5.0-rc4
btrfs-progs v4.20.1
Ubuntu 19.04 (development branch)

It is running on NVMe SSD on top of full-disk LUKS with BFQ scheduler.

Mounting options (multiple subvolumes like this):
compress=lzo,noatime,ssd,subvol=/root

Also I'm using automated snapshots a lot and there are a few directories with 
CoW disabled if that matters.

Filesystem was created with Ubuntu 18.10 Live USB (kernel 4.18.0-10-generic, 
btrfs-progs v4.16.1).

It corrupted itself and remounted in read-only state, so I had to hard reset it 
and from Ubuntu 18.10 Live USB run scrub:

> [   49.674792] Btrfs loaded, crc32c=crc32c-intel
> [   49.679962] BTRFS: device fsid 5170aca4-061a-4c6c-ab00-bd7fc8ae6030 devid 
> 1 transid 178701 /dev/dm-0
> [   52.199834] BTRFS info (device dm-0): disk space caching is enabled
> [   52.199839] BTRFS info (device dm-0): has skinny extents
> [   52.239346] BTRFS info (device dm-0): enabling ssd optimizations
> [   82.909833] WARNING: CPU: 14 PID: 6082 at fs/btrfs/extent-tree.c:6944 
> __btrfs_free_extent.isra.72+0x751/0xac0 [btrfs]
> [   82.909834] Modules linked in: btrfs zstd_compress libcrc32c xor raid6_pq 
> dm_crypt algif_skcipher af_alg intel_rapl x86_pkg_temp_thermal 
> intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi snd_hda_intel kvm 
> snd_usb_audio snd_hda_codec snd_usbmidi_lib snd_hda_core snd_hwdep snd_pcm 
> irqbypass crct10dif_pclmul snd_seq_midi snd_seq_midi_event crc32_pclmul 
> snd_rawmidi ghash_clmulni_intel uvcvideo pcbc snd_seq videobuf2_vmalloc 
> videobuf2_memops videobuf2_v4l2 videobuf2_common snd_seq_device snd_timer 
> aesni_intel videodev snd aes_x86_64 crypto_simd cdc_acm soundcore media 
> input_leds joydev cryptd mei_me glue_helper intel_wmi_thunderbolt 
> intel_cstate mei intel_rapl_perf acpi_pad mac_hid sch_fq_codel parport_pc 
> ppdev lp parport ip_tables x_tables autofs4 overlay nls_iso8859_1 dm_mirror 
> dm_region_hash dm_log
> [   82.909854]  hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 
> amdgpu nouveau chash gpu_sched ttm mxm_wmi drm_kms_helper syscopyarea 
> sysfillrect sysimgblt nvme fb_sys_fops igb e1000e atlantic drm ahci dca 
> i2c_algo_bit nvme_core libahci wmi video
> [   82.909864] CPU: 14 PID: 6082 Comm: btrfs-cleaner Not tainted 
> 4.18.0-10-generic #11-Ubuntu
> [   82.909864] Hardware name: To Be Filled By O.E.M. To Be Filled By 
> O.E.M./Z370 Professional Gaming i7, BIOS P3.40 11/08/2018
> [   82.909871] RIP: 0010:__btrfs_free_extent.isra.72+0x751/0xac0 [btrfs]
> [   82.909871] Code: ff 75 18 ff 75 10 e8 7e a6 ff ff c6 85 6c ff ff ff 00 41 
> 89 c5 58 5a 45 85 ed 0f 84 b9 f9 ff ff 41 83 fd fe 0f 85 6e fc ff ff <0f> 0b 
> 49 8b 3c 24 e8 44 c8 00 00 ff 75 18 4c 8b 4d 10 4d 89 f8 48
> [   82.909885] RSP: 0018:a71bc6833bf8 EFLAGS: 00010246
> [   82.909885] RAX: 000c81bc RBX: 000f3e55e000 RCX: 
> 
> [   82.909886] RDX: 0017b000 RSI:  RDI: 
> 89cfd82d4f50
> [   82.909886] RBP: a71bc6833ca0 R08: a71bc6833b14 R09: 
> 
> [   82.909887] R10: 0102 R11:  R12: 
> 89cfd8338af0
> [   82.909887] R13: fffe R14: 000e8e9f8000 R15: 
> 010b
> [   82.909888] FS:  () GS:89d09e58() 
> knlGS:
> [   82.909888] CS:  0010 DS:  ES:  CR0: 80050033
> [   82.909889] CR2: 564398e70760 CR3: 00076ce0a002 CR4: 
> 003606e0
> [   82.909889] DR0:  DR1:  DR2: 
> 
> [   82.909890] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [   82.909890] Call Trace:
> [   82.909898]  __btrfs_run_delayed_refs+0x20e/0x1010 [btrfs]
> [   82.909904]  ? add_pinned_bytes+0x67/0x70 [btrfs]
> [   82.909909]  ? btrfs_free_tree_block+0x167/0x2d0 [btrfs]
> [   82.909916]  btrfs_run_delayed_refs+0x80/0x190 [btrfs]
> [   82.909923]  btrfs_should_end_transaction+0x47/0x60 [btrfs]
> [   82.909928]  btrfs_drop_snapshot+0x3d1/0x800 [btrfs]
> [   82.909936]  btrfs_clean_one_deleted_snapshot+0xbb/0xf0 [btrfs]
> [   82.909942]  cleaner_kthread+0x136/0x160 [btrfs]
> [   82.909944]  kthread+0x120/0x140
> [   82.909950]  ? btree_submit_bio_start+0x20/0x20 [btrfs]
> [   82.909951]  ? kthread_bind+0x40/0x40
> [   82.909953]  ret_from_fork+0x35/0x40
> [   82.909953] ---[ end trace 32964933c87d1d27 ]---
> [   82.909955] BTRFS info (device dm-0): leaf 656310272 gen 178703 total ptrs 
> 117 free space 4483 owner 2
> [   82.909956]     item 0 key (65470308352 168 4096) itemoff 16233 itemsize 50
> [   82.909956]         extent refs 2 gen 170784 flags 1
> [   82.909957]         ref#0: shared data backref parent 62803951616 count 1
> [   82.909957]         ref#1: shared data backre

Re: Unrecoverable btrfs corruption (backref bytes do not match extent backref)

2019-01-04 Thread Nazar Mokrynskyi
05.01.19 03:18, Qu Wenruo пише:
> Please don't mount the fs RW, and copy your data out.

I have regular automated backups and wrote initial message from restored system 
already, lesson learned a long time ago.

Next time I'll be smarter and will make partition image prior to doing any 
operations on it, maybe `balance` did something to it.

It is quite unfortunate that information provided is not useful this time. If 
there any other ideas for tests to run - I'm willing to help.



Re: Unrecoverable btrfs corruption (backref bytes do not match extent backref)

2019-01-04 Thread Nazar Mokrynskyi
04.01.19 03:15, Chris Murphy пише:
> What do you get with 'btrfs check --mode=lowmem' ? This is a different
> implementation of check, and might reveal some additional information
> useful to developers. It is wickedly slow however.
root@nazarpc-Standard-PC-Q35-ICH9-2009:~# btrfs check --mode=lowmem /dev/vdb
warning, bad space info total_bytes 2155872256 used 2155876352
warning, bad space info total_bytes 3229614080 used 3229618176
warning, bad space info total_bytes 4303355904 used 430336
warning, bad space info total_bytes 5377097728 used 5377101824
warning, bad space info total_bytes 6450839552 used 6450843648
warning, bad space info total_bytes 7524581376 used 7524585472
warning, bad space info total_bytes 8598323200 used 8598327296
warning, bad space info total_bytes 9672065024 used 9672069120
warning, bad space info total_bytes 10745806848 used 10745810944
warning, bad space info total_bytes 11819548672 used 11819552768
warning, bad space info total_bytes 12893290496 used 12893294592
warning, bad space info total_bytes 13967032320 used 13967036416
warning, bad space info total_bytes 15040774144 used 15040778240
warning, bad space info total_bytes 16114515968 used 16114520064
warning, bad space info total_bytes 17188257792 used 17188261888
warning, bad space info total_bytes 18261999616 used 18262003712
warning, bad space info total_bytes 19335741440 used 19335745536
warning, bad space info total_bytes 20409483264 used 20409487360
warning, bad space info total_bytes 21483225088 used 21483229184
warning, bad space info total_bytes 22556966912 used 22556971008
warning, bad space info total_bytes 23630708736 used 23630712832
warning, bad space info total_bytes 24704450560 used 24704454656
warning, bad space info total_bytes 25778192384 used 25778196480
warning, bad space info total_bytes 26851934208 used 26851938304
warning, bad space info total_bytes 27925676032 used 27925680128
warning, bad space info total_bytes 28999417856 used 28999421952
warning, bad space info total_bytes 30073159680 used 30073163776
warning, bad space info total_bytes 31146901504 used 31146905600
warning, bad space info total_bytes 32220643328 used 32220647424
Checking filesystem on /dev/vdb
UUID: 5170aca4-061a-4c6c-ab00-bd7fc8ae6030
checking extents
checking free space cache
checking fs roots
ERROR: root 304 INODE REF[274921, 256895] name 
25da95e3e893bb2fa69a2f0acd77bfe725626a1e filetype 1 missing
ERROR: root 304 EXTENT_DATA[910393 4096] gap exists, expected: 
EXTENT_DATA[910393 25]
ERROR: root 304 EXTENT_DATA[910393 8192] gap exists, expected: 
EXTENT_DATA[910393 4121]
ERROR: root 304 EXTENT_DATA[910393 16384] gap exists, expected: 
EXTENT_DATA[910393 12313]
ERROR: root 304 EXTENT_DATA[910400 4096] gap exists, expected: 
EXTENT_DATA[910400 25]
ERROR: root 304 EXTENT_DATA[910400 8192] gap exists, expected: 
EXTENT_DATA[910400 4121]
ERROR: root 304 EXTENT_DATA[910400 16384] gap exists, expected: 
EXTENT_DATA[910400 12313]
ERROR: root 304 EXTENT_DATA[910401 4096] gap exists, expected: 
EXTENT_DATA[910401 25]
ERROR: root 304 EXTENT_DATA[910401 8192] gap exists, expected: 
EXTENT_DATA[910401 4121]
ERROR: root 304 EXTENT_DATA[910401 16384] gap exists, expected: 
EXTENT_DATA[910401 12313]
ERROR: root 101721 INODE REF[274921, 256895] name 
25da95e3e893bb2fa69a2f0acd77bfe725626a1e filetype 1 missing
ERROR: root 101721 EXTENT_DATA[910393 4096] gap exists, expected: 
EXTENT_DATA[910393 25]
ERROR: root 101721 EXTENT_DATA[910393 8192] gap exists, expected: 
EXTENT_DATA[910393 4121]
ERROR: root 101721 EXTENT_DATA[910393 16384] gap exists, expected: 
EXTENT_DATA[910393 12313]
ERROR: root 101721 EXTENT_DATA[910400 4096] gap exists, expected: 
EXTENT_DATA[910400 25]
ERROR: root 101721 EXTENT_DATA[910400 8192] gap exists, expected: 
EXTENT_DATA[910400 4121]
ERROR: root 101721 EXTENT_DATA[910400 16384] gap exists, expected: 
EXTENT_DATA[910400 12313]
ERROR: root 101721 EXTENT_DATA[910401 4096] gap exists, expected: 
EXTENT_DATA[910401 25]
ERROR: root 101721 EXTENT_DATA[910401 8192] gap exists, expected: 
EXTENT_DATA[910401 4121]
ERROR: root 101721 EXTENT_DATA[910401 16384] gap exists, expected: 
EXTENT_DATA[910401 12313]
ERROR: errors found in fs roots
found 39410126848 bytes used, error(s) found
total csum bytes: 35990412
total tree bytes: 196955471872
total fs tree bytes: 196809785344
total extent tree bytes: 96534528
btree space waste bytes: 33486070155
file data blocks allocated: 1705720172544
 referenced 2238568390656



Re: Unrecoverable btrfs corruption (backref bytes do not match extent backref)

2019-01-04 Thread Nazar Mokrynskyi
04.01.19 03:32, Qu Wenruo пише:
> Please provide the dump of the following command:
>
> # btrfs ins dump-tree -t extent | grep 3114475520 -C 20
root@nazarpc-Standard-PC-Q35-ICH9-2009:~# btrfs ins dump-tree -t extent 
/dev/vdc | grep 3114475520 -C 20
        refs 1 gen 1712966 flags DATA
        shared data backref parent 311408951296 count 1
    item 146 key (3114242048 EXTENT_ITEM 36864) itemoff 10402 itemsize 37
        refs 1 gen 1712966 flags DATA
        shared data backref parent 311408951296 count 1
    item 147 key (3114278912 EXTENT_ITEM 36864) itemoff 10365 itemsize 37
        refs 1 gen 1712966 flags DATA
        shared data backref parent 311408951296 count 1
    item 148 key (3114315776 EXTENT_ITEM 36864) itemoff 10328 itemsize 37
        refs 1 gen 1712966 flags DATA
        shared data backref parent 311408951296 count 1
    item 149 key (3114352640 EXTENT_ITEM 45056) itemoff 10291 itemsize 37
        refs 1 gen 1712966 flags DATA
        shared data backref parent 311408951296 count 1
    item 150 key (3114397696 EXTENT_ITEM 40960) itemoff 10254 itemsize 37
        refs 1 gen 1712966 flags DATA
        shared data backref parent 311408951296 count 1
    item 151 key (3114438656 EXTENT_ITEM 36864) itemoff 10217 itemsize 37
        refs 1 gen 1712966 flags DATA
        shared data backref parent 311408951296 count 1
    item 152 key (3114475520 EXTENT_ITEM 4096) itemoff 10193 itemsize 24
        refs 2 gen 1701147 flags DATA
    item 153 key (3114475520 EXTENT_ITEM 36864) itemoff 10169 itemsize 24
        refs 1 gen 1712966 flags DATA
    item 154 key (3114475520 SHARED_DATA_REF 311408951296) itemoff 10165 
itemsize 4
        shared data backref count 1
    item 155 key (3114475520 SHARED_DATA_REF 342561947648) itemoff 10161 
itemsize 4
        shared data backref count 1
    item 156 key (3114475520 SHARED_DATA_REF 348547874816) itemoff 10157 
itemsize 4
        shared data backref count 1
    item 157 key (3114508288 EXTENT_ITEM 4096) itemoff 10120 itemsize 37
        refs 1 gen 1713581 flags DATA
        shared data backref parent 311693983744 count 1
    item 158 key (3114512384 EXTENT_ITEM 45056) itemoff 10083 itemsize 37
        refs 1 gen 1712966 flags DATA
        shared data backref parent 311408951296 count 1
    item 159 key (3114557440 EXTENT_ITEM 110592) itemoff 10046 itemsize 37
        refs 1 gen 1712966 flags DATA
        shared data backref parent 311408951296 count 1
    item 160 key (3114668032 EXTENT_ITEM 102400) itemoff 10009 itemsize 37
        refs 1 gen 1712966 flags DATA
        shared data backref parent 311408951296 count 1
    item 161 key (3114770432 EXTENT_ITEM 36864) itemoff 9972 itemsize 37
        refs 1 gen 1712966 flags DATA
        shared data backref parent 311408951296 count 1
    item 162 key (3114807296 EXTENT_ITEM 40960) itemoff 9935 itemsize 37
        refs 1 gen 1712966 flags DATA
        shared data backref parent 311408951296 count 1
    item 163 key (3114848256 EXTENT_ITEM 36864) itemoff 9898 itemsize 37
--
    item 194 key (311447486464 METADATA_ITEM 0) itemoff 13136 itemsize 33
        refs 1 gen 1713594 flags TREE_BLOCK|FULL_BACKREF
        tree block skinny level 0
        shared block backref parent 348437413888
    item 195 key (311447502848 METADATA_ITEM 0) itemoff 13094 itemsize 42
        refs 2 gen 1713200 flags TREE_BLOCK|FULL_BACKREF
        tree block skinny level 0
        shared block backref parent 924172288
        shared block backref parent 871956480
    item 196 key (311447519232 METADATA_ITEM 0) itemoff 13043 itemsize 51
        refs 3 gen 1713594 flags TREE_BLOCK|FULL_BACKREF
        tree block skinny level 0
        shared block backref parent 348542287872
        shared block backref parent 348542271488
        shared block backref parent 348542238720
    item 197 key (311447535616 METADATA_ITEM 0) itemoff 13001 itemsize 42
        refs 2 gen 1713311 flags TREE_BLOCK|FULL_BACKREF
        tree block skinny level 0
        shared block backref parent 311441817600
        shared block backref parent 773259264
    item 198 key (311447552000 METADATA_ITEM 0) itemoff 12959 itemsize 42
        refs 2 gen 1713594 flags TREE_BLOCK|FULL_BACKREF
        tree block skinny level 0
        shared block backref parent 1068089344
        shared block backref parent 1067450368
    item 199 key (311447568384 METADATA_ITEM 0) itemoff 12917 itemsize 42
        refs 2 gen 1713499 flags TREE_BLOCK|FULL_BACKREF
        tree block skinny level 0
        shared block backref parent 310977855488
        shared block backref parent 941277184
    item 200 key (311447584768 METADATA_ITEM 0) itemoff 12884 itemsize 33
        refs 1 gen 1713594 flags TREE_BLOCK|FULL_BACKREF
        tree block skinny level 0
        shared block backref parent 310736814080
    item 201 key (311447601152 METADATA_ITEM 0) itemoff 12842 itemsize 42
        refs 2 gen 1713499 flags TREE_BLOCK|FULL_BACKREF
        tree block skinny level 0
        shared block backref parent 3

Unrecoverable btrfs corruption (backref bytes do not match extent backref)

2019-01-03 Thread Nazar Mokrynskyi

If this seems anything important and you want me to run some commands to check 
what happened exactly, I can start VMs with this partition image connected and 
do whatever is needed. I can't send image anywhere though, since it contains 
sensitive information.

NOTE: I don't need help with partition or data recovery, I'm used to these 
kinds of crashes and have backups, so no data were lost.

P.S. I really wish BTRFS can stop accidentally corrupting itself one day.

-- 
Sincerely, Nazar Mokrynskyi
github.com/nazar-pc



Re: Linux 4.14 breaks btrfs filesystems (3 times already)

2017-12-24 Thread Nazar Mokrynskyi
24.12.17 12:07, Nikolay Borisov пише:
>
> On 24.12.2017 11:37, Nazar Mokrynskyi wrote:
>> Hi folks,
>>
>> I know this is a bold statement, but this is also exactly what I'm 
>> experiencing.
>>
>> 2 filesystems that worked perfectly since July 2015 and one freshly created 
>> crashed during last 5 weeks since Ubuntu 18.04 switched from 4.13 to 4.14 
>> (my current kernel is 4.14.0-11-generic).
>>
>> I wrote about the first case (backup partition) 5 weeks ago (title was 
>> "Unrecoverable scrub errors"), but eventually recreated mentioned corrupted 
>> filesystem, scrubbed and checked other filesystems - everything was good, no 
>> errors and no warnings.
>>
>> 4 days ago I noticed that random files on my primary filesystem become 
>> corrupted in a very interesting way. Sometimes completely, sometimes only 
>> partially (like I was playing a game and it crashed at certain moment, when 
>> particular piece of data file was read). I've recreated primary filesystem 
>> too.
>>
>> This morning primary filesystem crashed again even harder that before.
>>
>> Scrub on latest crashed filesystem:
>>
>> [ 1074.544160] [ cut here ]
>> [ 1074.544162] kernel BUG at 
>> /build/linux-XO_uEE/linux-4.13.0/fs/btrfs/ctree.h:1802!
>> [ 1074.544166] invalid opcode:  [#1] SMP
>> [ 1074.544174] Modules linked in: btrfs xor raid6_pq dm_crypt algif_skcipher 
>> af_alg intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi 
>> coretemp kvm_intel kvm snd_usb_audio snd_hda_intel snd_usbmidi_lib irqbypass 
>> snd_hda_codec crct10dif_pclmul crc32_pclmul snd_hda_core ghash_clmulni_intel 
>> snd_hwdep pcbc snd_seq_midi snd_seq_midi_event aesni_intel snd_seq 
>> snd_rawmidi snd_pcm snd_seq_device snd_timer snd cdc_acm soundcore joydev 
>> input_leds aes_x86_64 crypto_simd glue_helper serio_raw cryptd intel_cstate 
>> intel_rapl_perf lpc_ich mei_me mei shpchp mac_hid parport_pc ppdev lp 
>> parport ip_tables x_tables autofs4 overlay nls_iso8859_1 dm_mirror 
>> dm_region_hash dm_log hid_generic usbhid hid uas usb_storage nouveau mxm_wmi 
>> video ttm drm_kms_helper igb syscopyarea sysfillrect sysimgblt dca 
>> fb_sys_fops
>> [ 1074.544232]  ahci i2c_algo_bit drm ptp libahci nvme pps_core nvme_core wmi
>> [ 1074.544240] CPU: 8 PID: 5459 Comm: kworker/u24:0 Not tainted 
>> 4.13.0-16-generic #19-Ubuntu
>> [ 1074.544244] Hardware name: MSI MS-7885/X99A SLI Krait Edition (MS-7885), 
>> BIOS N.92 01/10/2017
>> [ 1074.544271] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
>> [ 1074.544276] task: 8d7eaecf5d00 task.stack: 9ab182ecc000
>> [ 1074.544292] RIP: 0010:btrfs_extent_inline_ref_size.part.38+0x4/0x6 [btrfs]
>> [ 1074.544296] RSP: 0018:9ab182ecfa98 EFLAGS: 00010297
>> [ 1074.544300] RAX:  RBX: 00b6 RCX: 
>> 9ab182ecfa50
>> [ 1074.544303] RDX: 0001 RSI: 36a6 RDI: 
>> 
>> [ 1074.544307] RBP: 9ab182ecfa98 R08: 36a7 R09: 
>> 9ab182ecfa60
>> [ 1074.544310] R10:  R11: 0003 R12: 
>> 8d7e860c6348
>> [ 1074.544313] R13:  R14: 36a6 R15: 
>> 36e5
>> [ 1074.544317] FS:  () GS:8d7eef40() 
>> knlGS:
>> [ 1074.544321] CS:  0010 DS:  ES:  CR0: 80050033
>> [ 1074.544324] CR2: 7faeb402 CR3: 0003bb609000 CR4: 
>> 003406e0
>> [ 1074.544328] DR0:  DR1:  DR2: 
>> 
>> [ 1074.544332] DR3:  DR6: fffe0ff0 DR7: 
>> 0400
>> [ 1074.544335] Call Trace:
>> [ 1074.544348]  lookup_inline_extent_backref+0x5a3/0x5b0 [btrfs]
>> [ 1074.544360]  ? setup_inline_extent_backref+0x16e/0x260 [btrfs]
>> [ 1074.544371]  insert_inline_extent_backref+0x50/0xe0 [btrfs]
>> [ 1074.544382]  __btrfs_inc_extent_ref.isra.51+0x7e/0x260 [btrfs]
>> [ 1074.544396]  ? btrfs_merge_delayed_refs+0x62/0x550 [btrfs]
>> [ 1074.544408]  __btrfs_run_delayed_refs+0xc52/0x1380 [btrfs]
>> [ 1074.544420]  btrfs_run_delayed_refs+0x6b/0x250 [btrfs]
>> [ 1074.544431]  delayed_ref_async_start+0x98/0xb0 [btrfs]
>> [ 1074.55]  btrfs_worker_helper+0x7a/0x2e0 [btrfs]
>> [ 1074.544458]  btrfs_extent_refs_helper+0xe/0x10 [btrfs]
>> [ 1074.544464]  process_one_work+0x1e7/0x410
>> [ 1074.544467]  worker_thread+0x4a/0x410
>> [ 1074.544471]  kthread+0x125/0x140
>> [ 1074.544474]  ? process_one_work+0x410/0x

Linux 4.14 breaks btrfs filesystems (3 times already)

2017-12-24 Thread Nazar Mokrynskyi
+0x1202f)[0x55748d36202f]
btrfs check(+0x4d8cf)[0x55748d39d8cf]
btrfs check(+0x4f1c3)[0x55748d39f1c3]
btrfs check(+0x52a1c)[0x55748d3a2a1c]
btrfs check(+0x53265)[0x55748d3a3265]
btrfs check(+0x53d3d)[0x55748d3a3d3d]
btrfs check(cmd_check+0x1309)[0x55748d3a6fbc]
btrfs check(main+0x142)[0x55748d3686e9]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fccb1f971c1]
btrfs check(_start+0x2a)[0x55748d36872a]

Simple ls in the root of the filesystem right after fresh boot and mount 
resulted in following:

[  106.573579] [ cut here ]
[  106.573582] kernel BUG at 
/build/linux-XO_uEE/linux-4.13.0/fs/btrfs/ctree.h:1802!
[  106.573589] invalid opcode:  [#1] SMP
[  106.573602] Modules linked in: btrfs xor raid6_pq dm_crypt algif_skcipher 
af_alg intel_rapl snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp 
coretemp kvm_intel kvm snd_hda_intel snd_usb_audio snd_hda_codec snd_hda_core 
snd_usbmidi_lib snd_hwdep snd_pcm irqbypass crct10dif_pclmul crc32_pclmul 
snd_seq_midi ghash_clmulni_intel snd_seq_midi_event pcbc aesni_intel 
snd_rawmidi snd_seq snd_seq_device cdc_acm snd_timer aes_x86_64 joydev 
input_leds snd crypto_simd glue_helper soundcore cryptd intel_cstate serio_raw 
intel_rapl_perf lpc_ich mei_me mei shpchp mac_hid parport_pc ppdev lp parport 
ip_tables x_tables autofs4 overlay nls_iso8859_1 dm_mirror dm_region_hash 
dm_log hid_generic usbhid hid uas usb_storage nouveau mxm_wmi video igb ttm 
drm_kms_helper syscopyarea sysfillrect dca sysimgblt fb_sys_fops
[  106.573704]  i2c_algo_bit ahci ptp libahci drm pps_core nvme nvme_core wmi
[  106.573720] CPU: 6 PID: 245 Comm: kworker/u24:4 Not tainted 
4.13.0-16-generic #19-Ubuntu
[  106.573727] Hardware name: MSI MS-7885/X99A SLI Krait Edition (MS-7885), 
BIOS N.92 01/10/2017
[  106.573773] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[  106.573785] task: 92b56227dd00 task.stack: a2fb8237c000
[  106.573817] RIP: 0010:btrfs_extent_inline_ref_size.part.38+0x4/0x6 [btrfs]
[  106.573825] RSP: 0018:a2fb8237fa98 EFLAGS: 00010297
[  106.573831] RAX:  RBX: 00b6 RCX: a2fb8237fa50
[  106.573838] RDX: 0001 RSI: 36a6 RDI: 
[  106.573845] RBP: a2fb8237fa98 R08: 36a7 R09: a2fb8237fa60
[  106.573852] R10:  R11: 0003 R12: 92b52ac96460
[  106.573858] R13:  R14: 36a6 R15: 36e5
[  106.573866] FS:  () GS:92b56f38() 
knlGS:
[  106.573873] CS:  0010 DS:  ES:  CR0: 80050033
[  106.573879] CR2: 7f67cc017028 CR3: 000275009000 CR4: 003406e0
[  106.573886] DR0:  DR1:  DR2: 
[  106.573893] DR3:  DR6: fffe0ff0 DR7: 0400
[  106.573899] Call Trace:
[  106.573923]  lookup_inline_extent_backref+0x5a3/0x5b0 [btrfs]
[  106.573946]  ? setup_inline_extent_backref+0x16e/0x260 [btrfs]
[  106.573968]  insert_inline_extent_backref+0x50/0xe0 [btrfs]
[  106.573990]  __btrfs_inc_extent_ref.isra.51+0x7e/0x260 [btrfs]
[  106.574019]  ? btrfs_merge_delayed_refs+0x62/0x550 [btrfs]
[  106.574042]  __btrfs_run_delayed_refs+0xc52/0x1380 [btrfs]
[  106.574052]  ? __slab_free+0x14c/0x2d0
[  106.574075]  btrfs_run_delayed_refs+0x6b/0x250 [btrfs]
[  106.574097]  delayed_ref_async_start+0x98/0xb0 [btrfs]
[  106.574126]  btrfs_worker_helper+0x7a/0x2e0 [btrfs]
[  106.574151]  btrfs_extent_refs_helper+0xe/0x10 [btrfs]
[  106.574160]  process_one_work+0x1e7/0x410
[  106.574167]  worker_thread+0x4a/0x410
[  106.574174]  kthread+0x125/0x140
[  106.574181]  ? process_one_work+0x410/0x410
[  106.574187]  ? kthread_create_on_node+0x70/0x70
[  106.574195]  ret_from_fork+0x25/0x30
[  106.574200] Code: 89 d1 4c 89 da e8 26 ae f4 ff 58 48 8b 45 c0 65 48 33 04 
25 28 00 00 00 74 05 e8 81 a8 80 c1 c9 c3 55 48 89 e5 0f 0b 55 48 89 e5 <0f> 0b 
55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 0f 1f 44 00 00 55 31
[  106.574276] RIP: btrfs_extent_inline_ref_size.part.38+0x4/0x6 [btrfs] RSP: 
a2fb8237fa98
[  106.578666] ---[ end trace bd9d2e91fa0ddda7 ]---

After this kernel was also corrupted and not capable of running the system 
anymore, so I had to hard reset the system after collecting each piece above.

Thankfully I'm doing backups each 15 minutes (after initial btrfs experience) 
and backup partition is fine (I did scrub and btrfsck on it), so I've quickly 
restored everything, but this is not funny anymore.

Here are mount options for my primary filesystem (SSD > LUKS > BTRFS) and 
backup filesystem (HDD > LUKS > GPT > BTRFS):

compress=lzo,noatime,ssd,subvol=/root
compress=lzo,noatime,noexec,noauto

Have anyone noticed anything similar (I'm not subscribed to the mailing list)?

-- 
Sincerely, Nazar Mokrynskyi
github.com/nazar-pc

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs&qu

Re: Unrecoverable scrub errors

2017-11-19 Thread Nazar Mokrynskyi
This particular partition was initially created in July 2015. I've 
added/removed drives a few times when migrating from older to newer hardware, 
but never used RAID0 or any other RAID level beyond that.

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc

19.11.17 22:39, Roy Sigurd Karlsbakk пише:
> I guess not using RAID-0 would be a good start…
>
> Vennlig hilsen
>
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 98013356
> http://blogg.karlsbakk.net/
> GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
> --
> Hið góða skaltu í stein höggva, hið illa í snjó rita.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unrecoverable scrub errors

2017-11-19 Thread Nazar Mokrynskyi
Looks like it is not going to resolve nicely.

After removing that problematic snapshot filesystem quickly becomes readonly 
like so:

> [23552.839055] BTRFS error (device dm-2): cleaner transaction attach returned 
> -30
> [23577.374390] BTRFS info (device dm-2): use lzo compression
> [23577.374391] BTRFS info (device dm-2): disk space caching is enabled
> [23577.374392] BTRFS info (device dm-2): has skinny extents
> [23577.506214] BTRFS info (device dm-2): bdev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, 
> flush 0, corrupt 24, gen 0
> [23795.026390] BTRFS error (device dm-2): bad tree block start 0 470069510144
> [23795.148193] BTRFS error (device dm-2): bad tree block start 56 470069542912
> [23795.148424] BTRFS warning (device dm-2): dm-2 checksum verify failed on 
> 470069460992 wanted 54C49539 found FD171FBB level 0
> [23795.148526] BTRFS error (device dm-2): bad tree block start 0 470069493760
> [23795.150461] BTRFS error (device dm-2): bad tree block start 1459617832 
> 470069477376
> [23795.639781] BTRFS error (device dm-2): bad tree block start 0 470069510144
> [23795.655487] BTRFS error (device dm-2): bad tree block start 0 470069510144
> [23795.655496] BTRFS: error (device dm-2) in btrfs_drop_snapshot:9244: 
> errno=-5 IO failure
> [23795.655498] BTRFS info (device dm-2): forced readonly
Check and repaid doesn't help either:

> nazar-pc@nazar-pc ~> sudo btrfs check -p 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> Checking filesystem on 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5
> Extent back ref already exists for 797694840832 parent 330760175616 root 0 
> owner 0 offset 0 num_refs 1
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> Ignoring transid failure
> leaf parent key incorrect 470072098816
> bad block 470072098816
>
> ERROR: errors found in extent allocation tree or chunk allocation
> There is no free space entry for 797694844928-797694808064
> There is no free space entry for 797694844928-797819535360
> cache appears valid but isn't 796745793536
> There is no free space entry for 814739984384-814739988480
> There is no free space entry for 814739984384-814999404544
> cache appears valid but isn't 813925662720
> block group 894456299520 has wrong amount of free space
> failed to load free space cache for block group 894456299520
> block group 922910457856 has wrong amount of free space
> failed to load free space cache for block group 922910457856
>
> ERROR: errors found in free space cache
> found 963515335717 bytes used, error(s) found
> total csum bytes: 921699896
> total tree bytes: 20361920512
> total fs tree bytes: 17621073920
> total extent tree bytes: 1629323264
> btree space waste bytes: 3812167723
> file data blocks allocated: 21167059447808
>  referenced 2283091746816
>
> nazar-pc@nazar-pc ~> sudo btrfs check --repair -p 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> enabling repair mode
> Checking filesystem on 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> UUID: 82cfcb0f-0b80-4764-bed6-f529f2030ac5
> Extent back ref already exists for 797694840832 parent 330760175616 root 0 
> owner 0 offset 0 num_refs 1
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> parent transid verify failed on 470072098816 wanted 1431 found 307965
> Ignoring transid failure
> leaf parent key incorrect 470072098816
> bad block 470072098816
>
> ERROR: errors found in extent allocation tree or chunk allocation
> Fixed 0 roots.
> There is no free space entry for 797694844928-797694808064
> There is no free space entry for 797694844928-797819535360
> cache appears valid but isn't 796745793536
> There is no free space entry for 814739984384-814739988480
> There is no free space entry for 814739984384-814999404544
> cache appears valid but isn't 813925662720
> block group 894456299520 has wrong amount of free space
> failed to load free space cache for block group 894456299520
> block group 922910457856 has wrong amount of free space
> failed to load free space cache for block group 922910457856
>
> ERROR: errors found in free space cache
> found 963515335717 bytes used, error(s) found
> total csum bytes: 921699896
> total tree bytes: 20361920512
> tot

Re: Unrecoverable scrub errors

2017-11-18 Thread Nazar Mokrynskyi
19.11.17 07:23, Chris Murphy пише:
> On Sat, Nov 18, 2017 at 10:13 PM, Nazar Mokrynskyi  
> wrote:
>
>> That was eventually useful:
>>
>> * found some familiar file names (mangled eCryptfs file names from times 
>> when I used it for home directory) and decided to search for it in old 
>> snapshots of home directory (about 1/3 of snapshots on that partition)
>> * file name was present in snapshots back to July of 2015, but during search 
>> through snapshot from 2016-10-26_18:47:04 I've got I/O error reported by 
>> find command at one directory
>> * tried to open directory in file manager - same error, fails to open
>> * after removing this lets call it "broken" snapshot started new scrub, 
>> hopefully it'll finish fine
>>
>> If it is not actually related to recent memory issues I'd be positively 
>> surprised. Not sure what happened towards the end of October 2016 though, 
>> especially that backups were on different physical device back then.
> Wrong csum computation during the transfer? Did you use btrfs send receive?

Yes, I've used send/receive to copy snapshots from primary SSD to backup HDD.

Not sure when wrong csum computation happened, since SSD contains only most 
recent snapshots and only HDD contains older snapshots. Even if error happened 
on SSD, those older snapshots are gone a long time ago and there is no way to 
check this.

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unrecoverable scrub errors

2017-11-18 Thread Nazar Mokrynskyi
19.11.17 06:33, Chris Murphy пише:
> On Sat, Nov 18, 2017 at 8:45 PM, Nazar Mokrynskyi  
> wrote:
>> 19.11.17 05:19, Chris Murphy пише:
>>> On Sat, Nov 18, 2017 at 1:15 AM, Nazar Mokrynskyi  
>>> wrote:
>>>> I can assure you that drive (it is HDD) is perfectly functional with 0 
>>>> SMART errors or warnings and doesn't have any problems. dmesg is clean in 
>>>> that regard too, HDD itself can be excluded from potential causes.
>>>>
>>>> There were however some memory-related issues on my machine a few months 
>>>> ago, so there is a chance that data might have being written incorrectly 
>>>> to the drive back then (I didn't run scrub on backup drive for a long 
>>>> time).
>>>>
>>>> How can I identify to which files these metadata belong to replace or just 
>>>> remove them (files)?
>>> You might look through the archives about bad ram and btrfs check
>>> --repair and include Hugo Mills in the search, I'm pretty sure there
>>> is code in repair that can fix certain kinds of memory induced
>>> corruption in metadata. But I have no idea if this is that type or if
>>> repair can make things worse in this case. So I'd say you get
>>> everything off this file system that you want, and then go ahead and
>>> try --repair and see what happens.
>> In this case I'm not sure if data were written incorrectly or checksum or 
>> both. So I'd like to first identify the files affected, check them manually 
>> and then decide what to do with it. Especially there not many errors yet.
>>
>>> One alternative is to just leave it alone. If you're not hitting these
>>> leaves in day to day operation, they won't hurt anything.
>> It was working for some time, but I have suspicion that occasionally it 
>> causes spikes of disk activity because of this errors (which is why I run 
>> scrub initially).
>>> Another alternative is to umount, and use btrfs-debug-tree -b  on one
>>> of the leaf/node addresses and see what you get (probably an error),
>>> but it might still also show the node content so we have some idea
>>> what's affected by the error. If it flat out refuses to show the node,
>>> might be a feature request to get a flag that forces display of the
>>> node such as it is...
>> Here is what I've got:
>>
>>> nazar-pc@nazar-pc ~> sudo btrfs-debug-tree -b 470069460992 
>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
>>> btrfs-progs v4.13.3
>>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
>>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
>>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
>>> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
>>> Csum didn't match
>>> ERROR: failed to read 470069460992
>> Looks like I indeed need a --force here.
>>
> Huh, seems overdue. But what do I know?
>
> You can use btrfs-map-logical -l to get a physical address for this
> leaf, and then plug that into dd
>
> # dd if=/dev/ skip= bs=1 count=16384 2>/dev/null | hexdump -C
>
> Gotcha of course is this is not translated into the more plain
> language output by btrfs-debug-tree. And you're in the weeds with the
> on disk format documentation. But maybe you'll see filenames on the
> right hand side of the hexdump output and maybe that's enough... Or
> maybe it's worth computing a csum on that leaf to check against the
> csum for that leaf which is found in the first field of the leaf. I'd
> expect the csum itself is what's wrong, because if you get memory
> corruption in creating the node, the resulting csum will be *correct*
> for that malformed node and there'd be no csum error, you'd just see
> some other crazy faceplant.

That was eventually useful:

* found some familiar file names (mangled eCryptfs file names from times when I 
used it for home directory) and decided to search for it in old snapshots of 
home directory (about 1/3 of snapshots on that partition)
* file name was present in snapshots back to July of 2015, but during search 
through snapshot from 2016-10-26_18:47:04 I've got I/O error reported by find 
command at one directory
* tried to open directory in file manager - same error, fails to open
* after removing this lets call it "broken" snapshot started new scrub, 
hopefully it'll finish fine

If it is not actually related to recent memory issues I'd be positively 
surprised. Not sure what happened towards the end of October 2016 though, 
especially that backups were on different physical device back then.

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unrecoverable scrub errors

2017-11-18 Thread Nazar Mokrynskyi
19.11.17 05:19, Chris Murphy пише:
> On Sat, Nov 18, 2017 at 1:15 AM, Nazar Mokrynskyi  
> wrote:
>> I can assure you that drive (it is HDD) is perfectly functional with 0 SMART 
>> errors or warnings and doesn't have any problems. dmesg is clean in that 
>> regard too, HDD itself can be excluded from potential causes.
>>
>> There were however some memory-related issues on my machine a few months 
>> ago, so there is a chance that data might have being written incorrectly to 
>> the drive back then (I didn't run scrub on backup drive for a long time).
>>
>> How can I identify to which files these metadata belong to replace or just 
>> remove them (files)?
> You might look through the archives about bad ram and btrfs check
> --repair and include Hugo Mills in the search, I'm pretty sure there
> is code in repair that can fix certain kinds of memory induced
> corruption in metadata. But I have no idea if this is that type or if
> repair can make things worse in this case. So I'd say you get
> everything off this file system that you want, and then go ahead and
> try --repair and see what happens.

In this case I'm not sure if data were written incorrectly or checksum or both. 
So I'd like to first identify the files affected, check them manually and then 
decide what to do with it. Especially there not many errors yet.

> One alternative is to just leave it alone. If you're not hitting these
> leaves in day to day operation, they won't hurt anything.
It was working for some time, but I have suspicion that occasionally it causes 
spikes of disk activity because of this errors (which is why I run scrub 
initially).
> Another alternative is to umount, and use btrfs-debug-tree -b  on one
> of the leaf/node addresses and see what you get (probably an error),
> but it might still also show the node content so we have some idea
> what's affected by the error. If it flat out refuses to show the node,
> might be a feature request to get a flag that forces display of the
> node such as it is...

Here is what I've got:

> nazar-pc@nazar-pc ~> sudo btrfs-debug-tree -b 470069460992 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> btrfs-progs v4.13.3
> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
> checksum verify failed on 470069460992 found FD171FBB wanted 54C49539
> Csum didn't match
> ERROR: failed to read 470069460992
Looks like I indeed need a --force here.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unrecoverable scrub errors

2017-11-18 Thread Nazar Mokrynskyi
I can assure you that drive (it is HDD) is perfectly functional with 0 SMART 
errors or warnings and doesn't have any problems. dmesg is clean in that regard 
too, HDD itself can be excluded from potential causes.

There were however some memory-related issues on my machine a few months ago, 
so there is a chance that data might have being written incorrectly to the 
drive back then (I didn't run scrub on backup drive for a long time).

How can I identify to which files these metadata belong to replace or just 
remove them (files)?

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc

18.11.17 05:33, Adam Borowski пише:
> On Fri, Nov 17, 2017 at 08:19:11PM -0700, Chris Murphy wrote:
>> On Fri, Nov 17, 2017 at 8:41 AM, Nazar Mokrynskyi  
>> wrote:
>>
>>>> [551049.038718] BTRFS warning (device dm-2): checksum error at logical 
>>>> 470069460992 on dev 
>>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
>>>> 942238048: metadata leaf (level 0) in tree 985
>>>> [551049.038720] BTRFS warning (device dm-2): checksum error at logical 
>>>> 470069460992 on dev 
>>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
>>>> 942238048: metadata leaf (level 0) in tree 985
>>>> [551049.038723] BTRFS error (device dm-2): bdev 
>>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 
>>>> 0, flush 0, corrupt 1, gen 0
>>>> [551049.039634] BTRFS warning (device dm-2): checksum error at logical 
>>>> 470069526528 on dev 
>>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
>>>> 942238176: metadata leaf (level 0) in tree 985
>>>> [551049.039635] BTRFS warning (device dm-2): checksum error at logical 
>>>> 470069526528 on dev 
>>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
>>>> 942238176: metadata leaf (level 0) in tree 985
>>>> [551049.039637] BTRFS error (device dm-2): bdev 
>>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 
>>>> 0, flush 0, corrupt 2, gen 0
>>>> [551049.413114] BTRFS error (device dm-2): unable to fixup (regular) error 
>>>> at logical 470069460992 on dev 
>>>> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
>> These are metadata errors. Are there any other storage stack related
>> errors in the previous 2-5 minutes, such as read errors (UNC) or SATA
>> link reset messages?
>>
>>> Maybe I can find snapshot that contains file with wrong checksum and
>>> remove corresponding snapshot or something like that?
>> It's not a file. It's metadata leaf.
> Just for the record: had this be a data block (ie, a non-inline file
> extent), the dmesg message would include one of filenames that refer to that
> extent.  To clear the error, you'd need to remove all such files.
>
>>>> nazar-pc@nazar-pc ~> sudo btrfs filesystem df /media/Backup
>>>> Data, single: total=879.01GiB, used=877.24GiB
>>>> System, DUP: total=40.00MiB, used=128.00KiB
>>>> Metadata, DUP: total=20.50GiB, used=18.96GiB
>>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>> Metadata is DUP, but both copies have corruption. Kinda strange. But I
>> don't know how close the DUP copies are to each other, if possibly a
>> big enough media defect can explain this.
> The original post mentioned SSD (but was unclear if _this_ filesystem is
> backed by one).  If so, DUP is nearly worthless as both copies will be
> written to physical cells next to each other, no matter what positions the
> FTL shows them at.
>
>
> Meow!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unrecoverable scrub errors

2017-11-17 Thread Nazar Mokrynskyi
per/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> [551049.479989] BTRFS warning (device dm-2): checksum error at logical 
> 470069542912 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 942238208: metadata leaf (level 0) in tree 985
> [551049.479993] BTRFS warning (device dm-2): checksum error at logical 
> 470069542912 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 942238208: metadata leaf (level 0) in tree 985
> [551049.479997] BTRFS error (device dm-2): bdev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, 
> flush 0, corrupt 6, gen 0
> [551049.523539] BTRFS error (device dm-2): unable to fixup (regular) error at 
> logical 470069542912 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> [551051.672589] BTRFS warning (device dm-2): checksum error at logical 
> 470069460992 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 943286624: metadata leaf (level 0) in tree 985
> [551051.672593] BTRFS warning (device dm-2): checksum error at logical 
> 470069460992 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 943286624: metadata leaf (level 0) in tree 985
> [551051.672597] BTRFS error (device dm-2): bdev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, 
> flush 0, corrupt 7, gen 0
> [551051.820776] BTRFS error (device dm-2): unable to fixup (regular) error at 
> logical 470069460992 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> [551051.945310] BTRFS warning (device dm-2): checksum error at logical 
> 470069477376 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 943286656: metadata leaf (level 0) in tree 985
> [551051.945314] BTRFS warning (device dm-2): checksum error at logical 
> 470069477376 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 943286656: metadata leaf (level 0) in tree 985
> [551051.945318] BTRFS error (device dm-2): bdev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, 
> flush 0, corrupt 8, gen 0
> [551052.112245] BTRFS warning (device dm-2): checksum error at logical 
> 470069526528 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 943286752: metadata leaf (level 0) in tree 985
> [551052.112247] BTRFS warning (device dm-2): checksum error at logical 
> 470069526528 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 943286752: metadata leaf (level 0) in tree 985
> [551052.112248] BTRFS error (device dm-2): bdev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, 
> flush 0, corrupt 9, gen 0
> [551052.183671] BTRFS error (device dm-2): unable to fixup (regular) error at 
> logical 470069477376 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> [551052.253278] BTRFS error (device dm-2): unable to fixup (regular) error at 
> logical 470069526528 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
> [551052.260305] BTRFS warning (device dm-2): checksum error at logical 
> 470069493760 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 943286688: metadata leaf (level 0) in tree 985
> [551052.260307] BTRFS warning (device dm-2): checksum error at logical 
> 470069493760 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1, sector 
> 943286688: metadata leaf (level 0) in tree 985
> [551052.260308] BTRFS error (device dm-2): bdev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1 errs: wr 0, rd 0, 
> flush 0, corrupt 10, gen 0
> [551052.300024] BTRFS error (device dm-2): unable to fixup (regular) error at 
> logical 470069493760 on dev 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
This is an online backup partition and I have an offline backup partition with 
the same data, so not very concerned about loosing any data here, but would 
like to repair it.

Are there any better options before resorting to `btrfsck --repair`? Maybe I 
can find snapshot that contains file with wrong checksum and remove 
corresponding snapshot or something like that?

> nazar-pc@nazar-pc ~> sudo btrfs filesystem show /media/Backup
> Label: 'Backup'  uuid: 82cfcb0f-0b80-4764-bed6-f529f2030ac5
>     Total devices 1 FS bytes used 896.20GiB
>     devid    1 size 1.00TiB used 920.09GiB path 
> /dev/mapper/luks-bd5dd3e7-ad80-405f-8dfd-752f2b870f93-part1
>
> nazar-pc@nazar-pc ~> sudo btrfs filesystem df /media/Backup
> Data, single: total=879.01GiB, used=877.24GiB
> System, DUP: total=40.00MiB, used=128.00KiB
> Metadata, DUP: total=20.50GiB, used

Re: [PATCH 1/2 v2] btrfs-progs: fix btrfs send & receive with -e flag

2017-04-28 Thread Nazar Mokrynskyi
Hi,

Sorry for confusion, I've checked once again and the same issue happens in all 
cases.

I didn't notice this because my regular backups are done automatically in cron 
task + snapshots look fine despite the error, so I incorrectly assumed an error 
didn't happen there, but it actually did.

I've clarified this in last comment on bugzilla.

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249

28.04.17 13:03, Lakshmipathi.G пише:
> I can take a look. What I'm wondering about is why it fails only in the HDD
> to SSD case. If -ENODATA is returned with this patch it should mean that there
> was no header data. So is the user sure that this doesn't indicate a valid
> error?
>
> Christian

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Subvolume copy fails with "ERROR: empty stream is not considered valid"

2017-04-27 Thread Nazar Mokrynskyi
I've just reported a bug (https://bugzilla.kernel.org/show_bug.cgi?id=195597) 
that hit me after recent update of btrfs-progs.

It seems to be a false-positive that resulted from the changes that aimed to 
fix another issue.

Short version of it is following:

root@nazar-pc:~# /bin/btrfs send  "/media/Backup/web/2017-04-04_14:30:06" | 
/bin/btrfs receive "/media/Backup_backup/web"
At subvol /media/Backup/web/2017-04-04_14:30:06
At subvol 2017-04-04_14:30:06
ERROR: empty stream is not considered valid

I've also added Stéphane Graber to CC as the author of the recent update to 
Ubuntu's btrfs-progs package.

I'm not subscribed to the mailing list, so keep me in copy, please.

-- 
Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Major HDD performance degradation on btrfs receive

2016-05-26 Thread Nazar Mokrynskyi
I've being running newer version of just-backup-btrfs, which was configured to 
remove snapshots in batches ~ at least 3x100 at once (this is what I typically 
have in 1.5-2 days).

Snapshots transferring become much faster, however when I delete 300 snapshots 
at once, well... you can imagine what happens, but I can afford this on desktop.

Seekwatcher fails to run on my system with following error:

~> sudo seekwatcher -t find.trace -o find.png -p 'find /backup_hdd > /dev/null' 
-d /dev/sda1

Traceback (most recent call last):
  File "/usr/bin/seekwatcher", line 58, in 
from seekwatcher import rundata
  File "numpy.pxd", line 43, in seekwatcher.rundata (seekwatcher/rundata.c:7885)
ValueError: numpy.dtype does not appear to be the correct type object

I have no idea what does it mean, but generally I think if seeking because of 
fragmentation is a real cause of performance degradation, then this is 
something that BTRFS can improve, since I still have 65% of free space on BTRFS 
partition that receives snapshots and fragmentation in this case seems weird.

P.S. I've unsubscribed from mailing list, cc me on answers, please.

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249

On 18.03.16 16:22, Nazar Mokrynskyi wrote:
>> But seriously, what are you doing that you can't lose more than 15
>> minutes of?  Couldn't it be even 20 minutes, or a half-hour or an hour
>> with 15 minute snapshots only on the ssds (yes, I know the raid0 factor,
>> but the question still applies)?
> This is artificial psychological limit. Sometimes when you're actively 
> coding, it is quite sad to loose even 5 minutes of work, since productivity 
> is not constant. This is why 15 minutes was chosen as something that is not 
> too critical. There is no other real reason behind this limit other than how 
> I feel it.
>
>> And perhaps more importantly for your data, btrfs is still considered
>> "stabilizing, not fully stable and mature".  Use without backups is
>> highly discouraged, but I'd suggest that btrfs in its current state might
>> not be what you're looking for if you can't deal with loss of more than
>> 15 minutes worth of changes anyway.
>>
>> Be that as it may...
>>
>> Btrfs is definitely not yet optimized.  In many cases it reads or writes
>> only one device at a time, for instance, even in RaidN configuration.
>> And there are definitely snapshot scaling issues altho at your newer 500
>> snapshots total that shouldn't be /too/ bad.
> As an (relatively) early adopter I'm fine using experimental stuff with extra 
> safeties like backups (hey, I've used it even without those while back:)). I 
> fully acknowledge what is current state of BTRFS and want to help make it 
> even better by stressing issues that me and other users encounter, searching 
> for solutions, etc.
>
>> Dealing with reality, regardless of how or why, you currently have a
>> situation of intolerably slow receives that needs addressed.  From a
>> practical perspective you said an ssd for backups is ridiculous and I
>> can't disagree, but there's another "throw hardware at it" solution that
>> might be a bit more reasonable...
>>
>> Spinning rust hard drives are cheap.  What about getting another one, and
>> alternating your backup receives between them?  That would halve the load
>> to one every thirty minutes, without changing your 15-minute snapshot and
>> backup policy at all. =:^)
>>
>> So that gives you two choices for halving the load to the spinning rust.
>> Either decide you really can live with half-hour loss of data, or throw
>> only a relatively small amount of money (well, as long as you have room
>> to plug in another sata device anyway, otherwise...) at it for a second
>> backup device, and alternate between them.
> Yes, I'm leaning toward earning new hardware right now, fortunately, laptop 
> allows me to insert 2 x mSATA + 2 x 2.5 SATA drives, so I have exactly 2.5 
> SATA slot free.
>
>> OTOH, since you mentioned possible coding, optimization might not be a
>> bad thing, if you're willing to put in the time necessary to get up to
>> speed with the code and can work with the other devs in terms of timing,
>> etc.  But that will definitely take significant time even if you do it,
>> and the alternating backup solution can be put to use as soon as you can
>> get another device plugged in and setup. =:^)
> I'm not coding C/C++, so my capabilities to improve BTRFS itself are limited, 
> but

Re: "bad metadata" not fixed by btrfs repair

2016-03-28 Thread Nazar Mokrynskyi

I have the same thing with kernel 4.5 and btrfs-progs 4.4.

Wrote about it 2 weeks ago and didn't get any answer: 
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg51609.html


However, despite those messages everything seems to work fine.

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249

On 28.03.16 21:42, Marc Haber wrote:

On Mon, Mar 28, 2016 at 04:37:14PM +0200, Marc Haber wrote:

I have a btrfs which btrfs check --repair doesn't fix:

# btrfs check --repair /dev/mapper/fanbtr
bad metadata [4425377054720, 4425377071104) crossing stripe boundary
bad metadata [4425380134912, 4425380151296) crossing stripe boundary
bad metadata [4427532795904, 4427532812288) crossing stripe boundary
bad metadata [4568321753088, 4568321769472) crossing stripe boundary
bad metadata [4568489656320, 4568489672704) crossing stripe boundary
bad metadata [4571474493440, 4571474509824) crossing stripe boundary
bad metadata [4571946811392, 4571946827776) crossing stripe boundary
bad metadata [4572782919680, 4572782936064) crossing stripe boundary
bad metadata [4573086351360, 4573086367744) crossing stripe boundary
bad metadata [4574221041664, 4574221058048) crossing stripe boundary
bad metadata [4574373412864, 4574373429248) crossing stripe boundary
bad metadata [4574958649344, 4574958665728) crossing stripe boundary
bad metadata [4575996018688, 4575996035072) crossing stripe boundary
bad metadata [4580376772608, 4580376788992) crossing stripe boundary
repaired damaged extent references
Fixed 0 roots.
checking free space cache
checking fs roots
checking csums
checking root refs
enabling repair mode
Checking filesystem on /dev/mapper/fanbtr
UUID: 90f8d728-6bae-4fca-8cda-b368ba2c008e
cache and super generation don't match, space cache will be invalidated
found 97171628230 bytes used err is 0
total csum bytes: 91734220
total tree bytes: 3021848576
total fs tree bytes: 2762784768
total extent tree bytes: 148570112
btree space waste bytes: 545440822
file data blocks allocated: 308328280064
  referenced 177314340864

Mounting this filesystem gives:
Mar 28 20:25:18 fan kernel: [   20.979673] BTRFS error (device dm-16): could 
not find root 8
Mar 28 20:25:18 fan kernel: [   20.979739] BTRFS error (device dm-16): could 
not find root 8
Mar 28 20:25:18 fan kernel: [   20.980900] BTRFS error (device dm-16): could 
not find root 8
Mar 28 20:25:18 fan kernel: [   20.980948] BTRFS error (device dm-16): could 
not find root 8
Mar 28 20:25:18 fan kernel: [   20.981428] BTRFS error (device dm-16): could 
not find root 8
Mar 28 20:25:18 fan kernel: [   20.981472] BTRFS error (device dm-16): could 
not find root 8

which is not detected by btrfs check.

What is going on here?

Greetings
Marc






smime.p7s
Description: Кріптографічний підпис S/MIME


Re: Major HDD performance degradation on btrfs receive

2016-03-19 Thread Nazar Mokrynskyi

But seriously, what are you doing that you can't lose more than 15
minutes of?  Couldn't it be even 20 minutes, or a half-hour or an hour
with 15 minute snapshots only on the ssds (yes, I know the raid0 factor,
but the question still applies)?
This is artificial psychological limit. Sometimes when you're actively 
coding, it is quite sad to loose even 5 minutes of work, since 
productivity is not constant. This is why 15 minutes was chosen as 
something that is not too critical. There is no other real reason behind 
this limit other than how I feel it.



And perhaps more importantly for your data, btrfs is still considered
"stabilizing, not fully stable and mature".  Use without backups is
highly discouraged, but I'd suggest that btrfs in its current state might
not be what you're looking for if you can't deal with loss of more than
15 minutes worth of changes anyway.

Be that as it may...

Btrfs is definitely not yet optimized.  In many cases it reads or writes
only one device at a time, for instance, even in RaidN configuration.
And there are definitely snapshot scaling issues altho at your newer 500
snapshots total that shouldn't be /too/ bad.
As an (relatively) early adopter I'm fine using experimental stuff with 
extra safeties like backups (hey, I've used it even without those while 
back:)). I fully acknowledge what is current state of BTRFS and want to 
help make it even better by stressing issues that me and other users 
encounter, searching for solutions, etc.



Dealing with reality, regardless of how or why, you currently have a
situation of intolerably slow receives that needs addressed.  From a
practical perspective you said an ssd for backups is ridiculous and I
can't disagree, but there's another "throw hardware at it" solution that
might be a bit more reasonable...

Spinning rust hard drives are cheap.  What about getting another one, and
alternating your backup receives between them?  That would halve the load
to one every thirty minutes, without changing your 15-minute snapshot and
backup policy at all. =:^)

So that gives you two choices for halving the load to the spinning rust.
Either decide you really can live with half-hour loss of data, or throw
only a relatively small amount of money (well, as long as you have room
to plug in another sata device anyway, otherwise...) at it for a second
backup device, and alternate between them.
Yes, I'm leaning toward earning new hardware right now, fortunately, 
laptop allows me to insert 2 x mSATA + 2 x 2.5 SATA drives, so I have 
exactly 2.5 SATA slot free.



OTOH, since you mentioned possible coding, optimization might not be a
bad thing, if you're willing to put in the time necessary to get up to
speed with the code and can work with the other devs in terms of timing,
etc.  But that will definitely take significant time even if you do it,
and the alternating backup solution can be put to use as soon as you can
get another device plugged in and setup. =:^)
I'm not coding C/C++, so my capabilities to improve BTRFS itself are 
limited, but I'm always trying to find the reason and fix it instead of 
living with workarounds forever.


I'll play with Seekwatcher and optimizing snapshots deletion and will 
post an update afterwards.


Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249

On 17.03.16 09:00, Duncan wrote:

Nazar Mokrynskyi posted on Wed, 16 Mar 2016 05:37:02 +0200 as excerpted:


I'm not sure what you mean exactly by searching. My first SSD died
during waking up from suspend mode, it worked perfectly till last
moment. It was not used for critical data at that time, but now I
understand clearly that SSD failure can happen at any time. Having RAID0
of 2 SSDs it 2 times more risky, so I'm not ready to lose anything
beyond 15 minutes threshold. I'd rather end up having another HDD purely
for backup purposes.

I understand the raid0 N times the danger part, which is why I only ever
used raid0 on stuff like the distro packages cache that I could easily
redownload from the net, here.

But seriously, what are you doing that you can't lose more than 15
minutes of?  Couldn't it be even 20 minutes, or a half-hour or an hour
with 15 minute snapshots only on the ssds (yes, I know the raid0 factor,
but the question still applies)?

What /would/ you do if you lost a whole hour's worth of work?  Surely you
could duplicate it in the next hour?  Or are you doing securities trading
or something, where you /can't/ recover work at all, because by then the
market and your world have moved on?  But in that case...

And perhaps more importantly for your data, btrfs is still considered
"stabilizing, not fully stable and mature".  Use without backups is
highly discouraged, 

Re: Major HDD performance degradation on btrfs receive

2016-03-15 Thread Nazar Mokrynskyi

Sounds like a really good idea!

I'll try to implement in in my backup tool, but it might take some time 
to see real benefit from it (or no benefit:)).


Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora:naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249

On 16.03.16 06:18, Chris Murphy wrote:

Very simplistically: visualizing Btrfs writes without file deletion,
it's a contiguous write. There isn't much scatter, even accounting for
metadata and data chunk writes happening in slightly different regions
of platter space. (I'm thinking this slow down happens overwhelmingly
on HDDs.)

If there are file deletions, holes appear, and now some later writes
will fill those holes, but not exactly, which will lead to
fragmentation and thus seek times. Seeks would go up by a lot the
smaller the holes are. And the holes are smaller the fewer files are
being deleted at once.

If there's a snapshot, and then file deletions, holes don't appear.
Everything is always copy on write and deleted files don't actually
get deleted (they're still in another subvolume). So as soon as a file
is reflinked or in a snapshotted subvolume there's no fragmentation
happening with file deletions.

If there's many snapshots happening in a short time, such as once
every 10 minutes, that means only 10 minutes worth of writes happening
in a given subvolume. If that space is later released by deleting
snapshots one at time (like a rolling snapshot and delete strategy
every 10 minutes) that means only small holes are opening up for later
writes. It's maybe the worst case scenario for fragmenting Btrfs.

A better way might be to delay snapshot deletion. Keep taking the
snapshots, but delete old snapshots in batches. Delete maybe 10 or 100
(if we're talking thousands of snapshots) at once. This should free a
lot more contiguous space for later writes and significantly reduce
the chance of significant fragmentation. Of course some fragmentation
is going to happen no matter what, but I think the usage pattern
described in a lot of these slow down cases sound to me like worse
case scenario for cow.

Now, a less lazy person would actually test this hypothesis.


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message tomajord...@vger.kernel.org
More majordomo info athttp://vger.kernel.org/majordomo-info.html





smime.p7s
Description: Кріптографічний підпис S/MIME


bad metadata [125501440, 125517824) crossing stripe boundary

2016-03-15 Thread Nazar Mokrynskyi

I was running btrfsck today and got many of such errors:


bad metadata [125501440, 125517824) crossing stripe boundary
bad metadata [131334144, 131350528) crossing stripe boundary
bad metadata [142999552, 143015936) crossing stripe boundary
bad metadata [153944064, 153960448) crossing stripe boundary
bad metadata [281870336, 281886720) crossing stripe boundary
bad metadata [528285696, 528302080) crossing stripe boundary
bad metadata [661323776, 661340160) crossing stripe boundary
bad metadata [986316800, 986333184) crossing stripe boundary
bad metadata [987168768, 987185152) crossing stripe boundary
bad metadata [1029111808, 1029128192) crossing stripe boundary
bad metadata [1099169792, 1099186176) crossing stripe boundary
I was able to find message with similar error on mainling list from 
January, but didn't found any answer.


I'm on Kernel 4.5.0 stable and btrfs-tools 4.4. filesystem was created 
at the beginning of July 2015.


Here is btrfs scrub output:


scrub status for 40b8240a-a0a2-4034-ae55-f8558c0343a8
scrub started at Wed Mar 16 04:13:54 2016 and finished after 
00:52:51

total bytes scrubbed: 274.05GiB with 0 errors
Looks like no metadata errors found, so what that "bad metadata" things 
really mean?


--
Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249




smime.p7s
Description: Кріптографічний підпис S/MIME


Re: Major HDD performance degradation on btrfs receive

2016-03-15 Thread Nazar Mokrynskyi
test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has 
ever been run

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
100  Completed [00% left] (0-65535)
200  Not_testing
300  Not_testing
400  Not_testing
500  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute 
delay.

gnome-disks says it worked 15 days and 8 minutes


I also almost never look at the backups, and when I do, indeed scanning
through a 1000 snapshots fs on spinning disk takes time. If a script
does that every 15mins, and the fs uses LZO compression and there is
another active partition then you will have to deal with the slowness.
Well, it is not that bad and hard in reality. Every 15 minutes I'm 
transfering 3 diffs. Right now HDD contains 453 subvolumes totally, 34% 
of 359 GiB partition space used. After writing last message I've decided 
to collect diffs for further analysis.


So /home subvolume's diffs ranging from 6 to 270 MiB. Typically 30-40 MiB.

/root subvolume's diffs ranging from 10 KiB to 380 MiB (during software 
updates). Typically 40-80 KiB.


/web (source code here) subvolume's diffs ranging from bytes to 1 MiB, 
typically 150 KiB.


So generally when I'm watching movie or playing some game (not changing 
source code, not updating software and not doing anything that might 
cause significant changes in /home subvolume) I'll get about 30 MiB of 
diff in total. This is not that much for SATA3 HDD, it shouldn't stuck 
for some seconds when everything is so slow that video stops completely 
for few seconds.


Maybe BTRFS construction requires this small diff to make a big party 
all over HDD, I don't know, but there is some problem here for sure.



You could adapt the script or backup method not to search every time,
but to just write the next diff send|receive and only step back and
search if this fails.

Or keeping more 15min snapshots only on SSD and lower the rate of
send|receive them to HDD
I'm not sure what you mean exactly by searching. My first SSD died 
during waking up from suspend mode, it worked perfectly till last 
moment. It was not used for critical data at that time, but now I 
understand clearly that SSD failure can happen at any time. Having RAID0 
of 2 SSDs it 2 times more risky, so I'm not ready to lose anything 
beyond 15 minutes threshold. I'd rather end up having another HDD purely 
for backup purposes.



Interesting question: is there any tool to see the whole picture about 
how btrfs partition is fragmented? I saw many tools for NTFS on Windows 
that show nice picture, but not for Linux filesystems. Saw answer on 
StackOverlow about fsck, but btrfsck doesn't provide similar output.


Also, I can't really run defragmentation anyway since all backups are 
read-only.


Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249

On 16.03.16 01:11, Henk Slager wrote:

On Tue, Mar 15, 2016 at 1:47 AM, Nazar Mokrynskyi  wrote:

Some update since last time (few weeks ago).

All filesystems are mounted with noatime, I've also added mounting
optimization - so there is no problem with remounting filesystem every time,
it is done only once.

Remounting optimization helped by reducing 1 complete snapshot +
send/receive cycle by some seconds, but otherwise it is still very slow when
`btrfs receive` is active.

OK, great that the umount+mount is gone. I think most time is
unfortunately spent in seeks; I think over time and due to various
factors, both free space and files are highly fragmented on your disk.
It could also be that the disk is bit older and has or is starting to
use its spare sectors.


I'm not considering bcache + btrfs as potential setup because I do not
currently have free SSD for it and basically spending SSD besides HDD for
backup partition feels like a bit of overkill (especially for desktop use).

Yes I think so too; For backup, I am also a bit reluctant to use
bcache. But the big difference is that you do a snapshot transfer
every 15minute while I do that only every 24hour. So I almost dont
care how long the send|receive takes in the middle of the night. I
also almost never look at the backups, and when I do, indeed scanning
through a 1000 snapshots fs on spinning disk takes time. If a script
does that every 15mins, and the fs uses LZO compression and there is
another active partition then you will have to deal with the slowness.
And if the files are mostly small, like source-trees, it gets even
worse. So it is about 100x more creates+deletes of subvolumes. To be
honest, it is just requiring too much from a HDD 

Re: Major HDD performance degradation on btrfs receive

2016-03-14 Thread Nazar Mokrynskyi

Some update since last time (few weeks ago).

All filesystems are mounted with noatime, I've also added mounting 
optimization - so there is no problem with remounting filesystem every 
time, it is done only once.


Remounting optimization helped by reducing 1 complete snapshot + 
send/receive cycle by some seconds, but otherwise it is still very slow 
when `btrfs receive` is active.


I'm not considering bcache + btrfs as potential setup because I do not 
currently have free SSD for it and basically spending SSD besides HDD 
for backup partition feels like a bit of overkill (especially for 
desktop use).


My current kernel is 4.5.0 stable, btrfs-tools still 4.4-1 from Ubuntu 
16.04 repository as of today.


As I'm reading mailing list there are other folks having similar 
performance issues. So can we debug things to find the root cause and 
fix it at some point?


My C/C++/Kernel/BTRFS knowledges are scarce, which is why some 
assistance here is needed from someone more experienced.


Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249

On 25.02.16 03:04, Henk Slager wrote:

On Wed, Feb 24, 2016 at 11:45 PM, Nazar Mokrynskyi  wrote:

Here is btrfs-show-super output:


nazar-pc@nazar-pc ~> sudo btrfs-show-super /dev/sda1
superblock: bytenr=65536, device=/dev/sda1
-
csum0x1e3c6fb8 [match]
bytenr65536
flags0x1
 ( WRITTEN )
magic_BHRfS_M [match]
fsid40b8240a-a0a2-4034-ae55-f8558c0343a8
labelBackup
generation165491
root143985360896
sys_array_size226
chunk_root_generation162837
root_level1
chunk_root247023583232
chunk_root_level1
log_root0
log_root_transid0
log_root_level0
total_bytes858993459200
bytes_used276512202752
sectorsize4096
nodesize16384
leafsize16384
stripesize4096
root_dir6
num_devices1
compat_flags0x0
compat_ro_flags0x0
incompat_flags0x169
 ( MIXED_BACKREF |
   COMPRESS_LZO |
   BIG_METADATA |
   EXTENDED_IREF |
   SKINNY_METADATA )
csum_type0
csum_size4
cache_generation165491
uuid_tree_generation165491
dev_item.uuid81eee7a6-774e-4bb5-8b72-cebb85a2f2ce
dev_item.fsid40b8240a-a0a2-4034-ae55-f8558c0343a8 [match]
dev_item.type0
dev_item.total_bytes858993459200
dev_item.bytes_used291072114688
dev_item.io_align4096
dev_item.io_width4096
dev_item.sector_size4096
dev_item.devid1
dev_item.dev_group0
dev_item.seek_speed0
dev_item.bandwidth0
dev_item.generation0

It is sad that skinny metadata will only affect new data, probably, I'll end
up re-creating it:(

Can I rebalance it or something simple for this purpose?

A balance won't help for that and also your metadata does look quite
compact already. But I think you should not expect so much of this
skinny metadata on a PC with 16G RAM


Those are quite typical values for an already heavily used btrfs on a HDD.


Bad news, since I'm doing mounting/unmounting few times during snapshots
creation because of how BTRFS works (source code:
https://github.com/nazar-pc/just-backup-btrfs/blob/master/just-backup-btrfs.php#L148)

So if 10+20 seconds is typical, then in my case HDD can be very busy during
a minute or sometimes more, this is not good and basically part or even real
reason of initial question.

Yes indeed! This mount/unmount every 15 minutes (or more times per 15
minutes) is killing for performance IMO. At the moment I don't fully
understand why you are bothered by the limitation you mention in the
php source comments. I think it's definitely worth to change paths
and/or your requirements in such a way that you can avoid the
umount/mount.

As a workaround, bcache with its cache device nicely filled over time,
will absolutely speedup the mount. But as you had some troubles with
btrfs in the past and also you use ext4 on the same disk because it is
a more mature filesystem, you might not want bache+btrfs for backup
storage, it is up to you.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





smime.p7s
Description: Кріптографічний підпис S/MIME


Re: Major HDD performance degradation on btrfs receive

2016-02-24 Thread Nazar Mokrynskyi

Here is btrfs-show-super output:


nazar-pc@nazar-pc ~> sudo btrfs-show-super /dev/sda1
superblock: bytenr=65536, device=/dev/sda1
-
csum0x1e3c6fb8 [match]
bytenr65536
flags0x1
( WRITTEN )
magic_BHRfS_M [match]
fsid40b8240a-a0a2-4034-ae55-f8558c0343a8
labelBackup
generation165491
root143985360896
sys_array_size226
chunk_root_generation162837
root_level1
chunk_root247023583232
chunk_root_level1
log_root0
log_root_transid0
log_root_level0
total_bytes858993459200
bytes_used276512202752
sectorsize4096
nodesize16384
leafsize16384
stripesize4096
root_dir6
num_devices1
compat_flags0x0
compat_ro_flags0x0
incompat_flags0x169
( MIXED_BACKREF |
  COMPRESS_LZO |
  BIG_METADATA |
  EXTENDED_IREF |
  SKINNY_METADATA )
csum_type0
csum_size4
cache_generation165491
uuid_tree_generation165491
dev_item.uuid81eee7a6-774e-4bb5-8b72-cebb85a2f2ce
dev_item.fsid40b8240a-a0a2-4034-ae55-f8558c0343a8 [match]
dev_item.type0
dev_item.total_bytes858993459200
dev_item.bytes_used291072114688
dev_item.io_align4096
dev_item.io_width4096
dev_item.sector_size4096
dev_item.devid1
dev_item.dev_group0
dev_item.seek_speed0
dev_item.bandwidth0
dev_item.generation0
It is sad that skinny metadata will only affect new data, probably, I'll 
end up re-creating it:(


Can I rebalance it or something simple for this purpose?


Those are quite typical values for an already heavily used btrfs on a HDD.


Bad news, since I'm doing mounting/unmounting few times during snapshots 
creation because of how BTRFS works (source code: 
https://github.com/nazar-pc/just-backup-btrfs/blob/master/just-backup-btrfs.php#L148)


So if 10+20 seconds is typical, then in my case HDD can be very busy 
during a minute or sometimes more, this is not good and basically part 
or even real reason of initial question.


Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249


On 24.02.16 23:32, Henk Slager wrote:

On Tue, Feb 23, 2016 at 6:44 PM, Nazar Mokrynskyi  wrote:

Looks like btrfstune -x did nothing, probably, it was already used at
creation time, I'm using rcX versions of kernel all the time and rolling
version of Ubuntu, so this is very likely to be the case.

The commandbtrfs-show-super   shows the features of the
filesystem. You have a 'dummy' single profiles on the HDD fs and that
gives me a hint that you likely have used older tools to create the
fs. The current kernel does not set this feature flag on disk. If the
flag was already set, then no difference in performance.

If it was not set, then from now on, new metadata extents should be
skinny, which saves on total memory size and processing for (the
larger) filesystems. But for your existing data (snapshot subvolumes
in your  case) the metadata is then still non-skinny. So you won't
notice an instant difference only after all exiting fileblocks are
re-written or removed.
You will probably have a measurable difference if you equally fill 2
filesystems, one with and the other without the flag.


One thing I've noticed is much slower mount/umount on HDD than on SSD:


nazar-pc@nazar-pc ~> time sudo umount /backup
0.00user 0.00system 0:00.01elapsed 36%CPU (0avgtext+0avgdata
7104maxresident)k
0inputs+0outputs (0major+784minor)pagefaults 0swaps
nazar-pc@nazar-pc ~> time sudo mount /backup
0.00user 0.00system 0:00.03elapsed 23%CPU (0avgtext+0avgdata
7076maxresident)k
0inputs+0outputs (0major+803minor)pagefaults 0swaps
nazar-pc@nazar-pc ~> time sudo umount /backup_hdd
0.00user 0.11system 0:01.04elapsed 11%CPU (0avgtext+0avgdata
7092maxresident)k
0inputs+15296outputs (0major+787minor)pagefaults 0swaps
nazar-pc@nazar-pc ~> time sudo mount /backup_hdd
0.00user 0.02system 0:04.45elapsed 0%CPU (0avgtext+0avgdata
7140maxresident)k
14648inputs+0outputs (0major+795minor)pagefaults 0swaps

It is especially long (tenth of seconds with hight HDD load) when called
after some time, not consequently.

Once it took something like 20 seconds to unmount filesystem and around 10
seconds to mount it.

Those are quite typical values for an already heavily used btrfs on a HDD.


About memory - 16 GiB of RAM should be enough I guess:) Can I measure
somehow if seeking is a problem?

I don't know a tool that can measure seek times and gather statistics
over and extended period of time and relate that to filesystem
internal actions. It would be best if all this were done by the HDD
firmware (under command of t

Re: Major HDD performance degradation on btrfs receive

2016-02-23 Thread Nazar Mokrynskyi
Looks like btrfstune -x did nothing, probably, it was already used at 
creation time, I'm using rcX versions of kernel all the time and rolling 
version of Ubuntu, so this is very likely to be the case.


One thing I've noticed is much slower mount/umount on HDD than on SSD:


nazar-pc@nazar-pc ~> time sudo umount /backup
0.00user 0.00system 0:00.01elapsed 36%CPU (0avgtext+0avgdata 
7104maxresident)k

0inputs+0outputs (0major+784minor)pagefaults 0swaps
nazar-pc@nazar-pc ~> time sudo mount /backup
0.00user 0.00system 0:00.03elapsed 23%CPU (0avgtext+0avgdata 
7076maxresident)k

0inputs+0outputs (0major+803minor)pagefaults 0swaps
nazar-pc@nazar-pc ~> time sudo umount /backup_hdd
0.00user 0.11system 0:01.04elapsed 11%CPU (0avgtext+0avgdata 
7092maxresident)k

0inputs+15296outputs (0major+787minor)pagefaults 0swaps
nazar-pc@nazar-pc ~> time sudo mount /backup_hdd
0.00user 0.02system 0:04.45elapsed 0%CPU (0avgtext+0avgdata 
7140maxresident)k

14648inputs+0outputs (0major+795minor)pagefaults 0swaps
It is especially long (tenth of seconds with hight HDD load) when called 
after some time, not consequently.


Once it took something like 20 seconds to unmount filesystem and around 
10 seconds to mount it.


Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora:naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249

On 22.02.16 20:58, Nazar Mokrynskyi wrote:
On Tue, Feb 16, 2016 at 5:44 AM, Nazar Mokrynskyi 
 wrote:
> I have 2 SSD with BTRFS filesystem (RAID) on them and several 
subvolumes.
> Each 15 minutes I'm creating read-only snapshot of subvolumes 
/root, /home

> and /web inside /backup.
> After this I'm searching for last common subvolume on /backup_hdd, 
sending
> difference between latest common snapshot and simply latest 
snapshot to

> /backup_hdd.
> On top of all above there is snapshots rotation, so that /backup 
contains

> much less snapshots than /backup_hdd.
>
> I'm using this setup for last 7 months or so and this is luckily 
the longest

> period when I had no problems with BTRFS at all.
> However, last 2+ months btrfs receive command loads HDD so much 
that I can't

> even get list of directories in it.
> This happens even if diff between snapshots is really small.
> HDD contains 2 filesystems - mentioned BTRFS and ext4 for other 
files, so I

> can't even play mp3 file from ext4 filesystem while btrfs receive is
> running.
> Since I'm running everything each 15 minutes this is a real headache.
>
> My guess is that performance hit might be caused by filesystem 
fragmentation
> even though there is more than enough empty space. But I'm not sure 
how to
> properly check this and can't, obviously, run defragmentation on 
read-only

> subvolumes.
>
> I'll be thankful for anything that might help to identify and 
resolve this

> issue.
>
> ~> uname -a
> Linux nazar-pc 4.5.0-rc4-haswell #1 SMP Tue Feb 16 02:09:13 CET 
2016 x86_64

> x86_64 x86_64 GNU/Linux
>
> ~> btrfs --version
> btrfs-progs v4.4
>
> ~> sudo btrfs fi show
> Label: none  uuid: 5170aca4-061a-4c6c-ab00-bd7fc8ae6030
> Total devices 2 FS bytes used 71.00GiB
> devid1 size 111.30GiB used 111.30GiB path /dev/sdb2
> devid2 size 111.30GiB used 111.29GiB path /dev/sdc2
>
> Label: 'Backup'  uuid: 40b8240a-a0a2-4034-ae55-f8558c0343a8
> Total devices 1 FS bytes used 252.54GiB
> devid1 size 800.00GiB used 266.08GiB path /dev/sda1
>
> ~> sudo btrfs fi df /
> Data, RAID0: total=214.56GiB, used=69.10GiB
> System, RAID1: total=8.00MiB, used=16.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, RAID1: total=4.00GiB, used=1.87GiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> ~> sudo btrfs fi df /backup_hdd
> Data, single: total=245.01GiB, used=243.61GiB
> System, DUP: total=32.00MiB, used=48.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, DUP: total=10.50GiB, used=8.93GiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> Relevant mount options:
> UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/ btrfs
> compress=lzo,noatime,relatime,ssd,subvol=/root0 1
> UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/home btrfs
> compress=lzo,noatime,relatime,ssd,subvol=/home 01
> UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/backup btrfs
> compress=lzo,noatime,relatime,ssd,subvol=/backup 01
> UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/web btrfs
> compress=lzo,noatime,relatime,ssd,subvol=/web 01
> UUID=40b8240a-a0a2-4034-ae55-f8558c0343a8/backup_hdd btrfs
> compress=lzo,noatime,relatime,noexec 01
As alrea

Re: Major HDD performance degradation on btrfs receive

2016-02-23 Thread Nazar Mokrynskyi

Wow, this is interesting, didn't know it.

I'll probably try noatime instead:)

Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249

On 23.02.16 18:29, Alexander Fougner wrote:

2016-02-23 18:18 GMT+01:00 Nazar Mokrynskyi :

But why? I have relatime option, it should not cause changes unless file
contents is actually changed if I understand this option correctly.


*or* if it is older than 1 day. From the manpages:

relatime
   Update inode access times relative to modify or change time.
   Access time is only updated if the previous access time was
   earlier than the current modify or change time.  (Similar to
   noatime, but it doesn't break mutt or other applications that
   need to know if a file has been read since the last time it
   was modified.)

   Since Linux 2.6.30, the kernel defaults to the behavior
   provided by this option (unless noatime was specified), and
   the strictatime option is required to obtain traditional

   semantics.  In addition, since Linux 2.6.30, the file's last

   access time is always updated if it is more than 1 day old. <<<<<

Also, if you only use relatime, then you don't need to specify it,
it's the default since 2.6.30 as mentioned above.



Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox:
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249

On 23.02.16 18:05, Alexander Fougner wrote:

2016-02-23 17:55 GMT+01:00 Nazar Mokrynskyi :

What is wrong with noatime,relatime? I'm using them for a long time as
good compromise in terms of performance.

The one option ends up canceling the other, as they're both atime
related
options that say do different things.

I'd have to actually setup a test or do some research to be sure which
one overrides the other (but someone here probably can say without
further research), tho I'd /guess/ the latter one overrides the earlier
one, which would effectively make them both pretty much useless, since
relatime is the normal kernel default and thus doesn't need to be
specified.

Noatime is strongly recommended for btrfs, however, particularly with
snapshots, as otherwise, the changes between snapshots can consist
mostly
of generally useless atime changes.

(FWIW, after over a decade of using noatime here (I first used it on the
then new reiserfs, after finding a recommendation for it on that), I got
tired of specifying the option on nearly all my fstab entries, and now
days carry a local kernel patch that changes the default to noatime,
allowing me to drop specifying it everywhere.  I don't claim to be a
coder, let alone a kernel level coder, but as a gentooer used to
building
from source for over a decade, I've found that I can often find the code
behind some behavior I'd like to tweak, and given good enough comments,
I
can often create trivial patches to accomplish that tweak, even if it's
not exactly the code a real C coder would choose to use, which is
exactly
what I've done here.  So now, unless some other atime option is
specified, my filesystems are all mounted noatime.  =:^)

Well, then I'll leave relatime on root fs and noatime on partition with
snapshots, thanks.

If you snapshot the root filesystem then the atime changes will still
be there, and you'll be having a lot of unnecessary changes between
each snapshot.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





smime.p7s
Description: Кріптографічний підпис S/MIME


Re: Major HDD performance degradation on btrfs receive

2016-02-23 Thread Nazar Mokrynskyi
But why? I have relatime option, it should not cause changes unless file 
contents is actually changed if I understand this option correctly.


Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249

On 23.02.16 18:05, Alexander Fougner wrote:

2016-02-23 17:55 GMT+01:00 Nazar Mokrynskyi :

What is wrong with noatime,relatime? I'm using them for a long time as
good compromise in terms of performance.

The one option ends up canceling the other, as they're both atime related
options that say do different things.

I'd have to actually setup a test or do some research to be sure which
one overrides the other (but someone here probably can say without
further research), tho I'd /guess/ the latter one overrides the earlier
one, which would effectively make them both pretty much useless, since
relatime is the normal kernel default and thus doesn't need to be
specified.

Noatime is strongly recommended for btrfs, however, particularly with
snapshots, as otherwise, the changes between snapshots can consist mostly
of generally useless atime changes.

(FWIW, after over a decade of using noatime here (I first used it on the
then new reiserfs, after finding a recommendation for it on that), I got
tired of specifying the option on nearly all my fstab entries, and now
days carry a local kernel patch that changes the default to noatime,
allowing me to drop specifying it everywhere.  I don't claim to be a
coder, let alone a kernel level coder, but as a gentooer used to building
from source for over a decade, I've found that I can often find the code
behind some behavior I'd like to tweak, and given good enough comments, I
can often create trivial patches to accomplish that tweak, even if it's
not exactly the code a real C coder would choose to use, which is exactly
what I've done here.  So now, unless some other atime option is
specified, my filesystems are all mounted noatime.  =:^)

Well, then I'll leave relatime on root fs and noatime on partition with
snapshots, thanks.

If you snapshot the root filesystem then the atime changes will still
be there, and you'll be having a lot of unnecessary changes between
each snapshot.




smime.p7s
Description: Кріптографічний підпис S/MIME


Re: Major HDD performance degradation on btrfs receive

2016-02-23 Thread Nazar Mokrynskyi

> What is wrong with noatime,relatime? I'm using them for a long time as
> good compromise in terms of performance.
The one option ends up canceling the other, as they're both atime related
options that say do different things.

I'd have to actually setup a test or do some research to be sure which
one overrides the other (but someone here probably can say without
further research), tho I'd /guess/ the latter one overrides the earlier
one, which would effectively make them both pretty much useless, since
relatime is the normal kernel default and thus doesn't need to be
specified.

Noatime is strongly recommended for btrfs, however, particularly with
snapshots, as otherwise, the changes between snapshots can consist mostly
of generally useless atime changes.

(FWIW, after over a decade of using noatime here (I first used it on the
then new reiserfs, after finding a recommendation for it on that), I got
tired of specifying the option on nearly all my fstab entries, and now
days carry a local kernel patch that changes the default to noatime,
allowing me to drop specifying it everywhere.  I don't claim to be a
coder, let alone a kernel level coder, but as a gentooer used to building
from source for over a decade, I've found that I can often find the code
behind some behavior I'd like to tweak, and given good enough comments, I
can often create trivial patches to accomplish that tweak, even if it's
not exactly the code a real C coder would choose to use, which is exactly
what I've done here.  So now, unless some other atime option is
specified, my filesystems are all mounted noatime.  =:^)
Well, then I'll leave relatime on root fs and noatime on partition with 
snapshots, thanks.


Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249




smime.p7s
Description: Кріптографічний підпис S/MIME


Re: Major HDD performance degradation on btrfs receive

2016-02-22 Thread Nazar Mokrynskyi

On Tue, Feb 16, 2016 at 5:44 AM, Nazar Mokrynskyi  wrote:
> I have 2 SSD with BTRFS filesystem (RAID) on them and several subvolumes.
> Each 15 minutes I'm creating read-only snapshot of subvolumes /root, /home
> and /web inside /backup.
> After this I'm searching for last common subvolume on /backup_hdd, sending
> difference between latest common snapshot and simply latest snapshot to
> /backup_hdd.
> On top of all above there is snapshots rotation, so that /backup contains
> much less snapshots than /backup_hdd.
>
> I'm using this setup for last 7 months or so and this is luckily the longest
> period when I had no problems with BTRFS at all.
> However, last 2+ months btrfs receive command loads HDD so much that I can't
> even get list of directories in it.
> This happens even if diff between snapshots is really small.
> HDD contains 2 filesystems - mentioned BTRFS and ext4 for other files, so I
> can't even play mp3 file from ext4 filesystem while btrfs receive is
> running.
> Since I'm running everything each 15 minutes this is a real headache.
>
> My guess is that performance hit might be caused by filesystem fragmentation
> even though there is more than enough empty space. But I'm not sure how to
> properly check this and can't, obviously, run defragmentation on read-only
> subvolumes.
>
> I'll be thankful for anything that might help to identify and resolve this
> issue.
>
> ~> uname -a
> Linux nazar-pc 4.5.0-rc4-haswell #1 SMP Tue Feb 16 02:09:13 CET 2016 x86_64
> x86_64 x86_64 GNU/Linux
>
> ~> btrfs --version
> btrfs-progs v4.4
>
> ~> sudo btrfs fi show
> Label: none  uuid: 5170aca4-061a-4c6c-ab00-bd7fc8ae6030
> Total devices 2 FS bytes used 71.00GiB
> devid1 size 111.30GiB used 111.30GiB path /dev/sdb2
> devid2 size 111.30GiB used 111.29GiB path /dev/sdc2
>
> Label: 'Backup'  uuid: 40b8240a-a0a2-4034-ae55-f8558c0343a8
> Total devices 1 FS bytes used 252.54GiB
> devid1 size 800.00GiB used 266.08GiB path /dev/sda1
>
> ~> sudo btrfs fi df /
> Data, RAID0: total=214.56GiB, used=69.10GiB
> System, RAID1: total=8.00MiB, used=16.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, RAID1: total=4.00GiB, used=1.87GiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> ~> sudo btrfs fi df /backup_hdd
> Data, single: total=245.01GiB, used=243.61GiB
> System, DUP: total=32.00MiB, used=48.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, DUP: total=10.50GiB, used=8.93GiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> Relevant mount options:
> UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/ btrfs
> compress=lzo,noatime,relatime,ssd,subvol=/root0 1
> UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/home btrfs
> compress=lzo,noatime,relatime,ssd,subvol=/home 01
> UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/backup btrfs
> compress=lzo,noatime,relatime,ssd,subvol=/backup 01
> UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/web btrfs
> compress=lzo,noatime,relatime,ssd,subvol=/web 01
> UUID=40b8240a-a0a2-4034-ae55-f8558c0343a8/backup_hdd btrfs
> compress=lzo,noatime,relatime,noexec 01
As already indicated by Duncan, the amount of snapshots might be just
too much. The fragmentation on the HDD might have become very high. If
there is limited amount of RAM in the system (so limited caching), too
much time is lost in seeks. In addition:

  compress=lzo
this also increases the chance of scattering fragments and fragmentation.

  noatime,relatime
I am not sure why you have this. Hopefully you have the actual mount
listed as   noatime

You could use the principles of the tool/package called  snapper  to
do a sort of non-linear snapshot thinning: further back in time you
will have a much higher granularity of snapshot over a certain
timeframe.

You could use skinny metadata (recreate the fs with newer tools or use
btrfstune -x on /dev/sda1). I think at the moment this flag is not
enabled on /dev/sda1

If you put just 1 btrfs fs on the hdd (so move all the content from
the ext4 fs in the the btrfs fs) you might get better overall
performance. I assume the ext4 fs is on the second (slower part) of
the HDD and that is a disadvantage I think.
But you probably have reasons for why the setup is like it is.
I've replied to Duncan's message about number of snapshots, there is 
snapshots rotation and number of snapshots it is quite small, 491 in total.


About memory - 16 GiB of RAM should be enough I guess:) Can I measure 
somehow if seeking is a problem?


What is wrong with noatime,relatime? I'm using them for a long time as 
good compromise in terms of pe

Re: Major HDD performance degradation on btrfs receive

2016-02-22 Thread Nazar Mokrynskyi
x27;ll have 2-3 days worth of hourly snapshots on LABEL=backup, so upto
72 hourly snapshots per subvolume.  If on the 8th day you thin down to
six-hourly, 4/day, cutting out 2/3, you'll have five days of 12/day/
subvolume, 60 snapshots per subvolume, plus the 72, 132 snapshots per
subvolume total, to 8 days out so you can recover over a week's worth at
at least 2-hourly, if needed.

If then on the 32 day (giving you a month's worth of at least 4X/day),
you cut every other one, giving you twice a day snapshots, that's 24 days
of 2X/day or 48 snapshots per subvolume, plus the 132 from before, 180
snapshots per subvolume total, now.

If then on the 92 day (giving you two more months of 2X/day, a quarter's
worth of at least 2X/day) you again thin every other one, to one per day,
you have 60 days @ 2X/day or 120 snapshots per subvolume, plus the 180 we
had already, 300 snapshots per subvolume, now.

OK, so we're already over our target 250/subvolume, so we could thin a
bit more drastically.  However, we're only snapshotting three subvolumes,
so we can afford a bit of lenience on the per-subvolume cap as that's
assuming 4-8 snapshotted subvolumes, and we're still well under our total
filesystem snapshot cap.

If then you keep another quarter's worth of daily snapshots, out to 183
days, that's 91 days of daily snapshots, 91 per subvolume, on top of the
300 we had, so now 391 snapshots per subvolume.

If you then thin to weekly snapshots, cutting 6/7, and keep them around
another 27 weeks (just over half a year, thus over a year total), that's
27 more snapshots per subvolume, plus the 391 we had, 418 snapshots per
subvolume total.

418 snapshots per subvolume total, starting at 3-4X per hour to /backup
and hourly to LABEL=Backup, thinning down gradually to weekly after six
months and keeping that for the rest of the year.  Given that you're
snapshotting three subvolumes, that's 1254 snapshots total, still well
within the 1000-2000 total snapshots per filesystem target cap.

During that year, if the data is worth it, you should have done an offsite
or at least offline backup, we'll say quarterly.  After that, keeping the
local online backup around is merely for convenience, and with quarterly
backups, after a year you have multiple copies and can simply delete the
year-old snapshots, one a week, probably at the same time you thin down
the six-month-old daily snapshots to weekly.

Compare that just over 1200 snapshots to the 60K+ snapshots you may have
now, knowing that scaling over 10K snapshots is an issue particularly on
spinning rust, and you should be able to appreciate the difference it's
likely to make. =:^)

But at the same time, in practice it'll probably be much easier to
actually retrieve something from a snapshot a few months old, because you
won't have tens of thousands of effectively useless snapshots to sort
thru as you will be regularly thinning them down! =:^)

> ~> uname [-r]
> 4.5.0-rc4-haswell
>
> ~> btrfs --version
> btrfs-progs v4.4

You're staying current with your btrfs versions.  Kudos on that! =:^)

And on including btrfs fi show and btrfs fi df, as they were useful, tho
I'm snipping them here.

One more tip.  Btrfs quotas are known to have scaling issues as well.  If
you're using them, they'll exacerbate the problem.  And while I'm not
sure about current 4.4 status, thru 4.3 at least, they were buggy and not
reliable anyway.  So the recommendation is to leave quotas off on btrfs,
and use some other more mature filesystem where they're known to work
reliably if you really need them.

--
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman
First of all, sorry for delay, for whatever reason was not subscribed to 
mailing list.


You are right, RAID is on 2 SSDs and backup_hdd (LABEL=Backup) is 
separate really HDD.


Example was simplified to give an overview to not dig too deep into 
details. I actually have correct backups rotation, so we are not talking 
about thousands of snapshots:)
Here is tool I've created and using right now: 
https://github.com/nazar-pc/just-backup-btrfs
I'm keeping all snapshots for last day, up to 90 for last month and up 
to 48 throughout the year.

So as result there are:
* 166 snapshots in /backup_hdd/root
* 166 snapshots in /backup_hdd/home
* 159 snapshots in /backup_hdd/web

I'm not using quotas, there is nothing on this BTRFS partition besides 
mentioned snapshots.


--
Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249




smime.p7s
Description: Кріптографічний підпис S/MIME


Major HDD performance degradation on btrfs receive

2016-02-15 Thread Nazar Mokrynskyi
I have 2 SSD with BTRFS filesystem (RAID) on them and several 
subvolumes. Each 15 minutes I'm creating read-only snapshot of 
subvolumes /root, /home and /web inside /backup.
After this I'm searching for last common subvolume on /backup_hdd, 
sending difference between latest common snapshot and simply latest 
snapshot to /backup_hdd.
On top of all above there is snapshots rotation, so that /backup 
contains much less snapshots than /backup_hdd.


I'm using this setup for last 7 months or so and this is luckily the 
longest period when I had no problems with BTRFS at all.
However, last 2+ months btrfs receive command loads HDD so much that I 
can't even get list of directories in it.

This happens even if diff between snapshots is really small.
HDD contains 2 filesystems - mentioned BTRFS and ext4 for other files, 
so I can't even play mp3 file from ext4 filesystem while btrfs receive 
is running.

Since I'm running everything each 15 minutes this is a real headache.

My guess is that performance hit might be caused by filesystem 
fragmentation even though there is more than enough empty space. But I'm 
not sure how to properly check this and can't, obviously, run 
defragmentation on read-only subvolumes.


I'll be thankful for anything that might help to identify and resolve 
this issue.


~> uname -a
Linux nazar-pc 4.5.0-rc4-haswell #1 SMP Tue Feb 16 02:09:13 CET 2016 
x86_64 x86_64 x86_64 GNU/Linux


~> btrfs --version
btrfs-progs v4.4

~> sudo btrfs fi show
Label: none  uuid: 5170aca4-061a-4c6c-ab00-bd7fc8ae6030
Total devices 2 FS bytes used 71.00GiB
devid1 size 111.30GiB used 111.30GiB path /dev/sdb2
devid2 size 111.30GiB used 111.29GiB path /dev/sdc2

Label: 'Backup'  uuid: 40b8240a-a0a2-4034-ae55-f8558c0343a8
Total devices 1 FS bytes used 252.54GiB
devid1 size 800.00GiB used 266.08GiB path /dev/sda1

~> sudo btrfs fi df /
Data, RAID0: total=214.56GiB, used=69.10GiB
System, RAID1: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=4.00GiB, used=1.87GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

~> sudo btrfs fi df /backup_hdd
Data, single: total=245.01GiB, used=243.61GiB
System, DUP: total=32.00MiB, used=48.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=10.50GiB, used=8.93GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

Relevant mount options:
UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/ btrfs
compress=lzo,noatime,relatime,ssd,subvol=/root0 1
UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/home btrfs
compress=lzo,noatime,relatime,ssd,subvol=/home 01
UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/backup btrfs
compress=lzo,noatime,relatime,ssd,subvol=/backup 01
UUID=5170aca4-061a-4c6c-ab00-bd7fc8ae6030/web btrfs
compress=lzo,noatime,relatime,ssd,subvol=/web 01
UUID=40b8240a-a0a2-4034-ae55-f8558c0343a8/backup_hdd btrfs    
compress=lzo,noatime,relatime,noexec 01


--
Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249




smime.p7s
Description: Кріптографічний підпис S/MIME


Re: Btrfs wiki account

2015-08-30 Thread Nazar Mokrynskyi

Thanks Dave, I was finally approved, but with "Nazar Mokrynskyi2" username)

Any chance to update username to Nazar Mokrynskyi (without "2" at the end)?
I've already changed Real name.
Tried to reply on admin's email, but it doesn't accept emails actually, 
so I have to ask here again.


Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249

On 29.08.15 11:24, David Sterba wrote:

On Sat, Aug 29, 2015 at 03:21:53AM +0200, Nazar Mokrynskyi wrote:

I wanted to add one more tool for incremental backups to wiki, but
accidentally had typo in email at registration.
Now, more than one month after I still can't register, though
registration request should expire already.
Does anyone have access to fix that? Can't find any contacts of person
who supports wiki.

Forwarded your request to mighty wiky admin.





smime.p7s
Description: Кріптографічний підпис S/MIME


Btrfs wiki account

2015-08-28 Thread Nazar Mokrynskyi
I wanted to add one more tool for incremental backups to wiki, but 
accidentally had typo in email at registration.
Now, more than one month after I still can't register, though 
registration request should expire already.
Does anyone have access to fix that? Can't find any contacts of person 
who supports wiki.
Accounts can be under names "Nazar Mokrynskyi" and "Nazar Mokrynskyi2" 
(yes, second trial).

Sorry for a bit off-topic message.

--
Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249




smime.p7s
Description: Кріптографічний підпис S/MIME


Btrfs wiki account

2015-08-28 Thread Nazar Mokrynskyi
I wanted to add one more tool for incremental backups to wiki, but 
accidentally had typo in email at registration.
Now, more than one month after I still can't register, though 
registration request should expire already.
Does anyone have access to fix that? Can't find any contacts of person 
who supports wiki.
Accounts can be under names "Nazar Mokrynskyi" and "Nazar Mokrynskyi2" 
(yes, second trial).

Sorry for a bit off-topic message.

--
Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249




smime.p7s
Description: Кріптографічний підпис S/MIME