from:"Rich Freeman"

Re: unable to handle kernel paging request - btrfs

2016-10-10 Thread Rich Freeman

Here is another trace, similar to the original issue, but I have a bit
more detail on this one and it is available as text which if nothing
else is more convenient so I'll go ahead and paste this.  I don't
intend to keep pasting these unless I get something that looks
different.

I only posted the initial BUG.

Oct 10 05:11:15 hab nc[1250]: ip_tables ext4 crc16 mbcache jbd2 radeon
nxt200x cx88_dvb cx88_vp3054_i2c videobuf2_dvb dvb_core tuner_simple
tuner_types tuner cx8800 cx8802 videobuf2_dma_sg videobuf2_memops
videobuf2_v4l2 cx88_alsa cx88xx mousedev fbcon videobuf2_core bitblit
dm_region_hash dm_log dm_mod
Oct 10 05:11:15 hab nc[1250]: [81346.935203] CPU: 3 PID: 29648 Comm:
kworker/u16:3 Not tainted 4.4.24 #1
Oct 10 05:11:15 hab nc[1250]: [81346.935317] Hardware name: Gigabyte
Technology Co., Ltd. GA-880GM-UD2H/GA-880GM-UD2H, BIOS F8 10/11/2010
Oct 10 05:11:15 hab nc[1250]: [81346.935544] Workqueue: btrfs-endio
btrfs_endio_helper [btrfs]
Oct 10 05:11:15 hab nc[1250]: [81346.935657] task: 880415acae00
ti: 88019a584000 task.ti: 88019a584000
Oct 10 05:11:15 hab nc[1250]: [81346.935783] RIP:
0010:[]  [] __memcpy+0x12/0x20
Oct 10 05:11:15 hab nc[1250]: [81346.935930] RSP:
0018:88019a587c68  EFLAGS: 00010246
Oct 10 05:11:15 hab nc[1250]: [81346.936023] RAX: c90002ecfff8
RBX: 1000 RCX: 01ff
Oct 10 05:11:15 hab nc[1250]: [81346.936142] RDX: 
RSI: 88008c950008 RDI: c90002ed
Oct 10 05:11:15 hab nc[1250]: [81346.936262] RBP: 88019a587d30
R08: 41545345 R09: c90002ece000
Oct 10 05:11:15 hab nc[1250]: [81346.936382] R10: e8cc09e0
R11: 1000 R12: 88008c95
Oct 10 05:11:15 hab nc[1250]: [81346.936502] R13: 4154534d
R14:  R15: 8802b25b2798
Oct 10 05:11:15 hab nc[1250]: [81346.936623] FS:
7fe90a15d780() GS:880427cc()
knlGS:
Oct 10 05:11:15 hab nc[1250]: [81346.936756] CS:  0010 DS:  ES:
 CR0: 8005003b
Oct 10 05:11:16 hab nc[1250]: [81346.937182]  8800102c5720
0004 41545345 4154334d
Oct 10 05:11:16 hab nc[1250]: [81346.937347]  c90002ece000
1000 0002 003a
Oct 10 05:11:16 hab nc[1250]: [81346.937515] Call Trace:
Oct 10 05:11:16 hab nc[1250]: [81346.937621]  [] ?
lzo_decompress_biovec+0x1d1/0x2c0 [btrfs]
Oct 10 05:11:16 hab nc[1250]: [81346.944148]  []
end_compressed_bio_read+0x20c/0x2c0 [btrfs]
Oct 10 05:11:16 hab nc[1250]: [81346.950610]  [] ?
resched_curr+0x60/0xc0
Oct 10 05:11:16 hab nc[1250]: [81346.957055]  []
bio_endio+0x3a/0x70
Oct 10 05:11:16 hab nc[1250]: [81346.963516]  []
end_workqueue_fn+0x37/0x40 [btrfs]
Oct 10 05:11:16 hab nc[1250]: [81346.970009]  []
normal_work_helper+0xae/0x2d0 [btrfs]
Oct 10 05:11:16 hab nc[1250]: [81346.976532]  []
btrfs_endio_helper+0xd/0x10 [btrfs]
Oct 10 05:11:16 hab nc[1250]: [81346.983010]  []
process_one_work+0x148/0x400
Oct 10 05:11:16 hab nc[1250]: [81346.989509]  []
worker_thread+0x46/0x430
Oct 10 05:11:16 hab nc[1250]: [81346.996013]  [] ?
rescuer_thread+0x2d0/0x2d0
Oct 10 05:11:16 hab nc[1250]: [81347.034423] Code: ff ff 48 8b 43 60
48 2b 43 50 88 43 4e 5b 5d f3 c3 90 90 90 90 90 90 90 90 00 48 89 f8
48 89 d1 48 c1 e9 03 83 e2 07  48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00
00 48 89 f8 48 89 d1 f3
Oct 10 05:11:16 hab nc[1250]: [81347.041852] RIP  []
__memcpy+0x12/0x20
Oct 10 05:11:16 hab nc[1250]: [81347.048565]  RSP 
Oct 10 05:11:16 hab nc[1250]: [81347.055218] CR2: c90002ed
Oct 10 05:11:16 hab nc[1250]: [81347.104741] ---[ end trace
9a43c0b6d874fe31 ]---
Oct 10 05:11:16 hab nc[1250]: [81347.104752] BUG: unable to handle
kernel paging request at c90002c4a000
Oct 10 05:11:16 hab nc[1250]: [81347.104761] IP: []
__memcpy+0x12/0x20
Oct 10 05:11:16 hab nc[1250]: [81347.104767] PGD 417427067 PUD
417488067 PMD 410881067 PTE 0
Oct 10 05:11:16 hab nc[1250]: [81347.104771] Oops: 0002 [#2] SMP
Oct 10 05:11:16 hab nc[1250]: [81347.104825] Modules linked in:
netconsole configfs tun ipt_MASQUERADE nf_nat_masquerade_ipv4
xt_conntrack veth iptable_mangle iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter
ip_tables ext4 crc16 mbcache jbd2 radeon nxt200x cx88_dvb
cx88_vp3054_i2c videobuf2_dvb dvb_core tuner_simple tuner_types tuner
cx8800 cx8802 videobuf2_dma_sg videobuf2_memops videobuf2_v4l2
cx88_alsa cx88xx mousedev fbcon videobuf2_core bitblit softcursor
tveeprom font tileblit drm_kms_helper kvm_amd rc_core kvm v4l2_common
cfbfillrect syscopyarea videodev cfbimgblt sysfillrect
snd_hda_codec_realtek snd_hda_codec_generic irqbypass i2c_algo_bit
sysimgblt fb_sys_fops snd_hda_intel k10temp cfbcopyarea ttm
snd_hda_codec snd_hwdep i2c_piix4 snd_hda_core drm hid_logitech_hidpp
snd_pcm r8169[81347.104954] CR2: c90002c4a000 CR3:
cb2fb000 CR4: 06e0
Oct 10 05:11:16 hab nc[1250]: [81347.104955] Stack:
Oct 10 05:11:16 hab nc[1250]: [81347.104960]  a02ef741

Re: unable to handle kernel paging request - btrfs

2016-10-08 Thread Rich Freeman

I'm not sure if this is related to the same issue or not, but I just
started getting a new BUG, followed by a panic.  (I'm also enabled
network console capture so that you won't have to squint at photos.)

Original BUG is:


[14740.444257] [ cut here ]
[14740.444293] kernel BUG at /usr/src/linux-stable/fs/btrfs/volumes.c:5509!
[14740.444323] invalid opcode:  [#1] SMP
[14740.444348] Modules linked in: nfsd auth_rpcgss oid_registry lockd
grace sunrpc it87 hwmon_vid netconsole configfs tun ipt_MASQUERADE
nf_nat_masquerade_ipv4 xt_conntrack veth iptable_mangle iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_n
 at nf_conntrack iptable_filter ip_tables ext4 crc16 mbcache
jbd2 radeon nxt200x cx88_dvb cx88_vp3054_i2c videobuf2_dvb
dvb_coretuner_simple tuner_types tuner fbcon bitblit softcursor font
tileblit drm_kms_helper kvm_amd kvm cfbfillrect syscopyarea cfbimgblt
sysfillrect sysimgblt mousedev fb_sys_fops cfbcopyarea cx88_alsa ttm
cx8802 drm cx8800 videobuf2_dma_sg videobuf2_memops videobuf2_v4l2
cx88xx snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel
videobuf2_core snd_hda_codec tveeprom rc_core irqbypass v4l2_common
videodev k10temp i2c_algo_bit
[14740.444799]  snd_hwdep i2c_piix4 snd_hda_core hid_logitech_hidpp
snd_pcm r8169 8250 snd_timer snd mii 8250_base backlight serial_core
soundcore evdev sch_fq_codel hid_logitech_dj hid_generic usbhid btrfs
firewire_ohci atkbd ata_generic pata_acpi firew
   ire_core crc_itu_t xor zlib_deflate ohci_pci pata_atiixp
raid6_pq ehci_pci ohci_hcd ehci_hcd usbcore usb_common dm_mirror
dm_region_hash dm_log dm_mod
[14740.445028] CPU: 1 PID: 3213 Comm: kworker/u16:2 Not tainted 4.4.24 #1
[14740.445056] Hardware name: Gigabyte Technology Co., Ltd.
GA-880GM-UD2H/GA-880GM-UD2H, BIOS F8 10/11/2010
[14740.445116] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
[14740.445143] task: 8803ff527300 ti: 8803e3c8c000 task.ti:
8803e3c8c000
[14740.445173] RIP: 0010:[]  []
__btrfs_map_block+0xdfd/0x1140 [btrfs]
[14740.445226] RSP: 0018:8803e3c8faa0  EFLAGS: 00010282
[14740.445248] RAX: cdf2f040 RBX: 0002 RCX: 0002
[14740.445277] RDX:  RSI: 21b27000 RDI: 8800cab4fb40
[14740.445306] RBP: 8803e3c8fb88 R08: 050743c0 R09: cdf2f040
[14740.445334] R10: 0001 R11: 1e4d R12: cdf2f03f
[14740.445363] R13: 9000 R14: 8803e3c8fbd0 R15: 0001
[14740.445391] FS:  7f9e2befc7c0() GS:880427c4()
knlGS:
[14740.445423] CS:  0010 DS:  ES:  CR0: 8005003b
[14740.445446] CR2: 7fc533bf7000 CR3: 0003e29e4000 CR4: 06e0
[14740.445474] Stack:
[14740.445484]  8803e3c8fab0 81084577 8112acf0
02011200
[14740.445526]  880410cacc60 880410cacc90 1e4e
8803ff527300
[14740.445565]   1e4e 880414e68ee8

[14740.445603] Call Trace:
[14740.445618]  [] ? __enqueue_entity+0x67/0x70
[14740.445644]  [] ? mempool_alloc_slab+0x10/0x20
[14740.445680]  [] btrfs_map_bio+0x71/0x320 [btrfs]
[14740.445707]  [] ? kmem_cache_alloc+0x190/0x1f0
[14740.445742]  [] ? btrfs_bio_wq_end_io+0x2e/0x80 [btrfs]
[14740.445780]  []
btrfs_submit_compressed_read+0x451/0x4a0 [btrfs]
[14740.445821]  [] btrfs_submit_bio_hook+0x1a0/0x1b0 [btrfs]
[14740.445860]  [] ? btrfs_io_bio_alloc+0x10/0x30 [btrfs]
[14740.445900]  [] ? btrfs_create_repair_bio+0xc3/0xe0 [btrfs]
[14740.445940]  [] end_bio_extent_readpage+0x44f/0x510 [btrfs]
[14740.445981]  [] ? btrfs_create_repair_bio+0xe0/0xe0 [btrfs]
[14740.446011]  [] bio_endio+0x3a/0x70
[14740.446042]  [] end_workqueue_fn+0x37/0x40 [btrfs]
[14740.446080]  [] normal_work_helper+0xae/0x2d0 [btrfs]
[14740.446118]  [] btrfs_endio_helper+0xd/0x10 [btrfs]
[14740.446145]  [] process_one_work+0x148/0x400
[14740.446170]  [] worker_thread+0x46/0x430
[14740.446193]  [] ? rescuer_thread+0x2d0/0x2d0
[14740.446217]  [] ? rescuer_thread+0x2d0/0x2d0
[14740.446241]  [] kthread+0xc4/0xe0
[14740.446262]  [] ? kthread_park+0x50/0x50
[14740.446286]  [] ret_from_fork+0x3f/0x70
[14740.446309]  [] ? kthread_park+0x50/0x50
[14740.446332] Code: 60 ff ff ff 48 63 d3 48 2b 4d c0 48 0f af c1 48
39 c2 48 0f 46 c2 48 89 45 90 89 d9 c7 85 70 ff ff ff 00 0
  0 00 00 e9 f9 f3 ff ff <0f> 0b bb f4 ff ff ff e9 c7
fa ff ff be 6a 16 00 00 48 c7 c7 18
[14740.446672] RIP  [] __btrfs_map_block+0xdfd/0x1140 [btrfs]
[14740.446714]  RSP 
[14740.456756] ---[ end trace e349a675c6512569 ]---
[14740.456832] BUG: unable to handle kernel paging request at ffd8
[14740.456869] IP: [] kthread_data+0xb/0x20
[14740.456896] PGD 1a0a067 PUD 1a0c067 PMD 0
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: unable to handle kernel paging request - btrfs

2016-10-07 Thread Rich Freeman

On Fri, Sep 30, 2016 at 8:38 PM, Jeff Mahoney <je...@suse.com> wrote:
> On 9/30/16 5:07 PM, Rich Freeman wrote:
>> On Fri, Sep 30, 2016 at 4:55 PM, Jeff Mahoney <je...@suse.com> wrote:
>>> This looks like a use-after-free on one of the pages used for
>>> compression.  Can you post the output of objdump -Dr
>>> /lib/modules/$(uname -r)/kernel/fs/btrfs/btrfs.ko somewhere?
>>>
>>
>> Sure:
>> https://drive.google.com/open?id=0BwUDImviY_gcR3JfT0Z1cUlRVEk
>>
>> I was impressed by just how large it was.
>>
>> I take it you're going to try to use the offsets in the oops to figure
>> out where it went wrong?  I really need to get kernel core dumping
>> working on this box...
>
> Yep.  What I think is happening is that we have workspace getting freed
> while it's in use.  The faulting address is in vmalloc space and it's
> also the first argument to memcpy, which makes it the destination.  In
> lzo_decompress_biovec, that means it's the workspace->cbuf.  Beyond that
> I'll have to dig a bit more.
>

I'll confess to not being much of a kernel hacker, but could this
error also be caused by a buffer overrun?  If working_bytes or
in_page_bytes_left are larger than the size of the buffer then the
memcpy would overrun the length of the buffer.  I don't know if that
generates a different error than the one reported.

What guarantee do we have that working_bytes is less than the size of
workspace->cbuf?  I'm just throwing stuff out there because as far as
I can tell the code never frees workspace (I'm guessing kunmap at the
very end might take care of it).

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: unable to handle kernel paging request - btrfs

2016-09-30 Thread Rich Freeman

On Fri, Sep 30, 2016 at 4:55 PM, Jeff Mahoney  wrote:
> This looks like a use-after-free on one of the pages used for
> compression.  Can you post the output of objdump -Dr
> /lib/modules/$(uname -r)/kernel/fs/btrfs/btrfs.ko somewhere?
>

Sure:
https://drive.google.com/open?id=0BwUDImviY_gcR3JfT0Z1cUlRVEk

I was impressed by just how large it was.

I take it you're going to try to use the offsets in the oops to figure
out where it went wrong?  I really need to get kernel core dumping
working on this box...

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: unable to handle kernel paging request - btrfs

2016-09-30 Thread Rich Freeman

On Thu, Sep 22, 2016 at 1:41 PM, Jeff Mahoney <je...@suse.com> wrote:
> On 9/22/16 8:18 AM, Rich Freeman wrote:
>> I have been getting panics consistently after doing a btrfs replace
>> operation on a raid1 and rebooting.  I linked a photo of the panic; I
>> haven't been able to get a text capture of it.
>>
>> https://ibin.co/2vx0HhDeViu3.jpg
>>
>> I'm getting this error on the latest 4.4, 4.1, and even on an old
>> 3.18.26 kernel I had lying around.
>>
>> I tried the remove root_log_ctx from ctx list before btrfs_sync_log
>> returns patch on 4.1 and that did not solve my problem either.
>>
>> I'm able to boot into single-user mode and if I don't start any
>> processes the system seems fairly stable.  I am also able to start a
>> btrfs balance and run that for several hours without issue.  If I
>> start launching services the system will tend to panic, though how
>> many processes I can launch will vary.  I don't think that it is a
>> particular file being accessed that is triggering the issue since the
>> point where it fails varies.  I suspect it may be load-related.
>>
>> Mounting with compress=no doesn't seem to help either.  Granted, I see
>> lzo_decompress in the backtrace and that is probably a read operation.
>>
>> Any suggestions?  Google hasn't been helpful on this one...
>
> Can you boot with panic_on_oops=1, reproduce it, and capture that Oops?
> The trace in your photo is a secondary Oops (tainted D), which means
> that something else went wrong before that and now the system is
> tripping over it.  Secondary Oopses don't really help the debugging
> process because the system was already in a broken, undefined, state.
>

Ok, the system has been up for a week without issue, but just paniced
and rebooted right towards the end of a balance (it literally had
about 30 of 2500 chunks left).

After it came up (and waiting for it to fully mount as there were a
bunch of free space warnings/etc) I managed to capture an initial oops
when it happened again:

https://ibin.co/2wt0n2IaCOA3.jpg

This is on a system without swap, though my understanding is that the
paging system is used for other things.

Note that I've updated my kernel since my last post.  When it paniced
during the balance it was running 4.4.21, and on the oops I actually
captured it was on 4.4.23 (I was actually just waiting for the balance
to finish before rebooting with a new kernel).

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs and containers

2016-03-09 Thread Rich Freeman

On Wed, Mar 9, 2016 at 4:45 PM, Marc MERLIN  wrote:
> On Wed, Mar 09, 2016 at 02:21:26PM -0700, Chris Murphy wrote:
>> > I have a very stripped down docker image that actually mounts portion of
>> > of my root filesystem read only.
>> > While it's running out of a btrfs filesystem, you can't run btrfs
>> > commands against it:
>> > 05233e5c91f0:/# btrfs fi show
>> > 05233e5c91f0:/# btrfs subvol list /
>> > ERROR: can't perform the search - Operation not permitted
>> > 05233e5c91f0:/# btrfs subvol list .
>> > ERROR: can't perform the search - Operation not permitted
>> >
>> > I didn't do anything special, it's just working that way.
>>
>> Yep, you're not using --privileged in which case you can't list
>> things. But I'm not sure what the equivalent is off hand with
>> systemd-nspawn containers, I think those may always be privileged?
>
> Ok, cool. I just used docker out of the box, glad to know it errs on
> the secure side by default.
> (and I don't have systemd, so that may also help me there)
>

I'm sure the default capability list for systemd-nspawn and docker is
different.  I know that you can tune nspawn to give the container
whatever capabilities you want it to.  In general though a general
warning is that linux containers are still not quite 100% secure when
root is running inside.  Obviously the fewer capabilities you give
them the better, but the level of isolation isn't quite to VM levels.
It is better than chroot levels, however.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid

2016-03-06 Thread Rich Freeman

On Sun, Mar 6, 2016 at 4:07 PM, Chris Murphy <li...@colorremedies.com> wrote:
> On Sun, Mar 6, 2016 at 5:01 AM, Rich Freeman <ri...@gentoo.org> wrote:
>
>> I think it depends on how you define "old."  I think that 3.18.28
>> would be fine as it is a supported longterm.
>
> For raid56? I disagree. There were substantial raid56 code changes in
> 3.19 that were not backported to 3.18.

Of course.  I was referring to raid1.  I wouldn't run raid56 without
an expectation of occasionally losing everything on any version of
linux.  :)  If I were just testing it or I could tolerate losing
everything occasionally I'd probably track the current stable, if not
mainline, depending on my goals.

-- 
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs raid

2016-03-06 Thread Rich Freeman

On Tue, Mar 1, 2016 at 11:27 AM, Hugo Mills  wrote:
>
>Definitely don't use parity RAID on 3.19. It's not really something
> I'd trust, personally, even on 4.4, except for testing purposes.

++ - raid 5/6 are fairly unstable at this point.  Raid 1 should be just fine.

>TBH, I wouldn't really want to be running something as old as 3.19
> either. The actual problems of running older kernels are, IME,
> considerably worse than the perceived problems of upgrading.

I think it depends on how you define "old."  I think that 3.18.28
would be fine as it is a supported longterm.  I've just upgraded to
the 4.1 series which I plan to track until a new longterm has been out
for a few months and things lok quiet.

3.19 is very problematic though, as it is no longer supported.  I'd
sooner "downgrade" to 3.18.28 (which likely has more btrfs backports
unless your distro handles them).  Or, upgrade to 4.1.19.

If you are using highly experimental features like raid5 support on
btrfs then bleeding-edge is probably better, but I've found I've had
the fewest issues sticking with the previous longterm.  I've been
bitten by a few btrfs regressions over the years and I think 3.19 was
actually around the time I got hit by one of them.  Since I've
switched to just staying on a longterm once it hits the x.x.15 version
or so I've found things to be much more reliable.

-- 
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Deadlock after upgrade to 4.1

2015-12-29 Thread Rich Freeman

On Fri, Dec 25, 2015 at 11:34 PM, Chris Murphy  wrote:
> I would then also try to reproduce with 4.2.8 or 4.3.3 because those
> have ~ 25% backports than made it to 4.1.15, so there's an off chance
> it's fixed there.

I take it that those backports are in the queue though?  I was
actually thinking about updating to 4.1 over the holidays but this
thread is making me think that btrfs isn't quite ready in 4.1 yet.
3.18.25 is about the best experience with btrfs I've had so far, and I
guess I don't really have any reason to update until raid5 is stable
(which seems a long way off).

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs autodefrag?

2015-10-18 Thread Rich Freeman

On Sat, Oct 17, 2015 at 12:36 PM, Xavier Gnata  wrote:
> 2) Disabling copy-on-write for just the VM image directory.

Unless this has changed, doing this will also disable checksumming.  I
don't see any reason why it has to, but it does.  So, I avoid using
this at all costs.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: State of Dedup / Defrag

2015-10-15 Thread Rich Freeman

On Wed, Oct 14, 2015 at 10:47 PM, Zygo Blaxell
 wrote:
>
> I wouldn't describe dedup+defrag as unsafe.  More like insane.  You won't
> lose any data, but running both will waste a lot of time and power.
> Either one is OK without the other, or applied to non-overlapping sets
> of files, but they are operations with opposite results.

That is probably why I disabled it then.  I now recall past discussion
that defragging a file wasn't snapshot-aware, though I thought that
was fixed.

Obviously there is always a tradeoff since from a dedup perspective
you're best off arranging extents so that you're sharing as much as
possible, and from a defrag standpoint you want to just have each file
have a single extent even if two files differ by a single byte.

I've pretty much stopped running VMs on btrfs and I've adjusted my
journal settings to something more sane so the defrag isn't nearly as
important these days.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RAID6 stable enough for production?

2015-10-15 Thread Rich Freeman

On Wed, Oct 14, 2015 at 9:47 PM, Chris Murphy  wrote:
>
> For that matter, now that GlusterFS has checksums and snapshots...

Interesting - I haven't kept up with that.  Does it actually do
end-to-end checksums?  That is, compute the checksum at the time of
storage, store the checksum in the metadata somehow, and ensure the
checksum matches when data is retrieved?

I forget whether it was glusterfs or ceph I was looking at, but some
of those distributed filesystems will only checksum data while in
transit, but not while it is at rest.  So, if a server claims it has a
copy of the file, then it is assumed to be a good copy and you never
realize that even though you have 5 copies of that file distributed
around the server you ended up using differs from the other 4.

I'm also not sure if it supports an n+1/2 model like raid5/6, or if it
is just a 2*n model like raid1.  If I want to store 5TB of data with
redundancy, I'd prefer to not need 10TB worth of drives to do it,
regardless of how many systems they're spread across.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RAID6 stable enough for production?

2015-10-14 Thread Rich Freeman

On Wed, Oct 14, 2015 at 4:53 PM, Donald Pearson
 wrote:
>
> Personally I would still recommend zfs on illumos in production,
> because it's nearly unshakeable and the creative things you can do to
> deal with problems are pretty remarkable.  The unfortunate reality is
> though that over time your system will probably grow and expand and
> zfs is very locked in to the original configuration.  Adding vdevs is
> a poor solution IMO.
>

This is the main thing that has kept me away from zfs - you can't
modify a vdev, like you can with an md array or btrfs.  I don't think
zfs makes use of all your space if you have mixed disk sizes in a
raid-z either - it works like mdadm.  I'm not sure whether btrfs will
be any better in that regard (if I have 2x3TB and 3x1TB drives in a
RAID5 I should get 6TB of usable space, not 4TB, without messing with
partitioning).

So, I am running raid1 btrfs in the hope that I'll be able to move to
something more efficient in the future.

However, I would not personally be using raid5/6 for anything but pure
experimentation on btrfs anytime soon.  I don't even trust the 4.1
kernel series for btrfs at all just yet, and you're not going to be
running older than that for raid5/6.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: State of Dedup / Defrag

2015-10-14 Thread Rich Freeman

On Wed, Oct 14, 2015 at 1:09 AM, Zygo Blaxell
 wrote:
>
> I wouldn't try to use dedup on a kernel older than v4.1 because of these
> fixes in 4.1 and later:

I would assume that these would be ported to the other longterm
kernels like 3.18 at some point?

> Do dedup a photo or video file collection.  Don't dedup
> a live database server on a filesystem with compression enabled...yet.

LIkewise.  Typically I just dedup the entire filesystem, so it sounds
like we're not quite there yet.  Would it make sense to put this on
the wiki in the gotchas section?

> Using dedup and defrag at the same time is still a bad idea.  The features
> work against each other

You mentioned quite a bit about autodefrag.  I was thinking more in
terms of using explicit defrag, as was done by dedup in the past.  It
looks like duperemove doesn't actually do this, perhaps because it is
also considered unsafe these days.

Thanks, I was just trying to get a sense for where this was at.  It
sounds like we're getting to the point where it could be used in
general, but for now it is probably best to run it manually on stuff
that isn't too busy.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

State of Dedup / Defrag

2015-10-13 Thread Rich Freeman

What is the current state of Dedup and Defrag in btrfs?  I seem to
recall there having been problems a few months ago and I've stopped
using it, but I haven't seen much news since.

I'm interested both in the 3.18 and subsequent kernel series.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS as image store for KVM?

2015-10-05 Thread Rich Freeman

On Mon, Oct 5, 2015 at 7:16 AM, Lionel Bouton
 wrote:
> According to the bad performance -> unstable logic, md would then be the
> less stable RAID1 implementation which doesn't make sense to me.
>

The argument wasn't that bad performance meant that something was unstable.

The argument was that a lack of significant performance optimization
meant that the developers considered it unstable and not worth
investing time on optimizing.

So, the question isn't whether btrfs is or isn't faster than something
else.  the question is whether it is or isn't faster than it could be
if it were properly optimized.  That is, how does btrfs perform today
against btrfs from 20 years from now, which obviously cannot be
benchmarked today.

That said, I'm not really convinced that the developers haven't fixed
this because they feel that it would need to be redone later after
major refactoring.  I think it is more likely that there are just very
few developers working on btrfs and load-balancing on raid just
doesn't rank high on their list of interests or possibly expertise.
If any are being paid to work on btrfs then most likely their
employers don't care too much about it either.

I did find the phoronix results interesting though.  The whole driver
for "layer-violation" is that with knowledge of the filesystem you can
better optimize what you do/don't read and write, and that may be
showing here.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS as image store for KVM?

2015-10-04 Thread Rich Freeman

On Sun, Oct 4, 2015 at 8:03 AM, Lionel Bouton
 wrote:
>
> This focus on single reader RAID1 performance surprises me.
>
> 1/ AFAIK the kernel md RAID1 code behaves the same (last time I checked
> you need 2 processes to read from 2 devices at once) and I've never seen
> anyone arguing that the current md code is unstable.

Perhaps, but with btrfs it wouldn't be hard to get 1000 processes
reading from a raid1 in btrfs and have every single request directed
to the same disk with the other disk remaining completely idle.  I
believe the algorithm is just whether the pid is even or odd, and
doesn't take into account disk activity at all, let alone disk
performance or anything more sophisticated than that.

I'm sure md does a better job than that.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: fstrim silently does nothing on dev add/dev rem'd filesystem

2015-09-27 Thread Rich Freeman

On Sun, Sep 27, 2015 at 10:45 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> But I think part of reasoning behind the relatively low priority this
> issue has received is that it's a low visibility issue not really
> affecting most people running btrfs, either because they're not running
> on ssd or because they simply don't have a particularly high awareness of
> what trim does and thus about how it's failing to work here and what that
> means to them.  If we get a rash of people posting on-list that it's
> affecting them, that relative priority is likely to go up, and with it
> the patch testing and integration schedule for the affected patches.

I've never actually seen fstrim do anything on btrfs (0 bytes
trimmed).  I stopped using it a few months ago when the news came out
about all the issues with its implementation, and I believe my drive
is still blacklisted anyway.

It really should be fixed, but right now that goes all around - if
btrfs fixed it tomorrow I'd still be stuck until somebody figures out
how to reliably do it on a Samsung 850.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Latest kernel to use?

2015-09-25 Thread Rich Freeman

On Fri, Sep 25, 2015 at 9:25 AM, Bostjan Skufca  wrote:
>
> Similar here: I am sticking with 3.19.2 which has proven to work fine for me

I'd recommend still tracking SOME stable series.  I'm sure there were
fixes in 3.19 for btrfs (to say nothing of other subsystems) that
you're missing with that version.  3.19 is also unsupported at this
time.  You might want to consider moving to either 3.18.21 or 4.1.8
and tracking those series instead.  I doubt you'd give up much moving
back to 3.18 and there have been a bunch of btrfs fixes in that series
(though it seems to me that 3.18 has been slower to receive btrfs
patches than some of the other series).

I'm on the fence right now about making the move to 4.1.  Maybe in a
few releases I'll be there, depending on what the noise on the lists
sounds like.

There was a time when you were better off on bleeding-edge linux for
btrfs.  If you REALLY want to run btrfs raid5 or something like that
then I'd say that is still your best strategy.  However, if you stick
with features that have been around for a year the longterm kernels
seem a lot less likely to hit you with a regression, as long as you
don't switch to a new one the day it is declared as such.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS as image store for KVM?

2015-09-25 Thread Rich Freeman

On Sat, Sep 19, 2015 at 9:26 PM, Jim Salter  wrote:
>
> ZFS, by contrast, works like absolute gangbusters for KVM image storage.

I'd be interested in what allows ZFS to handle KVM image storage well,
and whether this could be implemented in btrfs.  I'd think that the
fragmentation issues would potentially apply to any COW filesystem,
and if ZFS has a solution for this then it would probably benefit
btrfs to implement the same solution, and not just for VM images.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Latest kernel to use?

2015-09-25 Thread Rich Freeman

On Fri, Sep 25, 2015 at 7:20 AM, Austin S Hemmelgarn
 wrote:
> On 2015-09-24 17:07, Sjoerd wrote:
>>
>> Maybe a silly question for most of you, but the wiki states to always try
>> to
>> use the latest kernel with btrfs. Which one would be best:
>> - 4.2.1 (currently latest stable and matches the btrfs-progs versioning)
>> or
>> - the 4.3.x (mainline)?
>>
>> Stable sounds more stable to me(hence the name ;) ), but the mainline
>> kernel
>> seems to be in more active development?
>>
> Like Hugo said, 4.2.1 is what you want right now.  In general, go with the
> highest version number that isn't a -rc version (4.3 isn't actually released
> yet, IIRC they're up to 4.3-rc2 right now, and almost at -rc3) (we should
> probably be specific like this on the wiki).
>

I'll just say that my btrfs stability has gone WAY up when I stopped
following this advice and instead followed a recent longterm.  Right
now I'm following 3.18.  There were some really bad corruption issues
in 3.17/18/19 that burned me, and today while considering moving up to
4.1 I'm still seeing a lot of threads about issues during balance/etc.
I still run into the odd issue with 3.18, but not nearly to the degree
that I used to.

Now, I would stick with a recent longterm.  The older longterms go
back to a time when btrfs was far more experimental.  Even 3.16
probably has a lot of issues that are fixed in 3.18.

That said, if you do run into an issue on a longterm kernel nobody
around here is likely to be able to help you much unless you can
reproduce it on the most recent stable kernel.

Just tossing that out as an alternative opinion.  Right now I'm
sticking with 3.18, but I'm interested in making the 4.1 switch once
issues with that seem to have died down.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: FYIO: A rant about btrfs

2015-09-16 Thread Rich Freeman

On Wed, Sep 16, 2015 at 12:45 PM, Martin Tippmann
 wrote:
> From reading the list I understand that btrfs is still very much work
> in progress and performance is not a top priority at this stage but I
> don't see why it shouldn't perform at least equally good as ZFS/F2FS
> on the same workloads. Is looking at performance problems on the
> development roadmap?

My sense is that sufferings in comparison to ZFS just represent a lack
of maturity - there just hasn't been as much focus on performance.
I'm not aware of any fundamental design issues which are likely to
make btrfs perform worse than ZFS in the long-term.

F2FS is a fundamentally different beast.  It is a log-based filesystem
as far as I'm aware, and on flash that gives it some substantial
advantages, but it doesn't support snapshotting/etc as far as I'm
aware.  I'm sure that in the long term some operations are just going
to be faster on F2FS no matter what just due to its design, and other
operations will always be slower on F2FS.

To draw an analogy, imagine you have a 1TB ext4 filesystem and a 1TB
btrfs filesystem.  On each you create a 900GB file, and then proceed
to make millions of internal writes all over it.  The ext4 filesystem
is just going to completely outperform btrfs at this job, and I
suspect it would outperform zfs as well.  For such a use case you
don't really even need a filesystem - you might as well just be
reading/writing random blocks right off the disk, and ext4 is pretty
close to that in behavior when it comes to internal file
modifications.  The COW filesystems are going to be fragmenting the
living daylights out of the file and its metadata.  Of course, if you
pulled the plug in the middle of one of those operations the COW
filesystems are more likely to end up in a sane state if you care
about the order of file modifications, and if you're doing this on
RAID both zfs and btrfs will be immune to any write hole issues.
Also, if you go making reflink copies of large files on a btrfs
filesystem it will perform MUCH better than doing the equivalent on
ext4 (which requires copying all the data, at a cost of both time and
space).

In the end you have to look at your application, and not just
performance stats.  There are tradeoffs.  Personally, I've had enough
hard drive failures that btrfs is worth it to me just for the
assurance that when something goes wrong the filesystem knows what is
good and what isn't.  As drives get bigger this becomes more and more
important.

--
Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: raid1 on uneven-sized disks

2015-08-09 Thread Rich Freeman

On Sun, Aug 9, 2015 at 8:47 AM, Hugo Mills h...@carfax.org.uk wrote:
 On Sun, Aug 09, 2015 at 02:29:53PM +0200, Jim MacBaine wrote:
 Hi,

 How does btrfs handle raid1 on a bunch of uneven sized disks? Can I
 just keep adding arbitrarily sized disks to an existing raid1 and
 expect the file system to continue to keep two copies of everything,
 so I could survive the loss of any single disk without data loss? Does
 btrfs work this way?

Yes, exactly.

You may find that http://carfax.org.uk/btrfs-usage/ is helpful.


The key is that btrfs manages raid at the chunk level, not the
device level.  When btrfs needs more disk space it allocates a new
chunk from unallocated space on a device.  If it is in raid1 mode it
will allocate a pair of chunks from two different drives, storing the
same data in each.  The allocation algorithm is reasonably smart so if
you have 2x1TB drives and 1x3TB drive you'll end up with about 2TB of
data stored and not 1TB on each of the two 1TB drives and an empty
unusable 3TB drive.

This is also why you can switch between raid modes on the fly -
switching modes only affects newly-allocated chunks, and the old ones
operate in whatever mode they were previously in.  A balance operation
rewrites the existing data to new chunks which would force everything
to use the new mode.

This also lets you do things like add a disk to a raid5.  If you have
5 disks and add one more, existing chunks will be striped across 5
drives, and new chunks will be striped across 6, unless you balance
them.

That may be a bit oversimplified, and obviously others on the list
know all the details...

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: systemd : Timed out waiting for defice dev-disk-by…

2015-07-30 Thread Rich Freeman

On Mon, Jul 27, 2015 at 1:20 AM, Duncan 1i5t5.dun...@cox.net wrote:
 Philip Seeger posted on Sun, 26 Jul 2015 22:39:04 +0200 as excerpted:

 Hi,

 50% of the time when booting, the system go in safe mode because my 12x
 4TB RAID10 btrfs is taking too long to mount from fstab.

 This won't help, but I've seen this exact behavior too (some time ago).
 Except that it wasn't 50% that it didn't work, more like almost
 everytime.
 Commenting out the fstab entry fixed it, mounting using a cronjob
 (@reboot) worked without a problem.

 (As far as I remember, options like x-systemd.device-timeout didn't
 change anything.)

 If someone has the answer, I'd be interested too.

 You mean something like a custom systemd *.service unit file?  That's
 what I'd do here. =:^)

I'd have to play with it to work out the kinks, but I'm pretty sure
you'd be better off with a mount unit instead of basically reinventing
a mount unit using a service unit.

I'd also think that you could also use drop-ins to enhance the
auto-generated units created by the fstab generator, if you just
wanted to add a dependency or such to a mount unit.  However, I've
never tried to create a drop-in for a generated unit.

Mount units should take any setting in systemd.unit which includes all
the ordering/dependency/etc controls.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Please add 9c4f61f01d269815bb7c37be3ede59c5587747c6 to stable

2015-04-23 Thread Rich Freeman

On Mon, Apr 13, 2015 at 12:58 PM, Greg KH gre...@linuxfoundation.org wrote:
 On Mon, Apr 13, 2015 at 07:28:38PM +0500, Roman Mamedov wrote:
 On Thu, 2 Apr 2015 10:17:47 -0400
 Chris Mason c...@fb.com wrote:

  Hi stable friends,
 
  Can you please backport this one to 3.19.y.  It fixes a bug introduced
  by:
 
  381cf6587f8a8a8e981bc0c18859b51dc756, which was tagged for stable
  3.14+
 
  The symptoms of the bug are deadlocks during log reply after a crash.
  The patch wasn't intentionally fixing the deadlock, which is why we
  missed it when tagging fixes.

 Unfortunately still not fixed (no btrfs-related changes) in 3.14.38 and
 3.18.11 released today.

 I have a few hundred stable backports left to sort through, don't worry,
 this is still in the queue, it's not lost.

It looks like this still isn't in 3.18.12, though it looks like it is in 3.19.5.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Upgrade to 3.19.2 Kernel fails to boot

2015-04-01 Thread Rich Freeman

On Wed, Apr 1, 2015 at 2:50 AM, Anand Jain anand.j...@oracle.com wrote:

 Eric found something like this and has a fix with in the email.
 Sub: I think btrfs: fix leak of path in btrfs_find_item broke stable
 trees ...


I don't mind trying this patch if the maintainers recommend it.  I'm
still getting panics every few days and 3.18.10 won't mount my root
filesystem, so I've been running on 3.18.8.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs dedup - available or experimental? Or yet to be?

2015-03-29 Thread Rich Freeman

On Sun, Mar 29, 2015 at 7:43 AM, Kai Krakow hurikha...@gmail.com wrote:

 With the planned performance improvements, I'm guessing the best way will
 become mounting the root subvolume (subvolid 0) and letting duperemove work
 on that as a whole - including crossing all fs boundaries.


Why cross filesystem boundaries by default?  If you scan from the root
subvolume you're guanteed to traverse every file on the filesystem
(which is all that can be deduped) without crossing any filesystem
boundaries.  Even if you have btrfs on non-btrfs on btrfs there must
be some other path that reaches the same files when scanning from
subvolid 0.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs dedup - available or experimental? Or yet to be?

2015-03-26 Thread Rich Freeman

On Thu, Mar 26, 2015 at 8:07 PM, Martin m_bt...@ml1.co.uk wrote:

 Anyone with any comments on how well duperemove performs for TB-sized
 volumes?

Took many hours but less than a day for a few TB - I'm not sure
whether it is smart enough to take less time on subsequent scans like
bedup.


 Does it work across subvolumes? (Presumably not...)

As far as I can tell, yes.  Unless you pass a command-line option it
crosses filesystem boundaries and even scans non-btrfs filesystems
(like /proc, /dev, etc).  Obviously you'll want to avoid that since it
only wastes time and I can just imagine it trying to hash kcore and
such.

Other than being less-than-ideal intelligence-wise, it seemed
effective.  I can live with that in an early release like this.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: snapshot destruction making IO extremely slow

2015-03-25 Thread Rich Freeman

On Wed, Mar 25, 2015 at 6:55 AM, Marc Cousin cousinm...@gmail.com wrote:
 On 25/03/2015 02:19, David Sterba wrote:
 as it reads the pre/post snapshots and deletes them if the diff is
 empty. This adds some IO stress.

 I couldn't find a clear explanation in the documentation. Does it mean
 that when there is absolutely no difference between two snapshots, one
 of them is deleted ? And that snapper does a diff between them to
 determine that ?


It seems like there should be some supported way of doing a diff on
two btrfs subvolumes.  There should be no need to recursively scan
trees if the heads of those trees are shared.  If I change one file at
the bottom of a 10 layer directory hierarchy, it should only take a
small number of reads to determine this.

The problem is that we don't have any functionality in kernel space to
do this (that I'm aware of), and we don't expose the necessary
information to userspace for it to do this smartly (again, as far as
I'm aware).

Maybe there would be some way to do it using btrfs send and parsing the output.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Upgrade to 3.19.2 Kernel fails to boot

2015-03-24 Thread Rich Freeman

On Tue, Mar 24, 2015 at 2:31 AM, Anand Jain anand.j...@oracle.com wrote:
 Do you have this fix ..

  [PATCH] Btrfs: release path before starting transaction in can_nocow_extent

 could you try ?.

I believe I already have this patch.  3.18.9 contains this:

commit bdeeab62a611f1f7cd48fd285ce568e8dcd0455a
Merge: 797afdf 1bda19e
Author: Linus Torvalds torva...@linux-foundation.org
Date:   Fri Oct 18 16:46:21 2013 -0700

Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull btrfs fix from Chris Mason:
 Sage hit a deadlock with ceph on btrfs, and Josef tracked it down to a
  regression in our initial rc1 pull.  When doing nocow writes we were
  sometimes starting a transaction with locks held

* 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: release path before starting transaction in can_nocow_extent
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs dedup - available or experimental? Or yet to be?

2015-03-24 Thread Rich Freeman

On Mon, Mar 23, 2015 at 7:22 PM, Hugo Mills h...@carfax.org.uk wrote:
 On Mon, Mar 23, 2015 at 11:10:46PM +, Martin wrote:
 As titled:


 Does btrfs have dedup (on raid1 multiple disks) that can be enabled?

The current state of play is on the wiki:

 https://btrfs.wiki.kernel.org/index.php/Deduplication


I hadn't realized that bedup was deprecated.

This seems unfortunate since it seemed to be a lot smarter about
detecting what has and hasn't already been scanned, and it also
supported defragmenting files while de-duplicating them.

I'll give duperemove a shot.   I just packaged it on Gentoo.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Upgrade to 3.19.2 Kernel fails to boot

2015-03-23 Thread Rich Freeman

On Mon, Mar 23, 2015 at 4:23 AM, Anand Jain anand.j...@oracle.com wrote:

 Do you still have the problem ? Can you pls confirm on the latest btrfs ?
 Since I am fixing the devices part of the btrfs, I am bit nervous.

I'm having a similar problem.  I'm getting some kind of btrfs
corruption that causes a panic/reboot, and then the initramfs won't
mount root for 3.18.9, but it will mount it for 3.18.8.

Running on 3.18.8 eventually caused the panic to repeat, so I'm not
sure that 3.18.9 is necessarily breaking things - it might just be
fussier about not mounting a dirty fs.

I did run a btrfs check --repair and it ended up moving some chromium
preferences from the user profile folder to lost+found.  That got the
system to run for about 8 hours, but it still paniced the next
morning.  I'm now running on 3.18.7 to see what happens.

Unfortunately I haven't been doing a good job about capturing logs.
I'll try to capture more the next time this happens.  I've been
running fine on 3.18 for a while now, so I'm not sure where all of
this is coming from.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Upgrade to 3.19.2 Kernel fails to boot

2015-03-23 Thread Rich Freeman

On Mon, Mar 23, 2015 at 9:22 AM, Rich Freeman
r-bt...@thefreemanclan.net wrote:

 I'm having a similar problem.  I'm getting some kind of btrfs
 corruption that causes a panic/reboot, and then the initramfs won't
 mount root for 3.18.9, but it will mount it for 3.18.8.

 Running on 3.18.8 eventually caused the panic to repeat, so I'm not
 sure that 3.18.9 is necessarily breaking things - it might just be
 fussier about not mounting a dirty fs.


This continues to happen.  The filesystem won't mount with 3.18.9, but
will mount with 3.18.8.

Here is the dmesg output from dracut on 3.18.9:

[  240.765147] INFO: task mount:395 blocked for more than 120 seconds.
[  240.765224]   Not tainted 3.18.9-gentoo #1
[  240.765274] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[  240.765809] mount   D 880427c51900 11800   395  1 0x0004
[  240.765927]  88040d2f76a8 0082 8804106170f0
00011900
[  240.766181]  88040d2f7fd8 00011900 88041593d6e0
8804106170f0
[  240.766373]  88040d2f76b8 8800cb505c70 8800cb505cf0
8800cb505cd8
[  240.766556] Call Trace:
[  240.766618]  [81504084] schedule+0x24/0x60
[  240.766719]  [a032fe9d] btrfs_tree_lock+0x4d/0x1c0 [btrfs]
[  240.766780]  [810882f0] ? prepare_to_wait_event+0x100/0x100
[  240.766859]  [a02d3859] btrfs_search_slot+0x6e9/0x9f0 [btrfs]
[  240.766939]  [a02d5503] btrfs_insert_empty_items+0x73/0xd0 [btrfs]
[  240.767017]  [a02ce495] ? btrfs_alloc_path+0x15/0x20 [btrfs]
[  240.767118]  [a033012a] btrfs_insert_orphan_item+0x5a/0x80 [btrfs]
[  240.767211]  [a03316c5] insert_orphan_item+0x65/0xa0 [btrfs]
[  240.767301]  [a0336589] replay_one_buffer+0x349/0x360 [btrfs]
[  240.767391]  [a0330ff5] walk_up_log_tree+0xc5/0x220 [btrfs]
[  240.767481]  [a03311eb] walk_log_tree+0x9b/0x1a0 [btrfs]
[  240.767572]  [a0338932] btrfs_recover_log_trees+0x262/0x4d0 [btrfs]
[  240.767662]  [a0336240] ? replay_one_extent+0x780/0x780 [btrfs]
[  240.767749]  [a02f4b9f] open_ctree+0x17ef/0x2100 [btrfs]
[  240.767827]  [a02cb876] btrfs_mount+0x766/0x900 [btrfs]
[  240.767886]  [81175bef] mount_fs+0x3f/0x1b0
[  240.767940]  [811331b0] ? __alloc_percpu+0x10/0x20
[  240.767997]  [8118fc53] vfs_kern_mount+0x63/0x100
[  240.768087]  [a02cb28b] btrfs_mount+0x17b/0x900 [btrfs]
[  240.768146]  [81132e8a] ? pcpu_alloc+0x35a/0x660
[  240.768201]  [81175bef] mount_fs+0x3f/0x1b0
[  240.768255]  [811331b0] ? __alloc_percpu+0x10/0x20
[  240.768311]  [8118fc53] vfs_kern_mount+0x63/0x100
[  240.768365]  [8119289c] do_mount+0x20c/0xaf0
[  240.768420]  [81118eb9] ? __get_free_pages+0x9/0x40
[  240.768474]  [81192555] ? copy_mount_options+0x35/0x150
[  240.768528]  [81193497] SyS_mount+0x97/0xf0
[  240.768582]  [81507ad2] system_call_fastpath+0x12/0x17
[  240.768638] INFO: task btrfs-transacti:435 blocked for more than 120 seconds.
[  240.768693]   Not tainted 3.18.9-gentoo #1
[  240.768742] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[  240.768811] btrfs-transacti D 880427c11900 12424   435  2 0x
[  240.768928]  8800cfab7dc8 0046 880410f01a10
00011900
[  240.769119]  8800cfab7fd8 00011900 81a16460
880410f01a10
[  240.769302]  8800cfab7dd8 88040c7ab000 8800cb554000
8800cb5301a0
[  240.769485] Call Trace:
[  240.769540]  [81504084] schedule+0x24/0x60
[  240.769625]  [a02f73e5]
btrfs_commit_transaction+0x275/0xa40 [btrfs]
[  240.769698]  [810882f0] ? prepare_to_wait_event+0x100/0x100
[  240.769784]  [a02f305d] transaction_kthread+0x1ad/0x240 [btrfs]
[  240.769870]  [a02f2eb0] ?
btrfs_cleanup_transaction+0x530/0x530 [btrfs]
[  240.769942]  [8106aa04] kthread+0xc4/0xe0
[  240.769997]  [8106a940] ? kthread_create_on_node+0x190/0x190
[  240.770064]  [81507a2c] ret_from_fork+0x7c/0xb0
[  240.770119]  [8106a940] ? kthread_create_on_node+0x190/0x190
[  360.832426] INFO: task mount:395 blocked for more than 120 seconds.
[  360.832488]   Not tainted 3.18.9-gentoo #1
[  360.832539] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[  360.832609] mount   D 880427c51900 11800   395  1 0x0004
[  360.832727]  88040d2f76a8 0082 8804106170f0
00011900
[  360.832911]  88040d2f7fd8 00011900 88041593d6e0
8804106170f0
[  360.833093]  88040d2f76b8 8800cb505c70 8800cb505cf0
8800cb505cd8
[  360.833276] Call Trace:
[  360.833385]  [81504084] schedule+0x24/0x60
[  360.833495]  [a032fe9d] btrfs_tree_lock+0x4d/0x1c0 [btrfs]
[  360.833555]  [810882f0] ? prepare_to_wait_event+0x100/0x100
[  360.833634

btrfs raid5 with mixed disks

2015-02-09 Thread Rich Freeman

How does btrfs raid5 handle mixed-size disks?  The docs weren't
terribly clear on this.

Suppose I have 4x3TB and 1x1TB disks.  Using conventional lvm+mdadm in
raid5 mode I'd expect to be able to fit about 10TB of space on those
(2TB striped across 4 disks plus 1TB striped across 5 disks after
partitioning).  How much would btrfs be able to store in the same
configuration?  I did see something about being able to use fixed-size
stripes, and I'm not sure if this helps.  If it does, are there any
penalties, especially with future expansion of the array?

With raid1 mode btrfrs is reasonably smart about mixed disk sizes, and
you usually end up with half of the total space available.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: scrub implies failing drive - smartctl blissfully unaware

2014-11-25 Thread Rich Freeman

On Tue, Nov 25, 2014 at 6:13 PM, Chris Murphy li...@colorremedies.com wrote:
 A few years ago companies including Western Digital started shipping
 large cheap drives, think of the green drives. These had very high
 TLER (Time Limited Error Recovery) settings, a.k.a. SCT ERC. Later
 they completely took out the ability to configure this error recovery
 timing so you only get the upward of 2 minutes to actually get a read
 error reported by the drive.

Why sell an $80 hard drive when you can change a few bytes in the
firmware and sell a crippled $80 drive and an otherwise-identical
non-crippled $130 drive?

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: filesystem corruption

2014-10-30 Thread Rich Freeman

On Thu, Oct 30, 2014 at 9:02 PM, Tobias Holst to...@tobby.eu wrote:
 Addition:
 I found some posts here about a general file system corruption in 3.17
 and 3.17.1 - is this the cause?
 Additionally I am using ro-snapshots - maybe this is the cause, too?

 Anyway: Can I fix that or do I have to reinstall? Haven't touched the
 filesystem, just did a scrub (found 0 errors).


Yup - ro-snapshots is a big problem in 3.17.  You can probably recover now by:
1.  Update your kernel to 3.17.2 - that takes care of all the big
known 3.16/17 issues in general.
2.  Run btrfs check using btrfs-tools 3.17.  That can clean up the
broken snapshots in your filesystem.

That is fairly likely to get your filesystem working normally again.
It worked for me.  I was getting some balance issues when trying to
add another device and I'm not sure if 3.17.2 totally fixed that - I
ended up cancelling the balance and it will be a while before I have
to balance this particular filesystem again, so I'll just hold off and
hope things stabilize.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS balance segfault, where to go from here

2014-10-28 Thread Rich Freeman

On Tue, Oct 28, 2014 at 9:12 AM, E V eliven...@gmail.com wrote:
 I've seen dead locks on 3.16.3. Personally, I'm staying with 3.14
 until something newer stabilizes, haven't had any issues with it. You
 might want to try the latest 3.14, though I think there should be a
 new one pretty soon with quite a few btrfs patches.

Yeah, I forget what drove me to switch to a newer kernel, but I'm
wishing I had stuck with 3.14.  The last set of stable kernels has
been a pretty rough ride.  :)

My sense browsing the list is that the activity level has picked up a
bit, and that might be why 3.15-17 have been a bit more bug-ridden
than is normal.  For the long-term it is actually a good sign for the
vitality of btrfs.

But, I'll probably track 3.17 until a new longterm is announced and be
a bit more conservative.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS balance segfault, where to go from here

2014-10-28 Thread Rich Freeman

On Tue, Oct 28, 2014 at 9:33 AM, Duncan 1i5t5.dun...@cox.net wrote:
 Since it's not an option here I've not looked into it too closely
 personally, and don't know if it'll fit your needs, but if it does, it
 may well be simpler to substitute it into the existing backup setup
 without rewriting the WHOLE thing, than to do that full rewrite from
 scratch, without the btrfs/zfs features.  I'd at least look into it,
 assuming you haven't already.

I haven't researched zfs as thoroughly as btrfs and I'm not running
it, but you're certainly right that it is more mature (though I would
not say that zfs on linux is as mature as zfs on BSD or especially
Solaris).

Keep in mind that ZFS is marketed more towards enterprise workloads.
It isn't quite a dynamic as btrfs is intended to be, though in truth
many of those btrfs features like reshaping a raid5 aren't implemented
yet.  My sense is that you're going to need to plan ahead a bit more
with ZFS and making changes without doing a full backup/re-create is
going to be harder.  It also isn't designed for SSD (though it does
have features for SSD caching of the write log and I think also
read-caching, which is something that does not yet exist for btrfs).

From what I understand of both I'd say that btrfs actually has the
better overall design, but zfs just has a LOT more maturity.  I think
that btrfs will eventually overtake it, but just when that will happen
is anybody's guess, and it certainly isn't there today.

The one thing that zfs does have going for you is that you're very
unlikely to get BUGs and PANICs anytime you do something as simple as
running rsync on it.

I will also note that I rsync data off of my btrfs filesystem all the
time without issue.  I do not have experience with using rsync to
write TO a btrfs filesystem.  Right now I don't trust btrfs send
enough to rely on it - the whole purpose of using rsync right now is
to backup my btrfs data to an ext4 partition which lets me sleep well
at night while still getting to play around with btrfs and make use of
features like snapshots/etc.  :)

If I was running a large (ie measured in 10s of disks) storage system
I'd probably go with ZFS now.  In such a setup being limited to RAID6s
of maybe 7 drives each and having to add/remove drives 7 at a time
wouldn't be a big deal.  When you're running a system with 6 disks
total that is a much bigger limitation.  If you look at something like
Backblade's storage pods that is the perfect example of the kind of
situation ZFS was designed to handle.  On the other hand, btrfs aims
to eventually address that while being a decent default filesystem for
your smartphone.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs balance segfault, kernel BUG at fs/btrfs/extent-tree.c:7727

2014-10-25 Thread Rich Freeman

On Mon, Oct 13, 2014 at 11:12 AM, Rich Freeman
r-bt...@thefreemanclan.net wrote:
 On Thu, Oct 9, 2014 at 10:19 AM, Petr Janecek jane...@ucw.cz wrote:

   I have trouble finishing btrfs balance on five disk raid10 fs.
 I added a disk to 4x3TB raid10 fs and run btrfs balance start
 /mnt/b3, which segfaulted after few hours, probably because of the BUG
 below. btrfs check does not find any errors, both before the balance
 and after reboot (the fs becomes un-umountable).

 [22744.238559] WARNING: CPU: 0 PID: 4211 at fs/btrfs/extent-tree.c:876 
 btrfs_lookup_extent_info+0x292/0x30a [btrfs]()

 [22744.532378] kernel BUG at fs/btrfs/extent-tree.c:7727!

 I am running into something similar. I just added a 3TB drive to my
 raid1 btrfs and started a balance.  The balance segfaulted, and I find
 this in dmesg:

I got another one of these crashes during a balance today, and this is
on 3.17.1 with the Btrfs: race free update of commit root for ro
snapshots patch.  So, there is something else in 3.17.1 that causes
this problem.  I did see mention of an extent error of some kind on
the lists and I don't have that patch - I believe it is planned for
3.17.2.

After the crash the filesystem became read-only.

I didn't have any way to easily capture the logs, but I got repeated
crashes when trying to re-mount the filesystem after rebooting.  The
dmesg log showed read errors from one of the devices (bdev /dev/sdb2
errs: wr 0, rd 1361, flush 0, corrupt 0, gen 0).  When I tried to
btrfs check the filesystem with btrfs-progs 3.17 it abruptly
terminated and output an error mentioning could not find extent items
followed by root and a really large number.

I finally managed to recover by mounting the device with skip_balance
- I suspect that it was crashing due to attempts to restart the
failing balance.  Then after letting the filesystem settle down I
unmounted it cleanly and rebooted and everything was back to normal.

However, i'm still getting bdev /dev/sdb2 errs: wr 0, rd 1361, flush
0, corrupt 0, gen 0 in my dmesg logs.  I have tried scrubbing the
device with no errors found.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: device balance times

2014-10-24 Thread Rich Freeman

On Thu, Oct 23, 2014 at 10:35 PM, Zygo Blaxell
ce3g8...@umail.furryterror.org wrote:

 - single profile: we can tolerate zero missing disks,
 so we don't allow rw mounts even if degraded.


That seems like the wrong logic here.  By all means mount read-only by
default for safety, but there should be a way to force a read-write
mount on any filesystem, precisely because the RAID modes can be mixed
and even if you lose two devices on a RAID1 system not ALL the data is
lost if you have more than two drives.

By all means return an error when reading a file that is completely
missing.  By all means have an extra fsck mode that goes ahead and
deletes all the missing files (assuming it has metadata) or perhaps
moves them all to a new lost+notfound subvolume or something.

Indeed, if the lost device just happens to not actually contain any
data you might be lucky and not lose any data at all when losing a
single device in a filesystem that entirely uses the single profile.
That would be a bit of an edge case though, but one that is
automatically handled if you give the admin the ability to force
read-write/etc.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: device balance times

2014-10-24 Thread Rich Freeman

On Fri, Oct 24, 2014 at 12:07 PM, Zygo Blaxell
ce3g8...@umail.furryterror.org wrote:

 We could also leave this as an option to the user mount -o
 degraded-and-I-want-to-lose-my-data, but in my opinion the use
 case is very, very exceptional.

Well, it is only exceptional if you never shut down during a
conversion to raid1 as far as I understand it.  :)


 IMHO the use case is common any time restoring the entire filesystem
 from backups is inconvenient.  That covers a *lot* of users.  I never
 have a machine with more than 50% of its raw disk space devoted to btrfs
 because I need raw space on the disk to do mkfs+rsync from the broken
 read-only btrfs filesystems.

The problem is that if you want btrfs raid1 and you ALSO want to have
an extra set of spares for copying your entire RAID1 to something
else, you're talking about a lot of extra disk space.  I really don't
want to maintain a SAN just in case I have a btrfs problem.  :)

I realize things are still somewhat experimental now, but we need to
at least think about how things will work long-term.  Copying all your
data to another filesystem and re-creating the btrfs filesystem isn't
really a good recovery mode.

Restoring from backups is also becoming increasingly difficult.  IO
bandwidth just has not kept pace with disk capacity.  It can take the
better part of a day to copy a multi-TB array, and if you need to copy
it two ways you have to double the time, not to mention having
multiple TB of disks lying around.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Poll: time to switch skinny-metadata on by default?

2014-10-21 Thread Rich Freeman

On Tue, Oct 21, 2014 at 5:29 AM, Duncan 1i5t5.dun...@cox.net wrote:
 David Sterba posted on Mon, 20 Oct 2014 18:34:03 +0200 as excerpted:

 On Thu, Oct 16, 2014 at 01:33:37PM +0200, David Sterba wrote:
 I'd like to make it default with the 3.17 release of btrfs-progs.
 Please let me know if you have objections.

 For the record, 3.17 will not change the defaults. The timing of the
 poll was very bad to get enough feedback before the release. Let's keep
 it open for now.

 FWIW my own results agree with yours, I've had no problem with skinny-
 metadata here, and it has been my default now for a couple backup-and-new-
 mkfs.btrfs generations, now.


How does one enable it for an existing filesystem?  Is it safe to just
run btrfstune -x?  Can this be done on a mounted filesystem?  Are
there any risks with converting?

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: unexplainable corruptions 3.17.0

2014-10-20 Thread Rich Freeman

On Mon, Oct 20, 2014 at 10:04 AM, Zygo Blaxell zblax...@furryterror.org wrote:
 On Fri, Oct 17, 2014 at 08:17:37AM +, Hugo Mills wrote:  On Fri, Oct 17, 
 2014 at 10:10:09AM +0200, Tomasz Torcz wrote:
  On Fri, Oct 17, 2014 at 04:02:03PM +0800, Liu Bo wrote:
  Recently I've observed some corruptions to systemd's journal
files which are somewhat puzzling. This is especially worrying
as this is btrfs raid1 setup and I expected auto-healing.
   
  System details: 3.17.0-301.fc21.x86_64
btrfs: raid1 over 2x dm-crypted 6TB HDDs.
mount opts: rw,relatime,seclabel,compress=lzo,space_cache
  Reads with cat, hexdump fails with:
read(4, 0x1001000, 65536)   = -1 EIO (Input/output error)
   
   Does scrub work for you?
 
As there seem to be no way to scrub individual files, I've started
  scrub of full volume.  It will take some hours to finish.
 
Meanwhile, could you satisfy my curiosity what would scrub do that
  wouldn't be done by just reading the whole file?

It checks both copies. Reading the file will only read one of the
 copies of any given block (so if that's good and the other copy is
 bad, it won't fix anything).

 Really?  One of my earliest btrfs tests was to run a loop of 'sha1sum
 -c' on a gigabyte or two of files in one window while I used dd to
 write random data in random locations directly to one of the filesystem
 mirror partitions in the other.  I did this test *specifically* to
 watch the automatic checksumming and self-healing features of btrfs
 in action.  A complete 'sha1sum' verification of the filesystem contents
 passed even though the kernel log was showing checksum errors scrolling
 by faster than I could read, which strongly implies that read() normally
 does check both mirrors before returning EIO.

I think you misread the earlier post.  It sounds like the algorithm is:
1.  Receive request to read block from file.
2.  Determine which mirrored block to read it from (it sounds like
this is sub-optimal today, presumably you'd want to use the least busy
disk or disk with the head closest to the right cylinder to do it).
3.  Read the block.  Verify the checksum.  If it matches return the data.
4.  If not find another mirrored block to read it from if one exists.
Verify the checksum.  If it matches return the data and update all
other mirrored copies with it.
5.  Repeat step 4 until you run out of mirrored copies.  If so, return an error.

So, doing random reads will NOT be equivalent to scrubbing the disks,
because with a scrub you want to check that ALL copies are code, and
the algorithm above only determines that any copy is good.

When you used dd to overwrite blocks, you didn't get errors because
when the first copy failed the filesystem just read the second copy as
intended.  That isn't a scrub - it is a recovery.

An actual scrub isn't file-focused, but device focused.  It starts
reading at the start of the device, and verifies each logical unit of
data sequentially.  This can be done asynchronously since btrfs stores
checksums, as opposed to a traditional RAID where the reads need to be
synchronous since the validity of a mirror/stripe can only be
ascertained by comparing it to all the other devices in that
mirror/stripe (and then unless you're using something like RAID6+ you
couldn't determine which copy is bad without a checksum).  In theory
I'd expect a scrub with btrfs to be less detrimental to performance as
a result - a read request could halt the scrub on one device without
delaying the scrub on the other devices.  Writes in RAID1 mode
necessarily disrupt two devices, but others would not be impacted.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: unexplainable corruptions 3.17.0

2014-10-17 Thread Rich Freeman

On Fri, Oct 17, 2014 at 8:53 AM, Chris Mason c...@fb.com wrote:
 This sounds like the problem fixed with some patches to our extent mapping
 code  that went in with the merge window.  I've cherry picked a few for
 stable and I'm running them through tests now.  They are in my stable-3.17
 branch, and I'll send to Greg once Linus grabs the revert for the last one.

Just for clarity - when can we expect to see these in the kernel?  I
wasn't sure which merge windows you're referring to.  I take it that
3.17.1 is still unpatched (for this and the readonly snapshot issue -
which requires reverting 9c3b306e1c9e6be4be09e99a8fe2227d1005effc).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Random file system corruption in 3.17 (not BTRFS related...?)

2014-10-15 Thread Rich Freeman

On Wed, Oct 15, 2014 at 10:30 AM, Josef Bacik jba...@fb.com wrote:
 We've found it, the Fedora guys are reverting the bad patch now, we'll get
 the fix sent back to stable shortly.  Sorry about that.

After reverting this commit, can the bad snapshots be
deleted/repaired/etc without wiping and restoring the entire
filesystem?  Copying 2.3TB of data isn't a particularly fast
operation...

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: what is the best way to monitor raid1 drive failures?

2014-10-14 Thread Rich Freeman

On Tue, Oct 14, 2014 at 10:48 AM, Suman C schakr...@gmail.com wrote:

 The new drive shows up as sdb. btrfs fi show still prints drive missing.

 mounted the filesystem with ro,degraded

 tried adding the new sdb drive which results in the following error.
 (-f because the new drive has a fs from past)

 # btrfs device add -f /dev/sdb /mnt2/raid1pool
 /dev/sdb is mounted

 Unless I am missing something, this looks like a bug.


You need to first run btrfs device delete missing /mnt2/raid1pool I
believe (missing is a keyword for a missing device in the array - if
the device were still present you could specify it by /dev/sdX).

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: What is the vision for btrfs fs repair?

2014-10-13 Thread Rich Freeman

On Sun, Oct 12, 2014 at 6:14 AM, Martin Steigerwald mar...@lichtvoll.de wrote:
 Am Freitag, 10. Oktober 2014, 10:37:44 schrieb Chris Murphy:
 On Oct 10, 2014, at 6:53 AM, Bob Marley bobmar...@shiftmail.org wrote:
  On 10/10/2014 03:58, Chris Murphy wrote:
  * mount -o recovery
 
Enable autorecovery attempts if a bad tree root is found at mount
time.
 
  I'm confused why it's not the default yet. Maybe it's continuing to
  evolve at a pace that suggests something could sneak in that makes
  things worse? It is almost an oxymoron in that I'm manually enabling an
  autorecovery
 
  If true, maybe the closest indication we'd get of btrfs stablity is the
  default enabling of autorecovery.
  No way!
  I wouldn't want a default like that.
 
  If you think at distributed transactions: suppose a sync was issued on
  both sides of a distributed transaction, then power was lost on one side,
  than btrfs had corruption. When I remount it, definitely the worst thing
  that can happen is that it auto-rolls-back to a previous known-good
  state.
 For a general purpose file system, losing 30 seconds (or less) of
 questionably committed data, likely corrupt, is a file system that won't
 mount without user intervention, which requires a secret decoder ring to
 get it to mount at all. And may require the use of specialized tools to
 retrieve that data in any case.

 The fail safe behavior is to treat the known good tree root as the default
 tree root, and bypass the bad tree root if it cannot be repaired, so that
 the volume can be mounted with default mount options (i.e. the ones in
 fstab). Otherwise it's a filesystem that isn't well suited for general
 purpose use as rootfs let alone for boot.

 To understand this a bit better:

 What can be the reasons a recent tree gets corrupted?

 I always thought with a controller and device and driver combination that
 honors fsync with BTRFS it would either be the new state of the last known
 good state *anyway*. So where does the need to rollback arise from?


In theory the recover option should never be necessary.  Btrfs makes
all the guarantees everybody wants it to - when the data is fsynced
then it will never be lost.

The question is what should happen when a corrupted tree root, which
should never happen, happens anyway.  The options are to refuse to
mount the filesystem by default, or mount it by default discarding
about 30-60s worth of writes.  And yes, when this situation happens
(whether it mounts by default or not) btrfs has broken its promise of
data being written after a successful fsync return.

As has been pointed out, braindead drive firmware is the most likely
cause of this sort of issue.  However, there are a number of other
hardware and software errors that could cause it, including errors in
linux outside of btrfs, and of course bugs in btrfs as well.

In an ideal world no filesystem would need any kind of recovery/repair
tools.  They can often mean that the fsync promise was broken.  The
real question is, once that has happened, how do you move on?

I think the best default is to auto-recover, but to have better
facilities for reporting errors to the user.  Right now btrfs is very
quiet about failures - maybe a cryptic message in dmesg, and nobody
reads all of that unless they're looking for something.  If btrfs
could report significant issues that might mitigate the impact of an
auto-recovery.

Also, another thing to consider during recovery is whether the damaged
data could be optionally stored in a snapshot of some kind - maybe in
the way that ext3/4 rollback data after conversion gets stored in a
snapshot.  My knowledge of the underlying structures is weak, but I'd
think that a corrupted tree root practically is a snapshot already,
and turning it into one might even be easier than cleaning it up.  Of
course, we would need to ensure the snapshot could be deleted without
further error.  Doing anything with the snapshot might require special
tools, but if people want to do disk scraping they could.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs balance segfault, kernel BUG at fs/btrfs/extent-tree.c:7727

2014-10-13 Thread Rich Freeman

On Thu, Oct 9, 2014 at 10:19 AM, Petr Janecek jane...@ucw.cz wrote:

   I have trouble finishing btrfs balance on five disk raid10 fs.
 I added a disk to 4x3TB raid10 fs and run btrfs balance start
 /mnt/b3, which segfaulted after few hours, probably because of the BUG
 below. btrfs check does not find any errors, both before the balance
 and after reboot (the fs becomes un-umountable).

 [22744.238559] WARNING: CPU: 0 PID: 4211 at fs/btrfs/extent-tree.c:876 
 btrfs_lookup_extent_info+0x292/0x30a [btrfs]()

 [22744.532378] kernel BUG at fs/btrfs/extent-tree.c:7727!

I am running into something similar. I just added a 3TB drive to my
raid1 btrfs and started a balance.  The balance segfaulted, and I find
this in dmesg:


[453046.291762] BTRFS info (device sde2): relocating block group
10367073779712 flags 17
[453062.494151] BTRFS info (device sde2): found 13 extents
[453069.283368] [ cut here ]
[453069.283468] kernel BUG at
/data/src/linux-3.17.0-gentoo/fs/btrfs/relocation.c:931!
[453069.283590] invalid opcode:  [#1] SMP
[453069.283666] Modules linked in: vhost_net vhost macvtap macvlan tun
ipt_MASQUERADE xt_conntrack veth nfsd auth_rpcgss oid_registry lockd
iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables it87
hwmon_vid hid_logitech_dj nxt200x cx88_dvb videobuf_dvb dvb_core
cx88_vp3054_i2c tuner_simple tuner_types tuner mousedev hid_generic
usbhid cx88_alsa radeon cx8800 cx8802 cx88xx snd_hda_codec_realtek
btcx_risc snd_hda_codec_generic videobuf_dma_sg videobuf_core kvm_amd
tveeprom kvm rc_core v4l2_common cfbfillrect fbcon videodev cfbimgblt
snd_hda_intel bitblit snd_hda_controller cfbcopyarea softcursor font
tileblit i2c_algo_bit k10temp snd_hda_codec backlight drm_kms_helper
snd_hwdep i2c_piix4 ttm snd_pcm snd_timer drm snd soundcore 8250 evdev
[453069.285043]  serial_core ext4 crc16 jbd2 mbcache zram lz4_compress
zsmalloc ata_generic pata_acpi btrfs xor zlib_deflate atkbd raid6_pq
ohci_pci firewire_ohci firewire_core crc_itu_t pata_atiixp ehci_pci
ohci_hcd ehci_hcd usbcore usb_common r8169 mii sunrpc dm_mirror
dm_region_hash dm_log dm_mod
[453069.285552] CPU: 1 PID: 17270 Comm: btrfs Not tainted 3.17.0-gentoo #1
[453069.285657] Hardware name: Gigabyte Technology Co., Ltd.
GA-880GM-UD2H/GA-880GM-UD2H, BIOS F8 10/11/2010
[453069.285806] task: 88040ec556e0 ti: 88010cf94000 task.ti:
88010cf94000
[453069.285925] RIP: 0010:[a02ddd62]  [a02ddd62]
build_backref_tree+0x1152/0x11b0 [btrfs]
[453069.286137] RSP: 0018:88010cf97848  EFLAGS: 00010206
[453069.286223] RAX: 8800ae67c800 RBX: 880122e94000 RCX:
880122e949c0
[453069.286336] RDX: 09270788d000 RSI: 880054c3fbc0 RDI:
8800ae67c800
[453069.286449] RBP: 88010cf97958 R08: 000159a0 R09:
880122e94000
[453069.286561] R10: 0003 R11:  R12:
8802da313000
[453069.286674] R13: 8802da313c60 R14: 880122e94780 R15:
88040c277000
[453069.286787] FS:  7f175ac51880() GS:880427c4()
knlGS:f7333b40
[453069.286913] CS:  0010 DS:  ES:  CR0: 8005003b
[453069.287005] CR2: 7f208de58000 CR3: 0003b0a9c000 CR4:
07e0
[453069.287116] Stack:
[453069.287151]  88010cf97868 880122e94000 01ff880122e94300
880342156060
[453069.287282]  880122e94780 8802da313c60 880122e94600
8800ae67c800
[453069.287412]  880122e947c0 8802da313000 88040c277120
88010005
[453069.287542] Call Trace:
[453069.287640]  [a02ddfa3] relocate_tree_blocks+0x1e3/0x630 [btrfs]
[453069.287796]  [a02e0550] relocate_block_group+0x3d0/0x650 [btrfs]
[453069.287951]  [a02e0958]
btrfs_relocate_block_group+0x188/0x2a0 [btrfs]
[453069.288113]  [a02b48f0]
btrfs_relocate_chunk.isra.61+0x70/0x780 [btrfs]
[453069.288276]  [a02c7fd0] ?
btrfs_set_lock_blocking_rw+0x70/0xc0 [btrfs]
[453069.288438]  [a02b0e79] ? free_extent_buffer+0x59/0xb0 [btrfs]
[453069.288590]  [a02b8e99] btrfs_balance+0x829/0xf40 [btrfs]
[453069.288738]  [a02bf80f] btrfs_ioctl_balance+0x1af/0x510 [btrfs]
[453069.288890]  [a02c59e4] btrfs_ioctl+0xa54/0x2950 [btrfs]
[453069.288995]  [8111d016] ?
lru_cache_add_active_or_unevictable+0x26/0x90
[453069.289119]  [8113a061] ? handle_mm_fault+0xbe1/0xdb0
[453069.289219]  [811ffdce] ? cred_has_capability+0x5e/0x100
[453069.289323]  [8104065c] ? __do_page_fault+0x1fc/0x4f0
[453069.289422]  [8117d80e] do_vfs_ioctl+0x7e/0x4f0
[453069.289513]  [811ff64f] ? file_has_perm+0x8f/0xa0
[453069.289606]  [8117dd09] SyS_ioctl+0x89/0xa0
[453069.289692]  [81040a1c] ? do_page_fault+0xc/0x10
[453069.289785]  [814f5752] system_call_fastpath+0x16/0x1b
[453069.289881] Code: ff ff 48 8b 9d 20 ff ff ff e9 11 ff ff ff 0f 0b
be ec 03 00 00 48 c7 c7 d0 f0 30 a0 e8 28 00 d7 e0 e9 06 f3 ff ff e8
c4 42

Re: 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931!

2014-10-13 Thread Rich Freeman

On Thu, Oct 2, 2014 at 3:27 AM, Tomasz Chmielewski t...@virtall.com wrote:
 Got this when running balance with 3.17.0-rc7:

 [173475.410717] kernel BUG at fs/btrfs/relocation.c:931!

I just started a post on another thread with this exact same issue on
3.17.0. I started a balance after adding a new drive.

[453046.291762] BTRFS info (device sde2): relocating block group
10367073779712 flags 17
[453062.494151] BTRFS info (device sde2): found 13 extents
[453069.283368] [ cut here ]
[453069.283468] kernel BUG at
/data/src/linux-3.17.0-gentoo/fs/btrfs/relocation.c:931!
[453069.283590] invalid opcode:  [#1] SMP
[453069.283666] Modules linked in: vhost_net vhost macvtap macvlan tun
ipt_MASQUERADE xt_conntrack veth nfsd auth_rpcgss oid_registry lockd
iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables it87
hwmon_vid hid_logitech_dj nxt200x cx88_dvb videobuf_dvb dvb_core
cx88_vp3054_i2c tuner_simple tuner_types tuner mousedev hid_generic
usbhid cx88_alsa radeon cx8800 cx8802 cx88xx snd_hda_codec_realtek
btcx_risc snd_hda_codec_generic videobuf_dma_sg videobuf_core kvm_amd
tveeprom kvm rc_core v4l2_common cfbfillrect fbcon videodev cfbimgblt
snd_hda_intel bitblit snd_hda_controller cfbcopyarea softcursor font
tileblit i2c_algo_bit k10temp snd_hda_codec backlight drm_kms_helper
snd_hwdep i2c_piix4 ttm snd_pcm snd_timer drm snd soundcore 8250 evdev
[453069.285043]  serial_core ext4 crc16 jbd2 mbcache zram lz4_compress
zsmalloc ata_generic pata_acpi btrfs xor zlib_deflate atkbd raid6_pq
ohci_pci firewire_ohci firewire_core crc_itu_t pata_atiixp ehci_pci
ohci_hcd ehci_hcd usbcore usb_common r8169 mii sunrpc dm_mirror
dm_region_hash dm_log dm_mod
[453069.285552] CPU: 1 PID: 17270 Comm: btrfs Not tainted 3.17.0-gentoo #1
[453069.285657] Hardware name: Gigabyte Technology Co., Ltd.
GA-880GM-UD2H/GA-880GM-UD2H, BIOS F8 10/11/2010
[453069.285806] task: 88040ec556e0 ti: 88010cf94000 task.ti:
88010cf94000
[453069.285925] RIP: 0010:[a02ddd62]  [a02ddd62]
build_backref_tree+0x1152/0x11b0 [btrfs]
[453069.286137] RSP: 0018:88010cf97848  EFLAGS: 00010206
[453069.286223] RAX: 8800ae67c800 RBX: 880122e94000 RCX:
880122e949c0
[453069.286336] RDX: 09270788d000 RSI: 880054c3fbc0 RDI:
8800ae67c800
[453069.286449] RBP: 88010cf97958 R08: 000159a0 R09:
880122e94000
[453069.286561] R10: 0003 R11:  R12:
8802da313000
[453069.286674] R13: 8802da313c60 R14: 880122e94780 R15:
88040c277000
[453069.286787] FS:  7f175ac51880() GS:880427c4()
knlGS:f7333b40
[453069.286913] CS:  0010 DS:  ES:  CR0: 8005003b
[453069.287005] CR2: 7f208de58000 CR3: 0003b0a9c000 CR4:
07e0
[453069.287116] Stack:
[453069.287151]  88010cf97868 880122e94000 01ff880122e94300
880342156060
[453069.287282]  880122e94780 8802da313c60 880122e94600
8800ae67c800
[453069.287412]  880122e947c0 8802da313000 88040c277120
88010005
[453069.287542] Call Trace:
[453069.287640]  [a02ddfa3] relocate_tree_blocks+0x1e3/0x630 [btrfs]
[453069.287796]  [a02e0550] relocate_block_group+0x3d0/0x650 [btrfs]
[453069.287951]  [a02e0958]
btrfs_relocate_block_group+0x188/0x2a0 [btrfs]
[453069.288113]  [a02b48f0]
btrfs_relocate_chunk.isra.61+0x70/0x780 [btrfs]
[453069.288276]  [a02c7fd0] ?
btrfs_set_lock_blocking_rw+0x70/0xc0 [btrfs]
[453069.288438]  [a02b0e79] ? free_extent_buffer+0x59/0xb0 [btrfs]
[453069.288590]  [a02b8e99] btrfs_balance+0x829/0xf40 [btrfs]
[453069.288738]  [a02bf80f] btrfs_ioctl_balance+0x1af/0x510 [btrfs]
[453069.288890]  [a02c59e4] btrfs_ioctl+0xa54/0x2950 [btrfs]
[453069.288995]  [8111d016] ?
lru_cache_add_active_or_unevictable+0x26/0x90
[453069.289119]  [8113a061] ? handle_mm_fault+0xbe1/0xdb0
[453069.289219]  [811ffdce] ? cred_has_capability+0x5e/0x100
[453069.289323]  [8104065c] ? __do_page_fault+0x1fc/0x4f0
[453069.289422]  [8117d80e] do_vfs_ioctl+0x7e/0x4f0
[453069.289513]  [811ff64f] ? file_has_perm+0x8f/0xa0
[453069.289606]  [8117dd09] SyS_ioctl+0x89/0xa0
[453069.289692]  [81040a1c] ? do_page_fault+0xc/0x10
[453069.289785]  [814f5752] system_call_fastpath+0x16/0x1b
[453069.289881] Code: ff ff 48 8b 9d 20 ff ff ff e9 11 ff ff ff 0f 0b
be ec 03 00 00 48 c7 c7 d0 f0 30 a0 e8 28 00 d7 e0 e9 06 f3 ff ff e8
c4 42 02 00 0f 0b 3c b0 0f 84 72 f1 ff ff be 22 03 00 00 48 c7 c7 d0
f0 30
[453069.290429] RIP  [a02ddd62]
build_backref_tree+0x1152/0x11b0 [btrfs]
[453069.290591]  RSP 88010cf97848
[453069.316194] ---[ end trace 5fdc0af4cc62bf41 ]---
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: btrfs send and kernel 3.17

2014-10-13 Thread Rich Freeman

On Sun, Oct 12, 2014 at 7:11 AM, David Arendt ad...@prnet.org wrote:
 This weekend I finally had time to try btrfs send again on the newly
 created fs. Now I am running into another problem:

 btrfs send returns: ERROR: send ioctl failed with -12: Cannot allocate
 memory

 In dmesg I see only the following output:

 parent transid verify failed on 21325004800 wanted 2620 found 8325


I'm not using send at all, but I've been running into parent transid
verify failed messages where the wanted is way smaller than the found
when trying to balance a raid1 after adding a new drive.  Originally I
had gotten a BUG, and after reboot the drive finished balancing
(interestingly enough without moving any chunks to the new drive -
just consolidating everything on the old drives), and then when I try
to do another balance I get:
[ 4426.987177] BTRFS info (device sdc2): relocating block group
10367073779712 flags 17
[ 4446.287998] BTRFS info (device sdc2): found 13 extents
[ 4451.330887] parent transid verify failed on 10063286579200 wanted
987432 found 993678
[ 4451.350663] parent transid verify failed on 10063286579200 wanted
987432 found 993678

The btrfs program itself outputs:
btrfs balance start -v /data
Dumping filters: flags 0x7, state 0x0, force is off
  DATA (flags 0x0): balancing
  METADATA (flags 0x0): balancing
  SYSTEM (flags 0x0): balancing
ERROR: error during balancing '/data' - Cannot allocate memory
There may be more info in syslog - try dmesg | tail

This is also on 3.17.  This may be completely unrelated, but it seemed
similar enough to be worth mentioning.

The filesystem otherwise seems to work fine, other than the new drive
not having any data on it:
Label: 'datafs'  uuid: cd074207-9bc3-402d-bee8-6a8c77d56959
Total devices 6 FS bytes used 2.16TiB
devid1 size 2.73TiB used 2.40TiB path /dev/sdc2
devid2 size 931.32GiB used 695.03GiB path /dev/sda2
devid3 size 931.32GiB used 700.00GiB path /dev/sdb2
devid4 size 931.32GiB used 700.00GiB path /dev/sdd2
devid5 size 931.32GiB used 699.00GiB path /dev/sde2
devid6 size 2.73TiB used 0.00 path /dev/sdf2

This is btrfs-progs-3.16.2.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread Rich Freeman

On Mon, Oct 13, 2014 at 4:27 PM, David Arendt ad...@prnet.org wrote:
 From my own experience and based on what other people are saying, I
 think there is a random btrfs filesystem corruption problem in kernel
 3.17 at least related to snapshots, therefore I decided to post using
 another subject to draw attention from people not concerned about btrfs
 send to it. More information can be found in the brtfs send posts.

 Did the filesystem you tried to balance contain snapshots ? Read only ones ?

The filesystem contains numerous subvolumes and snapshots, many of
which are read-only.  I'm managing many with snapper.

The similarity of the transid verify errors made me think this issue
is related, and the root cause may have nothing to do with btrfs send.

As far as I can tell these errors aren't having any affect on my data
- hopefully the system is catching the problems before there are
actual disk writes/etc.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread Rich Freeman

On Mon, Oct 13, 2014 at 4:48 PM, john terragon jterra...@gmail.com wrote:
 I think I just found a consistent simple way to trigger the problem
 (at least on my system). And, as I guessed before, it seems to be
 related just to readonly snapshots:

 1) I create a readonly snapshot
 2) I do some changes on the source subvolume for the snapshot (I'm not
 sure changes are strictly needed)
 3) reboot (or probably just unmount and remount. I reboot because the
 fs I've problems with contains my root subvolume)

 After the rebooting (or the remount) I consistently have the corruption
 with the usual multitude of these in dmesg
 parent transid verify failed on 902316032 wanted 2484 found 4101
 and the characteristic ls -la output

 drwxr-xr-x 1 root root  250 Oct 10 15:37 root
 d? ? ??   ?? root-b2
 drwxr-xr-x 1 root root  250 Oct 10 15:37 root-b3
 d? ? ??   ?? root-backup

 root-backup and root-b2 are both readonly whereas root-b3 is rw (and
 it didn't get corrupted).

 David, maybe you can try the same steps on one of your machines?


Look at that.  I didn't realize it, but indeed I have a corrupted snapshot:
/data/.snapshots/5338/:
ls: cannot access /data/.snapshots/5338/snapshot: Cannot allocate memory
total 4
drwxr-xr-x 1 root root  32 Oct 11 06:09 .
drwxr-x--- 1 root root  32 Oct 11 07:42 ..
-rw--- 1 root root 135 Oct 11 06:09 info.xml
d? ? ??  ?? snapshot

Several older snapshots are fine, and those predate my 3.17 upgrade.

I noticed that this corrupted snapshot isn't even listed in my snapper lists.

btrfs su delete /data/.snapshots/5338/snapshot
Transaction commit: none (default)
ERROR: error accessing '/data/.snapshots/5338/snapshot'

Removing them appears to be problematic as well.  I might just disable
compress=lzo and go back to 3.16 to see how that goes.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread Rich Freeman

On Mon, Oct 13, 2014 at 4:55 PM, Rich Freeman
r-bt...@thefreemanclan.net wrote:
 On Mon, Oct 13, 2014 at 4:48 PM, john terragon jterra...@gmail.com wrote:

 After the rebooting (or the remount) I consistently have the corruption
 with the usual multitude of these in dmesg
 parent transid verify failed on 902316032 wanted 2484 found 4101
 and the characteristic ls -la output

Sorry to double-reply, but I left this out.  I have a long string of
these early in boot as well that I never noticed before.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread Rich Freeman

On Mon, Oct 13, 2014 at 5:22 PM, john terragon jterra...@gmail.com wrote:
 I'm using compress=no so compression doesn't seem to be related, at
 least in my case. Just read-only snapshots on 3.17 (although I haven't
 tried 3.16).

I was using lzo compression, and hence my comment about turning it off
before going back to 3.16 (not realizing that 3.16 has subsequently
been fixed).

Ironically enough I discovered this as I was about to migrate my ext4
backup drive into my btrfs raid1.  Maybe I'll go ahead and wait on
that and have an rsync backup of the filesystem handy (minus
snapshots) just in case.  :)

I'd switch to 3.16, but it sounds like there is no way to remove the
snapshots at the moment, and I can live for a while without the
ability to create new ones.

interestingly enough it doesn't look like ALL snapshots are affected.
I checked and some of the snapshots I made last weekend while doing
system updates look accessible.  They are significantly smaller, and
the subvolumes they were made from are also fairly new - though I have
no idea if that is related.

The subvolumes do show up in btrfs su list.  They cannot be examined
using btrfs su show.

It would be VERY nice to have a way of cleaning this up without
blowing away the entire filesystem...

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 3.16 Managed to ENOSPC with 80% used

2014-09-26 Thread Rich Freeman

On Thu, Sep 25, 2014 at 5:21 PM, Holger Hoffstätte
holger.hoffstae...@googlemail.com wrote:
 That's why I mentioned adding a second device - that will immediately
 allow cleanup with headroom. An additional 8GB tmpfs volume can works
 wonders.


If you add a single 8GB tmpfs to a RAID1 btrfs array, is it safe to
assume that you'll still always have a redundant copy of everything on
a disk somewhere during the recovery?  Would only a single tmpfs
volume actually help in this case?  I get a bit nervous about doing a
cleanup that involves moving metadata to tmpfs of all places, since
some kind of deadlock/etc could result in unrecoverable data loss.

Doing the same thing with an actual hard drive would concern me less.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is it necessary to balance a btrfs raid1 array?

2014-09-10 Thread Rich Freeman

On Wed, Sep 10, 2014 at 9:06 AM, Austin S Hemmelgarn
ahferro...@gmail.com wrote:
 Normally, you shouldn't need to run balance at all on most BTRFS
 filesystems, unless your usage patterns vary widely over time (I'm
 actually a good example of this, most of the files in my home directory
 are relatively small, except for when I am building a system with
 buildroot or compiling a kernel, and on occasion I have VM images that
 I'm working with).

Tend to agree, but I do keep a close eye on free space.  If I get to
the point where I'm over 90% allocated to chunks with lots of unused
space otherwise I run a balance.  I tend to have the most problems
with my root/OS filesystem running on a 64GB SSD, likely because it is
so small.

Is there a big performance penalty running mixed chunks on an SSD?  I
believe this would get rid of the risk of ENOSPC issues if everything
gets allocated to chunks.  There are obviously no issues with random
access on an SSD, but there could be other problems (cache
utilization, etc).

I tend to watch btrfs fi sho and if the total space used starts
getting high then I run a balance.  Usually I run with -dusage=30 or
-dusage=50, but sometimes I get to the point where I just need to do a
full balance.  Often it is helpful to run a series of balance commands
starting at -dusage=10 and moving up in increments.  This at least
prevents killing IO continuously for hours.  If we can get to a point
where balancing can operate at low IO priority that would be helpful.

IO priority is a problem in btrfs in general.  Even tasks run at idle
scheduling priority can really block up a disk.  I've seen a lot of
hurry-and-wait behavior in btrfs.  It seems like the initial commit to
the log/etc is willing to accept a very large volume of data, and then
when all the trees get updated the system grinds to a crawl trying to
deal with all the data that was committed.  The problem is that you
have two queues, with the second queue being rate-limiting but the
first queue being the one that applies priority control.  What we
really need is for the log to have controls on how much it accepts so
that the updating of the trees/etc never is rate-limiting.   That will
limit the ability to have short IO write bursts, but it would prevent
low-priority writes from blocking high-priority read/writes.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Distro vs latest kernel for BTRFS?

2014-08-22 Thread Rich Freeman

On Fri, Aug 22, 2014 at 8:04 AM, Austin S Hemmelgarn
ahferro...@gmail.com wrote:

 I personally use Gentoo Unstable on all my systems, so I build all my
 kernels locally anyway, and stay pretty much in-line with the current
 stable Mainline kernel.

Gentoo Unstable probably means gentoo-sources, testing version,
which follows the stable kernel branch, but the most recent stable,
and not the long-term stable.  gentoo-sources stable version generally
follows the most recent longterm stable kernel (so 3.14 right now).
I'm not sure what the exact policy is, but that is my sense of it.

So, you're still running a stable kernel most likely.  If you really
want mainline then you want git-sources.  That follows the most recent
mainline I believe.  Of course, if you're following it that closely
then you probably should think about just doing a git clone and
managing it yourself, since then you can handle patches/etc more
easily.

I think the best option for somebody running btrfs is to stick with a
stable kernel branch, either the current stable or a very recent
longterm.  I wouldn't go back into 3.2 land or anything like that.

But, yes, if you had stuck with 3.14 and not gone to the current
stable then you would have missed the compress=lzo deadlock.  So, pick
your poison.  :)

Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Significance of high number of mails on this list?

2014-08-22 Thread Rich Freeman

On Fri, Aug 22, 2014 at 3:35 AM, Duncan 1i5t5.dun...@cox.net wrote:

 No claim to be a dev, btrfs or otherwise, here, but I believe in this
 case you /are/ being too paranoid.

 Both btrfs send and receive only deal with data/metadata they know how to
 deal with.  If it's corrupt in some way or if they don't understand it,
 they don't send/write it, they fail.

 IOW, if it works without error it's as guaranteed to be golden as these
 things get.  The problem is that it doesn't always work without error in
 the first place, sometimes it /does/ fail.  In that instance you can
 always try again as the existing data/metadata shouldn't be damaged, but
 if it keeps failing you may have to try something else, rsync, etc.

Well, my main use-case for rsync right now is btrfs bug hoses my
filesystem, so it would be nice to have a daily full backup on
something other than btrfs so that it is unlikely to suffer the same
problem at the same time.

Using btrfs send with that backup would certainly be more efficient,
but it would defeat the purpose of the backup, which is to not be
btrfs.  I am already using mirroring in the event of drive failure,
and offsite cloud backups of critical data in the event of a larger
catastrophe.  Btrfs eating my data is a somewhat likely failure mode
in the grand scheme of things, so I protect against it so that I can
still have fun playing with btrfs without losing sleep.

I've actually restored from it once.  I suspect that I could have
fixed my ENOSPC problem without resorting to that, but the usual FAQ
solutions didn't work and I was running short on time, and that
particular filesystem was only 64GB anyway so it was a fast restore
(and that is why this filesystem is prone to ENOSPC in the first
place).

Oh, and I'm using rsnapshot, so I also get the benefit of a few days
worth of backups - almost as good as snapper, though in reality
obviously not the same thing.

--
Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix task hang under heavy compressed write

2014-08-13 Thread Rich Freeman

On Wed, Aug 13, 2014 at 7:54 AM, Martin Steigerwald mar...@lichtvoll.de wrote:
 Am Dienstag, 12. August 2014, 15:44:59 schrieb Liu Bo:
 This has been reported and discussed for a long time, and this hang occurs
 in both 3.15 and 3.16.

 Liu, is this safe for testing yet?


I'm more than happy to test this an re-enable lzo (I've been running
fine on 3.16 with it disabled, but had numerous issues when it was
enabled on 3.15 and the rcs).  It would just be helpful to clarify
exactly what patch we should be testing, and what kernel we should
test it against to be most helpful.  No sense generating issue reports
that aren't useful.

Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Blocked tasks on 3.15.1

2014-07-22 Thread Rich Freeman

On Tue, Jul 22, 2014 at 10:53 AM, Chris Mason c...@fb.com wrote:

 Thanks for the help in tracking this down everyone.  We'll get there!
 Are you all running multi-disk systems (from a btrfs POV, more than one
 device?)  I don't care how many physical drives this maps to, just does
 btrfs think there's more than one drive.

I've been away on vacation so I haven't been able to try your latest
patch, but I can try whatever is out there starting this weekend.

I was getting fairly consistent hangs during heavy IO (especially
rsync) on 3.15 with lzo enabled.  This is on raid1 across 5 drives,
directly against the partitions themselves (no dmcrypt, mdadm, lvm,
etc).  I disabled lzo and haven't had problems since.  I'm now running
on mainline without issue, but I think I did see the hang on mainline
when I tried enabling lzo again briefly.

Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Blocked tasks on 3.15.1

2014-06-29 Thread Rich Freeman

On Fri, Jun 27, 2014 at 8:22 PM, Chris Samuel ch...@csamuel.org wrote:
 On Fri, 27 Jun 2014 05:20:41 PM Duncan wrote:

 If I'm not mistaken the fix for the 3.16 series bug was:

 ea4ebde02e08558b020c4b61bb9a4c0fcf63028e

 Btrfs: fix deadlocks with trylock on tree nodes.

 That patch applies cleanly to 3.15.2 so if it is indeed the fix it should
 probably go to -stable for the next 3.15 release..

I can confirm that 3.15.2 definitely has the deadlock problem.  I
tried upgrading just to convince myself of this before patching it and
it only took a few hours before it stopped syncing with the usual
errors.

I applied the patch on Jun 28 around 20:00UTC.  I haven't had a
deadlock since, despite having the file system fairly active with a
few reboots, some deleted snapshots, being assimilated by the new
sysvinit replacement, etc.  That doesn't really prove anything though
- for all I know it will hang a week from now.

However, the patch seems stable so far on 3.15.2.

Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Blocked tasks on 3.15.1

2014-06-27 Thread Rich Freeman

On Fri, Jun 27, 2014 at 9:06 AM, Duncan 1i5t5.dun...@cox.net wrote:
 Hopefully that problem's fixed on 3.16-rc2+, but as of yet there's not
 enough 3.16-rc2+ reports out there from folks experiencing issues with
 3.15 blocked tasks to rightfully say.

Any chance that it was backported to 3.15.2?  I'd rather not move to
mainline just for btrfs.

I got another block this morning and failed to capture a log before my
terminals gave out.  I switched back to 3.15.0 for the moment, and
we'll see if that fares any better.

Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Blocked tasks on 3.15.1

2014-06-27 Thread Rich Freeman

On Fri, Jun 27, 2014 at 11:52 AM, Chris Murphy li...@colorremedies.com wrote:
 On Jun 27, 2014, at 9:14 AM, Rich Freeman r-bt...@thefreemanclan.net wrote:


 I got another block this morning and failed to capture a log before my
 terminals gave out.  I switched back to 3.15.0 for the moment, and
 we'll see if that fares any better.

 Yeah I'd start going backwards. The idea of going forwards is to
 hopefully get you unstuck or extract data where otherwise you can't,
 it's not really a recommendation for production usage. It's also often
 useful if you can reproduce the block with a current rc kernel and
 issue sysrq+w and post that. Then do your regression with an older
 kernel.

So, obviously I'm getting my money's worth from the btrfs team, but
neither is always a great option as neither involves me running a
stable kernel.  3.15.0 contains CVE-2014-4014, although I'm running a
version patched for that vulnerability.  If I go back any further I'd
probably have to backport it myself, and I only know about it because
my distro patched that CVE on 3.15.0 before moving to 3.15.1.

Running 3.16 doesn't bother me much from a btrfs standpoint, but it
means I'm getting unstable updates on all the other modules as well.
It is just more to deal with.

I might give 3.15.2 a shot and see what happens, and I can always fall
back to 3.15.0 again.

Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Blocked tasks on 3.15.1

2014-06-26 Thread Rich Freeman

I've been getting blocked tasks on 3.15.1 generally at times when the
filesystem is somewhat busy (such as doing a backup via scp/clonezilla
writing to the disk).

A week ago I had enabled snapper for a day which resulted in a daily
cleanup of about 8 snapshots at once, which might have contributed,
but I've been limping along since.

Here is a pastebin of my dmesg from the hung tasks and a subsequent Alt-SysRq-W:

http://pastebin.com/yYdcxFTE

When this happens the system remains somewhat stable, but no writes to
the disk succeed, and I start getting load averages in the dozens as
tasks start blocking.

On reboot the system generally works fine, though it can hang a day or
two later.

I'm happy to try patches, or try to capture any other output that is
helpful the next time this happens - the system is fairly stable as
long as I capture things someplace other than my btrfs file systems.
I didn't see anything quite like this on the list.  I updated my
kernel around the time this behavior started, and was on 3.15.0
previously (though I haven't tried reverting yet).

Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix deadlock with nested trans handles

2014-03-20 Thread Rich Freeman

On Sat, Mar 15, 2014 at 7:51 AM, Duncan 1i5t5.dun...@cox.net wrote:
 1) Does running the snapper cleanup command from that cron job manually
 trigger the problem as well?

As you can imagine I'm not too keen to trigger this often.  But yes, I
just gave it a shot on my SSD and cleaning a few days of timelines
triggered a panic.

 2) What about modifying the cron job to run hourly, or perhaps every six
 hours, so it's deleting only 2 or 12 instead of 48 at a time?  Does that
 help?

 If so then it's a thundering herd problem.  While definitely still a bug,
 you'll at least have a workaround until its fixed.

Definitely looks like a thundering herd problem.

I stopped the cron jobs (including the creation of snapshots based on
your later warning).  However, I am my snapshots one at a time at a
rate of one every 5-30 minutes, and while that is creating
surprisingly high disk loads on my ssd and hard drives, I don't get
any panics.  I figured that having only one deletion pending per
checkpoint would eliminate locking risk.

I did get some blocked task messages in dmesg, like:
[105538.121239] INFO: task mysqld:3006 blocked for more than 120 seconds.
[105538.121251]   Not tainted 3.13.6-gentoo #1
[105538.121256] echo 0  /proc/sys/kernel/hung_task_timeout_secs
disables this message.
[105538.121262] mysqld  D 880395f63e80  3432  3006  1 0x
[105538.121273]  88028b623d38 0086 88028b623dc8
81c10440
[105538.121283]  0200 88028b623fd8 880395f63b80
00012c40
[105538.121291]  00012c40 880395f63b80 532b7877
880410e7e578
[105538.121299] Call Trace:
[105538.121316]  [81623d73] schedule+0x6a/0x6c
[105538.121327]  [81623f52] schedule_preempt_disabled+0x9/0xb
[105538.121337]  [816251af] __mutex_lock_slowpath+0x155/0x1af
[105538.121347]  [812b9db0] ? radix_tree_tag_set+0x71/0xd4
[105538.121356]  [81625225] mutex_lock+0x1c/0x2e
[105538.121365]  [8123c168] btrfs_log_inode_parent+0x161/0x308
[105538.121373]  [8162466d] ? mutex_unlock+0x11/0x13
[105538.121382]  [8123cd37] btrfs_log_dentry_safe+0x39/0x52
[105538.121390]  [8121a0c9] btrfs_sync_file+0x1bc/0x280
[105538.121401]  [811339a3] vfs_fsync_range+0x13/0x1d
[105538.121409]  [811339c4] vfs_fsync+0x17/0x19
[105538.121416]  [81133c3c] do_fsync+0x30/0x55
[105538.121423]  [81133e40] SyS_fsync+0xb/0xf
[105538.121432]  [8162c2e2] system_call_fastpath+0x16/0x1b

I suspect that this may not be terribly helpful - it probably reflects
tasks waiting for a lock rather than whatever is holding it.  It was
more of a problem when I was trying to delete a snapshot per minute on
my ssd, or one every 5 min on hdd.

Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix deadlock with nested trans handles

2014-03-14 Thread Rich Freeman

On Wed, Mar 12, 2014 at 12:34 PM, Rich Freeman
r-bt...@thefreemanclan.net wrote:
 On Wed, Mar 12, 2014 at 11:24 AM, Josef Bacik jba...@fb.com wrote:
 On 03/12/2014 08:56 AM, Rich Freeman wrote:

  After a number of reboots the system became stable, presumably
 whatever race condition btrfs was hitting followed a favorable
 path.

 I do have a 2GB btrfs-image pre-dating my application of this
 patch that was causing the issue last week.


 Uhm wow that's pretty epic.  I will talk to chris and figure out how
 we want to deal with that and send you a patch shortly.  Thanks,

 A tiny bit more background.

And some more background.  I had more reboots over the next two days
at the same time each day, just after my crontab successfully
completed.  One of the last thing it does is runs the snapper cleanups
which delete a bunch of snapshots.  During a reboot I checked and
there were a bunch of deleted snapshots, which disappeared over the
next 30-60 seconds before the panic, and then they would re-appear on
the next reboot.

I disabled the snapper cron job and this morning had no issues at all.
 One day isn't much to establish a trend, but I suspect that this is
the cause.  Obviously getting rid of snapshots would be desirable at
some point, but I can wait for a patch.  Snapper would be deleting
about 48 snapshots at the same time, since I create them hourly and
the cleanup occurs daily on two different subvolumes on the same
filesystem.

Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix deadlock with nested trans handles

2014-03-12 Thread Rich Freeman

On Thu, Mar 6, 2014 at 7:25 PM, Zach Brown z...@redhat.com wrote:
 On Thu, Mar 06, 2014 at 07:01:07PM -0500, Josef Bacik wrote:
 Zach found this deadlock that would happen like this


 And this fixes it.  It's run through a few times successfully.

I'm not sure if my issue is related to this or not - happy to start a
new thread if not.  I applied this patch as I was running into locks,
but I am still having them.

See: http://picpaste.com/IMG_20140312_072458-KPH35pQ6.jpg

After a number of reboots the system became stable, presumably
whatever race condition btrfs was hitting followed a favorable path.

I do have a 2GB btrfs-image pre-dating my application of this patch
that was causing the issue last week.

Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix deadlock with nested trans handles

2014-03-12 Thread Rich Freeman

On Wed, Mar 12, 2014 at 11:24 AM, Josef Bacik jba...@fb.com wrote:
 On 03/12/2014 08:56 AM, Rich Freeman wrote:

  After a number of reboots the system became stable, presumably
 whatever race condition btrfs was hitting followed a favorable
 path.

 I do have a 2GB btrfs-image pre-dating my application of this
 patch that was causing the issue last week.


 Uhm wow that's pretty epic.  I will talk to chris and figure out how
 we want to deal with that and send you a patch shortly.  Thanks,

If you need any info from me at all beyond the capture let me know.

A tiny bit more background.  The system would boot normally, but panic
after about 30-90 seconds (usually long enough to log into KDE,
perhaps even fire up a browser/etc).  In single-user mode I could
mount the filesystem read-only without issue.  If I mounted it
read-write (in recovery mode or normally) I'd get the panic after
about 30-60 seconds.  On one occasion it seemed stable, but panicked
when I unmounted it.

I have to say that I'm impressed that it recovers at all.  I'd rather
have the file system not write anything if it isn't sure it can't
write it correctly, and that seems to be the effect here.  Just about
all the issues I've run into with btrfs have tended to be lockup/etc
type issues, and not silent corruption.

Rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

68 matches

Mail list logo