Re: Is btrfs on top of bcache stable now?

2015-04-20 Thread Fábio Pfeifer
I'm one of those that used to have problems with btrfs on top of bcache.
After some corruptions, I gave up this setup.

Recently (from February, I think) I gave it another shot, and I have
had no problems since.
I use bcache in writeback mode, with very good performance. I'm
feeling btrfs very stable in this setup.

Best Regards,

Fabio Pfeifer

2015-04-20 11:49 GMT-03:00 Marc MERLIN :
> On Mon, Apr 20, 2015 at 10:27:05AM +, Hugo Mills wrote:
>>See the first issue here: https://btrfs.wiki.kernel.org/index.php/Gotchas
>
> Hi Hugo, looking at the page again, I see
> "bcache + btrfs does not seem to be stable yet"
> linking to a thread more than 2 years old and btrfs kernels that
> wouldn't be stable without bcache anyway.
>
> I've seen others mention they switched to bcache recently and not seen
> new "it's broken" reports.
>
> So, is it ok
> 1) to assume bcache and btrfs play ok together now?
> 2) remove the warning from that gotchas page?
>
> Thanks,
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>    what McDonalds is to gourmet 
> cooking
> Home page: http://marc.merlins.org/ | PGP 
> 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2013-12-19 Thread Fábio Pfeifer
Any update on this?

I have here exactly the same issue. Kernel 3.12.5-1-ARCH, backing
device 500 GB IDE, cache 24 GB SSD => /dev/bcache0
On /dev/bcache I also have 2 subvolumes, / and /home. I get lots of
messages in dmesg:

(...)
[   22.282469] BTRFS info (device bcache0): csum failed ino 56193 off
212992 csum 519977505 expected csum 3166125439
[   22.282656] incomplete page read in btrfs with offset 1024 and length 3072
[   23.370872] incomplete page read in btrfs with offset 1024 and length 3072
[   23.370890] BTRFS info (device bcache0): csum failed ino 57765 off
106496 csum 3553846164 expected csum 1299185721
[   23.505238] incomplete page read in btrfs with offset 2560 and length 1536
[   23.505256] BTRFS info (device bcache0): csum failed ino 75922 off
172032 csum 1883678196 expected csum 1337496676
[   23.508535] incomplete page read in btrfs with offset 2560 and length 1536
[   23.508547] BTRFS info (device bcache0): csum failed ino 74368 off
237568 csum 2863587994 expected csum 2693116460
[   25.683059] incomplete page read in btrfs with offset 2560 and length 1536
[   25.683078] BTRFS info (device bcache0): csum failed ino 123709 off
57344 csum 1528117893 expected csum 2239543273
[   25.684339] incomplete page read in btrfs with offset 1024 and length 3072
[   26.622384] incomplete page read in btrfs with offset 1024 and length 3072
[   26.906718] incomplete page read in btrfs with offset 2560 and length 1536
[   27.823247] incomplete page read in btrfs with offset 1024 and length 3072
[   27.823265] btrfs_readpage_end_io_hook: 2 callbacks suppressed
[   27.823271] BTRFS info (device bcache0): csum failed ino 34587 off
16384 csum 1180114025 expected csum 474262911
[   28.490066] incomplete page read in btrfs with offset 2560 and length 1536
[   28.490085] BTRFS info (device bcache0): csum failed ino 65817 off
327680 csum 3065880108 expected csum 2663659117
[   29.413824] incomplete page read in btrfs with offset 1024 and length 3072
[   41.913857] incomplete page read in btrfs with offset 2560 and length 1536
[   55.761753] incomplete page read in btrfs with offset 1024 and length 3072
[   55.761771] BTRFS info (device bcache0): csum failed ino 72835 off
81920 csum 1511792656 expected csum 3733709121
[   69.636498] incomplete page read in btrfs with offset 2560 and length 1536
(...)

should I be worried?

thanks,

Fabio Pfeifer

2013/12/18 eb :
> I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as 
> follows:
>
> /dev/sdb3 - cache0 (80 GB Intel SSD)
> /dev/sdc1 - backing device (2 TB WD HDD)
>
> sdb3+sdc1 => /dev/bcache0
>
> On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
> as / and /home. What's been bothering me are the following entries in
> my kernel log:
>
> [13811.845540] incomplete page write in btrfs with offset 1536 and length 2560
> [13870.326639] incomplete page write in btrfs with offset 3072 and length 1024
>
> The offset/length values are always either 1536/2560 or 3072/1024,
> they sum up nicely to 4K. There are 607 of those in there as I am
> writing this, the machine has been up 18 hours and been under no
> particular I/O strain (it's a desktop).
>
> Trying to fix this, I unattached the cache (still using /dev/bcache0,
> but without /dev/sdb3 attached), causing these errors to disappear. As
> soon as I re-attached /dev/sdb3 they started again, so I am fairly
> sure it's an unfavorable interaction between bcache and btrfs.
>
> Is this something I should be worried about (they're only emitted with
> KERN_INFO?) or just an alignment problem? The underlying HDD is using
> 4K-Sectors, while the block_size of bcache seems to be 512, could that
> be the issue here?
>
> I've also encountered incomplete reads and a few csum errors, but I
> have not been able to trigger these regularly. I have a feeling that
> the error is more likely  o be on the bcache end (I've mailed to that
> list as well), however any insight into the matter would be much
> appreciated.
>
> Thanks,
>
> - eb
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2013-12-19 Thread Fábio Pfeifer
Forgot to mention: bcache is in writeback mode

2013/12/19 Fábio Pfeifer :
> Any update on this?
>
> I have here exactly the same issue. Kernel 3.12.5-1-ARCH, backing
> device 500 GB IDE, cache 24 GB SSD => /dev/bcache0
> On /dev/bcache I also have 2 subvolumes, / and /home. I get lots of
> messages in dmesg:
>
> (...)
> [   22.282469] BTRFS info (device bcache0): csum failed ino 56193 off
> 212992 csum 519977505 expected csum 3166125439
> [   22.282656] incomplete page read in btrfs with offset 1024 and length 3072
> [   23.370872] incomplete page read in btrfs with offset 1024 and length 3072
> [   23.370890] BTRFS info (device bcache0): csum failed ino 57765 off
> 106496 csum 3553846164 expected csum 1299185721
> [   23.505238] incomplete page read in btrfs with offset 2560 and length 1536
> [   23.505256] BTRFS info (device bcache0): csum failed ino 75922 off
> 172032 csum 1883678196 expected csum 1337496676
> [   23.508535] incomplete page read in btrfs with offset 2560 and length 1536
> [   23.508547] BTRFS info (device bcache0): csum failed ino 74368 off
> 237568 csum 2863587994 expected csum 2693116460
> [   25.683059] incomplete page read in btrfs with offset 2560 and length 1536
> [   25.683078] BTRFS info (device bcache0): csum failed ino 123709 off
> 57344 csum 1528117893 expected csum 2239543273
> [   25.684339] incomplete page read in btrfs with offset 1024 and length 3072
> [   26.622384] incomplete page read in btrfs with offset 1024 and length 3072
> [   26.906718] incomplete page read in btrfs with offset 2560 and length 1536
> [   27.823247] incomplete page read in btrfs with offset 1024 and length 3072
> [   27.823265] btrfs_readpage_end_io_hook: 2 callbacks suppressed
> [   27.823271] BTRFS info (device bcache0): csum failed ino 34587 off
> 16384 csum 1180114025 expected csum 474262911
> [   28.490066] incomplete page read in btrfs with offset 2560 and length 1536
> [   28.490085] BTRFS info (device bcache0): csum failed ino 65817 off
> 327680 csum 3065880108 expected csum 2663659117
> [   29.413824] incomplete page read in btrfs with offset 1024 and length 3072
> [   41.913857] incomplete page read in btrfs with offset 2560 and length 1536
> [   55.761753] incomplete page read in btrfs with offset 1024 and length 3072
> [   55.761771] BTRFS info (device bcache0): csum failed ino 72835 off
> 81920 csum 1511792656 expected csum 3733709121
> [   69.636498] incomplete page read in btrfs with offset 2560 and length 1536
> (...)
>
> should I be worried?
>
> thanks,
>
> Fabio Pfeifer
>
> 2013/12/18 eb :
>> I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as 
>> follows:
>>
>> /dev/sdb3 - cache0 (80 GB Intel SSD)
>> /dev/sdc1 - backing device (2 TB WD HDD)
>>
>> sdb3+sdc1 => /dev/bcache0
>>
>> On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
>> as / and /home. What's been bothering me are the following entries in
>> my kernel log:
>>
>> [13811.845540] incomplete page write in btrfs with offset 1536 and length 
>> 2560
>> [13870.326639] incomplete page write in btrfs with offset 3072 and length 
>> 1024
>>
>> The offset/length values are always either 1536/2560 or 3072/1024,
>> they sum up nicely to 4K. There are 607 of those in there as I am
>> writing this, the machine has been up 18 hours and been under no
>> particular I/O strain (it's a desktop).
>>
>> Trying to fix this, I unattached the cache (still using /dev/bcache0,
>> but without /dev/sdb3 attached), causing these errors to disappear. As
>> soon as I re-attached /dev/sdb3 they started again, so I am fairly
>> sure it's an unfavorable interaction between bcache and btrfs.
>>
>> Is this something I should be worried about (they're only emitted with
>> KERN_INFO?) or just an alignment problem? The underlying HDD is using
>> 4K-Sectors, while the block_size of bcache seems to be 512, could that
>> be the issue here?
>>
>> I've also encountered incomplete reads and a few csum errors, but I
>> have not been able to trigger these regularly. I have a feeling that
>> the error is more likely  o be on the bcache end (I've mailed to that
>> list as well), however any insight into the matter would be much
>> appreciated.
>>
>> Thanks,
>>
>> - eb
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2013-12-20 Thread Fábio Pfeifer
Hello,

I put the "WARN_ON(1);" after the printk lines (incomplete page read
and incomplete page write) in extent_io.c.

here some call traces:

[   19.509497] incomplete page read in btrfs with offset 2560 and length 1536
[   19.509500] [ cut here ]
[   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
end_bio_extent_readpage+0x788/0xc20 [btrfs]()
[   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
[   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
W  O 3.12.5-1-ARCH #1
[   19.509580] Hardware name: System manufacturer System Product
Name/P5WDG2 WS Pro, BIOS 090503/06/2008
[   19.509581]  0009 880231a63cb0 814ee37b

[   19.509585]  880231a63ce8 81062bcd ea00085eaec0

[   19.509587]  8802320cc9c0  880233b0e000
880231a63cf8
[   19.509590] Call Trace:
[   19.509596]  [] dump_stack+0x54/0x8d
[   19.509601]  [] warn_slowpath_common+0x7d/0xa0
[   19.509603]  [] warn_slowpath_null+0x1a/0x20
[   19.509614]  [] end_bio_extent_readpage+0x788/0xc20 [btrfs]
[   19.509617]  [] ? lock_timer_base.isra.35+0x2b/0x50
[   19.509619]  [] ? detach_if_pending+0x120/0x120
[   19.509623]  [] bio_endio+0x1d/0x30
[   19.509632]  [] end_workqueue_fn+0x37/0x40 [btrfs]
[   19.509642]  [] worker_loop+0x14e/0x560 [btrfs]
[   19.509646]  [] ? default_wake_function+0x12/0x20
[   19.509656]  [] ? btrfs_queue_worker+0x330/0x330 [btrfs]
[   19.509672]  [] kthread+0xc0/0xd0
[   19.509677]  [] ? kthread_create_on_node+0x120/0x120
[   19.509680]  [] ret_from_fork+0x7c/0xb0
[   19.509683]  [] ? kthread_create_on_node+0x120/0x120
[   19.509687] ---[ end trace bbc8d0d088375446 ]---
[   25.592100] incomplete page read in btrfs with offset 2560 and length 1536
[   25.592105] [ cut here ]
[   25.592141] WARNING: CPU: 0 PID: 442 at fs/btrfs/extent_io.c:2441
end_bio_extent_readpage+0x788/0xc20 [btrfs]()
[   25.592143] Modules linked in: cdc_acm fuse iTCO_wdt
iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
[   25.592205] CPU: 0 PID: 442 Comm: btrfs-endio-met Tainted: P
W  O 3.12.5-1-ARCH #1
[   25.592208] Hardware name: System manufacturer System Product
Name/P5WDG2 WS Pro, BIOS 090503/06/2008
[   25.592211]  0009 880229773cb0 814ee37b

[   25.592216]  880229773ce8 81062bcd ea0002a20a80

[   25.592220]  88022d3ab180  88022d326000
880229773cf8
[   25.592225] Call Trace:
[   25.592234]  [] dump_stack+0x54/0x8d
[   25.592240]  [] warn_slowpath_common+0x7d/0xa0
[   25.592245]  [] warn_slowpath_null+0x1a/0x20
[   25.592262]  [] end_bio_extent_readpage+0x788/0xc20 [btrfs]
[   25.592267]  [] ? try_to_del_timer_sync+0x4f/0x70
[   25.592271]  [] ? del_timer_sync+0x52/0x60
[   25.592275]  [] ? detach_if_pending+0x120/0x120
[   25.592280]  [] bio_endio+0x1d/0x30
[   25.592296]  [] end_workqueue_fn+0x37/0x40 [btrfs]
[   25.592312]  [] worker_loop+0x14e/0x560 [btrfs]
[   25.592318]  [] ? default_wake_function+0x12/0x20
[   25.592335]  [] ? btrfs_queue_worker+0x330/0x330 [btrfs]
[   25.592350]  [] kthread+0xc0/0xd0
[   25.592353]  [] ? kthread_create_on_node+0x120/0x120
[   25.592356]  [] ret_from_fork+0x7c/0xb0
[   25.592359]  [] ? kthread_create_on_node+0x120/0x120
[   25.592360] ---[ end trace bbc8d0d088375447 ]---

thanks,

Fabio Pfeifer

2013/12/19 Chris Mason :
> On Wed, 2013-12-18 at 18:17 +0100, eb wrote:
>> I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as 
>> follows:
>>
>> /dev/sdb3 - cache0 (80 GB Intel SSD)
>> /dev/sdc1 - backing device (2 TB WD HDD)
>>
>> sdb3+sdc1 => /dev/bcache0
>>
>> On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
>> as / and /home. What's been bother

Re: btrfs on bcache

2013-12-24 Thread Fábio Pfeifer
(resend int text only)
Some more information about this issue.

I installed my system last november (arch x86_64), with kernel 3.11.
That time I didn't see any csum error or
"incomplete page read" error. Some time later these errors started to
show up. I don't know exactly if it was in
3.11 -> 3.12 upgrade or somewhere in the 3.12 cycle. I've been using
bcache in writeback mode from the beginning.

I made some more testing:
  - tryed bcache in writethrough, writearound  and none modes;
  - tryed linux kernel 3.13-rc5

The errors didn't go away (maybe because my filesystem is already
corrupted). I didn't have time to test with kernel 3.11 again.

But lately the errors increased, and it started to make my system
unstable, and then unusable.
I had to reformat everything and recover my backups.

I don't have my / and /home in btrfs over bcache anymore, but I can
make some tests in a spare HD and SSD i have here. I'll report back
after Christmas.

thanks,

Fabio

2013/12/20 Chris Mason :
> On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote:
>> Hello,
>>
>> I put the "WARN_ON(1);" after the printk lines (incomplete page read
>> and incomplete page write) in extent_io.c.
>>
>> here some call traces:
>>
>> [   19.509497] incomplete page read in btrfs with offset 2560 and length 1536
>> [   19.509500] [ cut here ]
>> [   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
>> end_bio_extent_readpage+0x788/0xc20 [btrfs]()
>> [   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
>> iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
>> ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
>> evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
>> i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
>> snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
>> processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
>> usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
>> ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
>> scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
>> [   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
>> W  O 3.12.5-1-ARCH #1
>> [   19.509580] Hardware name: System manufacturer System Product
>> Name/P5WDG2 WS Pro, BIOS 090503/06/2008
>> [   19.509581]  0009 880231a63cb0 814ee37b
>> 
>> [   19.509585]  880231a63ce8 81062bcd ea00085eaec0
>> 
>> [   19.509587]  8802320cc9c0  880233b0e000
>> 880231a63cf8
>> [   19.509590] Call Trace:
>> [   19.509596]  [] dump_stack+0x54/0x8d
>> [   19.509601]  [] warn_slowpath_common+0x7d/0xa0
>> [   19.509603]  [] warn_slowpath_null+0x1a/0x20
>> [   19.509614]  [] end_bio_extent_readpage+0x788/0xc20 
>> [btrfs]
>
> This should mean that bcache is either failing to read some blocks
> properly or is fiddling with the bv_len/bv_offset fields.
>
> Could someone from bcache comment?
>
> -chris
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on bcache

2014-08-04 Thread Fábio Pfeifer
After completely loosing my filesystem twice because of this bug, I gave
up using btrfs on top of bcache (also writeback). In my case, I used to
have some subvolumes and some snapshot of these subvolumes, but not many
of them. The btrfs mantra "backup, bakcup and backup" saved me.

Best regards,

Fábio Pfeifer

2014-07-30 20:01 GMT-03:00 Larkin Lowrey :
> I've been running two backup servers, with 25T and 20T of data, using
> btrfs on bcache (writeback) for about 7 months. I periodically run btrfs
> scrubs and backup verifies (SHA1 hashes) and have never had a corruption
> issue.
>
> My use of btrfs is simple, though, with no subvolumes and no btrfs level
> raid. My bcache backing devices are LVM volumes that span multiple md
> raid6 arrays. So, either the bug has been fixed or my configuration is
> not susceptible.
>
> I'm running kernel 3.15.5-200.fc20.x86_64.
>
> --Larkin
>
> On 7/30/2014 5:04 PM, dptr...@arcor.de wrote:
>> Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does 
>> this "bug" still exists?
>>
>> Kernel 3.14
>> B: 2x HDD 1 TB
>> C: 1x SSD 256 GB
>>
>> # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru
>> # mkfs.btrfs -d raid1 -m raid1 -L "BTRFS_RAID" /dev/bcache0 /dev/bcache1
>>
>> I still have no "incomplete page write" messages in "dmesg | grep btrfs" and 
>> the checksums of some manually reviewed files are okay.
>>
>> Who has more experiences about this?
>>
>> Thanks,
>>
>> - dp
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html