subject:"BCache"

BCache

2016-11-18 Thread Heiri Müller

Hello,


I „BCache“ a BTrFS RAID with 4 hard drives.
Normal use seems to work good. I have no problems.
I have the BTrFS RAID mounted through „/dev/bcache3“.
But when I remove a disk (simulating it’s broken), BTrFS doesn’t inform me 
about a „missing disk“:

# btrfs fi show

Label: 'RAID'  uuid: d0e2e2eb-2df7-454f-8446-5213cec2de3c
Total devices 4 FS bytes used 12.55GiB
devid1 size 465.76GiB used 6.00GiB path /dev/bcache3
devid2 size 931.51GiB used 6.00GiB path /dev/bcache1
devid3 size 596.17GiB used 6.00GiB path /dev/bcache2
devid4 size 465.76GiB used 6.00GiB path /dev/bcache0


One of these (actually „/dev/bcache3“ or „/dev/sde“) should be broken or 
missing.
I can’t make a „btrfs device delete missing“ either. It replies „ERROR: not a 
btrfs filesystem:“. It doesn’t matter, if I use „/dev/bcacheX“ or „/dev/sdX“ 
for that. And if I make a „btrfs device delete missing /mnt/raid“, it replies: 
„ERROR: error removing device 'missing': no missing devices found to remove“.
It looks to me, as if BCache hides these informations from BTrFS. Is this 
possible? What do you think?


Thanks

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs on bcache

2013-12-18 Thread eb

I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as follows:

/dev/sdb3 - cache0 (80 GB Intel SSD)
/dev/sdc1 - backing device (2 TB WD HDD)

sdb3+sdc1 => /dev/bcache0

On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
as / and /home. What's been bothering me are the following entries in
my kernel log:

[13811.845540] incomplete page write in btrfs with offset 1536 and length 2560
[13870.326639] incomplete page write in btrfs with offset 3072 and length 1024

The offset/length values are always either 1536/2560 or 3072/1024,
they sum up nicely to 4K. There are 607 of those in there as I am
writing this, the machine has been up 18 hours and been under no
particular I/O strain (it's a desktop).

Trying to fix this, I unattached the cache (still using /dev/bcache0,
but without /dev/sdb3 attached), causing these errors to disappear. As
soon as I re-attached /dev/sdb3 they started again, so I am fairly
sure it's an unfavorable interaction between bcache and btrfs.

Is this something I should be worried about (they're only emitted with
KERN_INFO?) or just an alignment problem? The underlying HDD is using
4K-Sectors, while the block_size of bcache seems to be 512, could that
be the issue here?

I've also encountered incomplete reads and a few csum errors, but I
have not been able to trigger these regularly. I have a feeling that
the error is more likely  o be on the bcache end (I've mailed to that
list as well), however any insight into the matter would be much
appreciated.

Thanks,

- eb
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs on bcache

2014-04-30 Thread Felix Homann

Hi,
a couple of months ago there has been some discussion about issues
when using btrfs on bcache:

http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018

>From looking at the mailing list archives I cannot tell whether or not
this issue has been resolved in current kernels from either bcache's
or btrfs' side.

Can anyone tell me what's the current state of this issue? Should it
be safe to use btrfs on bcache by now?

Thanks and kind regards,
Felix
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs on bcache

2014-07-30 Thread dptrash

Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does 
this "bug" still exists?

Kernel 3.14
B: 2x HDD 1 TB
C: 1x SSD 256 GB

# make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru
# mkfs.btrfs -d raid1 -m raid1 -L "BTRFS_RAID" /dev/bcache0 /dev/bcache1

I still have no "incomplete page write" messages in "dmesg | grep btrfs" and 
the checksums of some manually reviewed files are okay.

Who has more experiences about this?

Thanks,

- dp
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs on bcache

2014-07-31 Thread dptrash

Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does 
this "bug" still exists?

Kernel 3.14
B: 2x HDD 1 TB
C: 1x SSD 256 GB

# make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru
# mkfs.btrfs -d raid1 -m raid1 -L "BTRFS_RAID" /dev/bcache0 /dev/bcache1

I still have no "incomplete page write" messages in "dmesg | grep btrfs" and 
the checksums of some manually reviewed files are okay.

Who has more experiences about this?

Thanks,

- dp
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Give up on bcache?

2017-09-26 Thread Ferry Toth

Looking at the Phoronix benchmark here:

https://www.phoronix.com/scan.php?page=article&item=linux414-bcache-
raid&num=2

I think it might be idle hopes to think bcache can be used as a ssd cache 
for btrfs to significantly improve performance.. True, the benchmark is 
using ext.

But the most important one (where btrfs always shows to be a little slow) 
would be the SQLLite test. And with ext at least performance _degrades_ 
except for the Writeback mode, and even there is nowhere near what the 
SSD is capable of.

I think with btrfs it will be even worse and that it is a fundamental 
problem: caching is complex and the cache can not how how the data on the 
fs is used.

I think the original idea of hot data tracking has a much better chance 
to significantly improve performance. This of course as the SSD's and 
HDD's then will be equal citizens and btrfs itself gets to decide on 
which drive the data is best stored.

With this implemented right, it would also finally silence the never 
ending discussion why not btrfs and why zfs, ext, xfs etc. Which would be 
a plus by its own right.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2013-12-19 Thread Fábio Pfeifer

Any update on this?

I have here exactly the same issue. Kernel 3.12.5-1-ARCH, backing
device 500 GB IDE, cache 24 GB SSD => /dev/bcache0
On /dev/bcache I also have 2 subvolumes, / and /home. I get lots of
messages in dmesg:

(...)
[   22.282469] BTRFS info (device bcache0): csum failed ino 56193 off
212992 csum 519977505 expected csum 3166125439
[   22.282656] incomplete page read in btrfs with offset 1024 and length 3072
[   23.370872] incomplete page read in btrfs with offset 1024 and length 3072
[   23.370890] BTRFS info (device bcache0): csum failed ino 57765 off
106496 csum 3553846164 expected csum 1299185721
[   23.505238] incomplete page read in btrfs with offset 2560 and length 1536
[   23.505256] BTRFS info (device bcache0): csum failed ino 75922 off
172032 csum 1883678196 expected csum 1337496676
[   23.508535] incomplete page read in btrfs with offset 2560 and length 1536
[   23.508547] BTRFS info (device bcache0): csum failed ino 74368 off
237568 csum 2863587994 expected csum 2693116460
[   25.683059] incomplete page read in btrfs with offset 2560 and length 1536
[   25.683078] BTRFS info (device bcache0): csum failed ino 123709 off
57344 csum 1528117893 expected csum 2239543273
[   25.684339] incomplete page read in btrfs with offset 1024 and length 3072
[   26.622384] incomplete page read in btrfs with offset 1024 and length 3072
[   26.906718] incomplete page read in btrfs with offset 2560 and length 1536
[   27.823247] incomplete page read in btrfs with offset 1024 and length 3072
[   27.823265] btrfs_readpage_end_io_hook: 2 callbacks suppressed
[   27.823271] BTRFS info (device bcache0): csum failed ino 34587 off
16384 csum 1180114025 expected csum 474262911
[   28.490066] incomplete page read in btrfs with offset 2560 and length 1536
[   28.490085] BTRFS info (device bcache0): csum failed ino 65817 off
327680 csum 3065880108 expected csum 2663659117
[   29.413824] incomplete page read in btrfs with offset 1024 and length 3072
[   41.913857] incomplete page read in btrfs with offset 2560 and length 1536
[   55.761753] incomplete page read in btrfs with offset 1024 and length 3072
[   55.761771] BTRFS info (device bcache0): csum failed ino 72835 off
81920 csum 1511792656 expected csum 3733709121
[   69.636498] incomplete page read in btrfs with offset 2560 and length 1536
(...)

should I be worried?

thanks,

Fabio Pfeifer

2013/12/18 eb :
> I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as 
> follows:
>
> /dev/sdb3 - cache0 (80 GB Intel SSD)
> /dev/sdc1 - backing device (2 TB WD HDD)
>
> sdb3+sdc1 => /dev/bcache0
>
> On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
> as / and /home. What's been bothering me are the following entries in
> my kernel log:
>
> [13811.845540] incomplete page write in btrfs with offset 1536 and length 2560
> [13870.326639] incomplete page write in btrfs with offset 3072 and length 1024
>
> The offset/length values are always either 1536/2560 or 3072/1024,
> they sum up nicely to 4K. There are 607 of those in there as I am
> writing this, the machine has been up 18 hours and been under no
> particular I/O strain (it's a desktop).
>
> Trying to fix this, I unattached the cache (still using /dev/bcache0,
> but without /dev/sdb3 attached), causing these errors to disappear. As
> soon as I re-attached /dev/sdb3 they started again, so I am fairly
> sure it's an unfavorable interaction between bcache and btrfs.
>
> Is this something I should be worried about (they're only emitted with
> KERN_INFO?) or just an alignment problem? The underlying HDD is using
> 4K-Sectors, while the block_size of bcache seems to be 512, could that
> be the issue here?
>
> I've also encountered incomplete reads and a few csum errors, but I
> have not been able to trigger these regularly. I have a feeling that
> the error is more likely  o be on the bcache end (I've mailed to that
> list as well), however any insight into the matter would be much
> appreciated.
>
> Thanks,
>
> - eb
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2013-12-19 Thread Fábio Pfeifer

Forgot to mention: bcache is in writeback mode

2013/12/19 Fábio Pfeifer :
> Any update on this?
>
> I have here exactly the same issue. Kernel 3.12.5-1-ARCH, backing
> device 500 GB IDE, cache 24 GB SSD => /dev/bcache0
> On /dev/bcache I also have 2 subvolumes, / and /home. I get lots of
> messages in dmesg:
>
> (...)
> [   22.282469] BTRFS info (device bcache0): csum failed ino 56193 off
> 212992 csum 519977505 expected csum 3166125439
> [   22.282656] incomplete page read in btrfs with offset 1024 and length 3072
> [   23.370872] incomplete page read in btrfs with offset 1024 and length 3072
> [   23.370890] BTRFS info (device bcache0): csum failed ino 57765 off
> 106496 csum 3553846164 expected csum 1299185721
> [   23.505238] incomplete page read in btrfs with offset 2560 and length 1536
> [   23.505256] BTRFS info (device bcache0): csum failed ino 75922 off
> 172032 csum 1883678196 expected csum 1337496676
> [   23.508535] incomplete page read in btrfs with offset 2560 and length 1536
> [   23.508547] BTRFS info (device bcache0): csum failed ino 74368 off
> 237568 csum 2863587994 expected csum 2693116460
> [   25.683059] incomplete page read in btrfs with offset 2560 and length 1536
> [   25.683078] BTRFS info (device bcache0): csum failed ino 123709 off
> 57344 csum 1528117893 expected csum 2239543273
> [   25.684339] incomplete page read in btrfs with offset 1024 and length 3072
> [   26.622384] incomplete page read in btrfs with offset 1024 and length 3072
> [   26.906718] incomplete page read in btrfs with offset 2560 and length 1536
> [   27.823247] incomplete page read in btrfs with offset 1024 and length 3072
> [   27.823265] btrfs_readpage_end_io_hook: 2 callbacks suppressed
> [   27.823271] BTRFS info (device bcache0): csum failed ino 34587 off
> 16384 csum 1180114025 expected csum 474262911
> [   28.490066] incomplete page read in btrfs with offset 2560 and length 1536
> [   28.490085] BTRFS info (device bcache0): csum failed ino 65817 off
> 327680 csum 3065880108 expected csum 2663659117
> [   29.413824] incomplete page read in btrfs with offset 1024 and length 3072
> [   41.913857] incomplete page read in btrfs with offset 2560 and length 1536
> [   55.761753] incomplete page read in btrfs with offset 1024 and length 3072
> [   55.761771] BTRFS info (device bcache0): csum failed ino 72835 off
> 81920 csum 1511792656 expected csum 3733709121
> [   69.636498] incomplete page read in btrfs with offset 2560 and length 1536
> (...)
>
> should I be worried?
>
> thanks,
>
> Fabio Pfeifer
>
> 2013/12/18 eb :
>> I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as 
>> follows:
>>
>> /dev/sdb3 - cache0 (80 GB Intel SSD)
>> /dev/sdc1 - backing device (2 TB WD HDD)
>>
>> sdb3+sdc1 => /dev/bcache0
>>
>> On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
>> as / and /home. What's been bothering me are the following entries in
>> my kernel log:
>>
>> [13811.845540] incomplete page write in btrfs with offset 1536 and length 
>> 2560
>> [13870.326639] incomplete page write in btrfs with offset 3072 and length 
>> 1024
>>
>> The offset/length values are always either 1536/2560 or 3072/1024,
>> they sum up nicely to 4K. There are 607 of those in there as I am
>> writing this, the machine has been up 18 hours and been under no
>> particular I/O strain (it's a desktop).
>>
>> Trying to fix this, I unattached the cache (still using /dev/bcache0,
>> but without /dev/sdb3 attached), causing these errors to disappear. As
>> soon as I re-attached /dev/sdb3 they started again, so I am fairly
>> sure it's an unfavorable interaction between bcache and btrfs.
>>
>> Is this something I should be worried about (they're only emitted with
>> KERN_INFO?) or just an alignment problem? The underlying HDD is using
>> 4K-Sectors, while the block_size of bcache seems to be 512, could that
>> be the issue here?
>>
>> I've also encountered incomplete reads and a few csum errors, but I
>> have not been able to trigger these regularly. I have a feeling that
>> the error is more likely  o be on the bcache end (I've mailed to that
>> list as well), however any insight into the matter would be much
>> appreciated.
>>
>> Thanks,
>>
>> - eb
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2013-12-19 Thread Chris Mason

On Wed, 2013-12-18 at 18:17 +0100, eb wrote:
> I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as 
> follows:
> 
> /dev/sdb3 - cache0 (80 GB Intel SSD)
> /dev/sdc1 - backing device (2 TB WD HDD)
> 
> sdb3+sdc1 => /dev/bcache0
> 
> On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted
> as / and /home. What's been bothering me are the following entries in
> my kernel log:
> 
> [13811.845540] incomplete page write in btrfs with offset 1536 and length 2560
> [13870.326639] incomplete page write in btrfs with offset 3072 and length 1024
> 
> The offset/length values are always either 1536/2560 or 3072/1024,
> they sum up nicely to 4K. There are 607 of those in there as I am
> writing this, the machine has been up 18 hours and been under no
> particular I/O strain (it's a desktop).

Btrfs shouldn't be setting the offset on the bios.  Are you able to add
a WARN_ON to the message that prints this so we can see the stack trace?

Could you please cc the bcache and btrfs list together?

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2013-12-20 Thread eb

On Thu, Dec 19, 2013 at 8:59 PM, Chris Mason  wrote:
> On Wed, 2013-12-18 at 18:17 +0100, eb wrote:
> Btrfs shouldn't be setting the offset on the bios.  Are you able to add
> a WARN_ON to the message that prints this so we can see the stack trace?

If you send me a patch - my experience on hacking on the kernel is
exactly 0 - I'll try to see if I can compile a custom kernel and get
it running.

> Could you please cc the bcache and btrfs list together?

Done.

I did some more testing - I copied an image of a 128GB drive over the
network (via netcat) onto the bcache/btrfs system and verified the
results twice using sha1sum. They're both identical on the source
system (which is *not* using bcache) and bcache/btrfs setup. I've
gotten a lot of the incomplete write errors and a few csum erros in
dmesg, but apparently they haven't done any harm?

Not sure how remarkable this is, as these kinds of things are supposed
to bypass the cache anyway, but I assume they still have to go through
the subsystem.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2013-12-20 Thread Fábio Pfeifer

Hello,

I put the "WARN_ON(1);" after the printk lines (incomplete page read
and incomplete page write) in extent_io.c.

here some call traces:

[   19.509497] incomplete page read in btrfs with offset 2560 and length 1536
[   19.509500] [ cut here ]
[   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
end_bio_extent_readpage+0x788/0xc20 [btrfs]()
[   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
[   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
W  O 3.12.5-1-ARCH #1
[   19.509580] Hardware name: System manufacturer System Product
Name/P5WDG2 WS Pro, BIOS 090503/06/2008
[   19.509581]  0009 880231a63cb0 814ee37b

[   19.509585]  880231a63ce8 81062bcd ea00085eaec0

[   19.509587]  8802320cc9c0  880233b0e000
880231a63cf8
[   19.509590] Call Trace:
[   19.509596]  [] dump_stack+0x54/0x8d
[   19.509601]  [] warn_slowpath_common+0x7d/0xa0
[   19.509603]  [] warn_slowpath_null+0x1a/0x20
[   19.509614]  [] end_bio_extent_readpage+0x788/0xc20 [btrfs]
[   19.509617]  [] ? lock_timer_base.isra.35+0x2b/0x50
[   19.509619]  [] ? detach_if_pending+0x120/0x120
[   19.509623]  [] bio_endio+0x1d/0x30
[   19.509632]  [] end_workqueue_fn+0x37/0x40 [btrfs]
[   19.509642]  [] worker_loop+0x14e/0x560 [btrfs]
[   19.509646]  [] ? default_wake_function+0x12/0x20
[   19.509656]  [] ? btrfs_queue_worker+0x330/0x330 [btrfs]
[   19.509672]  [] kthread+0xc0/0xd0
[   19.509677]  [] ? kthread_create_on_node+0x120/0x120
[   19.509680]  [] ret_from_fork+0x7c/0xb0
[   19.509683]  [] ? kthread_create_on_node+0x120/0x120
[   19.509687] ---[ end trace bbc8d0d088375446 ]---
[   25.592100] incomplete page read in btrfs with offset 2560 and length 1536
[   25.592105] [ cut here ]
[   25.592141] WARNING: CPU: 0 PID: 442 at fs/btrfs/extent_io.c:2441
end_bio_extent_readpage+0x788/0xc20 [btrfs]()
[   25.592143] Modules linked in: cdc_acm fuse iTCO_wdt
iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
[   25.592205] CPU: 0 PID: 442 Comm: btrfs-endio-met Tainted: P
W  O 3.12.5-1-ARCH #1
[   25.592208] Hardware name: System manufacturer System Product
Name/P5WDG2 WS Pro, BIOS 090503/06/2008
[   25.592211]  0009 880229773cb0 814ee37b

[   25.592216]  880229773ce8 81062bcd ea0002a20a80

[   25.592220]  88022d3ab180  88022d326000
880229773cf8
[   25.592225] Call Trace:
[   25.592234]  [] dump_stack+0x54/0x8d
[   25.592240]  [] warn_slowpath_common+0x7d/0xa0
[   25.592245]  [] warn_slowpath_null+0x1a/0x20
[   25.592262]  [] end_bio_extent_readpage+0x788/0xc20 [btrfs]
[   25.592267]  [] ? try_to_del_timer_sync+0x4f/0x70
[   25.592271]  [] ? del_timer_sync+0x52/0x60
[   25.592275]  [] ? detach_if_pending+0x120/0x120
[   25.592280]  [] bio_endio+0x1d/0x30
[   25.592296]  [] end_workqueue_fn+0x37/0x40 [btrfs]
[   25.592312]  [] worker_loop+0x14e/0x560 [btrfs]
[   25.592318]  [] ? default_wake_function+0x12/0x20
[   25.592335]  [] ? btrfs_queue_worker+0x330/0x330 [btrfs]
[   25.592350]  [] kthread+0xc0/0xd0
[   25.592353]  [] ? kthread_create_on_node+0x120/0x120
[   25.592356]  [] ret_from_fork+0x7c/0xb0
[   25.592359]  [] ? kthread_create_on_node+0x120/0x120
[   25.592360] ---[ end trace bbc8d0d088375447 ]---

thanks,

Fabio Pfeifer

2013/12/19 Chris Mason :
> On Wed, 2013-12-18 at 18:17 +0100, eb wrote:
>> I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as 
>> follows:
>>
>> /dev/sdb3 - cache0 (80 GB Intel SSD)
>> /dev/sdc1 - backing device (2 TB WD HDD)
>>
>> sdb3+sdc1 => /dev/bcache0
>>
>> On /dev/bcache0, there

Re: btrfs on bcache

2013-12-20 Thread Chris Mason

On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote:
> Hello,
> 
> I put the "WARN_ON(1);" after the printk lines (incomplete page read
> and incomplete page write) in extent_io.c.
> 
> here some call traces:
> 
> [   19.509497] incomplete page read in btrfs with offset 2560 and length 1536
> [   19.509500] [ cut here ]
> [   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
> end_bio_extent_readpage+0x788/0xc20 [btrfs]()
> [   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
> iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
> ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
> evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
> i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
> snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
> processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
> usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
> ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
> scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
> [   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
> W  O 3.12.5-1-ARCH #1
> [   19.509580] Hardware name: System manufacturer System Product
> Name/P5WDG2 WS Pro, BIOS 090503/06/2008
> [   19.509581]  0009 880231a63cb0 814ee37b
> 
> [   19.509585]  880231a63ce8 81062bcd ea00085eaec0
> 
> [   19.509587]  8802320cc9c0  880233b0e000
> 880231a63cf8
> [   19.509590] Call Trace:
> [   19.509596]  [] dump_stack+0x54/0x8d
> [   19.509601]  [] warn_slowpath_common+0x7d/0xa0
> [   19.509603]  [] warn_slowpath_null+0x1a/0x20
> [   19.509614]  [] end_bio_extent_readpage+0x788/0xc20 
> [btrfs]

This should mean that bcache is either failing to read some blocks
properly or is fiddling with the bv_len/bv_offset fields.

Could someone from bcache comment?

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2013-12-20 Thread Henry de Valence

On Thu, Dec 19, 2013 at 2:04 PM, Fábio Pfeifer  wrote:
> Any update on this?
>
> I have here exactly the same issue. Kernel 3.12.5-1-ARCH, backing
> device 500 GB IDE, cache 24 GB SSD => /dev/bcache0
> On /dev/bcache I also have 2 subvolumes, / and /home. I get lots of
> messages in dmesg:

I also have this issue.

Also, this afternoon I experienced data corruption on my btrfs device
(checksum errors), which might or might not be related. I don't really
know how to determine the cause, but if anyone has suggestions they'd
be appreciated.

Cheers,
Henry de Valence
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2013-12-24 Thread Fábio Pfeifer

(resend int text only)
Some more information about this issue.

I installed my system last november (arch x86_64), with kernel 3.11.
That time I didn't see any csum error or
"incomplete page read" error. Some time later these errors started to
show up. I don't know exactly if it was in
3.11 -> 3.12 upgrade or somewhere in the 3.12 cycle. I've been using
bcache in writeback mode from the beginning.

I made some more testing:
  - tryed bcache in writethrough, writearound  and none modes;
  - tryed linux kernel 3.13-rc5

The errors didn't go away (maybe because my filesystem is already
corrupted). I didn't have time to test with kernel 3.11 again.

But lately the errors increased, and it started to make my system
unstable, and then unusable.
I had to reformat everything and recover my backups.

I don't have my / and /home in btrfs over bcache anymore, but I can
make some tests in a spare HD and SSD i have here. I'll report back
after Christmas.

thanks,

Fabio

2013/12/20 Chris Mason :
> On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote:
>> Hello,
>>
>> I put the "WARN_ON(1);" after the printk lines (incomplete page read
>> and incomplete page write) in extent_io.c.
>>
>> here some call traces:
>>
>> [   19.509497] incomplete page read in btrfs with offset 2560 and length 1536
>> [   19.509500] [ cut here ]
>> [   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
>> end_bio_extent_readpage+0x788/0xc20 [btrfs]()
>> [   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
>> iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
>> ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
>> evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
>> i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
>> snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
>> processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
>> usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
>> ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
>> scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
>> [   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
>> W  O 3.12.5-1-ARCH #1
>> [   19.509580] Hardware name: System manufacturer System Product
>> Name/P5WDG2 WS Pro, BIOS 090503/06/2008
>> [   19.509581]  0009 880231a63cb0 814ee37b
>> 
>> [   19.509585]  880231a63ce8 81062bcd ea00085eaec0
>> 
>> [   19.509587]  8802320cc9c0  880233b0e000
>> 880231a63cf8
>> [   19.509590] Call Trace:
>> [   19.509596]  [] dump_stack+0x54/0x8d
>> [   19.509601]  [] warn_slowpath_common+0x7d/0xa0
>> [   19.509603]  [] warn_slowpath_null+0x1a/0x20
>> [   19.509614]  [] end_bio_extent_readpage+0x788/0xc20 
>> [btrfs]
>
> This should mean that bcache is either failing to read some blocks
> properly or is fiddling with the bv_len/bv_offset fields.
>
> Could someone from bcache comment?
>
> -chris
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-01-06 Thread Kent Overstreet

On Fri, Dec 20, 2013 at 03:46:30PM +, Chris Mason wrote:
> On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote:
> > Hello,
> > 
> > I put the "WARN_ON(1);" after the printk lines (incomplete page read
> > and incomplete page write) in extent_io.c.
> > 
> > here some call traces:
> > 
> > [   19.509497] incomplete page read in btrfs with offset 2560 and length 
> > 1536
> > [   19.509500] [ cut here ]
> > [   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
> > end_bio_extent_readpage+0x788/0xc20 [btrfs]()
> > [   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
> > iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
> > ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
> > evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
> > i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
> > snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
> > processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
> > usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
> > ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
> > scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
> > [   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
> > W  O 3.12.5-1-ARCH #1
> > [   19.509580] Hardware name: System manufacturer System Product
> > Name/P5WDG2 WS Pro, BIOS 090503/06/2008
> > [   19.509581]  0009 880231a63cb0 814ee37b
> > 
> > [   19.509585]  880231a63ce8 81062bcd ea00085eaec0
> > 
> > [   19.509587]  8802320cc9c0  880233b0e000
> > 880231a63cf8
> > [   19.509590] Call Trace:
> > [   19.509596]  [] dump_stack+0x54/0x8d
> > [   19.509601]  [] warn_slowpath_common+0x7d/0xa0
> > [   19.509603]  [] warn_slowpath_null+0x1a/0x20
> > [   19.509614]  [] end_bio_extent_readpage+0x788/0xc20 
> > [btrfs]
> 
> This should mean that bcache is either failing to read some blocks
> properly or is fiddling with the bv_len/bv_offset fields.
> 
> Could someone from bcache comment?

Oh man, I found this and then threw up my hands in despair.

Bcache isn't doing anything with the bv_len/bv_offset fields; it may clone the
biovec so it can retry a bio on error, if the biovecs weren't all whole pages,
otherwise it just passes the biovec down with the next bio to the underlying
cache/backing device.

What btrfs appears to be doing though - I couldn't believe that code actually
_worked_, Jens please jump in here but AFAIK bv_len/bv_offset are in practice
undefined after a bio's completed, they might have been updated if the driver
was using blk_update_request but for many drivers that just process the entire
bio all at once they just won't touch those fields - and that includes anything
that clones the bio (md/dm).

This is probably relevant to immutable biovecs here...

-

Ok, I looked again at the relevant btrfs code, I guess I can see how this printk
isn't normally triggered. But Chris, _what on earth_ is btrfs trying to check
for here? And why is it using bv_offset and bv_len further down in
end_bio_extent_readpage()?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-01-08 Thread Chris Mason

On Mon, 2014-01-06 at 15:37 -0800, Kent Overstreet wrote:
> On Fri, Dec 20, 2013 at 03:46:30PM +, Chris Mason wrote:
> > On Fri, 2013-12-20 at 10:42 -0200, Fábio Pfeifer wrote:
> > > Hello,
> > > 
> > > I put the "WARN_ON(1);" after the printk lines (incomplete page read
> > > and incomplete page write) in extent_io.c.
> > > 
> > > here some call traces:
> > > 
> > > [   19.509497] incomplete page read in btrfs with offset 2560 and length 
> > > 1536
> > > [   19.509500] [ cut here ]
> > > [   19.509528] WARNING: CPU: 2 PID: 220 at fs/btrfs/extent_io.c:2441
> > > end_bio_extent_readpage+0x788/0xc20 [btrfs]()
> > > [   19.509530] Modules linked in: cdc_acm fuse iTCO_wdt
> > > iTCO_vendor_support snd_hda_codec_analog coretemp kvm_intel kvm raid1
> > > ext4 crc16 md_mod mbcache jbd2 microcode nvidia(PO) psmouse pcspkr
> > > evdev serio_raw i2c_i801 lpc_ich i2c_core snd_hda_intel sky2 skge
> > > i82975x_edac button asus_atk0110 snd_hda_codec snd_hwdep shpchp
> > > snd_pcm snd_page_alloc snd_timer acpi_cpufreq snd edac_core soundcore
> > > processor vboxdrv(O) sr_mod cdrom ata_generic pata_acpi hid_generic
> > > usbhid hid usb_storage sd_mod pata_marvell firewire_ohci uhci_hcd ahci
> > > ehci_pci firewire_core ata_piix libahci crc_itu_t ehci_hcd libata
> > > scsi_mod usbcore usb_common btrfs crc32c libcrc32c xor raid6_pq bcache
> > > [   19.509578] CPU: 2 PID: 220 Comm: btrfs-endio-met Tainted: P
> > > W  O 3.12.5-1-ARCH #1
> > > [   19.509580] Hardware name: System manufacturer System Product
> > > Name/P5WDG2 WS Pro, BIOS 090503/06/2008
> > > [   19.509581]  0009 880231a63cb0 814ee37b
> > > 
> > > [   19.509585]  880231a63ce8 81062bcd ea00085eaec0
> > > 
> > > [   19.509587]  8802320cc9c0  880233b0e000
> > > 880231a63cf8
> > > [   19.509590] Call Trace:
> > > [   19.509596]  [] dump_stack+0x54/0x8d
> > > [   19.509601]  [] warn_slowpath_common+0x7d/0xa0
> > > [   19.509603]  [] warn_slowpath_null+0x1a/0x20
> > > [   19.509614]  [] end_bio_extent_readpage+0x788/0xc20 
> > > [btrfs]
> > 
> > This should mean that bcache is either failing to read some blocks
> > properly or is fiddling with the bv_len/bv_offset fields.
> > 
> > Could someone from bcache comment?
> 
> Oh man, I found this and then threw up my hands in despair.
> 
> Bcache isn't doing anything with the bv_len/bv_offset fields; it may clone the
> biovec so it can retry a bio on error, if the biovecs weren't all whole pages,
> otherwise it just passes the biovec down with the next bio to the underlying
> cache/backing device.
> 
> What btrfs appears to be doing though - I couldn't believe that code actually
> _worked_, Jens please jump in here but AFAIK bv_len/bv_offset are in practice
> undefined after a bio's completed, they might have been updated if the driver
> was using blk_update_request but for many drivers that just process the entire
> bio all at once they just won't touch those fields - and that includes 
> anything
> that clones the bio (md/dm).
> 
> This is probably relevant to immutable biovecs here...
> 
> -
> 
> Ok, I looked again at the relevant btrfs code, I guess I can see how this 
> printk
> isn't normally triggered. But Chris, _what on earth_ is btrfs trying to check
> for here? And why is it using bv_offset and bv_len further down in
> end_bio_extent_readpage()?

After the IO is done, we're recording the specific logical byte range
that covered the IO.  In practice its always the full page, we can
switch to just trusting PAGE_CACHE_SIZE.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-01-08 Thread Kent Overstreet

On Wed, Jan 08, 2014 at 07:35:32PM +, Chris Mason wrote:
> On Mon, 2014-01-06 at 15:37 -0800, Kent Overstreet wrote:
> > Ok, I looked again at the relevant btrfs code, I guess I can see how this 
> > printk
> > isn't normally triggered. But Chris, _what on earth_ is btrfs trying to 
> > check
> > for here? And why is it using bv_offset and bv_len further down in
> > end_bio_extent_readpage()?
> 
> After the IO is done, we're recording the specific logical byte range
> that covered the IO.  In practice its always the full page, we can
> switch to just trusting PAGE_CACHE_SIZE.

Yeah, the code already assumes it was doing PAGE_CACHE_SIZE reads; what
you're effectively checking is that the driver did the bvec all at once,
and that it didn't process half a bvec, update it, then process the rest
- which is a completely fine thing to do.

So for now - yeah, the correct thing to do is to just ignore
bv_offset/bv_len and go by PAGE_CACHE_SIZE. But - after immutable
biovecs is in, _then_ you'll be able to depend on bv_offset/bv_len
remaining unchanged (and you can get rid of your dependency on
PAGE_CACHE_SIZE bvecs).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-05-01 Thread Austin S Hemmelgarn

On 2014-04-30 14:16, Felix Homann wrote:
> Hi,
> a couple of months ago there has been some discussion about issues
> when using btrfs on bcache:
> 
> http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018
> 
> From looking at the mailing list archives I cannot tell whether or not
> this issue has been resolved in current kernels from either bcache's
> or btrfs' side.
> 
> Can anyone tell me what's the current state of this issue? Should it
> be safe to use btrfs on bcache by now?

In all practicality, I don't think anyone who frequents the list knows.
 I do know that there are a number of people (myself included) who avoid
bcache in general because of having issues with seemingly random kernel
OOPSes when it is linked in (either as a module or compiled in), even
when it isn't being used.  My advice would be to just test it with some
non-essential data (maybe set up a virtual machine?).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-07-30 Thread Larkin Lowrey

I've been running two backup servers, with 25T and 20T of data, using
btrfs on bcache (writeback) for about 7 months. I periodically run btrfs
scrubs and backup verifies (SHA1 hashes) and have never had a corruption
issue.

My use of btrfs is simple, though, with no subvolumes and no btrfs level
raid. My bcache backing devices are LVM volumes that span multiple md
raid6 arrays. So, either the bug has been fixed or my configuration is
not susceptible.

I'm running kernel 3.15.5-200.fc20.x86_64.

--Larkin

On 7/30/2014 5:04 PM, dptr...@arcor.de wrote:
> Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does 
> this "bug" still exists?
>
> Kernel 3.14
> B: 2x HDD 1 TB
> C: 1x SSD 256 GB
>
> # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru
> # mkfs.btrfs -d raid1 -m raid1 -L "BTRFS_RAID" /dev/bcache0 /dev/bcache1
>
> I still have no "incomplete page write" messages in "dmesg | grep btrfs" and 
> the checksums of some manually reviewed files are okay.
>
> Who has more experiences about this?
>
> Thanks,
>
> - dp
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-07-31 Thread Duncan

dptrash posted on Thu, 31 Jul 2014 17:35:44 +0200 as excerpted:

> Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018,
> does this "bug" still exists?
> 
> Kernel 3.14 B: 2x HDD 1 TB C: 1x SSD 256 GB
> 
> # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc
> --cache_replacement_policy=lru
> # mkfs.btrfs -d raid1 -m raid1 -L "BTRFS_RAID" /dev/bcache0 /dev/bcache1
> 
> I still have no "incomplete page write" messages in "dmesg | grep btrfs"
> and the checksums of some manually reviewed files are okay.
> 
> Who has more experiences about this?

See the reply (not mine) to your earlier post of the question:

http://permalink.gmane.org/gmane.linux.kernel.bcache.devel/2602

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-08-04 Thread Fábio Pfeifer

After completely loosing my filesystem twice because of this bug, I gave
up using btrfs on top of bcache (also writeback). In my case, I used to
have some subvolumes and some snapshot of these subvolumes, but not many
of them. The btrfs mantra "backup, bakcup and backup" saved me.

Best regards,

Fábio Pfeifer

2014-07-30 20:01 GMT-03:00 Larkin Lowrey :
> I've been running two backup servers, with 25T and 20T of data, using
> btrfs on bcache (writeback) for about 7 months. I periodically run btrfs
> scrubs and backup verifies (SHA1 hashes) and have never had a corruption
> issue.
>
> My use of btrfs is simple, though, with no subvolumes and no btrfs level
> raid. My bcache backing devices are LVM volumes that span multiple md
> raid6 arrays. So, either the bug has been fixed or my configuration is
> not susceptible.
>
> I'm running kernel 3.15.5-200.fc20.x86_64.
>
> --Larkin
>
> On 7/30/2014 5:04 PM, dptr...@arcor.de wrote:
>> Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does 
>> this "bug" still exists?
>>
>> Kernel 3.14
>> B: 2x HDD 1 TB
>> C: 1x SSD 256 GB
>>
>> # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru
>> # mkfs.btrfs -d raid1 -m raid1 -L "BTRFS_RAID" /dev/bcache0 /dev/bcache1
>>
>> I still have no "incomplete page write" messages in "dmesg | grep btrfs" and 
>> the checksums of some manually reviewed files are okay.
>>
>> Who has more experiences about this?
>>
>> Thanks,
>>
>> - dp
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs on bcache

2014-08-20 Thread raphead


Hi,
has this issue been resolved?
I would like to use the bcache + btrfs combo.
Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Give up on bcache?

2017-09-26 Thread Roman Mamedov

On Tue, 26 Sep 2017 16:50:00 + (UTC)
Ferry Toth  wrote:

> https://www.phoronix.com/scan.php?page=article&item=linux414-bcache-
> raid&num=2
> 
> I think it might be idle hopes to think bcache can be used as a ssd cache 
> for btrfs to significantly improve performance..

My personal real-world experience shows that SSD caching -- with lvmcache --
does indeed significantly improve performance of a large Btrfs filesystem with
slowish base storage.

And that article, sadly, only demonstrates once again the general mediocre
quality of Phoronix content: it is an astonishing oversight to not check out
lvmcache in the same setup, to at least try to draw some useful conclusion, is
it Bcache that is strangely deficient, or SSD caching as a general concept
does not work well in the hardware setup utilized.

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Give up on bcache?

2017-09-26 Thread Kai Krakow

Am Tue, 26 Sep 2017 23:33:19 +0500
schrieb Roman Mamedov :

> On Tue, 26 Sep 2017 16:50:00 + (UTC)
> Ferry Toth  wrote:
> 
> > https://www.phoronix.com/scan.php?page=article&item=linux414-bcache-
> > raid&num=2
> > 
> > I think it might be idle hopes to think bcache can be used as a ssd
> > cache for btrfs to significantly improve performance..  
> 
> My personal real-world experience shows that SSD caching -- with
> lvmcache -- does indeed significantly improve performance of a large
> Btrfs filesystem with slowish base storage.
> 
> And that article, sadly, only demonstrates once again the general
> mediocre quality of Phoronix content: it is an astonishing oversight
> to not check out lvmcache in the same setup, to at least try to draw
> some useful conclusion, is it Bcache that is strangely deficient, or
> SSD caching as a general concept does not work well in the hardware
> setup utilized.

Bcache is actually not meant to increase benchmark performance except
for very few corner cases. It is designed to improve interactivity and
perceived performance, reducing head movements. On the bcache homepage
there's actually tips on how to benchmark bcache correctly, including
warm-up phase and turning on sequential caching. Phoronix doesn't do
that, they test default settings, which is imho a good thing but you
should know the consequences and research how to turn the knobs.

Depending on the caching mode and cache size, the SQlite test may not
show real-world numbers. Also, you should optimize some btrfs options
to work correctly with bcache, e.g. force it to mount "nossd" as it
detects the bcache device as SSD - which is wrong for some workloads, I
think especially desktop workloads and most server workloads.

Also, you may want to tune udev to correct some attributes so other
applications can do their detection and behavior correctly, too:

$ cat /etc/udev/rules.d/00-ssd-scheduler.rules
ACTION=="add|change", KERNEL=="bcache*", ATTR{queue/rotational}="1"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", 
ATTR{queue/iosched/slice_idle}="0"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", 
ATTR{queue/scheduler}="kyber"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", 
ATTR{queue/scheduler}="bfq"

Take note: on a non-mq system you may want to use noop/deadline/cfq
instead of kyber/bfq.

I'm running bcache since over two years now and the performance
improvement is very very high with boot times going down to 30-40s from
3+ minutes previously, faster app startup times (almost instantly like
on SSD), reduced noise by reduced head movements, etc. Also, it has
easy setup (no split metadata/data cache, you can attach more than one
device to a single cache), and it is rocksolid even when crashing the
system.

Bcache learns by using LRU for caching: What you don't need will be
pushed out of cache over time, what you use, stays. This is actually a
lot like "hot data caching". Given a big enough cache, everything of
your daily needs would stay in cache, easily achieving hit ratios
around 90%. Since sequential access is bypassed, you don't have to
worry to flush the cache with large copy operations.

My system uses a 512G SSD with 400G dedicated to bcache, attached to 3x
1TB HDD draid0 mraid1 btrfs, filled with 2TB of net data and daily
backups using borgbackup. Bcache runs in writeback mode, the backup
takes around 15 minutes each night to dig through all data and stores
it to an internal intermediate backup also on bcache (xfs, write-around
mode). Currently not implemented, this intermediate backup will later
be mirrored to external, off-site location.

Some of the rest of the SSD is EFI-ESP, some swap space, and
over-provisioned area to keep bcache performance high.

$ uptime && bcache-status
 21:28:44 up 3 days, 20:38,  3 users,  load average: 1,18, 1,44, 2,14
--- bcache ---
UUIDaacfbcd9-dae5-4377-92d1-6808831a4885
Block Size  4.00 KiB
Bucket Size 512.00 KiB
Congested?  False
Read Congestion 2.0ms
Write Congestion20.0ms
Total Cache Size400 GiB
Total Cache Used400 GiB (100%)
Total Cache Unused  0 B (0%)
Evictable Cache 396 GiB (99%)
Replacement Policy  [lru] fifo random
Cache Mode  (Various)
Total Hits  2364518 (89%)
Total Misses290764
Total Bypass Hits   4284468 (100%)
Total Bypass Misses 0
Total Bypassed  215 GiB

The bucket size and block size was chosen to best fit with Samsung TLC
arrangement. But this is pure theory, I never benchm

Re: Give up on bcache?

2017-09-26 Thread Austin S. Hemmelgarn


On 2017-09-26 12:50, Ferry Toth wrote:

Looking at the Phoronix benchmark here:

https://www.phoronix.com/scan.php?page=article&item=linux414-bcache-
raid&num=2

I think it might be idle hopes to think bcache can be used as a ssd cache
for btrfs to significantly improve performance.. True, the benchmark is
using ext.
It's a benchmark.  They're inherently synthetic and workload specific, 
and therefore should not be trusted to represent things accurately for 
arbitrary use cases.


But the most important one (where btrfs always shows to be a little slow)
would be the SQLLite test. And with ext at least performance _degrades_
except for the Writeback mode, and even there is nowhere near what the
SSD is capable of.
And what makes you think it will be?  You're using it as a hot-data 
cache, not a dedicated write-back cache, and you have the overhead from 
bcache itself too.  Just some simple math based on examining the bcache 
code suggests you can't get better than about 98% of the SSD's 
performance if you're lucky, and I'd guess it's more like 80% most of 
the time.


I think with btrfs it will be even worse and that it is a fundamental
problem: caching is complex and the cache can not how how the data on the
fs is used.
Actually, the improvement from using bcache with BTRFS is higher 
proportionate to the baseline of not using it by a small margin than it 
is when used with ext4.  BTRFS does a lot more with the disk, so you 
have a lot more time spent accessing the disk, and thus more time that 
can be reduced by improving disk performance.  While the CoW nature of 
BTRFS does somewhat mitigate the performance improvement from using 
bcache, it does not completely negate it.


I think the original idea of hot data tracking has a much better chance
to significantly improve performance. This of course as the SSD's and
HDD's then will be equal citizens and btrfs itself gets to decide on
which drive the data is best stored.
First, the user needs to decide, not BTRFS (at least, by default, BTRFS 
should not be involved in the decision).  Second, tiered storage (that's 
what that's properly called) is mostly orthogonal to caching (though 
bcache and dm-cache behave like tiered storage once the cache is warmed).


With this implemented right, it would also finally silence the never
ending discussion why not btrfs and why zfs, ext, xfs etc. Which would be
a plus by its own right.
Even with this, there would still be plenty of reasons to pick one of 
those filesystems over BTRFS.  There would however be one more reason to 
pick BTRFS over ext or XFS (but necessarily not ZFS, it already has 
caching built in).


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Give up on bcache?

2017-09-26 Thread Adam Borowski

On Tue, Sep 26, 2017 at 11:33:19PM +0500, Roman Mamedov wrote:
> On Tue, 26 Sep 2017 16:50:00 + (UTC)
> Ferry Toth  wrote:
> 
> > https://www.phoronix.com/scan.php?page=article&item=linux414-bcache-
> > raid&num=2
> > 
> > I think it might be idle hopes to think bcache can be used as a ssd cache 
> > for btrfs to significantly improve performance..
> 
> My personal real-world experience shows that SSD caching -- with lvmcache --
> does indeed significantly improve performance of a large Btrfs filesystem with
> slowish base storage.
> 
> And that article, sadly, only demonstrates once again the general mediocre
> quality of Phoronix content: it is an astonishing oversight to not check out
> lvmcache in the same setup, to at least try to draw some useful conclusion, is
> it Bcache that is strangely deficient, or SSD caching as a general concept
> does not work well in the hardware setup utilized.

Also, it looks as if Phoronix' tests don't stress metadata at all.  Btrfs is
all about metadata, speeding it up greatly helps most workloads.

A pipe-dream wishlist would be:
* store and access master copy of metadata on SSD only
* pin all data blocks referenced by generations not yet mirrored
* slowly copy over metadata to HDD

-- 
⢀⣴⠾⠻⢶⣦⠀ We domesticated dogs 36000 years ago; together we chased
⣾⠁⢰⠒⠀⣿⡁ animals, hung out and licked or scratched our private parts.
⢿⡄⠘⠷⠚⠋⠀ Cats domesticated us 9500 years ago, and immediately we got
⠈⠳⣄ agriculture, towns then cities. -- whitroth on /.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Give up on bcache?

2017-09-26 Thread Ferry Toth

Op Tue, 26 Sep 2017 15:52:44 -0400, schreef Austin S. Hemmelgarn:

> On 2017-09-26 12:50, Ferry Toth wrote:
>> Looking at the Phoronix benchmark here:
>> 
>> https://www.phoronix.com/scan.php?page=article&item=linux414-bcache-
>> raid&num=2
>> 
>> I think it might be idle hopes to think bcache can be used as a ssd
>> cache for btrfs to significantly improve performance.. True, the
>> benchmark is using ext.
> It's a benchmark.  They're inherently synthetic and workload specific,
> and therefore should not be trusted to represent things accurately for
> arbitrary use cases.

So what. A decent benchmark tries to measure a specific aspect of the fs.

I think you agree that applications doing lots of fsyncs (databases, 
dpkg) are slow on btrfs especially on hdd's, whatever way you measure 
that (it feels slow, it measures slow, it really is slow).

On a ssd the problem is less.

So if you can fix that by using a ssd cache or a hybrid solution, how 
would you like to compare that? It _feels_ faster?

>> But the most important one (where btrfs always shows to be a little
>> slow)
>> would be the SQLLite test. And with ext at least performance _degrades_
>> except for the Writeback mode, and even there is nowhere near what the
>> SSD is capable of.
> And what makes you think it will be?  You're using it as a hot-data
> cache, not a dedicated write-back cache, and you have the overhead from
> bcache itself too.  Just some simple math based on examining the bcache
> code suggests you can't get better than about 98% of the SSD's
> performance if you're lucky, and I'd guess it's more like 80% most of
> the time.
>> 
>> I think with btrfs it will be even worse and that it is a fundamental
>> problem: caching is complex and the cache can not how how the data on
>> the fs is used.
> Actually, the improvement from using bcache with BTRFS is higher
> proportionate to the baseline of not using it by a small margin than it
> is when used with ext4.  BTRFS does a lot more with the disk, so you
> have a lot more time spent accessing the disk, and thus more time that
> can be reduced by improving disk performance.  While the CoW nature of
> BTRFS does somewhat mitigate the performance improvement from using
> bcache, it does not completely negate it.

I would like to reverse this, how much degradation do you suffer from 
btrfs on a ssd as baseline compared to btrfs on a mixed ssd/hdd system.

IMHO you are hoping to get ssd performance at hdd cost.  

>> I think the original idea of hot data tracking has a much better chance
>> to significantly improve performance. This of course as the SSD's and
>> HDD's then will be equal citizens and btrfs itself gets to decide on
>> which drive the data is best stored.
> First, the user needs to decide, not BTRFS (at least, by default, BTRFS
> should not be involved in the decision).  Second, tiered storage (that's
> what that's properly called) is mostly orthogonal to caching (though
> bcache and dm-cache behave like tiered storage once the cache is
> warmed).

So, on your desktop you really are going to seach for all sqllite, mysql 
and psql files, dpkg files etc. and move them to the ssd? You can already 
do that. Go ahead! 

The big win would be if the file system does that automatically for you.

>> With this implemented right, it would also finally silence the never
>> ending discussion why not btrfs and why zfs, ext, xfs etc. Which would
>> be a plus by its own right.
> Even with this, there would still be plenty of reasons to pick one of
> those filesystems over BTRFS.  There would however be one more reason to
> pick BTRFS over ext or XFS (but necessarily not ZFS, it already has
> caching built in).

Exactly, one more advantage of btrfs and one less of zfs.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Give up on bcache?

2017-09-27 Thread Austin S. Hemmelgarn


On 2017-09-26 18:46, Ferry Toth wrote:

Op Tue, 26 Sep 2017 15:52:44 -0400, schreef Austin S. Hemmelgarn:


On 2017-09-26 12:50, Ferry Toth wrote:

Looking at the Phoronix benchmark here:

https://www.phoronix.com/scan.php?page=article&item=linux414-bcache-
raid&num=2

I think it might be idle hopes to think bcache can be used as a ssd
cache for btrfs to significantly improve performance.. True, the
benchmark is using ext.

It's a benchmark.  They're inherently synthetic and workload specific,
and therefore should not be trusted to represent things accurately for
arbitrary use cases.


So what. A decent benchmark tries to measure a specific aspect of the fs.
Yes, and it usually measures it using a ridiculously unrealistic 
workload.  Some of the benchmarks in iozone are a good example of this, 
like the backwards read one (there is nearly nothing that it provides 
any useful data for).  For a benchmark to be meaningful, you have to 
test what you actually intend to use, and from a practical perspective, 
that article is primarily testing throughput, which is not something you 
should be using SSD caching for.


I think you agree that applications doing lots of fsyncs (databases,
dpkg) are slow on btrfs especially on hdd's, whatever way you measure
that (it feels slow, it measures slow, it really is slow).
Yes, but they're also slow on _everything_.  fsync() is slow.  Period. 
It just more of an issue on BTRFS because it's a CoW filesystem _and_ 
it's slower than ext4 even with that CoW layer bypassed.


On a ssd the problem is less.
And most of that is a result of the significantly higher bulk throughput 
on the SSD, which is not something that SSD caching replicates.


So if you can fix that by using a ssd cache or a hybrid solution, how
would you like to compare that? It _feels_ faster?
That depends.  If it's on a desktop, then that actually is one of the 
best ways to test it, since user perception is your primary quality 
metric (you can make the fastest system in the world, but if the user 
can't tell, you've gained nothing).  If you're on anything else, you 
test the actual workload if possible, and a benchmark that tries to 
replicate the workload if not.  Put another way, if you're building a 
PGSQL server, you should be bench-marking things with a PGSQL 
bench-marking tool, not some arbitrary that likely won't replicate a 
PGSQL workload.



But the most important one (where btrfs always shows to be a little
slow)
would be the SQLLite test. And with ext at least performance _degrades_
except for the Writeback mode, and even there is nowhere near what the
SSD is capable of.

And what makes you think it will be?  You're using it as a hot-data
cache, not a dedicated write-back cache, and you have the overhead from
bcache itself too.  Just some simple math based on examining the bcache
code suggests you can't get better than about 98% of the SSD's
performance if you're lucky, and I'd guess it's more like 80% most of
the time.


I think with btrfs it will be even worse and that it is a fundamental
problem: caching is complex and the cache can not how how the data on
the fs is used.

Actually, the improvement from using bcache with BTRFS is higher
proportionate to the baseline of not using it by a small margin than it
is when used with ext4.  BTRFS does a lot more with the disk, so you
have a lot more time spent accessing the disk, and thus more time that
can be reduced by improving disk performance.  While the CoW nature of
BTRFS does somewhat mitigate the performance improvement from using
bcache, it does not completely negate it.


I would like to reverse this, how much degradation do you suffer from
btrfs on a ssd as baseline compared to btrfs on a mixed ssd/hdd system.
Performance-wise?  It's workload dependent, but in most case it's a hit 
regardless of if you're using BTRFS or some other filesystem.


If instead you're asking what the difference in device longevity, you 
can probably expect the SSD to wear out faster in the second case. 
Unless you have a reasonably big SSD and are using write-around caching, 
every write will hit the SSD too, and you'll end up with lots of 
rewrites on the SSD.


IMHO you are hoping to get ssd performance at hdd cost.
Then you're looking at the wrong tool.  The primary use cases for SSD 
caching are smoothing latency and improving interactivity by reducing 
head movement.  Any other measure of performance is pretty much 
guaranteed to be worse with SSD caching than just using an SSD, and bulk 
throughput is often just as bad as, if not worse than, using a regular 
HDD by itself.


If you are that desperate for performance like an SSD, quit whining 
about cost and just buy an SSD.  Decent ones are down to less than 0.40 
USD per GB depending on the brand (search 'Crucial MX300' on Amazon if 
you want an example), so the co

Recovering BTRFS from bcache failure.

2015-04-07 Thread Dan Merillat

Bcache failures are nasty, because they leave a mix of old and new
data on the disk.  In this case, there was very little dirty data, but
of course the tree roots were dirty and out-of-sync.

fileserver:/usr/src/btrfs-progs# ./btrfs --version
Btrfs v3.18.2

kernel version 3.18

[  572.573566] BTRFS info (device bcache0): enabling auto recovery
[  572.573619] BTRFS info (device bcache0): disk space caching is enabled
[  574.266055] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.276952] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.277008] BTRFS: failed to read tree root on bcache0
[  574.277187] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.277356] BTRFS (device bcache0): parent transid verify failed on
7567956930560 wanted 613690 found 613681
[  574.277398] BTRFS: failed to read tree root on bcache0
[  574.285955] BTRFS (device bcache0): parent transid verify failed on
7567965720576 wanted 613689 found 613694
[  574.298741] BTRFS (device bcache0): parent transid verify failed on
7567965720576 wanted 613689 found 610499
[  574.298804] BTRFS: failed to read tree root on bcache0
[  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
[  575.111495] BTRFS (device bcache0): parent transid verify failed on
7567954464768 wanted 613688 found 613685
[  575.111559] BTRFS: failed to read tree root on bcache0
[  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
[  575.131803] BTRFS (device bcache0): parent transid verify failed on
7567954214912 wanted 613687 found 613680
[  575.131866] BTRFS: failed to read tree root on bcache0
[  575.180101] BTRFS: open_ctree failed

all the btrfs tools throw up their hands with similar errors:
ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Could not open root, trying backup super
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Could not open root, trying backup super
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Could not open root, trying backup super


fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0
--init-extent-tree
enabling repair mode
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Couldn't open file system

Annoyingly:
# ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
parent transid verify failed on 7567956930560 wanted 613690 found 613681
Ignoring transid failure
Couldn't setup extent tree
Open ctree failed
create failed (Success)

So I can't even send an image for people to look at.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recovering BTRFS from bcache failure.

2015-04-08 Thread Dan Merillat

Any ideas on where to start with this?  I did flush the cache out to
disk before I made changes to the bcache configuration, so there
shouldn't be anything completely missing, just some bits of stale
metadata.  If I can get the tools to take the closest match and run
with it it would probably recover nearly everything.

At worst, is there a way to scan the metadata blocks and rebuild from
found extent-trees?




On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat  wrote:
> Bcache failures are nasty, because they leave a mix of old and new
> data on the disk.  In this case, there was very little dirty data, but
> of course the tree roots were dirty and out-of-sync.
>
> fileserver:/usr/src/btrfs-progs# ./btrfs --version
> Btrfs v3.18.2
>
> kernel version 3.18
>
> [  572.573566] BTRFS info (device bcache0): enabling auto recovery
> [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
> [  574.266055] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.276952] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.277008] BTRFS: failed to read tree root on bcache0
> [  574.277187] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.277356] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.277398] BTRFS: failed to read tree root on bcache0
> [  574.285955] BTRFS (device bcache0): parent transid verify failed on
> 7567965720576 wanted 613689 found 613694
> [  574.298741] BTRFS (device bcache0): parent transid verify failed on
> 7567965720576 wanted 613689 found 610499
> [  574.298804] BTRFS: failed to read tree root on bcache0
> [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
> [  575.111495] BTRFS (device bcache0): parent transid verify failed on
> 7567954464768 wanted 613688 found 613685
> [  575.111559] BTRFS: failed to read tree root on bcache0
> [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
> [  575.131803] BTRFS (device bcache0): parent transid verify failed on
> 7567954214912 wanted 613687 found 613680
> [  575.131866] BTRFS: failed to read tree root on bcache0
> [  575.180101] BTRFS: open_ctree failed
>
> all the btrfs tools throw up their hands with similar errors:
> ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn't setup extent tree
> Couldn't setup device tree
> Could not open root, trying backup super
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn't setup extent tree
> Couldn't setup device tree
> Could not open root, trying backup super
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn't setup extent tree
> Couldn't setup device tree
> Could not open root, trying backup super
>
>
> fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0
> --init-extent-tree
> enabling repair mode
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn't setup extent tree
> Couldn't setup device tree
> Couldn't open file system
>
> Annoyingly:
> # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn'

Re: Recovering BTRFS from bcache failure.

2015-04-08 Thread Dan Merillat

It's a known bug with bcache and enabling discard, it was discarding
sections containing data it wanted.  After a reboot bcache refused to
accept the cache data, and of course it was dirty because I'm frankly
too stupid to breathe sometimes.

So yes, it's a bcache issue, but that's unresolvable.  I'm trying to
rescue the btrfs data that it trashed.


On Wed, Apr 8, 2015 at 2:27 PM, Cameron Berkenpas  wrote:
> Hello,
>
> I had some luck in the past with btrfs restore using the -r option. I don't
> recall how I determined the roots... Maybe I tried random numbers? I was
> able to recover nearly all of my data from a bcache related crash from over
> a year ago.
>
> What kind of bcache failure did you see? I've been doing some testing
> recently and ran into 2 bcache failures. With both of these failures, I had
> a ' bad btree header at bucket' error message (which is entirely different
> from the crash I had over a year back). I'm currently trying a different SSD
> to see if that alleviates the issue. The error makes me think that it's a
> bcache specific issue that's unrelated to btrfs or possibly (in my case) an
> issue with the previous SSD.
>
> Did you encounter this same error?
>
> With my 2 most recent crashes, I didn't try to recover very hard (or even
> try 'btrfs recover; at all) as I've been taking daily backups. I did try
> btrfsck, and not only would it fail, it would segfault.
>
> -Cameron
>
>
> On 04/08/2015 11:07 AM, Dan Merillat wrote:
>
> Any ideas on where to start with this?  I did flush the cache out to
> disk before I made changes to the bcache configuration, so there
> shouldn't be anything completely missing, just some bits of stale
> metadata.  If I can get the tools to take the closest match and run
> with it it would probably recover nearly everything.
>
> At worst, is there a way to scan the metadata blocks and rebuild from
> found extent-trees?
>
>
>
>
> On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat 
> wrote:
>
> Bcache failures are nasty, because they leave a mix of old and new
> data on the disk.  In this case, there was very little dirty data, but
> of course the tree roots were dirty and out-of-sync.
>
> fileserver:/usr/src/btrfs-progs# ./btrfs --version
> Btrfs v3.18.2
>
> kernel version 3.18
>
> [  572.573566] BTRFS info (device bcache0): enabling auto recovery
> [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
> [  574.266055] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.276952] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.277008] BTRFS: failed to read tree root on bcache0
> [  574.277187] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.277356] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.277398] BTRFS: failed to read tree root on bcache0
> [  574.285955] BTRFS (device bcache0): parent transid verify failed on
> 7567965720576 wanted 613689 found 613694
> [  574.298741] BTRFS (device bcache0): parent transid verify failed on
> 7567965720576 wanted 613689 found 610499
> [  574.298804] BTRFS: failed to read tree root on bcache0
> [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
> [  575.111495] BTRFS (device bcache0): parent transid verify failed on
> 7567954464768 wanted 613688 found 613685
> [  575.111559] BTRFS: failed to read tree root on bcache0
> [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
> [  575.131803] BTRFS (device bcache0): parent transid verify failed on
> 7567954214912 wanted 613687 found 613680
> [  575.131866] BTRFS: failed to read tree root on bcache0
> [  575.180101] BTRFS: open_ctree failed
>
> all the btrfs tools throw up their hands with similar errors:
> ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn't setup extent tree
> Couldn't setup device tree
> Could not open root, trying backup super
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent tran

Re: Recovering BTRFS from bcache failure.

2015-04-08 Thread Dan Merillat

Sorry I pressed send before I finished my thoughts.

btrfs restore gets nowhere with any options.  btrfs-recover says the
superblocks are fine, and chunk recover does nothing after a few hours
of reading.

Everything else bails out with the errors I listed above.

On Wed, Apr 8, 2015 at 2:36 PM, Dan Merillat  wrote:
> It's a known bug with bcache and enabling discard, it was discarding
> sections containing data it wanted.  After a reboot bcache refused to
> accept the cache data, and of course it was dirty because I'm frankly
> too stupid to breathe sometimes.
>
> So yes, it's a bcache issue, but that's unresolvable.  I'm trying to
> rescue the btrfs data that it trashed.
>
>
> On Wed, Apr 8, 2015 at 2:27 PM, Cameron Berkenpas  wrote:
>> Hello,
>>
>> I had some luck in the past with btrfs restore using the -r option. I don't
>> recall how I determined the roots... Maybe I tried random numbers? I was
>> able to recover nearly all of my data from a bcache related crash from over
>> a year ago.
>>
>> What kind of bcache failure did you see? I've been doing some testing
>> recently and ran into 2 bcache failures. With both of these failures, I had
>> a ' bad btree header at bucket' error message (which is entirely different
>> from the crash I had over a year back). I'm currently trying a different SSD
>> to see if that alleviates the issue. The error makes me think that it's a
>> bcache specific issue that's unrelated to btrfs or possibly (in my case) an
>> issue with the previous SSD.
>>
>> Did you encounter this same error?
>>
>> With my 2 most recent crashes, I didn't try to recover very hard (or even
>> try 'btrfs recover; at all) as I've been taking daily backups. I did try
>> btrfsck, and not only would it fail, it would segfault.
>>
>> -Cameron
>>
>>
>> On 04/08/2015 11:07 AM, Dan Merillat wrote:
>>
>> Any ideas on where to start with this?  I did flush the cache out to
>> disk before I made changes to the bcache configuration, so there
>> shouldn't be anything completely missing, just some bits of stale
>> metadata.  If I can get the tools to take the closest match and run
>> with it it would probably recover nearly everything.
>>
>> At worst, is there a way to scan the metadata blocks and rebuild from
>> found extent-trees?
>>
>>
>>
>>
>> On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat 
>> wrote:
>>
>> Bcache failures are nasty, because they leave a mix of old and new
>> data on the disk.  In this case, there was very little dirty data, but
>> of course the tree roots were dirty and out-of-sync.
>>
>> fileserver:/usr/src/btrfs-progs# ./btrfs --version
>> Btrfs v3.18.2
>>
>> kernel version 3.18
>>
>> [  572.573566] BTRFS info (device bcache0): enabling auto recovery
>> [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
>> [  574.266055] BTRFS (device bcache0): parent transid verify failed on
>> 7567956930560 wanted 613690 found 613681
>> [  574.276952] BTRFS (device bcache0): parent transid verify failed on
>> 7567956930560 wanted 613690 found 613681
>> [  574.277008] BTRFS: failed to read tree root on bcache0
>> [  574.277187] BTRFS (device bcache0): parent transid verify failed on
>> 7567956930560 wanted 613690 found 613681
>> [  574.277356] BTRFS (device bcache0): parent transid verify failed on
>> 7567956930560 wanted 613690 found 613681
>> [  574.277398] BTRFS: failed to read tree root on bcache0
>> [  574.285955] BTRFS (device bcache0): parent transid verify failed on
>> 7567965720576 wanted 613689 found 613694
>> [  574.298741] BTRFS (device bcache0): parent transid verify failed on
>> 7567965720576 wanted 613689 found 610499
>> [  574.298804] BTRFS: failed to read tree root on bcache0
>> [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
>> [  575.111495] BTRFS (device bcache0): parent transid verify failed on
>> 7567954464768 wanted 613688 found 613685
>> [  575.111559] BTRFS: failed to read tree root on bcache0
>> [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
>> [  575.131803] BTRFS (device bcache0): parent transid verify failed on
>> 7567954214912 wanted 613687 found 613680
>> [  575.131866] BTRFS: failed to read tree root on bcache0
>> [  575.180101] BTRFS: open_ctree failed
>>
>> all the btrfs tools throw up their hands with similar errors:
>> ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
>> parent transid verify fail

Re: Recovering BTRFS from bcache failure.

2015-04-08 Thread Kai Krakow

Dan Merillat  schrieb:

> Bcache failures are nasty, because they leave a mix of old and new
> data on the disk.  In this case, there was very little dirty data, but
> of course the tree roots were dirty and out-of-sync.
> 
> fileserver:/usr/src/btrfs-progs# ./btrfs --version
> Btrfs v3.18.2
> 
> kernel version 3.18
> 
> [  572.573566] BTRFS info (device bcache0): enabling auto recovery
> [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
> [  574.266055] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.276952] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.277008] BTRFS: failed to read tree root on bcache0
> [  574.277187] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.277356] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.277398] BTRFS: failed to read tree root on bcache0
> [  574.285955] BTRFS (device bcache0): parent transid verify failed on
> 7567965720576 wanted 613689 found 613694
> [  574.298741] BTRFS (device bcache0): parent transid verify failed on
> 7567965720576 wanted 613689 found 610499
> [  574.298804] BTRFS: failed to read tree root on bcache0
> [  575.047079] BTRFS (device bcache0): bad tree block start 0
> [  7567954464768 575.111495] BTRFS (device bcache0): parent transid verify
> [  failed on
> 7567954464768 wanted 613688 found 613685
> [  575.111559] BTRFS: failed to read tree root on bcache0
> [  575.121749] BTRFS (device bcache0): bad tree block start 0
> [  7567954214912 575.131803] BTRFS (device bcache0): parent transid verify
> [  failed on
> 7567954214912 wanted 613687 found 613680
> [  575.131866] BTRFS: failed to read tree root on bcache0
> [  575.180101] BTRFS: open_ctree failed
> 
> all the btrfs tools throw up their hands with similar errors:
> ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn't setup extent tree
> Couldn't setup device tree
> Could not open root, trying backup super
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn't setup extent tree
> Couldn't setup device tree
> Could not open root, trying backup super
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn't setup extent tree
> Couldn't setup device tree
> Could not open root, trying backup super
> 
> 
> fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0
> --init-extent-tree
> enabling repair mode
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn't setup extent tree
> Couldn't setup device tree
> Couldn't open file system
> 
> Annoyingly:
> # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn't setup extent tree
> Open ctree failed
> create failed (Success)
> 
> So I can't even send an image for people to look at.

There's always last resort (LAST RESORT!) btrfs-zero-log. It may destroy 
some of your data, however, and can make things even worse if other repairs 
could've helped before. So here's some pointers:

  * btrfs-find-root: find a working tree-root (no idea how to set it, t

Re: Recovering BTRFS from bcache failure.

2015-04-09 Thread Dan Merillat

On Tue, Apr 7, 2015 at 11:40 PM, Dan Merillat  wrote:
> Bcache failures are nasty, because they leave a mix of old and new
> data on the disk.  In this case, there was very little dirty data, but
> of course the tree roots were dirty and out-of-sync.
>
> fileserver:/usr/src/btrfs-progs# ./btrfs --version
> Btrfs v3.18.2
>
> kernel version 3.18
>
> [  572.573566] BTRFS info (device bcache0): enabling auto recovery
> [  572.573619] BTRFS info (device bcache0): disk space caching is enabled
> [  574.266055] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.276952] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.277008] BTRFS: failed to read tree root on bcache0
> [  574.277187] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.277356] BTRFS (device bcache0): parent transid verify failed on
> 7567956930560 wanted 613690 found 613681
> [  574.277398] BTRFS: failed to read tree root on bcache0
> [  574.285955] BTRFS (device bcache0): parent transid verify failed on
> 7567965720576 wanted 613689 found 613694
> [  574.298741] BTRFS (device bcache0): parent transid verify failed on
> 7567965720576 wanted 613689 found 610499
> [  574.298804] BTRFS: failed to read tree root on bcache0
> [  575.047079] BTRFS (device bcache0): bad tree block start 0 7567954464768
> [  575.111495] BTRFS (device bcache0): parent transid verify failed on
> 7567954464768 wanted 613688 found 613685
> [  575.111559] BTRFS: failed to read tree root on bcache0
> [  575.121749] BTRFS (device bcache0): bad tree block start 0 7567954214912
> [  575.131803] BTRFS (device bcache0): parent transid verify failed on
> 7567954214912 wanted 613687 found 613680
> [  575.131866] BTRFS: failed to read tree root on bcache0
> [  575.180101] BTRFS: open_ctree failed
>
> all the btrfs tools throw up their hands with similar errors:
> ileserver:/usr/src/btrfs-progs# btrfs restore /dev/bcache0 -l
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn't setup extent tree
> Couldn't setup device tree
> Could not open root, trying backup super
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn't setup extent tree
> Couldn't setup device tree
> Could not open root, trying backup super
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn't setup extent tree
> Couldn't setup device tree
> Could not open root, trying backup super
>
>
> fileserver:/usr/src/btrfs-progs# ./btrfsck --repair /dev/bcache0
> --init-extent-tree
> enabling repair mode
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn't setup extent tree
> Couldn't setup device tree
> Couldn't open file system
>
> Annoyingly:
> # ./btrfs-image -c9 -t4 -s -w /dev/bcache0 /tmp/test.out
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> parent transid verify failed on 7567956930560 wanted 613690 found 613681
> Ignoring transid failure
> Couldn't setup extent tree
> Open ctree failed
> create failed (Success)
>
> So I can't even send an image for people to look at.

CCing some more people on this one, while this filesystem isn't
important I'd like to know that "restore from backup" isn't the only
option for BTRFS corruption.  All of the tools simply throw up their
hands and bail when confronted with this filesystem, even btrfs-image.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/10] bcache: optimize continue_at_nobarrier()

2018-05-18 Thread Kent Overstreet

Signed-off-by: Kent Overstreet 
---
 drivers/md/bcache/closure.h | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/closure.h b/drivers/md/bcache/closure.h
index 3b9dfc9962..2392a46bcd 100644
--- a/drivers/md/bcache/closure.h
+++ b/drivers/md/bcache/closure.h
@@ -244,7 +244,7 @@ static inline void closure_queue(struct closure *cl)
 != offsetof(struct work_struct, func));
if (wq) {
INIT_WORK(&cl->work, cl->work.func);
-   BUG_ON(!queue_work(wq, &cl->work));
+   queue_work(wq, &cl->work);
} else
cl->fn(cl);
 }
@@ -337,8 +337,13 @@ do {   
\
  */
 #define continue_at_nobarrier(_cl, _fn, _wq)   \
 do {   \
-   set_closure_fn(_cl, _fn, _wq);  \
-   closure_queue(_cl); \
+   closure_set_ip(_cl);\
+   if (_wq) {  \
+   INIT_WORK(&(_cl)->work, (void *) _fn);  \
+   queue_work((_wq), &(_cl)->work);\
+   } else {\
+   (_fn)(_cl); \
+   }   \
 } while (0)
 
 /**
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Migrate to bcache: A few questions

2013-12-29 Thread Kai Krakow

Hello list!

I'm planning to buy a small SSD (around 60GB) and use it for bcache in front 
of my 3x 1TB HDD btrfs setup (mraid1+draid0) using write-back caching. Btrfs 
is my root device, thus the system must be able to boot from bcache using 
init ramdisk. My /boot is a separate filesystem outside of btrfs and will be 
outside of bcache. I am using Gentoo as my system.

I have a few questions:

* How stable is it? I've read about some csum errors lately...

* I want to migrate my current storage to bcache without replaying a backup.
  Is it possible?

* Did others already use it? What is the perceived performance for desktop
  workloads in comparision to not using bcache?

* How well does bcache handle power outages? Btrfs does handle them very
  well since many months.

* How well does it play with dracut as initrd? Is it as simple as telling it
  the new device nodes or is there something complicate to configure?

* How does bcache handle a failing SSD when it starts to wear out in a few
  years?

* Is it worth waiting for hot-relocation support in btrfs to natively use
  a SSD as cache?

* Would you recommend going with a bigger/smaller SSD? I'm planning to use
  only 75% of it for bcache so wear-leveling can work better, maybe use
  another part of it for hibernation (suspend to disk).

Regards,
Kai

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Marc MERLIN

+btrfs mailing list, see below why

On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote:
> On Mon, 27 Nov 2016, Coly Li wrote:
> > 
> > Yes, too many work queues... I guess the locking might be caused by some
> > very obscure reference of closure code. I cannot have any clue if I
> > cannot find a stable procedure to reproduce this issue.
> > 
> > Hmm, if there is a tool to clone all the meta data of the back end cache
> > and whole cached device, there might be a method to replay the oops much
> > easier.
> > 
> > Eric, do you have any hint ?
> 
> Note that the backing device doesn't have any metadata, just a superblock. 
> You can easily dd that off onto some other volume without transferring the 
> data. By default, data starts at 8k, or whatever you used in `make-bcache 
> -w`.

Ok, Linus helped me find a workaround for this problem:
https://lkml.org/lkml/2016/11/29/667
namely:
   echo 2 > /proc/sys/vm/dirty_ratio
   echo 1 > /proc/sys/vm/dirty_background_ratio
(it's a 24GB system, so the defaults of 20 and 10 were creating too many
requests in th buffers)

Note that this is only a workaround, not a fix.

When I did this and re tried my big copy again, I still got 100+ kernel
work queues, but apparently the underlying swraid5 was able to unblock
and satisfy the write requests before too many accumulated and crashed
the kernel.

I'm not a kernel coder, but seems to me that bcache needs a way to
throttle incoming requests if there are too many so that it does not end
up in a state where things blow up due to too many piled up requests.

You should be able to reproduce this by taking 5 spinning rust drives,
put raid5 on top, dmcrypt, bcache and hopefully any filesystem (although
I used btrfs) and send lots of requests.
Actually to be honest, the problems have mostly been happening when I do
btrfs scrub and btrfs send/receive which both generate I/O from within
the kernel instead of user space.
So here, btrfs may be a contributor to the problem too, but while btrfs
still trashes my system if I remove the caching device on bcache (and
with the default dirty ratio values), it doesn't crash the kernel.

I'll start another separate thread with the btrfs folks on how much
pressure is put on the system, but on your side it would be good to help
ensure that bcache doesn't crash the system altogether if too many
requests are allowed to pile up.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Marc MERLIN

On Wed, Nov 30, 2016 at 08:46:46AM -0800, Marc MERLIN wrote:
> +btrfs mailing list, see below why
> 
> On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote:
> > On Mon, 27 Nov 2016, Coly Li wrote:
> > > 
> > > Yes, too many work queues... I guess the locking might be caused by some
> > > very obscure reference of closure code. I cannot have any clue if I
> > > cannot find a stable procedure to reproduce this issue.
> > > 
> > > Hmm, if there is a tool to clone all the meta data of the back end cache
> > > and whole cached device, there might be a method to replay the oops much
> > > easier.
> > > 
> > > Eric, do you have any hint ?
> > 
> > Note that the backing device doesn't have any metadata, just a superblock. 
> > You can easily dd that off onto some other volume without transferring the 
> > data. By default, data starts at 8k, or whatever you used in `make-bcache 
> > -w`.
> 
> Ok, Linus helped me find a workaround for this problem:
> https://lkml.org/lkml/2016/11/29/667
> namely:
>echo 2 > /proc/sys/vm/dirty_ratio
>echo 1 > /proc/sys/vm/dirty_background_ratio
> (it's a 24GB system, so the defaults of 20 and 10 were creating too many
> requests in th buffers)
> 
> Note that this is only a workaround, not a fix.

Actually, I'm even more worried about the general bcache situation when
caching is enabled. In the message above, Linus wrote:

"One situation where I've seen something like this happen is

 (a) lots and lots of dirty data queued up
 (b) horribly slow storage
 (c) filesystem that ends up serializing on writeback under certain
circumstances

The usual case for (b) in the modern world is big SSD's that have bad
worst-case behavior (ie they may do gbps speeds when doing well, and
then they come to a screeching halt when their buffers fill up and
they have to do rewrites, and their gbps throughput drops to mbps or
lower).

Generally you only find that kind of really nasty SSD in the USB stick
world these days."

Well, come to think of it, this is _exactly_ what bcache will create, by
design. It'll swallow up a lot of IO cached to the SSD, until the SSD
buffers fill up and then things will hang while bcache struggles to
write it all to slower spinning rust storage.

Looks to me like bcache and dirty_ratio need to be synced somehow, or
things will fall over reliably.

What do you think?

Thanks,
Marc


> When I did this and re tried my big copy again, I still got 100+ kernel
> work queues, but apparently the underlying swraid5 was able to unblock
> and satisfy the write requests before too many accumulated and crashed
> the kernel.
> 
> I'm not a kernel coder, but seems to me that bcache needs a way to
> throttle incoming requests if there are too many so that it does not end
> up in a state where things blow up due to too many piled up requests.
> 
> You should be able to reproduce this by taking 5 spinning rust drives,
> put raid5 on top, dmcrypt, bcache and hopefully any filesystem (although
> I used btrfs) and send lots of requests.
> Actually to be honest, the problems have mostly been happening when I do
> btrfs scrub and btrfs send/receive which both generate I/O from within
> the kernel instead of user space.
> So here, btrfs may be a contributor to the problem too, but while btrfs
> still trashes my system if I remove the caching device on bcache (and
> with the default dirty ratio values), it doesn't crash the kernel.
> 
> I'll start another separate thread with the btrfs folks on how much
> pressure is put on the system, but on your side it would be good to help
> ensure that bcache doesn't crash the system altogether if too many
> requests are allowed to pile up.
> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>    what McDonalds is to gourmet 
> cooking
> Home page: http://marc.merlins.org/ | PGP 
> 1024R/763BE901

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Eric Wheeler

On Wed, 30 Nov 2016, Marc MERLIN wrote:
> +btrfs mailing list, see below why
> 
> On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote:
> > On Mon, 27 Nov 2016, Coly Li wrote:
> > > 
> > > Yes, too many work queues... I guess the locking might be caused by some
> > > very obscure reference of closure code. I cannot have any clue if I
> > > cannot find a stable procedure to reproduce this issue.
> > > 
> > > Hmm, if there is a tool to clone all the meta data of the back end cache
> > > and whole cached device, there might be a method to replay the oops much
> > > easier.
> > > 
> > > Eric, do you have any hint ?
> > 
> > Note that the backing device doesn't have any metadata, just a superblock. 
> > You can easily dd that off onto some other volume without transferring the 
> > data. By default, data starts at 8k, or whatever you used in `make-bcache 
> > -w`.
> 
> Ok, Linus helped me find a workaround for this problem:
> https://lkml.org/lkml/2016/11/29/667
> namely:
>echo 2 > /proc/sys/vm/dirty_ratio
>echo 1 > /proc/sys/vm/dirty_background_ratio
> (it's a 24GB system, so the defaults of 20 and 10 were creating too many
> requests in th buffers)
> 
> Note that this is only a workaround, not a fix.
> 
> When I did this and re tried my big copy again, I still got 100+ kernel
> work queues, but apparently the underlying swraid5 was able to unblock
> and satisfy the write requests before too many accumulated and crashed
> the kernel.
> 
> I'm not a kernel coder, but seems to me that bcache needs a way to
> throttle incoming requests if there are too many so that it does not end
> up in a state where things blow up due to too many piled up requests.
> 
> You should be able to reproduce this by taking 5 spinning rust drives,
> put raid5 on top, dmcrypt, bcache and hopefully any filesystem (although
> I used btrfs) and send lots of requests.
> Actually to be honest, the problems have mostly been happening when I do
> btrfs scrub and btrfs send/receive which both generate I/O from within
> the kernel instead of user space.
> So here, btrfs may be a contributor to the problem too, but while btrfs
> still trashes my system if I remove the caching device on bcache (and
> with the default dirty ratio values), it doesn't crash the kernel.
> 
> I'll start another separate thread with the btrfs folks on how much
> pressure is put on the system, but on your side it would be good to help
> ensure that bcache doesn't crash the system altogether if too many
> requests are allowed to pile up.


Try BFQ.  It is AWESOME and helps reduce the congestion problem with bulk 
writes at the request queue on its way to the spinning disk or SSD:
http://algo.ing.unimo.it/people/paolo/disk_sched/

use the latest BFQ git here, merge it into v4.8.y:
https://github.com/linusw/linux-bfq/commits/bfq-v8

This doesn't completely fix the dirty_ration problem, but it is far better 
than CFQ or deadline in my opinion (and experience).

-Eric



--
Eric Wheeler


> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>    what McDonalds is to gourmet 
> cooking
> Home page: http://marc.merlins.org/ | PGP 
> 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Marc MERLIN

On Wed, Nov 30, 2016 at 03:57:28PM -0800, Eric Wheeler wrote:
> > I'll start another separate thread with the btrfs folks on how much
> > pressure is put on the system, but on your side it would be good to help
> > ensure that bcache doesn't crash the system altogether if too many
> > requests are allowed to pile up.
> 
> Try BFQ.  It is AWESOME and helps reduce the congestion problem with bulk 
> writes at the request queue on its way to the spinning disk or SSD:
>   http://algo.ing.unimo.it/people/paolo/disk_sched/
> 
> use the latest BFQ git here, merge it into v4.8.y:
>   https://github.com/linusw/linux-bfq/commits/bfq-v8
> 
> This doesn't completely fix the dirty_ration problem, but it is far better 
> than CFQ or deadline in my opinion (and experience).

That's good to know thanks.
But for my uninformed opinion, is there anything bcache can do to throttle
incoming requests if they are piling up, or they're coming from producers
upstream and bcache has no choice but try and process them as quickly as
possible without a way to block the sender if too many are coming?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.8.8, bcache deadlock and hard lockup

2016-11-30 Thread Chris Murphy

On Wed, Nov 30, 2016 at 4:57 PM, Eric Wheeler  wrote:
> On Wed, 30 Nov 2016, Marc MERLIN wrote:
>> +btrfs mailing list, see below why
>>
>> On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote:
>> > On Mon, 27 Nov 2016, Coly Li wrote:
>> > >
>> > > Yes, too many work queues... I guess the locking might be caused by some
>> > > very obscure reference of closure code. I cannot have any clue if I
>> > > cannot find a stable procedure to reproduce this issue.
>> > >
>> > > Hmm, if there is a tool to clone all the meta data of the back end cache
>> > > and whole cached device, there might be a method to replay the oops much
>> > > easier.
>> > >
>> > > Eric, do you have any hint ?
>> >
>> > Note that the backing device doesn't have any metadata, just a superblock.
>> > You can easily dd that off onto some other volume without transferring the
>> > data. By default, data starts at 8k, or whatever you used in `make-bcache
>> > -w`.
>>
>> Ok, Linus helped me find a workaround for this problem:
>> https://lkml.org/lkml/2016/11/29/667
>> namely:
>>echo 2 > /proc/sys/vm/dirty_ratio
>>echo 1 > /proc/sys/vm/dirty_background_ratio
>> (it's a 24GB system, so the defaults of 20 and 10 were creating too many
>> requests in th buffers)
>>
>> Note that this is only a workaround, not a fix.
>>
>> When I did this and re tried my big copy again, I still got 100+ kernel
>> work queues, but apparently the underlying swraid5 was able to unblock
>> and satisfy the write requests before too many accumulated and crashed
>> the kernel.
>>
>> I'm not a kernel coder, but seems to me that bcache needs a way to
>> throttle incoming requests if there are too many so that it does not end
>> up in a state where things blow up due to too many piled up requests.
>>
>> You should be able to reproduce this by taking 5 spinning rust drives,
>> put raid5 on top, dmcrypt, bcache and hopefully any filesystem (although
>> I used btrfs) and send lots of requests.
>> Actually to be honest, the problems have mostly been happening when I do
>> btrfs scrub and btrfs send/receive which both generate I/O from within
>> the kernel instead of user space.
>> So here, btrfs may be a contributor to the problem too, but while btrfs
>> still trashes my system if I remove the caching device on bcache (and
>> with the default dirty ratio values), it doesn't crash the kernel.
>>
>> I'll start another separate thread with the btrfs folks on how much
>> pressure is put on the system, but on your side it would be good to help
>> ensure that bcache doesn't crash the system altogether if too many
>> requests are allowed to pile up.
>
>
> Try BFQ.  It is AWESOME and helps reduce the congestion problem with bulk
> writes at the request queue on its way to the spinning disk or SSD:
> http://algo.ing.unimo.it/people/paolo/disk_sched/
>
> use the latest BFQ git here, merge it into v4.8.y:
> https://github.com/linusw/linux-bfq/commits/bfq-v8
>
> This doesn't completely fix the dirty_ration problem, but it is far better
> than CFQ or deadline in my opinion (and experience).

There are several threads over the past year with users having
problems no one else had previously reported, and they were using BFQ.
But there's no evidence whether BFQ was the cause, or exposing some
existing bug that another scheduler doesn't. Anyway, I'd say using an
out of tree scheduler means higher burden of testing and skepticism.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.8.8, bcache deadlock and hard lockup

2016-12-01 Thread Austin S. Hemmelgarn


On 2016-11-30 19:48, Chris Murphy wrote:

On Wed, Nov 30, 2016 at 4:57 PM, Eric Wheeler  wrote:

On Wed, 30 Nov 2016, Marc MERLIN wrote:

+btrfs mailing list, see below why

On Tue, Nov 29, 2016 at 12:59:44PM -0800, Eric Wheeler wrote:

On Mon, 27 Nov 2016, Coly Li wrote:


Yes, too many work queues... I guess the locking might be caused by some
very obscure reference of closure code. I cannot have any clue if I
cannot find a stable procedure to reproduce this issue.

Hmm, if there is a tool to clone all the meta data of the back end cache
and whole cached device, there might be a method to replay the oops much
easier.

Eric, do you have any hint ?


Note that the backing device doesn't have any metadata, just a superblock.
You can easily dd that off onto some other volume without transferring the
data. By default, data starts at 8k, or whatever you used in `make-bcache
-w`.


Ok, Linus helped me find a workaround for this problem:
https://lkml.org/lkml/2016/11/29/667
namely:
   echo 2 > /proc/sys/vm/dirty_ratio
   echo 1 > /proc/sys/vm/dirty_background_ratio
(it's a 24GB system, so the defaults of 20 and 10 were creating too many
requests in th buffers)

Note that this is only a workaround, not a fix.

When I did this and re tried my big copy again, I still got 100+ kernel
work queues, but apparently the underlying swraid5 was able to unblock
and satisfy the write requests before too many accumulated and crashed
the kernel.

I'm not a kernel coder, but seems to me that bcache needs a way to
throttle incoming requests if there are too many so that it does not end
up in a state where things blow up due to too many piled up requests.

You should be able to reproduce this by taking 5 spinning rust drives,
put raid5 on top, dmcrypt, bcache and hopefully any filesystem (although
I used btrfs) and send lots of requests.
Actually to be honest, the problems have mostly been happening when I do
btrfs scrub and btrfs send/receive which both generate I/O from within
the kernel instead of user space.
So here, btrfs may be a contributor to the problem too, but while btrfs
still trashes my system if I remove the caching device on bcache (and
with the default dirty ratio values), it doesn't crash the kernel.

I'll start another separate thread with the btrfs folks on how much
pressure is put on the system, but on your side it would be good to help
ensure that bcache doesn't crash the system altogether if too many
requests are allowed to pile up.



Try BFQ.  It is AWESOME and helps reduce the congestion problem with bulk
writes at the request queue on its way to the spinning disk or SSD:
http://algo.ing.unimo.it/people/paolo/disk_sched/

use the latest BFQ git here, merge it into v4.8.y:
https://github.com/linusw/linux-bfq/commits/bfq-v8

This doesn't completely fix the dirty_ration problem, but it is far better
than CFQ or deadline in my opinion (and experience).


There are several threads over the past year with users having
problems no one else had previously reported, and they were using BFQ.
But there's no evidence whether BFQ was the cause, or exposing some
existing bug that another scheduler doesn't. Anyway, I'd say using an
out of tree scheduler means higher burden of testing and skepticism.
Normally I'd agree on this, but BFQ is a bit of a different situation 
from usual because:
1. 90% of the reason that BFQ isn't in mainline is that the block 
maintainers have declared the legacy (non blk-mq) code deprecated and 
refuse to take anything new there despite having absolutely zero 
scheduling in blk-mq.
2. It's been around for years with hundreds of thousands of users over 
the years who have had no issues with it.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.8.8, bcache deadlock and hard lockup

2016-12-01 Thread Eric Wheeler

On Wed, 30 Nov 2016, Marc MERLIN wrote:
> On Wed, Nov 30, 2016 at 03:57:28PM -0800, Eric Wheeler wrote:
> > > I'll start another separate thread with the btrfs folks on how much
> > > pressure is put on the system, but on your side it would be good to help
> > > ensure that bcache doesn't crash the system altogether if too many
> > > requests are allowed to pile up.
> > 
> > Try BFQ.  It is AWESOME and helps reduce the congestion problem with bulk 
> > writes at the request queue on its way to the spinning disk or SSD:
> > http://algo.ing.unimo.it/people/paolo/disk_sched/
> > 
> > use the latest BFQ git here, merge it into v4.8.y:
> > https://github.com/linusw/linux-bfq/commits/bfq-v8
> > 
> > This doesn't completely fix the dirty_ration problem, but it is far better 
> > than CFQ or deadline in my opinion (and experience).
> 
> That's good to know thanks.
> But for my uninformed opinion, is there anything bcache can do to throttle
> incoming requests if they are piling up, or they're coming from producers
> upstream and bcache has no choice but try and process them as quickly as
> possible without a way to block the sender if too many are coming?

Not really.  The congestion isn't in bcache, its at the disk queue beyond 
bcache, but userspace processes are blocked by the (huge) pagecache dirty 
writeback which happens before bcache gets it and must complete before 
userspace may proceed: 

fs -> pagecache -> bcache -> {ssd,disk}  

The real issue is that the dirty page cache gets really big, flushes, 
waits for downstream devices (bcache->ssd,disk) to finish, and then 
returns to userspace.  The only way to limit dirty cache are those options 
that Linus mentioned.

BFQ can help for processes not tied to the flush because it may re-order 
other process requests ahead of the big flush---so even though a big flush 
is happening and that process is stalled, others might proceed without 
delay.

See this thread, too:

https://groups.google.com/forum/#!msg/bfq-iosched/M2M_UhbC05A/hf6Ni9JbAQAJ

--
Eric Wheeler



> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>    what McDonalds is to gourmet 
> cooking
> Home page: http://marc.merlins.org/  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2013-12-29 Thread Chris Murphy

On Dec 29, 2013, at 2:11 PM, Kai Krakow  wrote:

> 
> * How stable is it? I've read about some csum errors lately…

Seems like bcache devs are still looking into the recent btrfs csum issues.

> 
> * I want to migrate my current storage to bcache without replaying a backup.
>  Is it possible?
> 
> * Did others already use it? What is the perceived performance for desktop
>  workloads in comparision to not using bcache?
> 
> * How well does bcache handle power outages? Btrfs does handle them very
>  well since many months.
> 
> * How well does it play with dracut as initrd? Is it as simple as telling it
>  the new device nodes or is there something complicate to configure?
> 
> * How does bcache handle a failing SSD when it starts to wear out in a few
>  years?

I think most of these questions are better suited for the bcache list. I think 
there are still many uncertainties about the behavior of SSDs during power 
failures when they aren't explicitly designed with power failure protection in 
mind. At best I'd hope for a rollback involving data loss, but hopefully not a 
corrupt file system. I'd rather lose the last minute of data supposedly written 
to the drive, than have to do a fuil restore from backup.

> 
> * Is it worth waiting for hot-relocation support in btrfs to natively use
>  a SSD as cache?

I haven't read anything about it. Don't see it listed in project ideas.

> 
> * Would you recommend going with a bigger/smaller SSD? I'm planning to use
>  only 75% of it for bcache so wear-leveling can work better, maybe use
>  another part of it for hibernation (suspend to disk).

I think that depends greatly on workload. If you're writing or reading a lot of 
disparate files, or a lot of small file random writes (mail server), I'd go 
bigger. By default sequential IO isn't cached. So I think you can get a big 
boost in responsiveness with a relatively small bcache size.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2013-12-29 Thread Kai Krakow

Chris Murphy  schrieb:

> I think most of these questions are better suited for the bcache list.

Ah yes, you are true. I will repost the non-btrfs related questions to the 
bcache list. But actually I am most interested in using bcache together 
btrfs, so getting a general picture of its current state in this combination 
would be nice - and so these questions may be partially appropriate here.

> I
> think there are still many uncertainties about the behavior of SSDs during
> power failures when they aren't explicitly designed with power failure
> protection in mind. At best I'd hope for a rollback involving data loss,
> but hopefully not a corrupt file system. I'd rather lose the last minute
> of data supposedly written to the drive, than have to do a fuil restore
> from backup.

These thought are actually quite interesting. So you are saying that data 
may not be fully written to SSD although the kernel thinks so? This is 
probably very dangerous. The bcache module could not ensure coherence 
between its backing devices and its own contents - and data loss will occur 
and probably destroy important file system structures.

I understand your words as "data may only partially being written". This, of 
course, may happen to HDDs as well. But usually a file system works with 
transactions so the last incomplete transaction can simply be thrown away. I 
hope bcache implements the same architecture. But what does it mean for the 
stacked write-back architecture?

As I understand, bcache may use write-through for sequential writes, but 
write-back for random writes. In this case, part of the data may have hit 
the backing device, other data does only exist in the bcache. If that last 
transaction is not closed due to power-loss, and then thrown away, we have 
part of the transaction already written to the backing device that the 
filesystem does not know of after resume.

I'd appreciate some thoughts about it but this topic is probably also best 
moved over to the bcache list.

Thanks,
Kai 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2013-12-29 Thread Chris Murphy

On Dec 29, 2013, at 6:22 PM, Kai Krakow  wrote:

> So you are saying that data 
> may not be fully written to SSD although the kernel thinks so?

Drives shouldn't lie when asked to flush to disk, but they do. Older article 
about this at lwn is a decent primer on the subject of write barriers.

http://lwn.net/Articles/283161/

> This is 
> probably very dangerous. The bcache module could not ensure coherence 
> between its backing devices and its own contents - and data loss will occur 
> and probably destroy important file system structures.

I don't know the details, there's more detail on lkml.org and bcache lists. My 
impression is that short of bugs, it should be much safer than you describe. 
It's not like a linear/concat md or LVM device fail scenario. There's good info 
in the bcache.h file:

http://lxr.free-electrons.com/source/drivers/md/bcache/bcache.h

If anything, once the kinks are worked out, under heavy random write IO I'd 
expect bcache to improve the likelihood data isn't lost. Faster speed of SSD 
means we get a faster commit of the data to stable media. Also bcache assumes 
the cache is always dirty on startup, no matter whether the shutdown was clean 
or dirty, so the code is explicitly designed to resolve the state of the cache 
relative to the backing device. It's actually pretty fascinating work.

It may not be required, but I'd expect we'd want the write cache on the backing 
device disabled. It should still honor write barriers but it kinda seems 
unnecessary and riskier to have it enabled (which is the default with consumer 
drives).

> As I understand, bcache may use write-through for sequential writes, but 
> write-back for random writes. In this case, part of the data may have hit 
> the backing device, other data does only exist in the bcache. If that last 
> transaction is not closed due to power-loss, and then thrown away, we have 
> part of the transaction already written to the backing device that the 
> filesystem does not know of after resume.

In the write through case we should be no worse off than the bare drive in a 
power loss. In the write back case the SSD should have committed more data than 
the HDD could have in the same situation. I don't understand the details of how 
partially successful writes to the backing media are handled when the system 
comes back up. Since bcache is also COW, SSD blocks aren't reused until data is 
committed to the backing device.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2013-12-29 Thread Duncan

Kai Krakow posted on Sun, 29 Dec 2013 22:11:16 +0100 as excerpted:

> Hello list!
> 
> I'm planning to buy a small SSD (around 60GB) and use it for bcache in
> front of my 3x 1TB HDD btrfs setup (mraid1+draid0) using write-back
> caching. Btrfs is my root device, thus the system must be able to boot
> from bcache using init ramdisk. My /boot is a separate filesystem
> outside of btrfs and will be outside of bcache. I am using Gentoo as my
> system.

Gentooer here too. =:^)

> I have a few questions:
> 
> * How stable is it? I've read about some csum errors lately...

FWIW, both bcache and btrfs are new and still developing technology.  
While I'm using btrfs here, I have tested usable (which for root means 
either means directly bootable or that you have tested booting to a 
recovery image and restoring from there, I do the former, here) backups, 
as STRONGLY recommended for btrfs in its current state, but haven't had 
to use them.

And I considered bcache previously and might otherwise be using it, but 
at least personally, I'm not willing to try BOTH of them at once, since 
neither one is mature yet and if there are problems as there very well 
might be, I'd have the additional issue of figuring out which one was the 
problem, and I'm personally not prepared to deal with that.

Instead, at this point I'd recommend choosing /either/ bcache /or/ btrfs, 
and using bcache with a more mature filesystem like ext4 or (what I used 
for years previous and still use for spinning rust) reiserfs.

And as I said, keep your backups as current as you're willing to deal 
with losing what's not backed up, and tested usable and (for root) either 
bootable or restorable from alternate boot, because while at least btrfs 
is /reasonably/ stable for /ordinary/ daily use, there remain corner-
cases and you never know when your case is going to BE a corner-case!

> * I want to migrate my current storage to bcache without replaying a
> backup.  Is it possible?

Since I've not actually used bcache, I won't try to answer some of these, 
but will answer based on what I've seen on the list where I can...  I 
don't know on this one.

> * Did others already use it? What is the perceived performance for
> desktop workloads in comparision to not using bcache?

Others are indeed already using it.  I've seen some btrfs/bcache problems 
reported on this list, but as mentioned above, when both are in use that 
means figuring out which is the problem, and at least from the btrfs side 
I've not seen a lot of resolution in that regard.  From here it /looks/ 
like that's simply being punted at this time, as there's still more 
easily traceable problems without the additional bcache variable to work 
on first.  But it's quite possible the bcache list is actively tackling 
btrfs/bache combination problems, as I'm not subscribed there.

So I can't answer the desktop performance comparison question directly, 
but given that I /am/ running btrfs on SSD, I /can/ say I'm quite happy 
with that. =:^)

Keep in mind...

We're talking storage cache here.  Given the cost of memory and common 
system configurations these days, 4-16 gig of memory on a desktop isn't 
unusual or cost prohibitive, and a common desktop working set should well 
fit.

I suspect my desktop setup, 16 gigs memory backing a 6-core AMD fx6100 
(bulldozer-1) @ 3.6 GHz, is probably a bit toward the high side even for 
a gentooer, but not inordinately so.  Based on my usage...

Typical app memory usage runs 1-2 GiB (that's with KDE 4.12.49. from 
the gentoo/kde overlay, but USE=-semantic-desktop, etc).  Buffer memory 
runs a few MiB but isn't normally significant, so it can fold into that 
same 1-2 GiB too.

That leaves a full 14 GiB for cache.  But at least with /my/ usage, 
normal non-update cache memory usage tends to be below ~6 GiB too, so 
total apps/buffer/cache memory usage tends to be below 8 GiB as well.

When I'm doing multi-job builds or working with big media files, I'll 
sometimes go above 8 gig usage, and that occasional cache-spill was why I 
upgraded to 16 gig.  But in practice, 10 gig would take care of that most 
of the time, and were it not for the "accident" of powers-of-two meaning 
16 gig is the notch above 8 gig, 10 or 12 gig would be plenty.  Truth be 
told, I so seldom use that last 4 gig that it's almost embarrassing.

* Tho if I ran multi-GiB VMs that'd use up that extra memory real fast!  
But while that /is/ becoming more common, I'm not exactly sure I'd 
classify 4 gigs plus of VM usage as "desktop" usage just yet.  
Workstation, yes, and definitely server, but not really desktop.

All that as background to this...

* Cache works only after first access.  If you only access something 
occasionally, it may not be worth caching at al

Re: Migrate to bcache: A few questions

2013-12-30 Thread Marc MERLIN

On Mon, Dec 30, 2013 at 02:22:55AM +0100, Kai Krakow wrote:
> These thought are actually quite interesting. So you are saying that data 
> may not be fully written to SSD although the kernel thinks so? This is 

That, and worse.

Incidently, I have just posted on my G+ about this:
https://plus.google.com/106981743284611658289/posts/Us8yjK9SPs6

which is mostly links to
http://lkcl.net/reports/ssd_analysis.html
https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault

After you read those, you'll never think twice about SSDs and data loss
anymore :-/
(I kind of found that out myself over time too, but these have much more
data than I got myself empirically on a couple of SSDs)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2013-12-30 Thread Austin S Hemmelgarn

On 12/29/2013 04:11 PM, Kai Krakow wrote:
> Hello list!
> 
> I'm planning to buy a small SSD (around 60GB) and use it for bcache in front 
> of my 3x 1TB HDD btrfs setup (mraid1+draid0) using write-back caching. Btrfs 
> is my root device, thus the system must be able to boot from bcache using 
> init ramdisk. My /boot is a separate filesystem outside of btrfs and will be 
> outside of bcache. I am using Gentoo as my system.
> 
> I have a few questions:
> 
> * How stable is it? I've read about some csum errors lately...
> 
> * I want to migrate my current storage to bcache without replaying a backup.
>   Is it possible?
> 
> * Did others already use it? What is the perceived performance for desktop
>   workloads in comparision to not using bcache?
> 
> * How well does bcache handle power outages? Btrfs does handle them very
>   well since many months.
> 
> * How well does it play with dracut as initrd? Is it as simple as telling it
>   the new device nodes or is there something complicate to configure?
> 
> * How does bcache handle a failing SSD when it starts to wear out in a few
>   years?
> 
> * Is it worth waiting for hot-relocation support in btrfs to natively use
>   a SSD as cache?
> 
> * Would you recommend going with a bigger/smaller SSD? I'm planning to use
>   only 75% of it for bcache so wear-leveling can work better, maybe use
>   another part of it for hibernation (suspend to disk).
I've actually tried a simmilar configuration myself a couple of times
(also using Gentoo in-fact), and I can tell you from experience that
unless things have changed greatly since kernel 3.12.1, it really isn't
worth the headaches.  Setting it up on an already installed system is a
serious pain because the backing device has to be reformatted with a
bcache super-block.  In addition, every kernel that I have tried that
had bcache compiled in or loaded as a module had issues, I would see a
kernel OOPS on average once a day from the bcache code, usually followed
shortly by a panic from some other unrelated subsystem.  I didn't get
any actual data corruption, but I wasn't using btrfs at the time for any
of my filesystems.

As an alternative to using bcache, you might try something simmilar to
the following:
64G SSD with /boot, /, and /usr
Other HDD with /var, /usr/portage, /usr/src, and /home
tmpfs or ramdisk for /tmp and /var/tmp
This is essentially what I use now, and I have found that it
significantly improves system performance.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2013-12-30 Thread Kai Krakow

Marc MERLIN  schrieb:

> On Mon, Dec 30, 2013 at 02:22:55AM +0100, Kai Krakow wrote:
>> These thought are actually quite interesting. So you are saying that data
>> may not be fully written to SSD although the kernel thinks so? This is
> 
> That, and worse.
> 
> Incidently, I have just posted on my G+ about this:
> https://plus.google.com/106981743284611658289/posts/Us8yjK9SPs6
> 
> which is mostly links to
> http://lkcl.net/reports/ssd_analysis.html
> https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault
> 
> After you read those, you'll never think twice about SSDs and data loss
> anymore :-/
> (I kind of found that out myself over time too, but these have much more
> data than I got myself empirically on a couple of SSDs)

The bad thing here is: Even battery-backed RAID controllers won't help you 
here. I start to understand why I still don't trust this new technology 
entirely.

Thanks,
Kai

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2013-12-30 Thread Kai Krakow

Duncan <1i5t5.dun...@cox.net> schrieb:

[ spoiler: tldr ;-) ]

>> * How stable is it? I've read about some csum errors lately...
> 
> FWIW, both bcache and btrfs are new and still developing technology.
> While I'm using btrfs here, I have tested usable (which for root means
> either means directly bootable or that you have tested booting to a
> recovery image and restoring from there, I do the former, here) backups,
> as STRONGLY recommended for btrfs in its current state, but haven't had
> to use them.
> 
> And I considered bcache previously and might otherwise be using it, but
> at least personally, I'm not willing to try BOTH of them at once, since
> neither one is mature yet and if there are problems as there very well
> might be, I'd have the additional issue of figuring out which one was the
> problem, and I'm personally not prepared to deal with that.

I mostly trust btrfs by now. Don't understand me wrong: I still have my 
nightly backup job syncing the complete system to an external drive - 
nothing defeats a good backup. But btrfs has survived reliably multiple 
power-losses, kernel panics/freezes, unreliable USB connections, ... It 
looks very stable from that view. Yes, it may have bugs that may introduce 
errors fatal to the filesystem structure. But generally, under usual 
workloads it has proven stable for me. At least for desktop workloads.
 
> Instead, at this point I'd recommend choosing /either/ bcache /or/ btrfs,
> and using bcache with a more mature filesystem like ext4 or (what I used
> for years previous and still use for spinning rust) reiserfs.

I've used reiserfs for several years a long time ago. But it does absolutely 
not scale well for parallel/threaded workloads which is a show stopper for 
server workloads. But it always survived even the worst failure scenarios 
(like SCSI bus going offline for some RAID members) and the tools 
distributed with it were able to recover all data even if the FS was damaged 
beyond any usual things you would normally try when it does no longer mount. 
I've been with Ext3 before, and it was not only one time that a simple 
power-loss during high server-workload destroyed the filesystem beyond 
repair with fsck only making it worse.

Since reiserfs did not scale well and ext* FS has annoyed me more than once, 
we've decided to go with XFS. While it tends to wipe some data after power-
loss and leaves you with zero-filled files, it has proven extremely reliable 
even under those situations mentioned above like dying SCSI bus. Not to the 
extent reiserfs did but still very satisfying. The big plus: it scales 
extremely well with parallel workloads and can be optimized for the stripe 
configuration of the underlying RAID layer. So I made it my default 
filesystem for desktop, too. With the above mentioned annoying "feature" of 
zero'ing out recently touched files when the system crashed. But well, we 
all got proven backups, right? Yep, I also learned that lesson... *sigh

But btrfs, when first announced and while I already was jealously looking at 
ZFS, seemed to be the FS of my choice giving me flexible RAID setups, 
snapshots... I'm quite happy with it although it feels slow sometimes. I 
simply threw more RAM at it - now it is okay.


> And as I said, keep your backups as current as you're willing to deal
> with losing what's not backed up, and tested usable and (for root) either
> bootable or restorable from alternate boot, because while at least btrfs
> is /reasonably/ stable for /ordinary/ daily use, there remain corner-
> cases and you never know when your case is going to BE a corner-case!

I've got a small rescue system I can boot which has btrfs-tools and a recent 
kernel to flexible repair, restore, or whatever I want to do with my backup. 
My backup itself is not bootable (although it probably could, if I change 
some configurations files).

>> * I want to migrate my current storage to bcache without replaying a
>> backup.  Is it possible?
> 
> Since I've not actually used bcache, I won't try to answer some of these,
> but will answer based on what I've seen on the list where I can...  I
> don't know on this one.

I remember someone created some pyhton scripts to make it possible - wrt to 
btrfs especially. Can't remember the link. Maybe I'm able to dig it up. But 
at least I read it as: There's no improvement on that migration path 
directly from bcache. I hoped otherwise...

>> * Did others already use it? What is the perceived performance for
>> desktop workloads in comparision to not using bcache?
> 
> Others are indeed already using it.  I've seen some btrfs/bcache problems
> reported on this list, but as mentioned above, when both are in use that
> means figuring out which is the

Re: Migrate to bcache: A few questions

2014-01-01 Thread Duncan

Austin S Hemmelgarn posted on Mon, 30 Dec 2013 11:02:21 -0500 as
excerpted:

> I've actually tried a simmilar configuration myself a couple of times
> (also using Gentoo in-fact), and I can tell you from experience that
> unless things have changed greatly since kernel 3.12.1, it really isn't
> worth the headaches.

Basically what I posted, but "now with added real experience!" (TM) =:^)

> As an alternative to using bcache, you might try something simmilar to
> the following:
> 64G SSD with /boot, /, and /usr Other HDD with /var, /usr/portage,
> /usr/src, and /home tmpfs or ramdisk for /tmp and /var/tmp
> This is essentially what I use now, and I have found that it
> significantly improves system performance.

Again, very similar to my own recommendation.  Nice to see others saying 
the same thing. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2014-01-01 Thread Austin S Hemmelgarn

On 12/30/2013 11:02 AM, Austin S Hemmelgarn wrote:
> 
> As an alternative to using bcache, you might try something simmilar to
> the following:
> 64G SSD with /boot, /, and /usr
> Other HDD with /var, /usr/portage, /usr/src, and /home
> tmpfs or ramdisk for /tmp and /var/tmp
> This is essentially what I use now, and I have found that it
> significantly improves system performance.
> 
On this specific note, I would actually suggest against putting the 
portage tree on btrfs, it makes syncing go ridiculously slow, 
and it also seems to slow down emerge as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2014-01-02 Thread Duncan

Austin S Hemmelgarn posted on Wed, 01 Jan 2014 15:12:40 -0500 as
excerpted:

> On 12/30/2013 11:02 AM, Austin S Hemmelgarn wrote:
>> 
>> As an alternative to using bcache, you might try something simmilar to
>> the following:
>> 64G SSD with /boot, /, and /usr Other HDD with /var, /usr/portage,
>> /usr/src, and /home tmpfs or ramdisk for /tmp and /var/tmp
>> This is essentially what I use now, and I have found that it
>> significantly improves system performance.
>> 
> On this specific note, I would actually suggest against putting the
> portage tree on btrfs, it makes syncing go ridiculously slow,
> and it also seems to slow down emerge as well.

Interesting observation.

I had not see it here (with the gentoo tree and overlays on btrfs), but 
that's very likely because all my btrfs are on SSD, as I upgraded to both 
at the same time, because my previous default filesystem choice, 
reiserfs, isn't well suited to SSD due to excessive writing due to the 
journaling.

I do know slow syncs and portage dep-calculations were one of the reasons 
I switched to SSD (and thus btrfs), however.  That was getting pretty 
painful on spinning rust, at least with reiserfs.  And I imagine btrfs on 
single-device spinning rust would if anything be worse at least for 
syncs, due to the default dup metadata, meaning at least three writes 
(and three seeks) for each file, once for the data, twice for the 
metadata.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2014-01-02 Thread Austin S Hemmelgarn

On 2014-01-02 03:49, Duncan wrote:
> Austin S Hemmelgarn posted on Wed, 01 Jan 2014 15:12:40 -0500 as
> excerpted:
> 
>> On 12/30/2013 11:02 AM, Austin S Hemmelgarn wrote:
>>>
>>> As an alternative to using bcache, you might try something simmilar to
>>> the following:
>>> 64G SSD with /boot, /, and /usr Other HDD with /var, /usr/portage,
>>> /usr/src, and /home tmpfs or ramdisk for /tmp and /var/tmp
>>> This is essentially what I use now, and I have found that it
>>> significantly improves system performance.
>>>
>> On this specific note, I would actually suggest against putting the
>> portage tree on btrfs, it makes syncing go ridiculously slow,
>> and it also seems to slow down emerge as well.
> 
> Interesting observation.
> 
> I had not see it here (with the gentoo tree and overlays on btrfs), but 
> that's very likely because all my btrfs are on SSD, as I upgraded to both 
> at the same time, because my previous default filesystem choice, 
> reiserfs, isn't well suited to SSD due to excessive writing due to the 
> journaling.
> 
> I do know slow syncs and portage dep-calculations were one of the reasons 
> I switched to SSD (and thus btrfs), however.  That was getting pretty 
> painful on spinning rust, at least with reiserfs.  And I imagine btrfs on 
> single-device spinning rust would if anything be worse at least for 
> syncs, due to the default dup metadata, meaning at least three writes 
> (and three seeks) for each file, once for the data, twice for the 
> metadata.
> 
I think the triple seek+write is probably the biggest offender in my
case, although COW and autodefrag probably don't help either.  I'm kind
of hesitant to put stuff that gets changed daily on a SSD, so I've ended
up putting portage on ext4 with no journaling (which out-performs every
other filesystem I have tested WRT write performance).  As for the
dep-calculations, I have 16G of ram, so I just use a script to read the
entire tree into the page cache after each sync.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrate to bcache: A few questions

2014-01-02 Thread Duncan

Austin S Hemmelgarn posted on Thu, 02 Jan 2014 07:36:06 -0500 as
excerpted:

> I think the triple seek+write is probably the biggest offender in my
> case, although COW and autodefrag probably don't help either.  I'm kind
> of hesitant to put stuff that gets changed daily on a SSD, so I've ended
> up putting portage on ext4 with no journaling (which out-performs every
> other filesystem I have tested WRT write performance).

I went ahead with the gentoo tree and overlays on SSD, because... well, 
they need the fast access that SSD provides, and if I can't use the SSD 
for its good points, why did I buy it in the first place?

It's also worth noting that only a few files change on a day to day 
basis.  Most of the tree remains as it is.  Similarly with the git pack 
files behind the overlays (and live-git sources) -- once they reach a 
certain point they stop changing and all the changes go into a new file.

Further, most reports I've seen suggest that daily changes on some 
reasonably small part of an SSD aren't a huge problem... given wear-
leveling and an estimated normal lifetime of say three to five years 
before they're replaced with new hardware anyway, daily changes simply 
shouldn't be an issue.  It's worth keeping limited-write-cycles in mind 
and minimizing them where possible, but it's not quite the big thing a 
lot of people make it out to be.

Additionally, I'm near 100% overprovisioned, giving the SSDs lots of room 
to do that wear-leveling, so...

Meanwhile, are you using tail packing on that ext4?  The idea of wasting 
all that space due to all those small files has always been a downer for 
me and others, and is the reason many of us used reiserfs for many 
years.  I guess ext4 now does have tail packing or some similar solution, 
but I do wonder how much that tail packing affects performance.

Of course it'd also be possible to run reiserfs without tail packing, and 
even without journaling.  But somehow I always thought what was the point 
of running reiser, if those were disabled.

Anyway, I'd find it interesting to benchmark what the effect of 
tailpacking (or whatever ext4 calls it) on no-journal ext4 for the gentoo 
tree actually was.  If you happen to know, or happen to be inspired to 
run those benchmarks now, I'd be interested...

> As for the
> dep-calculations, I have 16G of ram, so I just use a script to read the
> entire tree into the page cache after each sync.

With 16 gig RAM, won't the sync have pulled everything into page-cache 
already?  That has always seemed to be the case here.  Running an emerge 
--deep --upgrade --newuse @world here after a sync shows very little disk 
activity and takes far less time than trying the same thing after an 
unmount/remount, thus cold-cache.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 21/35] bcache: set bi_op to REQ_OP

2016-02-24 Thread mchristi

From: Mike Christie 

This patch has bcache set the bio bi_op to a REQ_OP, and rq_flag_bits
to bi_rw.

This patch is compile tested only

Signed-off-by: Mike Christie 
---
 drivers/md/bcache/btree.c |  2 ++
 drivers/md/bcache/debug.c |  2 ++
 drivers/md/bcache/io.c|  2 +-
 drivers/md/bcache/journal.c   |  7 ---
 drivers/md/bcache/movinggc.c  |  2 +-
 drivers/md/bcache/request.c   |  9 +
 drivers/md/bcache/super.c | 26 +++---
 drivers/md/bcache/writeback.c |  4 ++--
 8 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 22b9e34..752a44f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -295,6 +295,7 @@ static void bch_btree_node_read(struct btree *b)
closure_init_stack(&cl);
 
bio = bch_bbio_alloc(b->c);
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bio->bi_iter.bi_size = KEY_SIZE(&b->key) << 9;
bio->bi_end_io  = btree_node_read_endio;
@@ -397,6 +398,7 @@ static void do_btree_node_write(struct btree *b)
 
b->bio->bi_end_io   = btree_node_write_endio;
b->bio->bi_private  = cl;
+   b->bio->bi_op   = REQ_OP_WRITE;
b->bio->bi_rw   = REQ_META|WRITE_SYNC|REQ_FUA;
b->bio->bi_iter.bi_size = roundup(set_bytes(i), block_bytes(b->c));
bch_bio_map(b->bio, i);
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 52b6bcf..8df9e66 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -52,6 +52,7 @@ void bch_btree_verify(struct btree *b)
bio->bi_bdev= PTR_CACHE(b->c, &b->key, 0)->bdev;
bio->bi_iter.bi_sector  = PTR_OFFSET(&b->key, 0);
bio->bi_iter.bi_size= KEY_SIZE(&v->key) << 9;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bch_bio_map(bio, sorted);
 
@@ -114,6 +115,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
check = bio_clone(bio, GFP_NOIO);
if (!check)
return;
+   check->bi_op = REQ_OP_READ;
check->bi_rw |= READ_SYNC;
 
if (bio_alloc_pages(check, GFP_NOIO))
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 86a0bb8..f10a9a0 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -111,7 +111,7 @@ void bch_bbio_count_io_errors(struct cache_set *c, struct 
bio *bio,
struct bbio *b = container_of(bio, struct bbio, bio);
struct cache *ca = PTR_CACHE(c, &b->key, 0);
 
-   unsigned threshold = bio->bi_rw & REQ_WRITE
+   unsigned threshold = op_is_write(bio->bi_op)
? c->congested_write_threshold_us
: c->congested_read_threshold_us;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index af3f9f7..68fa0f0 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -54,7 +54,7 @@ reread:   left = ca->sb.bucket_size - offset;
bio_reset(bio);
bio->bi_iter.bi_sector  = bucket + offset;
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = READ;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_iter.bi_size= len << 9;
 
bio->bi_end_io  = journal_read_endio;
@@ -452,7 +452,7 @@ static void do_journal_discard(struct cache *ca)
bio->bi_iter.bi_sector  = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_DISCARD;
+   bio->bi_op  = REQ_OP_DISCARD;
bio->bi_max_vecs= 1;
bio->bi_io_vec  = bio->bi_inline_vecs;
bio->bi_iter.bi_size= bucket_bytes(ca);
@@ -626,7 +626,8 @@ static void journal_write_unlocked(struct closure *cl)
bio_reset(bio);
bio->bi_iter.bi_sector  = PTR_OFFSET(k, i);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
+   bio->bi_op  = REQ_OP_WRITE;
+   bio->bi_rw  = REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
    bio->bi_iter.bi_size = sectors << 9;
 
bio->bi_end_io  = journal_write_endio;
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index b929fc9..f33860a 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -163,7 +163,7 @@ static void read_moving(struct cache_set *c)
moving_

[PATCH 21/42] bcache: set bi_op to REQ_OP

2016-04-13 Thread mchristi

From: Mike Christie 

This patch has bcache use bio->bi_op for REQ_OPs and rq_flag_bits
to bio->bi_rw.

Signed-off-by: Mike Christie 
Reviewed-by: Christoph Hellwig 
---
 drivers/md/bcache/btree.c |  2 ++
 drivers/md/bcache/debug.c |  2 ++
 drivers/md/bcache/io.c|  2 +-
 drivers/md/bcache/journal.c   |  7 ---
 drivers/md/bcache/movinggc.c  |  2 +-
 drivers/md/bcache/request.c   |  9 +
 drivers/md/bcache/super.c | 26 +++---
 drivers/md/bcache/writeback.c |  4 ++--
 8 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 22b9e34..752a44f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -295,6 +295,7 @@ static void bch_btree_node_read(struct btree *b)
closure_init_stack(&cl);
 
bio = bch_bbio_alloc(b->c);
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bio->bi_iter.bi_size = KEY_SIZE(&b->key) << 9;
bio->bi_end_io  = btree_node_read_endio;
@@ -397,6 +398,7 @@ static void do_btree_node_write(struct btree *b)
 
b->bio->bi_end_io   = btree_node_write_endio;
b->bio->bi_private  = cl;
+   b->bio->bi_op   = REQ_OP_WRITE;
b->bio->bi_rw   = REQ_META|WRITE_SYNC|REQ_FUA;
b->bio->bi_iter.bi_size = roundup(set_bytes(i), block_bytes(b->c));
bch_bio_map(b->bio, i);
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 52b6bcf..8df9e66 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -52,6 +52,7 @@ void bch_btree_verify(struct btree *b)
bio->bi_bdev= PTR_CACHE(b->c, &b->key, 0)->bdev;
bio->bi_iter.bi_sector  = PTR_OFFSET(&b->key, 0);
bio->bi_iter.bi_size= KEY_SIZE(&v->key) << 9;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bch_bio_map(bio, sorted);
 
@@ -114,6 +115,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
check = bio_clone(bio, GFP_NOIO);
if (!check)
return;
+   check->bi_op = REQ_OP_READ;
    check->bi_rw |= READ_SYNC;
 
if (bio_alloc_pages(check, GFP_NOIO))
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 86a0bb8..f10a9a0 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -111,7 +111,7 @@ void bch_bbio_count_io_errors(struct cache_set *c, struct 
bio *bio,
struct bbio *b = container_of(bio, struct bbio, bio);
struct cache *ca = PTR_CACHE(c, &b->key, 0);
 
-   unsigned threshold = bio->bi_rw & REQ_WRITE
+   unsigned threshold = op_is_write(bio->bi_op)
    ? c->congested_write_threshold_us
: c->congested_read_threshold_us;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index af3f9f7..68fa0f0 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -54,7 +54,7 @@ reread:   left = ca->sb.bucket_size - offset;
bio_reset(bio);
bio->bi_iter.bi_sector  = bucket + offset;
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = READ;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_iter.bi_size= len << 9;
 
bio->bi_end_io  = journal_read_endio;
@@ -452,7 +452,7 @@ static void do_journal_discard(struct cache *ca)
bio->bi_iter.bi_sector  = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_DISCARD;
+   bio->bi_op  = REQ_OP_DISCARD;
bio->bi_max_vecs= 1;
bio->bi_io_vec  = bio->bi_inline_vecs;
bio->bi_iter.bi_size= bucket_bytes(ca);
@@ -626,7 +626,8 @@ static void journal_write_unlocked(struct closure *cl)
bio_reset(bio);
bio->bi_iter.bi_sector  = PTR_OFFSET(k, i);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
+   bio->bi_op  = REQ_OP_WRITE;
+       bio->bi_rw  = REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
    bio->bi_iter.bi_size = sectors << 9;
 
bio->bi_end_io  = journal_write_endio;
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index b929fc9..f33860a 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -163,7 +163,7 @@ static void read_moving(struct cache_set *c)
moving_i

[PATCH 21/42] bcache: set bi_op to REQ_OP

2016-04-15 Thread mchristi

From: Mike Christie 

This patch has bcache use bio->bi_op for REQ_OPs and rq_flag_bits
to bio->bi_rw.

Signed-off-by: Mike Christie 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/bcache/btree.c |  2 ++
 drivers/md/bcache/debug.c |  2 ++
 drivers/md/bcache/io.c|  2 +-
 drivers/md/bcache/journal.c   |  7 ---
 drivers/md/bcache/movinggc.c  |  2 +-
 drivers/md/bcache/request.c   |  9 +
 drivers/md/bcache/super.c | 26 +++---
 drivers/md/bcache/writeback.c |  4 ++--
 8 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 22b9e34..752a44f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -295,6 +295,7 @@ static void bch_btree_node_read(struct btree *b)
closure_init_stack(&cl);
 
bio = bch_bbio_alloc(b->c);
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bio->bi_iter.bi_size = KEY_SIZE(&b->key) << 9;
bio->bi_end_io  = btree_node_read_endio;
@@ -397,6 +398,7 @@ static void do_btree_node_write(struct btree *b)
 
b->bio->bi_end_io   = btree_node_write_endio;
b->bio->bi_private  = cl;
+   b->bio->bi_op   = REQ_OP_WRITE;
b->bio->bi_rw   = REQ_META|WRITE_SYNC|REQ_FUA;
b->bio->bi_iter.bi_size = roundup(set_bytes(i), block_bytes(b->c));
bch_bio_map(b->bio, i);
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 52b6bcf..8df9e66 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -52,6 +52,7 @@ void bch_btree_verify(struct btree *b)
bio->bi_bdev= PTR_CACHE(b->c, &b->key, 0)->bdev;
bio->bi_iter.bi_sector  = PTR_OFFSET(&b->key, 0);
bio->bi_iter.bi_size= KEY_SIZE(&v->key) << 9;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bch_bio_map(bio, sorted);
 
@@ -114,6 +115,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
check = bio_clone(bio, GFP_NOIO);
if (!check)
return;
+   check->bi_op = REQ_OP_READ;
    check->bi_rw |= READ_SYNC;
 
if (bio_alloc_pages(check, GFP_NOIO))
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 86a0bb8..f10a9a0 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -111,7 +111,7 @@ void bch_bbio_count_io_errors(struct cache_set *c, struct 
bio *bio,
struct bbio *b = container_of(bio, struct bbio, bio);
struct cache *ca = PTR_CACHE(c, &b->key, 0);
 
-   unsigned threshold = bio->bi_rw & REQ_WRITE
+   unsigned threshold = op_is_write(bio->bi_op)
    ? c->congested_write_threshold_us
: c->congested_read_threshold_us;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index af3f9f7..68fa0f0 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -54,7 +54,7 @@ reread:   left = ca->sb.bucket_size - offset;
bio_reset(bio);
bio->bi_iter.bi_sector  = bucket + offset;
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = READ;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_iter.bi_size= len << 9;
 
bio->bi_end_io  = journal_read_endio;
@@ -452,7 +452,7 @@ static void do_journal_discard(struct cache *ca)
bio->bi_iter.bi_sector  = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_DISCARD;
+   bio->bi_op  = REQ_OP_DISCARD;
bio->bi_max_vecs= 1;
bio->bi_io_vec  = bio->bi_inline_vecs;
bio->bi_iter.bi_size= bucket_bytes(ca);
@@ -626,7 +626,8 @@ static void journal_write_unlocked(struct closure *cl)
bio_reset(bio);
bio->bi_iter.bi_sector  = PTR_OFFSET(k, i);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
+   bio->bi_op  = REQ_OP_WRITE;
+       bio->bi_rw  = REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
    bio->bi_iter.bi_size = sectors << 9;
 
bio->bi_end_io  = journal_write_endio;
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index b929fc9..f33860a 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -163,7 +163,7 @@ static void read_moving(struc

[PATCH 21/42] bcache: set bi_op to REQ_OP

2016-04-15 Thread mchristi

From: Mike Christie 

This patch has bcache use bio->bi_op for REQ_OPs and rq_flag_bits
to bio->bi_rw.

Signed-off-by: Mike Christie 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
---
 drivers/md/bcache/btree.c |  2 ++
 drivers/md/bcache/debug.c |  2 ++
 drivers/md/bcache/io.c|  2 +-
 drivers/md/bcache/journal.c   |  7 ---
 drivers/md/bcache/movinggc.c  |  2 +-
 drivers/md/bcache/request.c   |  9 +
 drivers/md/bcache/super.c | 26 +++---
 drivers/md/bcache/writeback.c |  4 ++--
 8 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 22b9e34..752a44f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -295,6 +295,7 @@ static void bch_btree_node_read(struct btree *b)
closure_init_stack(&cl);
 
bio = bch_bbio_alloc(b->c);
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bio->bi_iter.bi_size = KEY_SIZE(&b->key) << 9;
bio->bi_end_io  = btree_node_read_endio;
@@ -397,6 +398,7 @@ static void do_btree_node_write(struct btree *b)
 
b->bio->bi_end_io   = btree_node_write_endio;
b->bio->bi_private  = cl;
+   b->bio->bi_op   = REQ_OP_WRITE;
b->bio->bi_rw   = REQ_META|WRITE_SYNC|REQ_FUA;
b->bio->bi_iter.bi_size = roundup(set_bytes(i), block_bytes(b->c));
bch_bio_map(b->bio, i);
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 52b6bcf..8df9e66 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -52,6 +52,7 @@ void bch_btree_verify(struct btree *b)
bio->bi_bdev= PTR_CACHE(b->c, &b->key, 0)->bdev;
bio->bi_iter.bi_sector  = PTR_OFFSET(&b->key, 0);
bio->bi_iter.bi_size= KEY_SIZE(&v->key) << 9;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bch_bio_map(bio, sorted);
 
@@ -114,6 +115,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
check = bio_clone(bio, GFP_NOIO);
if (!check)
return;
+   check->bi_op = REQ_OP_READ;
    check->bi_rw |= READ_SYNC;
 
if (bio_alloc_pages(check, GFP_NOIO))
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 86a0bb8..f10a9a0 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -111,7 +111,7 @@ void bch_bbio_count_io_errors(struct cache_set *c, struct 
bio *bio,
struct bbio *b = container_of(bio, struct bbio, bio);
struct cache *ca = PTR_CACHE(c, &b->key, 0);
 
-   unsigned threshold = bio->bi_rw & REQ_WRITE
+   unsigned threshold = op_is_write(bio->bi_op)
    ? c->congested_write_threshold_us
: c->congested_read_threshold_us;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index af3f9f7..68fa0f0 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -54,7 +54,7 @@ reread:   left = ca->sb.bucket_size - offset;
bio_reset(bio);
bio->bi_iter.bi_sector  = bucket + offset;
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = READ;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_iter.bi_size= len << 9;
 
bio->bi_end_io  = journal_read_endio;
@@ -452,7 +452,7 @@ static void do_journal_discard(struct cache *ca)
bio->bi_iter.bi_sector  = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_DISCARD;
+   bio->bi_op  = REQ_OP_DISCARD;
bio->bi_max_vecs= 1;
bio->bi_io_vec  = bio->bi_inline_vecs;
bio->bi_iter.bi_size= bucket_bytes(ca);
@@ -626,7 +626,8 @@ static void journal_write_unlocked(struct closure *cl)
bio_reset(bio);
bio->bi_iter.bi_sector  = PTR_OFFSET(k, i);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
+   bio->bi_op  = REQ_OP_WRITE;
+       bio->bi_rw  = REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
    bio->bi_iter.bi_size = sectors << 9;
 
bio->bi_end_io  = journal_write_endio;
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index b929fc9..f33860a 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -163,7 +163,7 @@ static void read_moving(struc

[PATCH 21/35] bcache: set bi_op to REQ_OP

2016-01-05 Thread mchristi

From: Mike Christie 

This patch has bcache set the bio bi_op to a REQ_OP, and rq_flag_bits
to bi_rw.

This patch is compile tested only.

Signed-off-by: Mike Christie 
---
 drivers/md/bcache/btree.c |  2 ++
 drivers/md/bcache/debug.c |  2 ++
 drivers/md/bcache/io.c|  2 +-
 drivers/md/bcache/journal.c   |  7 ---
 drivers/md/bcache/movinggc.c  |  2 +-
 drivers/md/bcache/request.c   |  9 +
 drivers/md/bcache/super.c | 26 +++---
 drivers/md/bcache/writeback.c |  4 ++--
 8 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 22b9e34..752a44f 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -295,6 +295,7 @@ static void bch_btree_node_read(struct btree *b)
closure_init_stack(&cl);
 
bio = bch_bbio_alloc(b->c);
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bio->bi_iter.bi_size = KEY_SIZE(&b->key) << 9;
bio->bi_end_io  = btree_node_read_endio;
@@ -397,6 +398,7 @@ static void do_btree_node_write(struct btree *b)
 
b->bio->bi_end_io   = btree_node_write_endio;
b->bio->bi_private  = cl;
+   b->bio->bi_op   = REQ_OP_WRITE;
b->bio->bi_rw   = REQ_META|WRITE_SYNC|REQ_FUA;
b->bio->bi_iter.bi_size = roundup(set_bytes(i), block_bytes(b->c));
bch_bio_map(b->bio, i);
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index db68562..4c48783 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -52,6 +52,7 @@ void bch_btree_verify(struct btree *b)
bio->bi_bdev= PTR_CACHE(b->c, &b->key, 0)->bdev;
bio->bi_iter.bi_sector  = PTR_OFFSET(&b->key, 0);
bio->bi_iter.bi_size= KEY_SIZE(&v->key) << 9;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  |= REQ_META|READ_SYNC;
bch_bio_map(bio, sorted);
 
@@ -114,6 +115,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
check = bio_clone(bio, GFP_NOIO);
if (!check)
return;
+   check->bi_op = REQ_OP_READ;
check->bi_rw |= READ_SYNC;
 
if (bio_alloc_pages(check, GFP_NOIO))
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 86a0bb8..f10a9a0 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -111,7 +111,7 @@ void bch_bbio_count_io_errors(struct cache_set *c, struct 
bio *bio,
struct bbio *b = container_of(bio, struct bbio, bio);
struct cache *ca = PTR_CACHE(c, &b->key, 0);
 
-   unsigned threshold = bio->bi_rw & REQ_WRITE
+   unsigned threshold = op_is_write(bio->bi_op)
? c->congested_write_threshold_us
: c->congested_read_threshold_us;
 
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index af3f9f7..68fa0f0 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -54,7 +54,7 @@ reread:   left = ca->sb.bucket_size - offset;
bio_reset(bio);
bio->bi_iter.bi_sector  = bucket + offset;
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = READ;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_iter.bi_size= len << 9;
 
bio->bi_end_io  = journal_read_endio;
@@ -452,7 +452,7 @@ static void do_journal_discard(struct cache *ca)
bio->bi_iter.bi_sector  = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_DISCARD;
+   bio->bi_op  = REQ_OP_DISCARD;
bio->bi_max_vecs= 1;
bio->bi_io_vec  = bio->bi_inline_vecs;
bio->bi_iter.bi_size= bucket_bytes(ca);
@@ -626,7 +626,8 @@ static void journal_write_unlocked(struct closure *cl)
bio_reset(bio);
bio->bi_iter.bi_sector  = PTR_OFFSET(k, i);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
+   bio->bi_op  = REQ_OP_WRITE;
+   bio->bi_rw  = REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
    bio->bi_iter.bi_size = sectors << 9;
 
bio->bi_end_io  = journal_write_endio;
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index b929fc9..f33860a 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -163,7 +163,7 @@ static void read_moving(struct cache_set *c)
moving_

Is btrfs on top of bcache stable now?

2015-04-20 Thread Marc MERLIN

On Mon, Apr 20, 2015 at 10:27:05AM +, Hugo Mills wrote:
>See the first issue here: https://btrfs.wiki.kernel.org/index.php/Gotchas

Hi Hugo, looking at the page again, I see 
"bcache + btrfs does not seem to be stable yet"
linking to a thread more than 2 years old and btrfs kernels that
wouldn't be stable without bcache anyway.

I've seen others mention they switched to bcache recently and not seen
new "it's broken" reports.

So, is it ok
1) to assume bcache and btrfs play ok together now?
2) remove the warning from that gotchas page?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 25/45] bcache: use bio op accessors

2016-06-05 Thread mchristi

From: Mike Christie 

Separate the op from the rq_flag_bits and have bcache
set/get the bio using bio_set_op_attrs/bio_op.

Signed-off-by: Mike Christie 
---
 drivers/md/bcache/btree.c |  4 ++--
 drivers/md/bcache/debug.c |  4 ++--
 drivers/md/bcache/journal.c   |  7 ---
 drivers/md/bcache/movinggc.c  |  2 +-
 drivers/md/bcache/request.c   | 14 +++---
 drivers/md/bcache/super.c | 24 +---
 drivers/md/bcache/writeback.c |  4 ++--
 7 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index eab505e..76f7534 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -294,10 +294,10 @@ static void bch_btree_node_read(struct btree *b)
closure_init_stack(&cl);
 
bio = bch_bbio_alloc(b->c);
-   bio->bi_rw  = REQ_META|READ_SYNC;
bio->bi_iter.bi_size = KEY_SIZE(&b->key) << 9;
bio->bi_end_io  = btree_node_read_endio;
bio->bi_private = &cl;
+   bio_set_op_attrs(bio, REQ_OP_READ, REQ_META|READ_SYNC);
 
bch_bio_map(bio, b->keys.set[0].data);
 
@@ -396,8 +396,8 @@ static void do_btree_node_write(struct btree *b)
 
b->bio->bi_end_io   = btree_node_write_endio;
b->bio->bi_private  = cl;
-   b->bio->bi_rw   = REQ_META|WRITE_SYNC|REQ_FUA;
b->bio->bi_iter.bi_size = roundup(set_bytes(i), block_bytes(b->c));
+   bio_set_op_attrs(b->bio, REQ_OP_WRITE, REQ_META|WRITE_SYNC|REQ_FUA);
    bch_bio_map(b->bio, i);
 
/*
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 52b6bcf..c28df164 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -52,7 +52,7 @@ void bch_btree_verify(struct btree *b)
bio->bi_bdev= PTR_CACHE(b->c, &b->key, 0)->bdev;
bio->bi_iter.bi_sector  = PTR_OFFSET(&b->key, 0);
bio->bi_iter.bi_size= KEY_SIZE(&v->key) << 9;
-   bio->bi_rw  = REQ_META|READ_SYNC;
+   bio_set_op_attrs(bio, REQ_OP_READ, REQ_META|READ_SYNC);
bch_bio_map(bio, sorted);
 
submit_bio_wait(bio);
@@ -114,7 +114,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
check = bio_clone(bio, GFP_NOIO);
if (!check)
return;
-   check->bi_rw |= READ_SYNC;
+   bio_set_op_attrs(check, REQ_OP_READ, READ_SYNC);
 
if (bio_alloc_pages(check, GFP_NOIO))
goto out_put;
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index af3f9f7..a3c3b30 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -54,11 +54,11 @@ reread: left = ca->sb.bucket_size - offset;
bio_reset(bio);
bio->bi_iter.bi_sector  = bucket + offset;
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = READ;
bio->bi_iter.bi_size= len << 9;
 
bio->bi_end_io  = journal_read_endio;
bio->bi_private = &cl;
+   bio_set_op_attrs(bio, REQ_OP_READ, 0);
bch_bio_map(bio, data);
 
closure_bio_submit(bio, &cl);
@@ -449,10 +449,10 @@ static void do_journal_discard(struct cache *ca)
atomic_set(&ja->discard_in_flight, DISCARD_IN_FLIGHT);
 
bio_init(bio);
+   bio_set_op_attrs(bio, REQ_OP_DISCARD, 0);
bio->bi_iter.bi_sector  = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_DISCARD;
bio->bi_max_vecs= 1;
bio->bi_io_vec  = bio->bi_inline_vecs;
bio->bi_iter.bi_size= bucket_bytes(ca);
@@ -626,11 +626,12 @@ static void journal_write_unlocked(struct closure *cl)
bio_reset(bio);
bio->bi_iter.bi_sector  = PTR_OFFSET(k, i);
bio->bi_bdev= ca->bdev;
-   bio->bi_rw  = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
bio->bi_iter.bi_size = sectors << 9;
 
bio->bi_end_io  = journal_write_endio;
bio->bi_private = w;
+   bio_set_op_attrs(bio, REQ_OP_WRITE,
+REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA);
bch_bio_map(bio, w->data);
 
trace_bcache_journal_write(bio);
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index b929fc9..1881319 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -163,7 +163,7 @@ static void read_moving(struct cache_set *c)

[PATCH 08/10] bcache: move closures to lib/

2018-05-18 Thread Kent Overstreet

Prep work for bcachefs - being a fork of bcache it also uses closures

Signed-off-by: Kent Overstreet 
---
 drivers/md/bcache/Kconfig  | 10 +-
 drivers/md/bcache/Makefile |  6 +++---
 drivers/md/bcache/bcache.h |  2 +-
 drivers/md/bcache/super.c  |  1 -
 drivers/md/bcache/util.h   |  3 +--
 {drivers/md/bcache => include/linux}/closure.h | 17 -
 lib/Kconfig|  3 +++
 lib/Kconfig.debug  |  9 +
 lib/Makefile   |  2 ++
 {drivers/md/bcache => lib}/closure.c   | 17 -
 10 files changed, 36 insertions(+), 34 deletions(-)
 rename {drivers/md/bcache => include/linux}/closure.h (97%)
 rename {drivers/md/bcache => lib}/closure.c (95%)

diff --git a/drivers/md/bcache/Kconfig b/drivers/md/bcache/Kconfig
index 4d200883c5..45f1094c08 100644
--- a/drivers/md/bcache/Kconfig
+++ b/drivers/md/bcache/Kconfig
@@ -1,6 +1,7 @@
 
 config BCACHE
tristate "Block device as cache"
+   select CLOSURES
---help---
Allows a block device to be used as cache for other devices; uses
a btree for indexing and the layout is optimized for SSDs.
@@ -15,12 +16,3 @@ config BCACHE_DEBUG
 
Enables extra debugging tools, allows expensive runtime checks to be
turned on.
-
-config BCACHE_CLOSURES_DEBUG
-   bool "Debug closures"
-   depends on BCACHE
-   select DEBUG_FS
-   ---help---
-   Keeps all active closures in a linked list and provides a debugfs
-   interface to list them, which makes it possible to see asynchronous
-   operations that get stuck.
diff --git a/drivers/md/bcache/Makefile b/drivers/md/bcache/Makefile
index d26b351958..2b790fb813 100644
--- a/drivers/md/bcache/Makefile
+++ b/drivers/md/bcache/Makefile
@@ -2,8 +2,8 @@
 
 obj-$(CONFIG_BCACHE)   += bcache.o
 
-bcache-y   := alloc.o bset.o btree.o closure.o debug.o extents.o\
-   io.o journal.o movinggc.o request.o stats.o super.o sysfs.o trace.o\
-   util.o writeback.o
+bcache-y   := alloc.o bset.o btree.o debug.o extents.o io.o\
+   journal.o movinggc.o request.o stats.o super.o sysfs.o trace.o util.o\
+   writeback.o
 
 CFLAGS_request.o   += -Iblock
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 12e5197f18..d954dc44dd 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -180,6 +180,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -191,7 +192,6 @@
 
 #include "bset.h"
 #include "util.h"
-#include "closure.h"
 
 struct bucket {
atomic_t    pin;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index f2273143b3..5f1ac8e0a3 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2148,7 +2148,6 @@ static int __init bcache_init(void)
mutex_init(&bch_register_lock);
init_waitqueue_head(&unregister_wait);
register_reboot_notifier(&reboot);
-   closure_debug_init();
 
bcache_major = register_blkdev(0, "bcache");
    if (bcache_major < 0) {
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index a6763db7f0..a75523ed0d 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -4,6 +4,7 @@
 #define _BCACHE_UTIL_H
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -12,8 +13,6 @@
 #include 
 #include 
 
-#include "closure.h"
-
 #define PAGE_SECTORS   (PAGE_SIZE / 512)
 
 struct closure;
diff --git a/drivers/md/bcache/closure.h b/include/linux/closure.h
similarity index 97%
rename from drivers/md/bcache/closure.h
rename to include/linux/closure.h
index 2392a46bcd..1072bf2c13 100644
--- a/drivers/md/bcache/closure.h
+++ b/include/linux/closure.h
@@ -154,7 +154,7 @@ struct closure {
 
atomic_tremaining;
 
-#ifdef CONFIG_BCACHE_CLOSURES_DEBUG
+#ifdef CONFIG_DEBUG_CLOSURES
 #define CLOSURE_MAGIC_DEAD 0xc054dead
 #define CLOSURE_MAGIC_ALIVE0xc054a11e
 
@@ -183,15 +183,13 @@ static inline void closure_sync(struct closure *cl)
__closure_sync(cl);
 }
 
-#ifdef CONFIG_BCACHE_CLOSURES_DEBUG
+#ifdef CONFIG_DEBUG_CLOSURES
 
-void closure_debug_init(void);
 void closure_debug_create(struct closure *cl);
 void closure_debug_destroy(struct closure *cl);
 
 #else
 
-static inline void closure_debug_init(void) {}
 static inline void closure_debug_create(struct closure *cl) {}
 static inline void closure_debug_destroy(struct closure *cl) {}
 
@@ -199,21 +197,21 @@ static inline void closure_debug_destroy(struct closure 
*cl) {}
 
 static inline void closure_set_ip(struct closure *cl)
 {
-#ifdef CONFIG_BCACHE_CLOSURES_DEBUG
+#ifdef CONFIG_DEBUG_CLOSURES
cl->ip = _THIS_IP_;
 #endif
 }

[PATCH 05/12] bcache: convert to bioset_init()/mempool_init()

2018-05-20 Thread Kent Overstreet

Signed-off-by: Kent Overstreet 
---
 drivers/md/bcache/bcache.h  | 10 +-
 drivers/md/bcache/bset.c| 13 -
 drivers/md/bcache/bset.h|  2 +-
 drivers/md/bcache/btree.c   |  4 ++--
 drivers/md/bcache/io.c  |  4 ++--
 drivers/md/bcache/request.c | 18 +-
 drivers/md/bcache/super.c   | 38 ++---
 7 files changed, 37 insertions(+), 52 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 3a0cfb237a..3050438761 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -269,7 +269,7 @@ struct bcache_device {
atomic_t*stripe_sectors_dirty;
unsigned long   *full_dirty_stripes;
 
-   struct bio_set  *bio_split;
+   struct bio_set  bio_split;
 
unsigneddata_csum:1;
 
@@ -528,9 +528,9 @@ struct cache_set {
struct closure  sb_write;
struct semaphoresb_write_mutex;
 
-   mempool_t   *search;
-   mempool_t   *bio_meta;
-   struct bio_set  *bio_split;
+   mempool_t   search;
+   mempool_t   bio_meta;
+   struct bio_set  bio_split;
 
/* For the btree cache */
struct shrinker shrink;
@@ -655,7 +655,7 @@ struct cache_set {
 * A btree node on disk could have too many bsets for an iterator to fit
 * on the stack - have to dynamically allocate them
 */
-   mempool_t   *fill_iter;
+   mempool_t   fill_iter;
 
struct bset_sort_state  sort;
 
diff --git a/drivers/md/bcache/bset.c b/drivers/md/bcache/bset.c
index 579c696a5f..f3403b45bc 100644
--- a/drivers/md/bcache/bset.c
+++ b/drivers/md/bcache/bset.c
@@ -1118,8 +1118,7 @@ struct bkey *bch_btree_iter_next_filter(struct btree_iter 
*iter,
 
 void bch_bset_sort_state_free(struct bset_sort_state *state)
 {
-   if (state->pool)
-   mempool_destroy(state->pool);
+   mempool_exit(&state->pool);
 }
 
 int bch_bset_sort_state_init(struct bset_sort_state *state, unsigned 
page_order)
@@ -1129,11 +1128,7 @@ int bch_bset_sort_state_init(struct bset_sort_state 
*state, unsigned page_order)
state->page_order = page_order;
state->crit_factor = int_sqrt(1 << page_order);
 
-   state->pool = mempool_create_page_pool(1, page_order);
-   if (!state->pool)
-   return -ENOMEM;
-
-   return 0;
+   return mempool_init_page_pool(&state->pool, 1, page_order);
 }
 EXPORT_SYMBOL(bch_bset_sort_state_init);
 
@@ -1191,7 +1186,7 @@ static void __btree_sort(struct btree_keys *b, struct 
btree_iter *iter,
 
BUG_ON(order > state->page_order);
 
-   outp = mempool_alloc(state->pool, GFP_NOIO);
+   outp = mempool_alloc(&state->pool, GFP_NOIO);
out = page_address(outp);
used_mempool = true;
order = state->page_order;
@@ -1220,7 +1215,7 @@ static void __btree_sort(struct btree_keys *b, struct 
btree_iter *iter,
}
 
if (used_mempool)
-   mempool_free(virt_to_page(out), state->pool);
+   mempool_free(virt_to_page(out), &state->pool);
else
    free_pages((unsigned long) out, order);
 
diff --git a/drivers/md/bcache/bset.h b/drivers/md/bcache/bset.h
index 0c24280f3b..b867f22004 100644
--- a/drivers/md/bcache/bset.h
+++ b/drivers/md/bcache/bset.h
@@ -347,7 +347,7 @@ static inline struct bkey *bch_bset_search(struct 
btree_keys *b,
 /* Sorting */
 
 struct bset_sort_state {
-   mempool_t   *pool;
+   mempool_t   pool;
 
unsignedpage_order;
    unsigned    crit_factor;
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 17936b2dc7..2a0968c04e 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -204,7 +204,7 @@ void bch_btree_node_read_done(struct btree *b)
struct bset *i = btree_bset_first(b);
struct btree_iter *iter;
 
-   iter = mempool_alloc(b->c->fill_iter, GFP_NOIO);
+   iter = mempool_alloc(&b->c->fill_iter, GFP_NOIO);
iter->size = b->c->sb.bucket_size / b->c->sb.block_size;
iter->used = 0;
 
@@ -271,7 +271,7 @@ void bch_btree_node_read_done(struct btree *b)
bch_bset_init_next(&b->keys, write_block(b),
   bset_magic(&b->c->sb));
 out:
-   mempool_free(iter, b->c->fill_iter);
+   mempool_free(iter, &b->c->fill_iter);
return;
 err:
set_btree_node_io_error(b);
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 2ddf8515e6..9612873afe 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -17

Re: [RFC PATCH 0/7] bcache: md conversion

2012-05-18 Thread Alex Elsayed

Dan Williams wrote:

> The consensus from LSF was that bcache need not invent a new interface
> when md and dm can both do the job.  As mentioned in patch 7 this series
> aims to be a minimal conversion.  Other refactoring items like
> deprecating register_lock for mddev->reconfig_mutex are deferred.
> 
> This supports assembly of an already established cache array:
> 
> mdadm -A /dev/md/bcache /dev/sd[ab]
> 
> ...will create the /dev/md/bcache container and a subarray representing
> the cache volume.  "Flash-only", or backing-device only volumes were not
> tested.  "Create" support and hot-add/hot-remove come later.
> 
> Note:
> * When attempting to test with small loopback devices (100MB), assembly
>   soft locks in bcache_journal_read().  That hang went away with larger
>   devices, so there seems to be minimum component device size that needs
>   to be considered in the tooling.

Is there any plan to separate the on-disk layout (per-device headers, etc) 
from the logic for the purpose of reuse? I can think of at least one case 
where this would be extremely useful: integration in BtrFS.

BtrFS already has its own methods for making sure a group of devices are all 
present when the filesystem is mounted, so it doesn't really need the 
formatting of the backing device bcache does to prevent it from being 
mounted solo. Putting bcache under BtrFS would be silly in the same way as 
putting it under a raid array, but bcache can't be put on top of BtrFS.

Logically, in looking at BtrFS' architecture, a cache would likely fit best 
at the 'block group' level, which IIUC would be roughly equivalent to the 
recommended 'over raid, under lvm' method of using bcache.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is btrfs on top of bcache stable now?

2015-04-20 Thread Fábio Pfeifer

I'm one of those that used to have problems with btrfs on top of bcache.
After some corruptions, I gave up this setup.

Recently (from February, I think) I gave it another shot, and I have
had no problems since.
I use bcache in writeback mode, with very good performance. I'm
feeling btrfs very stable in this setup.

Best Regards,

Fabio Pfeifer

2015-04-20 11:49 GMT-03:00 Marc MERLIN :
> On Mon, Apr 20, 2015 at 10:27:05AM +, Hugo Mills wrote:
>>See the first issue here: https://btrfs.wiki.kernel.org/index.php/Gotchas
>
> Hi Hugo, looking at the page again, I see
> "bcache + btrfs does not seem to be stable yet"
> linking to a thread more than 2 years old and btrfs kernels that
> wouldn't be stable without bcache anyway.
>
> I've seen others mention they switched to bcache recently and not seen
> new "it's broken" reports.
>
> So, is it ok
> 1) to assume bcache and btrfs play ok together now?
> 2) remove the warning from that gotchas page?
>
> Thanks,
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>    what McDonalds is to gourmet 
> cooking
> Home page: http://marc.merlins.org/ | PGP 
> 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 25/45] bcache: use bio op accessors

2016-06-05 Thread Hannes Reinecke

On 06/05/2016 09:32 PM, mchri...@redhat.com wrote:
> From: Mike Christie 
> 
> Separate the op from the rq_flag_bits and have bcache
> set/get the bio using bio_set_op_attrs/bio_op.
> 
> Signed-off-by: Mike Christie 
> ---
>  drivers/md/bcache/btree.c |  4 ++--
>  drivers/md/bcache/debug.c     |  4 ++--
>  drivers/md/bcache/journal.c   |  7 ---
>  drivers/md/bcache/movinggc.c  |  2 +-
>  drivers/md/bcache/request.c   | 14 +++---
>  drivers/md/bcache/super.c | 24 +++++---
>  drivers/md/bcache/writeback.c |  4 ++--
>  7 files changed, 31 insertions(+), 28 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 19/37] bcache: use bio_init_fields in super

2021-01-18 Thread Chaitanya Kulkarni

Signed-off-by: Chaitanya Kulkarni 
---
 drivers/md/bcache/super.c | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index a4752ac410dc..b4ced138a0c0 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -312,9 +312,7 @@ void bch_write_bdev_super(struct cached_dev *dc, struct 
closure *parent)
closure_init(cl, parent);
 
bio_init(bio, dc->sb_bv, 1);
-   bio_set_dev(bio, dc->bdev);
-   bio->bi_end_io  = write_bdev_super_endio;
-   bio->bi_private = dc;
+   bio_init_fields(bio, dc->bdev, 0, dc, write_bdev_super_endio, 0, 0);
 
closure_get(cl);
/* I/O request sent to backing device */
@@ -356,9 +354,7 @@ void bcache_write_super(struct cache_set *c)
ca->sb.version = version;
 
bio_init(bio, ca->sb_bv, 1);
-   bio_set_dev(bio, ca->bdev);
-   bio->bi_end_io  = write_super_endio;
-   bio->bi_private = ca;
+   bio_init_fields(bio, ca->bdev, 0, ca, write_super_endio, 0, 0);
 
closure_get(cl);
__write_super(&ca->sb, ca->sb_disk, bio);
@@ -402,9 +398,7 @@ static void uuid_io(struct cache_set *c, int op, unsigned 
long op_flags,
 
bio->bi_opf = REQ_SYNC | REQ_META | op_flags;
bio->bi_iter.bi_size = KEY_SIZE(k) << 9;
-
-   bio->bi_end_io  = uuid_endio;
-   bio->bi_private = cl;
+   bio_init_fields(bio, NULL, 0, cl, uuid_endio, 0, 0);
bio_set_op_attrs(bio, op, REQ_SYNC|REQ_META|op_flags);
bch_bio_map(bio, c->uuids);
 
@@ -566,12 +560,9 @@ static void prio_io(struct cache *ca, uint64_t bucket, int 
op,
 
closure_init_stack(cl);
 
-   bio->bi_iter.bi_sector  = bucket * ca->sb.bucket_size;
-   bio_set_dev(bio, ca->bdev);
bio->bi_iter.bi_size= meta_bucket_bytes(&ca->sb);
-
-   bio->bi_end_io  = prio_endio;
-   bio->bi_private = ca;
+   bio_init_fields(bio, ca->bdev, bucket * ca->sb.bucket_size, ca,
+   prio_endio, 0, 0);
bio_set_op_attrs(bio, op, REQ_SYNC|REQ_META|op_flags);
bch_bio_map(bio, ca->disk_buckets);
 
-- 
2.22.1

[RFC PATCH 20/37] bcache: use bio_init_fields in writeback

2021-01-18 Thread Chaitanya Kulkarni

Signed-off-by: Chaitanya Kulkarni 
---
 drivers/md/bcache/writeback.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index a129e4d2707c..e2b769bbdb14 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -358,10 +358,8 @@ static void write_dirty(struct closure *cl)
if (KEY_DIRTY(&w->key)) {
dirty_init(w);
bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
-   io->bio.bi_iter.bi_sector = KEY_START(&w->key);
-   bio_set_dev(&io->bio, io->dc->bdev);
-   io->bio.bi_end_io   = dirty_endio;
-
+   bio_init_fields(&io->bio, io->dc->bdev, KEY_START(&w->key), 
NULL,
+   dirty_endio, 0, 0);
/* I/O request sent to backing device */
closure_bio_submit(io->dc->disk.c, &io->bio, cl);
}
@@ -471,10 +469,10 @@ static void read_dirty(struct cached_dev *dc)
 
dirty_init(w);
bio_set_op_attrs(&io->bio, REQ_OP_READ, 0);
-   io->bio.bi_iter.bi_sector = PTR_OFFSET(&w->key, 0);
-   bio_set_dev(&io->bio,
-   PTR_CACHE(dc->disk.c, &w->key, 0)->bdev);
-   io->bio.bi_end_io   = read_dirty_endio;
+   bio_init_fields(&io->bio,
+   PTR_CACHE(dc->disk.c, &w->key, 0)->bdev,
+   PTR_OFFSET(&w->key, 0), NULL,
+   read_dirty_endio, 0, 0);
 
if (bch_bio_alloc_pages(&io->bio, GFP_KERNEL))
goto err_free;
-- 
2.22.1

[RFC PATCH 18/37] bcache: use bio_init_fields in journal

2021-01-18 Thread Chaitanya Kulkarni

Signed-off-by: Chaitanya Kulkarni 
---
 drivers/md/bcache/journal.c | 21 -
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index aefbdb7e003b..0aabcb5cf2ad 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -54,12 +54,10 @@ reread: left = ca->sb.bucket_size - offset;
len = min_t(unsigned int, left, PAGE_SECTORS << JSET_BITS);
 
bio_reset(bio);
-   bio->bi_iter.bi_sector  = bucket + offset;
-   bio_set_dev(bio, ca->bdev);
+   bio_init_fields(bio, ca->bdev, bucket + offset,
+   &cl, journal_read_endio, 0, 0);
bio->bi_iter.bi_size= len << 9;
 
-   bio->bi_end_io  = journal_read_endio;
-   bio->bi_private = &cl;
bio_set_op_attrs(bio, REQ_OP_READ, 0);
bch_bio_map(bio, data);
 
@@ -588,6 +586,7 @@ static void do_journal_discard(struct cache *ca)
 {
struct journal_device *ja = &ca->journal;
struct bio *bio = &ja->discard_bio;
+   sector_t sect;
 
if (!ca->discard) {
ja->discard_idx = ja->last_idx;
@@ -613,12 +612,10 @@ static void do_journal_discard(struct cache *ca)
 
bio_init(bio, bio->bi_inline_vecs, 1);
bio_set_op_attrs(bio, REQ_OP_DISCARD, 0);
-   bio->bi_iter.bi_sector  = bucket_to_sector(ca->set,
-   ca->sb.d[ja->discard_idx]);
-   bio_set_dev(bio, ca->bdev);
bio->bi_iter.bi_size= bucket_bytes(ca);
-   bio->bi_end_io  = journal_discard_endio;
-
+   sect = bucket_to_sector(ca->set, ca->sb.d[ja->discard_idx]);
+   bio_init_fields(bio, ca->bdev, sect, NULL,
+   journal_discard_endio, 0, 0);
closure_get(&ca->set->cl);
INIT_WORK(&ja->discard_work, journal_discard_work);
queue_work(bch_journal_wq, &ja->discard_work);
@@ -774,12 +771,10 @@ static void journal_write_unlocked(struct closure *cl)
atomic_long_add(sectors, &ca->meta_sectors_written);
 
bio_reset(bio);
-   bio->bi_iter.bi_sector  = PTR_OFFSET(k, i);
-   bio_set_dev(bio, ca->bdev);
bio->bi_iter.bi_size = sectors << 9;
 
-   bio->bi_end_io  = journal_write_endio;
-   bio->bi_private = w;
+   bio_init_fields(bio, ca->bdev, PTR_OFFSET(k, i), w,
+   journal_write_endio, 0, 0);
bio_set_op_attrs(bio, REQ_OP_WRITE,
 REQ_SYNC|REQ_META|REQ_PREFLUSH|REQ_FUA);
bch_bio_map(bio, w->data);
-- 
2.22.1

corruption with multi-device btrfs + single bcache, won't mount

2019-02-09 Thread STEVE LEUNG

Hi all,

I decided to try something a bit crazy, and try multi-device raid1 btrfs on
top of dm-crypt and bcache.  That is:

  btrfs -> dm-crypt -> bcache -> physical disks

I have a single cache device in front of 4 disks.  Maybe this wasn't
that good of an idea, because the filesystem went read-only a few
days after setting it up, and now it won't mount.  I'd been running
btrfs on top of 4 dm-crypt-ed disks for some time without any
problems, and only added bcache (taking one device out at a time,
converting it over, adding it back) recently.

This was on Arch Linux x86-64, kernel 4.20.1.

dmesg from a mount attempt (using -o usebackuproot,nospace_cache,clear_cache):

  [  267.355024] BTRFS info (device dm-5): trying to use backup root at mount 
time
  [  267.355027] BTRFS info (device dm-5): force clearing of disk cache
  [  267.355030] BTRFS info (device dm-5): disabling disk space caching
  [  267.355032] BTRFS info (device dm-5): has skinny extents
  [  271.446808] BTRFS error (device dm-5): parent transid verify failed on 
13069706166272 wanted 4196588 found 4196585
  [  271.447485] BTRFS error (device dm-5): parent transid verify failed on 
13069706166272 wanted 4196588 found 4196585
  [  271.447491] BTRFS error (device dm-5): failed to read block groups: -5
  [  271.455868] BTRFS error (device dm-5): open_ctree failed

btrfs check:

  parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
  parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
  parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
  parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
  Ignoring transid failure
  ERROR: child eb corrupted: parent bytenr=13069708722176 item=7 parent level=2 
child level=0
  ERROR: cannot open file system

Any simple fix for the filesystem?  It'd be nice to recover the data
that's hopefully still intact.  I have some backups that I can dust
off if it really comes down to it, but it's more convenient to
recover the data in-place.

This is complete speculation, but I do wonder if having the single
cache device for multiple btrfs disks triggered the problem.

Thanks for any assistance.

Steve

Re: [PATCH 08/10] bcache: move closures to lib/

2018-05-18 Thread Christoph Hellwig

On Fri, May 18, 2018 at 03:49:13AM -0400, Kent Overstreet wrote:
> Prep work for bcachefs - being a fork of bcache it also uses closures

Hell no.  This code needs to go away and not actually be promoted to
lib/.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/12] bcache: convert to bioset_init()/mempool_init()

2018-05-20 Thread Coly Li

On 2018/5/21 6:25 AM, Kent Overstreet wrote:
> Signed-off-by: Kent Overstreet 

Hi Kent,

This change looks good to me,

Reviewed-by: Coly Li 

Thanks.

Coly Li

> ---
>  drivers/md/bcache/bcache.h  | 10 +-
>  drivers/md/bcache/bset.c| 13 -
>  drivers/md/bcache/bset.h|  2 +-
>  drivers/md/bcache/btree.c   |  4 ++--
>  drivers/md/bcache/io.c  |  4 ++--
>  drivers/md/bcache/request.c | 18 +-----
>  drivers/md/bcache/super.c   | 38 ++---
>  7 files changed, 37 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 3a0cfb237a..3050438761 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -269,7 +269,7 @@ struct bcache_device {
>   atomic_t*stripe_sectors_dirty;
>   unsigned long   *full_dirty_stripes;
>  
> - struct bio_set  *bio_split;
> + struct bio_set  bio_split;
>  
>   unsigneddata_csum:1;
>  
> @@ -528,9 +528,9 @@ struct cache_set {
>   struct closure  sb_write;
>   struct semaphoresb_write_mutex;
>  
> - mempool_t   *search;
> - mempool_t   *bio_meta;
> - struct bio_set  *bio_split;
> + mempool_t   search;
> + mempool_t   bio_meta;
> + struct bio_set  bio_split;
>  
>   /* For the btree cache */
>   struct shrinker shrink;
> @@ -655,7 +655,7 @@ struct cache_set {
>* A btree node on disk could have too many bsets for an iterator to fit
>* on the stack - have to dynamically allocate them
>*/
> - mempool_t   *fill_iter;
> + mempool_t   fill_iter;
>  
>   struct bset_sort_state  sort;
>  
> diff --git a/drivers/md/bcache/bset.c b/drivers/md/bcache/bset.c
> index 579c696a5f..f3403b45bc 100644
> --- a/drivers/md/bcache/bset.c
> +++ b/drivers/md/bcache/bset.c
> @@ -1118,8 +1118,7 @@ struct bkey *bch_btree_iter_next_filter(struct 
> btree_iter *iter,
>  
>  void bch_bset_sort_state_free(struct bset_sort_state *state)
>  {
> - if (state->pool)
> - mempool_destroy(state->pool);
> + mempool_exit(&state->pool);
>  }
>  
>  int bch_bset_sort_state_init(struct bset_sort_state *state, unsigned 
> page_order)
> @@ -1129,11 +1128,7 @@ int bch_bset_sort_state_init(struct bset_sort_state 
> *state, unsigned page_order)
>   state->page_order = page_order;
>   state->crit_factor = int_sqrt(1 << page_order);
>  
> - state->pool = mempool_create_page_pool(1, page_order);
> - if (!state->pool)
> - return -ENOMEM;
> -
> - return 0;
> + return mempool_init_page_pool(&state->pool, 1, page_order);
>  }
>  EXPORT_SYMBOL(bch_bset_sort_state_init);
>  
> @@ -1191,7 +1186,7 @@ static void __btree_sort(struct btree_keys *b, struct 
> btree_iter *iter,
>  
>   BUG_ON(order > state->page_order);
>  
> - outp = mempool_alloc(state->pool, GFP_NOIO);
> + outp = mempool_alloc(&state->pool, GFP_NOIO);
>   out = page_address(outp);
>   used_mempool = true;
>   order = state->page_order;
> @@ -1220,7 +1215,7 @@ static void __btree_sort(struct btree_keys *b, struct 
> btree_iter *iter,
>   }
>  
>   if (used_mempool)
> - mempool_free(virt_to_page(out), state->pool);
> + mempool_free(virt_to_page(out), &state->pool);
>   else
>   free_pages((unsigned long) out, order);
>  
> diff --git a/drivers/md/bcache/bset.h b/drivers/md/bcache/bset.h
> index 0c24280f3b..b867f22004 100644
> --- a/drivers/md/bcache/bset.h
> +++ b/drivers/md/bcache/bset.h
> @@ -347,7 +347,7 @@ static inline struct bkey *bch_bset_search(struct 
> btree_keys *b,
>  /* Sorting */
>  
>  struct bset_sort_state {
> - mempool_t   *pool;
> + mempool_t   pool;
>  
>   unsignedpage_order;
>   unsignedcrit_factor;
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index 17936b2dc7..2a0968c04e 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -204,7 +204,7 @@ void bch_btree_node_read_done(struct btree *b)
>   struct bset *i = btree_bset_first(b);
>   struct btree_iter *iter;
>  
> - iter = mempool_alloc(b->c->fill_iter, GFP_NOIO);
> + iter = mempool_alloc(&b->c->fill_iter, GFP_NOIO);
>   iter->size = b->c-&g

[PATCH 8/9] bcache: generic_make_request() handles large bios now

2013-11-04 Thread Kent Overstreet

So we get to delete our hacky workaround.

Signed-off-by: Kent Overstreet 
---
 drivers/md/bcache/bcache.h|  18 
 drivers/md/bcache/io.c| 100 +-
 drivers/md/bcache/journal.c   |   4 +-
 drivers/md/bcache/request.c   |  16 +++
 drivers/md/bcache/super.c |  33 ++
 drivers/md/bcache/util.h  |   5 ++-
 drivers/md/bcache/writeback.c |   4 +-
 include/linux/bio.h   |  12 -
 8 files changed, 19 insertions(+), 173 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 964353c..8f65331 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -241,19 +241,6 @@ struct keybuf {
DECLARE_ARRAY_ALLOCATOR(struct keybuf_key, freelist, KEYBUF_NR);
 };
 
-struct bio_split_pool {
-   struct bio_set  *bio_split;
-   mempool_t   *bio_split_hook;
-};
-
-struct bio_split_hook {
-   struct closure  cl;
-   struct bio_split_pool   *p;
-   struct bio  *bio;
-   bio_end_io_t*bi_end_io;
-   void*bi_private;
-};
-
 struct bcache_device {
struct closure  cl;
 
@@ -286,8 +273,6 @@ struct bcache_device {
int (*cache_miss)(struct btree *, struct search *,
  struct bio *, unsigned);
int (*ioctl) (struct bcache_device *, fmode_t, unsigned, unsigned long);
-
-   struct bio_split_pool   bio_split_hook;
 };
 
 struct io {
@@ -465,8 +450,6 @@ struct cache {
atomic_long_t   meta_sectors_written;
atomic_long_t   btree_sectors_written;
atomic_long_t   sectors_written;
-
-   struct bio_split_pool   bio_split_hook;
 };
 
 struct gc_stat {
@@ -901,7 +884,6 @@ void bch_bbio_endio(struct cache_set *, struct bio *, int, 
const char *);
 void bch_bbio_free(struct bio *, struct cache_set *);
 struct bio *bch_bbio_alloc(struct cache_set *);
 
-void bch_generic_make_request(struct bio *, struct bio_split_pool *);
 void __bch_submit_bbio(struct bio *, struct cache_set *);
 void bch_submit_bbio(struct bio *, struct cache_set *, struct bkey *, 
unsigned);
 
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index fa028fa..86a0bb8 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -11,104 +11,6 @@
 
 #include 
 
-static unsigned bch_bio_max_sectors(struct bio *bio)
-{
-   struct request_queue *q = bdev_get_queue(bio->bi_bdev);
-   struct bio_vec bv;
-   struct bvec_iter iter;
-   unsigned ret = 0, seg = 0;
-
-   if (bio->bi_rw & REQ_DISCARD)
-   return min(bio_sectors(bio), q->limits.max_discard_sectors);
-
-   bio_for_each_segment(bv, bio, iter) {
-   struct bvec_merge_data bvm = {
-   .bi_bdev= bio->bi_bdev,
-   .bi_sector  = bio->bi_iter.bi_sector,
-   .bi_size= ret << 9,
-   .bi_rw  = bio->bi_rw,
-   };
-
-   if (seg == min_t(unsigned, BIO_MAX_PAGES,
-queue_max_segments(q)))
-   break;
-
-   if (q->merge_bvec_fn &&
-   q->merge_bvec_fn(q, &bvm, &bv) < (int) bv.bv_len)
-   break;
-
-   seg++;
-   ret += bv.bv_len >> 9;
-   }
-
-   ret = min(ret, queue_max_sectors(q));
-
-   WARN_ON(!ret);
-   ret = max_t(int, ret, bio_iovec(bio).bv_len >> 9);
-
-   return ret;
-}
-
-static void bch_bio_submit_split_done(struct closure *cl)
-{
-   struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
-   s->bio->bi_end_io = s->bi_end_io;
-   s->bio->bi_private = s->bi_private;
-   bio_endio_nodec(s->bio, 0);
-
-   closure_debug_destroy(&s->cl);
-   mempool_free(s, s->p->bio_split_hook);
-}
-
-static void bch_bio_submit_split_endio(struct bio *bio, int error)
-{
-   struct closure *cl = bio->bi_private;
-   struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
-   if (error)
-   clear_bit(BIO_UPTODATE, &s->bio->bi_flags);
-
-   bio_put(bio);
-   closure_put(cl);
-}
-
-void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p)
-{
-   struct bio_split_hook *s;
-   struct bio *n;
-
-   if (!bio_has_data(bio) && !(bio->bi_rw & REQ_DISCARD))
-   goto submit;
-
-   if (bio_sectors(bio) <= bch_bio_max_sectors(bio))
-   goto submit;
-
-   s = mempool_alloc(p->bio_split_hook, GFP_NOIO);
-   closure_init(&s->cl, NULL);
-
-   s->bio  = bio;
-   s->p= p;
-   s->bi_end_io= bio->bi_end_io;
-   s->bi_private   = bio->bi_private;
-

bcache with SSD instead of battery powered raid cards

2012-03-13 Thread Kiran Patil

Hi,

Is anybody using bcache with SSD instead of battery powered raid cards
with Btrfs ?

Hard drives are cheap and big, SSDs are fast but small and expensive.
Wouldn't it be nice if you could transparently get the advantages of
both? With Bcache, you can have your cake and eat it too.

Bcache is a patch for the Linux kernel to use SSDs to cache other
block devices. It's analogous to L2Arc for ZFS, but Bcache also does
writeback caching, and it's filesystem agnostic. It's designed to be
switched on with a minimum of effort, and to work well without
configuration on any setup. By default it won't cache sequential IO,
just the random reads and writes that SSDs excel at. It's meant to be
suitable for desktops, servers, high end storage arrays, and perhaps
even embedded.

http://bcache.evilpiepirate.org/

http://news.gmane.org/gmane.linux.kernel.bcache.devel

Thanks,
Kiran.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Kernel 4.14 RAID5 multi disk array on bcache not mounting

2017-11-21 Thread Andy Leadbetter

I have a 4 disk array on top of 120GB bcache setup, arranged as follows

dev/sda1: UUID="42AE-12E3" TYPE="vfat" PARTLABEL="EFI System"
PARTUUID="d337c56a-fb0f-4e87-8d5f-a89122c81167"
/dev/sda2: UUID="06e3ce52-f34a-409a-a143-3c04f1d334ff" TYPE="ext4"
PARTLABEL="Linux filesystem"
PARTUUID="d2d3fa93-eebf-41ab-8162-d81722bf47ec"
/dev/sda4: UUID="b729c490-81f0-461f-baa2-977af9a7b6d9" TYPE="bcache"
PARTLABEL="Linux filesystem"
PARTUUID="84548857-f504-440a-857f-c0838c1eb83d"
/dev/sdb1: UUID="6016277c-143d-46b4-ae4e-8565ffc8158f" TYPE="swap"
PARTLABEL="Linux swap" PARTUUID="8692bf67-7271-4bf6-a623-b79d74093f2c"
/dev/sdb2: UUID="bc93c5e2-705a-4cbe-bcd9-7be1181163b2" TYPE="bcache"
PARTLABEL="Linux filesystem"
PARTUUID="662a450b-3592-4929-9647-8e8a1dedae69"
/dev/sdc1: UUID="9df21d4e-de02-4000-b684-5fb95d4d0492" TYPE="swap"
PARTLABEL="Linux swap" PARTUUID="ed9d7b8e-5480-4e70-b983-1a350ecae38a"
/dev/sdc2: UUID="7d8feaf6-aa6a-4b13-af49-0ad1bd1efb64" TYPE="bcache"
PARTLABEL="Linux filesystem"
PARTUUID="d343e23a-39ed-4061-80a2-55b66e20ecc1"
/dev/sdd1: UUID="18defba2-594b-402e-b3b2-8e38035c624d" TYPE="swap"
PARTLABEL="Linux swap" PARTUUID="fed9ffd6-0480-4496-8e6d-02d263d719b7"
/dev/sdd2: UUID="be0f0381-0d7e-46c9-ad04-01415bfc6f61" TYPE="bcache"
PARTLABEL="Linux filesystem"
PARTUUID="8f56de8a-105f-4d56-b699-59e1215b3c6b"
/dev/bcache32: UUID="38d5de43-28fb-40a9-a535-dbf17ff52e75"
UUID_SUB="731c31f1-51dd-477a-9bd1-fac73d0e6f69" TYPE="btrfs"
/dev/sde: UUID="05514ad3-d90a-4e90-aa11-7c6d34515ca2" TYPE="bcache"
/dev/bcache16: UUID="38d5de43-28fb-40a9-a535-dbf17ff52e75"
UUID_SUB="79cbcaf1-40b9-4954-a977-537ed3310e76" TYPE="btrfs"
/dev/bcache0: UUID="38d5de43-28fb-40a9-a535-dbf17ff52e75"
UUID_SUB="42d3a0dd-fbec-4318-9a5b-6d96aa1f6328" TYPE="btrfs"
/dev/bcache48: UUID="38d5de43-28fb-40a9-a535-dbf17ff52e75"
UUID_SUB="cb7018d6-a27d-493e-b41f-e45c64f6873a" TYPE="btrfs"
/dev/sda3: PARTUUID="d9fa3100-5044-4e10-9f2f-f8037786a43f"


ubuntu 17.10 with PPA Kernels up to 4.13.x all mount this array
perfectly, and the performance of the cache is as expected.

Upgraded today to 4.14.1 from their PPA and the

running btrfs dev scan finds the btrfs filesystem devices bcache16 and
bcache32, bcache0 and bcache48 are not recognised, and thus the file
system will not mount.

according bcache all devices are present, and attached to the cache
device correctly.

btrfs fi on Kernel 4.13 gives

Label: none  uuid: 38d5de43-28fb-40a9-a535-dbf17ff52e75
Total devices 4 FS bytes used 2.03TiB
devid1 size 1.82TiB used 1.07TiB path /dev/bcache16
devid2 size 1.82TiB used 1.07TiB path /dev/bcache32
devid3 size 1.82TiB used 1.07TiB path /dev/bcache0

devid4 size 1.82TiB used 1.07TiB path /dev/bcache48


Where do I start in debugging this?

btrfs-progs v4.12

btrfs fi df /

Data, RAID5: total=3.20TiB, used=2.02TiB
System, RAID5: total=192.00MiB, used=288.00KiB
Metadata, RAID5: total=6.09GiB, used=3.69GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

There are no errors in the dmesg that I can see from btrfs scan,
simply the two devices are not found.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: corruption with multi-device btrfs + single bcache, won't mount

2019-02-10 Thread Thiago Ramon

On Sun, Feb 10, 2019 at 5:07 AM STEVE LEUNG  wrote:
>
> Hi all,
>
> I decided to try something a bit crazy, and try multi-device raid1 btrfs on
> top of dm-crypt and bcache.  That is:
>
>   btrfs -> dm-crypt -> bcache -> physical disks
>
> I have a single cache device in front of 4 disks.  Maybe this wasn't
> that good of an idea, because the filesystem went read-only a few
> days after setting it up, and now it won't mount.  I'd been running
> btrfs on top of 4 dm-crypt-ed disks for some time without any
> problems, and only added bcache (taking one device out at a time,
> converting it over, adding it back) recently.
>
> This was on Arch Linux x86-64, kernel 4.20.1.
>
> dmesg from a mount attempt (using -o usebackuproot,nospace_cache,clear_cache):
>
>   [  267.355024] BTRFS info (device dm-5): trying to use backup root at mount 
> time
>   [  267.355027] BTRFS info (device dm-5): force clearing of disk cache
>   [  267.355030] BTRFS info (device dm-5): disabling disk space caching
>   [  267.355032] BTRFS info (device dm-5): has skinny extents
>   [  271.446808] BTRFS error (device dm-5): parent transid verify failed on 
> 13069706166272 wanted 4196588 found 4196585
>   [  271.447485] BTRFS error (device dm-5): parent transid verify failed on 
> 13069706166272 wanted 4196588 found 4196585
>   [  271.447491] BTRFS error (device dm-5): failed to read block groups: -5
>   [  271.455868] BTRFS error (device dm-5): open_ctree failed
>
> btrfs check:
>
>   parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
>   parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
>   parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
>   parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
>   Ignoring transid failure
>   ERROR: child eb corrupted: parent bytenr=13069708722176 item=7 parent 
> level=2 child level=0
>   ERROR: cannot open file system
>
> Any simple fix for the filesystem?  It'd be nice to recover the data
> that's hopefully still intact.  I have some backups that I can dust
> off if it really comes down to it, but it's more convenient to
> recover the data in-place.
>
> This is complete speculation, but I do wonder if having the single
> cache device for multiple btrfs disks triggered the problem.
No, having a single cache device with multiple backing devices is the
most common way to use bcache. I've used a setup similar to yours for
a couple of years without problems (until it broke down recently due
to other issues).

Your current filesystem is probably too damaged to properly repair
right now (some other people here might be able to help with that),
but you probably haven't lost much of what's in there. You can dump
the files out with "btrfs restore", or you can use a patch to allow
you to mount the damaged filesystem read-only
(https://patchwork.kernel.org/patch/10738583/).

But before you try to restore anything, can you go back in your kernel
logs and check for errors? Either one of your devices is failing, you
might have physical link issues or bad memory. Even with a complex
setup like this you shouldn't be getting random corruption like this.

>
> Thanks for any assistance.
>
> Steve

Re: corruption with multi-device btrfs + single bcache, won't mount

2019-02-10 Thread Qu Wenruo



On 2019/2/10 下午2:56, STEVE LEUNG wrote:
> Hi all,
> 
> I decided to try something a bit crazy, and try multi-device raid1 btrfs on
> top of dm-crypt and bcache.  That is:
> 
>   btrfs -> dm-crypt -> bcache -> physical disks
> 
> I have a single cache device in front of 4 disks.  Maybe this wasn't
> that good of an idea, because the filesystem went read-only a few
> days after setting it up, and now it won't mount.  I'd been running
> btrfs on top of 4 dm-crypt-ed disks for some time without any
> problems, and only added bcache (taking one device out at a time,
> converting it over, adding it back) recently.
> 
> This was on Arch Linux x86-64, kernel 4.20.1.
> 
> dmesg from a mount attempt (using -o usebackuproot,nospace_cache,clear_cache):
> 
>   [  267.355024] BTRFS info (device dm-5): trying to use backup root at mount 
> time
>   [  267.355027] BTRFS info (device dm-5): force clearing of disk cache
>   [  267.355030] BTRFS info (device dm-5): disabling disk space caching
>   [  267.355032] BTRFS info (device dm-5): has skinny extents
>   [  271.446808] BTRFS error (device dm-5): parent transid verify failed on 
> 13069706166272 wanted 4196588 found 4196585
>   [  271.447485] BTRFS error (device dm-5): parent transid verify failed on 
> 13069706166272 wanted 4196588 found 4196585

When this happens, there is no good way to completely recover (btrfs
check pass after the recovery) the fs.

We should enhance btrfs-progs to handle it, but it will take some time.

>   [  271.447491] BTRFS error (device dm-5): failed to read block groups: -5
>   [  271.455868] BTRFS error (device dm-5): open_ctree failed
> 
> btrfs check:
> 
>   parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
>   parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
>   parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
>   parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
>   Ignoring transid failure
>   ERROR: child eb corrupted: parent bytenr=13069708722176 item=7 parent 
> level=2 child level=0
>   ERROR: cannot open file system
> 
> Any simple fix for the filesystem?  It'd be nice to recover the data
> that's hopefully still intact.  I have some backups that I can dust
> off if it really comes down to it, but it's more convenient to
> recover the data in-place.

However there is a patch to address this kinda "common" corruption scenario.

https://lwn.net/Articles/777265/

In that patchset, there is a new rescue=bg_skip mount option (needs to
be used with ro), which should allow you to access whatever you still
have from the fs.

From other reporters, such corruption is mainly related to extent tree,
thus data damage should be pretty small.

Thanks,
Qu

> 
> This is complete speculation, but I do wonder if having the single
> cache device for multiple btrfs disks triggered the problem.
> 
> Thanks for any assistance.
> 
> Steve
> 



signature.asc
Description: OpenPGP digital signature

Re: corruption with multi-device btrfs + single bcache, won't mount

2019-02-10 Thread STEVE LEUNG

- Original Message -
> From: "Thiago Ramon" 
> On Sun, Feb 10, 2019 at 5:07 AM STEVE LEUNG  wrote:
>>
>> Hi all,
>>
>> I decided to try something a bit crazy, and try multi-device raid1 btrfs on
>> top of dm-crypt and bcache.  That is:
>>
>>   btrfs -> dm-crypt -> bcache -> physical disks
>>
>> I have a single cache device in front of 4 disks.  Maybe this wasn't
>> that good of an idea, because the filesystem went read-only a few
>> days after setting it up, and now it won't mount.  I'd been running
>> btrfs on top of 4 dm-crypt-ed disks for some time without any
>> problems, and only added bcache (taking one device out at a time,
>> converting it over, adding it back) recently.
>>
>> This is complete speculation, but I do wonder if having the single
>> cache device for multiple btrfs disks triggered the problem.
> 
> But before you try to restore anything, can you go back in your kernel
> logs and check for errors? Either one of your devices is failing, you
> might have physical link issues or bad memory. Even with a complex
> setup like this you shouldn't be getting random corruption like this.

Indeed, it looks like plugging in the 5th device for caching may have
destabilized things (maybe I'm drawing too much power from the power
supply or something), as I've observed some spurious ATA errors
trying to boot from rescue media.  Things seem to go back to normal
if I take the cache device out.

This hardware is old, but has seemed reliable enough.  Although that
said, this is my second btrfs corruption I've run into (fortunately
no data lost), so maybe the hardware is not as solid as I'd thought.

I guess I should have given it more of a shakedown before rolling out
bcache everywhere.  :)  Thanks for the insight.

Steve

Re: corruption with multi-device btrfs + single bcache, won't mount

2019-02-10 Thread STEVE LEUNG




- Original Message -
> From: "Qu Wenruo" 
> On 2019/2/10 下午2:56, STEVE LEUNG wrote:
>> Hi all,
>> 
>> I decided to try something a bit crazy, and try multi-device raid1 btrfs on
>> top of dm-crypt and bcache.  That is:
>> 
>>   btrfs -> dm-crypt -> bcache -> physical disks
>> 
>> I have a single cache device in front of 4 disks.  Maybe this wasn't
>> that good of an idea, because the filesystem went read-only a few
>> days after setting it up, and now it won't mount.  I'd been running
>> btrfs on top of 4 dm-crypt-ed disks for some time without any
>> problems, and only added bcache (taking one device out at a time,
>> converting it over, adding it back) recently.
>> 
> However there is a patch to address this kinda "common" corruption scenario.
> 
> https://lwn.net/Articles/777265/
> 
> In that patchset, there is a new rescue=bg_skip mount option (needs to
> be used with ro), which should allow you to access whatever you still
> have from the fs.
> 
> From other reporters, such corruption is mainly related to extent tree,
> thus data damage should be pretty small.

I can also report that this patch has allowed me to recover the data.
The devices were apparently flaky with the addition of the cache
device to the system, which explains why the filesystem got
corrupted.

Thanks very much for the help!

Steve

Re: corruption with multi-device btrfs + single bcache, won't mount

2019-02-11 Thread Steve Leung




- Original Message -
> From: "Qu Wenruo" 
> To: "STEVE LEUNG" , linux-btrfs@vger.kernel.org
> Sent: Sunday, February 10, 2019 6:52:23 AM
> Subject: Re: corruption with multi-device btrfs + single bcache, won't mount

> - Original Message -
> From: "Qu Wenruo" 
> On 2019/2/10 下午2:56, STEVE LEUNG wrote:
>> Hi all,
>> 
>> I decided to try something a bit crazy, and try multi-device raid1 btrfs on
>> top of dm-crypt and bcache.  That is:
>> 
>>   btrfs -> dm-crypt -> bcache -> physical disks
>> 
>> I have a single cache device in front of 4 disks.  Maybe this wasn't
>> that good of an idea, because the filesystem went read-only a few
>> days after setting it up, and now it won't mount.  I'd been running
>> btrfs on top of 4 dm-crypt-ed disks for some time without any
>> problems, and only added bcache (taking one device out at a time,
>> converting it over, adding it back) recently.
>> 
>> This was on Arch Linux x86-64, kernel 4.20.1.
>> 
>> dmesg from a mount attempt (using -o 
>> usebackuproot,nospace_cache,clear_cache):
>> 
>>   [  267.355024] BTRFS info (device dm-5): trying to use backup root at 
>> mount time
>>   [  267.355027] BTRFS info (device dm-5): force clearing of disk cache
>>   [  267.355030] BTRFS info (device dm-5): disabling disk space caching
>>   [  267.355032] BTRFS info (device dm-5): has skinny extents
>>   [  271.446808] BTRFS error (device dm-5): parent transid verify failed on
>>   13069706166272 wanted 4196588 found 4196585
>>   [  271.447485] BTRFS error (device dm-5): parent transid verify failed on
>>   13069706166272 wanted 4196588 found 4196585
> 
> When this happens, there is no good way to completely recover (btrfs
> check pass after the recovery) the fs.
> 
> We should enhance btrfs-progs to handle it, but it will take some time.
> 
>>   [  271.447491] BTRFS error (device dm-5): failed to read block groups: -5
>>   [  271.455868] BTRFS error (device dm-5): open_ctree failed
>> 
>> btrfs check:
>> 
>>   parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
>>   parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
>>   parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
>>   parent transid verify failed on 13069706166272 wanted 4196588 found 4196585
>>   Ignoring transid failure
>>   ERROR: child eb corrupted: parent bytenr=13069708722176 item=7 parent 
>> level=2
>>   child level=0
>>   ERROR: cannot open file system
>> 
>> Any simple fix for the filesystem?  It'd be nice to recover the data
>> that's hopefully still intact.  I have some backups that I can dust
>> off if it really comes down to it, but it's more convenient to
>> recover the data in-place.
> 
> However there is a patch to address this kinda "common" corruption scenario.
> 
> https://lwn.net/Articles/777265/
> 
> In that patchset, there is a new rescue=bg_skip mount option (needs to
> be used with ro), which should allow you to access whatever you still
> have from the fs.
> 
> From other reporters, such corruption is mainly related to extent tree,
> thus data damage should be pretty small.

Ok I think I spoke too soon.  Some files are recoverable, but many
cannot be read.  Userspace gets back an I/O error, and the kernel log
reports similar parent transid verify failed errors, with what seem
to be similar generation numbers to what I saw in my original mount
error.

i.e. wants 4196588, found something that's off by usually 2 or 3.
Occasionally there's one that's off by about 1300.

There are multiple snapshots on this filesystem (going back a few
days), and the same file in each snapshot seems to be equally
affected, even if the file hasn't changed in many months.

Metadata seems to be intact - I can stat every file in one of the
snapshots and I don't get any errors back.

Any other ideas?  It kind of seems like "btrfs restore" would be
suitable here, but it sounds like it would need to be taught about
rescue=bg_skip first.

Thanks for all the help.  Even a partial recovery is a lot better
than what I was facing before.

Steve

Re: corruption with multi-device btrfs + single bcache, won't mount

2019-02-11 Thread Qu Wenruo



On 2019/2/12 下午2:22, Steve Leung wrote:
> 
> 
> - Original Message -
>> From: "Qu Wenruo" 
>> To: "STEVE LEUNG" , linux-btrfs@vger.kernel.org
>> Sent: Sunday, February 10, 2019 6:52:23 AM
>> Subject: Re: corruption with multi-device btrfs + single bcache, won't mount
> 
>> - Original Message -
>> From: "Qu Wenruo" 
>> On 2019/2/10 下午2:56, STEVE LEUNG wrote:
>>> Hi all,
>>>
>>> I decided to try something a bit crazy, and try multi-device raid1 btrfs on
>>> top of dm-crypt and bcache.  That is:
>>>
>>>   btrfs -> dm-crypt -> bcache -> physical disks
>>>
>>> I have a single cache device in front of 4 disks.  Maybe this wasn't
>>> that good of an idea, because the filesystem went read-only a few
>>> days after setting it up, and now it won't mount.  I'd been running
>>> btrfs on top of 4 dm-crypt-ed disks for some time without any
>>> problems, and only added bcache (taking one device out at a time,
>>> converting it over, adding it back) recently.
>>>
>>> This was on Arch Linux x86-64, kernel 4.20.1.
>>>
>>> dmesg from a mount attempt (using -o 
>>> usebackuproot,nospace_cache,clear_cache):
>>>
>>>   [  267.355024] BTRFS info (device dm-5): trying to use backup root at 
>>> mount time
>>>   [  267.355027] BTRFS info (device dm-5): force clearing of disk cache
>>>   [  267.355030] BTRFS info (device dm-5): disabling disk space caching
>>>   [  267.355032] BTRFS info (device dm-5): has skinny extents
>>>   [  271.446808] BTRFS error (device dm-5): parent transid verify failed on
>>>   13069706166272 wanted 4196588 found 4196585
>>>   [  271.447485] BTRFS error (device dm-5): parent transid verify failed on
>>>   13069706166272 wanted 4196588 found 4196585
>>
>> When this happens, there is no good way to completely recover (btrfs
>> check pass after the recovery) the fs.
>>
>> We should enhance btrfs-progs to handle it, but it will take some time.
>>
>>>   [  271.447491] BTRFS error (device dm-5): failed to read block groups: -5
>>>   [  271.455868] BTRFS error (device dm-5): open_ctree failed
>>>
>>> btrfs check:
>>>
>>>   parent transid verify failed on 13069706166272 wanted 4196588 found 
>>> 4196585
>>>   parent transid verify failed on 13069706166272 wanted 4196588 found 
>>> 4196585
>>>   parent transid verify failed on 13069706166272 wanted 4196588 found 
>>> 4196585
>>>   parent transid verify failed on 13069706166272 wanted 4196588 found 
>>> 4196585
>>>   Ignoring transid failure
>>>   ERROR: child eb corrupted: parent bytenr=13069708722176 item=7 parent 
>>> level=2
>>>   child level=0
>>>   ERROR: cannot open file system
>>>
>>> Any simple fix for the filesystem?  It'd be nice to recover the data
>>> that's hopefully still intact.  I have some backups that I can dust
>>> off if it really comes down to it, but it's more convenient to
>>> recover the data in-place.
>>
>> However there is a patch to address this kinda "common" corruption scenario.
>>
>> https://lwn.net/Articles/777265/
>>
>> In that patchset, there is a new rescue=bg_skip mount option (needs to
>> be used with ro), which should allow you to access whatever you still
>> have from the fs.
>>
>> From other reporters, such corruption is mainly related to extent tree,
>> thus data damage should be pretty small.
> 
> Ok I think I spoke too soon.  Some files are recoverable, but many
> cannot be read.  Userspace gets back an I/O error, and the kernel log
> reports similar parent transid verify failed errors, with what seem
> to be similar generation numbers to what I saw in my original mount
> error.
> 
> i.e. wants 4196588, found something that's off by usually 2 or 3.
> Occasionally there's one that's off by about 1300.

That's more or less expected for such transid corruption.
The fs is already screwed up.

The lowest generation you found during all these error message could be
when the first corruption happens.
(And it may date back to very old days)

> 
> There are multiple snapshots on this filesystem (going back a few
> days), and the same file in each snapshot seems to be equally
> affected, even if the file hasn't changed in many months.>
> Metadata seems to be intact - I can stat every file in one of the
> snapshots and I don't get any errors back.
> 
> Any other ideas?  It kind of seems like "btrfs restore" would be
> suitable here, but it sounds like it would need to be taught about
> rescue=bg_skip first.

Since v4.16.1, btrfs-restore should be OK to ignore extent tree
completely, thus you can try btrfs-restore.

Btrfs-restore may be a little better, since it can ignore csum errors
completely.

Thanks,
Qu

> 
> Thanks for all the help.  Even a partial recovery is a lot better
> than what I was facing before.
> 
> Steve
> 



signature.asc
Description: OpenPGP digital signature

Re: bcache with SSD instead of battery powered raid cards

2012-03-20 Thread Justin Sharp


On 03/13/2012 04:06 AM, Kiran Patil wrote:

Hi,

Is anybody using bcache with SSD instead of battery powered raid cards
with Btrfs ?

Hard drives are cheap and big, SSDs are fast but small and expensive.
Wouldn't it be nice if you could transparently get the advantages of
both? With Bcache, you can have your cake and eat it too.

Bcache is a patch for the Linux kernel to use SSDs to cache other
block devices. It's analogous to L2Arc for ZFS, but Bcache also does
writeback caching, and it's filesystem agnostic. It's designed to be
switched on with a minimum of effort, and to work well without
configuration on any setup. By default it won't cache sequential IO,
just the random reads and writes that SSDs excel at. It's meant to be
suitable for desktops, servers, high end storage arrays, and perhaps
even embedded.

http://bcache.evilpiepirate.org/

Did you ever experiment with this? What results did you find?

There is also something similar called flashcache developed by some 
facebook engineer that I'm interested in trying. They are supposedly 
using this to speed up mysql+innodb. It is out of mainline tree code 
though, and I don't think there is much of an effort to get it in. It 
supports writeback, writethrough and writearound (blocks are never 
cached on write, only on read) caching. It uses dm-mapper to combine 
your 'cache block' with your 'slow spinning block' and then you put your 
filesystem on top of that dm device. https://github.com/facebook/flashcache


Regards,
--Justin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/45] bcache: use op_is_write instead of checking for REQ_WRITE

2016-06-05 Thread mchristi

From: Mike Christie 

We currently set REQ_WRITE/WRITE for all non READ IOs
like discard, flush, writesame, etc. In the next patches where we
no longer set up the op as a bitmap, we will not be able to
detect a operation direction like writesame by testing if REQ_WRITE is
set.

This has bcache use the op_is_write helper which will do the right
thing.

Signed-off-by: Mike Christie 
---
 drivers/md/bcache/io.c  | 2 +-
 drivers/md/bcache/request.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 86a0bb8..fd885cc 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -111,7 +111,7 @@ void bch_bbio_count_io_errors(struct cache_set *c, struct 
bio *bio,
struct bbio *b = container_of(bio, struct bbio, bio);
struct cache *ca = PTR_CACHE(c, &b->key, 0);
 
-   unsigned threshold = bio->bi_rw & REQ_WRITE
+   unsigned threshold = op_is_write(bio_op(bio))
? c->congested_write_threshold_us
: c->congested_read_threshold_us;
 
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 25fa844..6b85a23 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -383,7 +383,7 @@ static bool check_should_bypass(struct cached_dev *dc, 
struct bio *bio)
 
if (mode == CACHE_MODE_NONE ||
(mode == CACHE_MODE_WRITEAROUND &&
-(bio->bi_rw & REQ_WRITE)))
+op_is_write(bio_op(bio
goto skip;
 
if (bio->bi_iter.bi_sector & (c->sb.block_size - 1) ||
@@ -404,7 +404,7 @@ static bool check_should_bypass(struct cached_dev *dc, 
struct bio *bio)
 
if (!congested &&
mode == CACHE_MODE_WRITEBACK &&
-   (bio->bi_rw & REQ_WRITE) &&
+   op_is_write(bio_op(bio)) &&
(bio->bi_rw & REQ_SYNC))
goto rescale;
 
@@ -657,7 +657,7 @@ static inline struct search *search_alloc(struct bio *bio,
s->cache_miss   = NULL;
s->d= d;
s->recoverable  = 1;
-   s->write= (bio->bi_rw & REQ_WRITE) != 0;
+   s->write= op_is_write(bio_op(bio));
s->read_dirty_data  = 0;
s->start_time   = jiffies;
 
-- 
2.7.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V10 11/19] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2018-11-15 Thread Ming Lei

bch_bio_alloc_pages() is always called on one new bio, so it is safe
to access the bvec table directly. Given it is the only kind of this
case, open code the bvec table access since bio_for_each_segment_all()
will be changed to support for iterating over multipage bvec.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Acked-by: Coly Li 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-btrfs@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-de...@redhat.com
Signed-off-by: Ming Lei 
---
 drivers/md/bcache/util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index 20eddeac1531..8517aebcda2d 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -270,7 +270,7 @@ int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
int i;
struct bio_vec *bv;
 
-   bio_for_each_segment_all(bv, bio, i) {
+   for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++) {
bv->bv_page = alloc_page(gfp_mask);
if (!bv->bv_page) {
while (--bv >= bio->bi_io_vec)
-- 
2.9.5

[PATCH V11 11/19] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2018-11-20 Thread Ming Lei

bch_bio_alloc_pages() is always called on one new bio, so it is safe
to access the bvec table directly. Given it is the only kind of this
case, open code the bvec table access since bio_for_each_segment_all()
will be changed to support for iterating over multipage bvec.

Acked-by: Coly Li 
Signed-off-by: Ming Lei 
---
 drivers/md/bcache/util.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index 20eddeac1531..62fb917f7a4f 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -270,7 +270,11 @@ int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
int i;
struct bio_vec *bv;
 
-   bio_for_each_segment_all(bv, bio, i) {
+   /*
+* This is called on freshly new bio, so it is safe to access the
+* bvec table directly.
+*/
+   for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++, i++) {
bv->bv_page = alloc_page(gfp_mask);
if (!bv->bv_page) {
while (--bv >= bio->bi_io_vec)
-- 
2.9.5

[PATCH V12 14/20] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2018-11-25 Thread Ming Lei

bch_bio_alloc_pages() is always called on one new bio, so it is safe
to access the bvec table directly. Given it is the only kind of this
case, open code the bvec table access since bio_for_each_segment_all()
will be changed to support for iterating over multipage bvec.

Acked-by: Coly Li 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Ming Lei 
---
 drivers/md/bcache/util.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index 20eddeac1531..62fb917f7a4f 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -270,7 +270,11 @@ int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
int i;
struct bio_vec *bv;
 
-   bio_for_each_segment_all(bv, bio, i) {
+   /*
+* This is called on freshly new bio, so it is safe to access the
+* bvec table directly.
+*/
+   for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++, i++) {
bv->bv_page = alloc_page(gfp_mask);
if (!bv->bv_page) {
while (--bv >= bio->bi_io_vec)
-- 
2.9.5

Re: Kernel 4.14 RAID5 multi disk array on bcache not mounting

2017-11-21 Thread Lionel Bouton

Le 21/11/2017 à 23:04, Andy Leadbetter a écrit :
> I have a 4 disk array on top of 120GB bcache setup, arranged as follows
[...]
> Upgraded today to 4.14.1 from their PPA and the

4.14 and 4.14.1 have a nasty bug affecting bcache users. See for example
:
https://www.reddit.com/r/linux/comments/7eh2oz/serious_regression_in_linux_414_using_bcache_can/

Lionel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Kernel 4.14 RAID5 multi disk array on bcache not mounting

2017-11-22 Thread Holger Hoffstätte

On 11/21/17 23:22, Lionel Bouton wrote:
> Le 21/11/2017 à 23:04, Andy Leadbetter a écrit :
>> I have a 4 disk array on top of 120GB bcache setup, arranged as follows
> [...]
>> Upgraded today to 4.14.1 from their PPA and the
> 
> 4.14 and 4.14.1 have a nasty bug affecting bcache users. See for example
> :
> https://www.reddit.com/r/linux/comments/7eh2oz/serious_regression_in_linux_414_using_bcache_can/

4.14.2 (just out as rc1) will have the fix.

-h
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V13 13/19] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2019-01-11 Thread Ming Lei

bch_bio_alloc_pages() is always called on one new bio, so it is safe
to access the bvec table directly. Given it is the only kind of this
case, open code the bvec table access since bio_for_each_segment_all()
will be changed to support for iterating over multipage bvec.

Acked-by: Coly Li 
Reviewed-by: Omar Sandoval 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Ming Lei 
---
 drivers/md/bcache/util.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index 20eddeac1531..62fb917f7a4f 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -270,7 +270,11 @@ int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
int i;
struct bio_vec *bv;
 
-   bio_for_each_segment_all(bv, bio, i) {
+   /*
+* This is called on freshly new bio, so it is safe to access the
+* bvec table directly.
+*/
+   for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++, i++) {
bv->bv_page = alloc_page(gfp_mask);
if (!bv->bv_page) {
while (--bv >= bio->bi_io_vec)
-- 
2.9.5

[PATCH V14 12/18] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2019-01-21 Thread Ming Lei

bch_bio_alloc_pages() is always called on one new bio, so it is safe
to access the bvec table directly. Given it is the only kind of this
case, open code the bvec table access since bio_for_each_segment_all()
will be changed to support for iterating over multipage bvec.

Acked-by: Coly Li 
Reviewed-by: Omar Sandoval 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Ming Lei 
---
 drivers/md/bcache/util.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index 20eddeac1531..62fb917f7a4f 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -270,7 +270,11 @@ int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
int i;
struct bio_vec *bv;
 
-   bio_for_each_segment_all(bv, bio, i) {
+   /*
+* This is called on freshly new bio, so it is safe to access the
+* bvec table directly.
+*/
+   for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++, i++) {
bv->bv_page = alloc_page(gfp_mask);
if (!bv->bv_page) {
while (--bv >= bio->bi_io_vec)
-- 
2.9.5

[PATCH V15 12/18] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2019-02-15 Thread Ming Lei

bch_bio_alloc_pages() is always called on one new bio, so it is safe
to access the bvec table directly. Given it is the only kind of this
case, open code the bvec table access since bio_for_each_segment_all()
will be changed to support for iterating over multipage bvec.

Acked-by: Coly Li 
Reviewed-by: Omar Sandoval 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Ming Lei 
---
 drivers/md/bcache/util.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index 20eddeac1531..62fb917f7a4f 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -270,7 +270,11 @@ int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
int i;
struct bio_vec *bv;
 
-   bio_for_each_segment_all(bv, bio, i) {
+   /*
+* This is called on freshly new bio, so it is safe to access the
+* bvec table directly.
+*/
+   for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++, i++) {
bv->bv_page = alloc_page(gfp_mask);
if (!bv->bv_page) {
while (--bv >= bio->bi_io_vec)
-- 
2.9.5

Suggestion on reducing short kernel hangs from my btrfs filesystems: bcache?

2014-11-14 Thread Marc MERLIN

I have a server which runs zoneminder (video recording which is CPU and
disk IO intensive) while also doing a bunch of I/O over serial ports.

I have a a dual core
Intel(R) Core(TM) i3-2100T CPU @ 2.50GHz
(4 virtual CPUs in /proc/cpuinfo)

It's pretty clear that when zoneminder is doing more work, my programs
that talk to serial ports start failing due to delays on the kernel side
and desynchronization, causing serial port protocol errors (I'm using
USB serial adapters, and use 12 of them).
I'm pretty sure it's because of delays in the kernel more than user
space, but can't prove that easily.

I have a preempt kernel, kernel 3.16.3:
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_DEBUG_PREEMPT=y

>From what I can tell, things did get worse after I upgraded from ext4 to
btrfs (not counting times where I resync the software raid5 underneath
or run a btrfs scrub).

I may try to see if VOLPREMPT might work better, but I'm thinking
putting an SSD in front of that mdadm RAID5 array will help by relieving
the IO load and hopefully giving more time for the CPU to handle serial
port requests.
I'm actually not sure if my issue is btrfs interrupting serial port
connections due to PREEMPT, or if serial port connections aren't being
serviced quickly enough because the kernel is busy with btrfs and PREMPT
hasn't kicked in yet.

>From reading the list, bcache may work with btrfs, but before I try
that, I was curious if there are other or better ways to use an SSD to
make btrfs less impacting on my server?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/45] bcache: use op_is_write instead of checking for REQ_WRITE

2016-06-05 Thread Hannes Reinecke

On 06/05/2016 09:31 PM, mchri...@redhat.com wrote:
> From: Mike Christie 
> 
> We currently set REQ_WRITE/WRITE for all non READ IOs
> like discard, flush, writesame, etc. In the next patches where we
> no longer set up the op as a bitmap, we will not be able to
> detect a operation direction like writesame by testing if REQ_WRITE is
> set.
> 
> This has bcache use the op_is_write helper which will do the right
> thing.
> 
> Signed-off-by: Mike Christie 
> ---
>  drivers/md/bcache/io.c  | 2 +-
>  drivers/md/bcache/request.c | 6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
(Could probably folded together with the two previous patches)

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V10 11/19] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2018-11-19 Thread Ming Lei

On Thu, Nov 15, 2018 at 04:44:02PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:52:58PM +0800, Ming Lei wrote:
> > bch_bio_alloc_pages() is always called on one new bio, so it is safe
> > to access the bvec table directly. Given it is the only kind of this
> > case, open code the bvec table access since bio_for_each_segment_all()
> > will be changed to support for iterating over multipage bvec.
> > 
> > Cc: Dave Chinner 
> > Cc: Kent Overstreet 
> > Acked-by: Coly Li 
> > Cc: Mike Snitzer 
> > Cc: dm-de...@redhat.com
> > Cc: Alexander Viro 
> > Cc: linux-fsde...@vger.kernel.org
> > Cc: Shaohua Li 
> > Cc: linux-r...@vger.kernel.org
> > Cc: linux-er...@lists.ozlabs.org
> > Cc: David Sterba 
> > Cc: linux-btrfs@vger.kernel.org
> > Cc: Darrick J. Wong 
> > Cc: linux-...@vger.kernel.org
> > Cc: Gao Xiang 
> > Cc: Christoph Hellwig 
> > Cc: Theodore Ts'o 
> > Cc: linux-e...@vger.kernel.org
> > Cc: Coly Li 
> > Cc: linux-bca...@vger.kernel.org
> > Cc: Boaz Harrosh 
> > Cc: Bob Peterson 
> > Cc: cluster-de...@redhat.com
> > Signed-off-by: Ming Lei 
> > ---
> >  drivers/md/bcache/util.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
> > index 20eddeac1531..8517aebcda2d 100644
> > --- a/drivers/md/bcache/util.c
> > +++ b/drivers/md/bcache/util.c
> > @@ -270,7 +270,7 @@ int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
> > int i;
> > struct bio_vec *bv;
> >  
> > -   bio_for_each_segment_all(bv, bio, i) {
> > +   for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++) {
> 
> This is missing an i++.

Good catch, will fix it in next version.

thanks,
Ming

Re: [PATCH V10 11/19] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2018-11-19 Thread Ming Lei

On Fri, Nov 16, 2018 at 02:46:45PM +0100, Christoph Hellwig wrote:
> > -   bio_for_each_segment_all(bv, bio, i) {
> > +   for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++) {
> 
> This really needs a comment.  Otherwise it looks fine to me.

OK, will do it in next version.

Thanks,
Ming

Re: [PATCH V10 11/19] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2018-11-15 Thread Omar Sandoval

On Thu, Nov 15, 2018 at 04:52:58PM +0800, Ming Lei wrote:
> bch_bio_alloc_pages() is always called on one new bio, so it is safe
> to access the bvec table directly. Given it is the only kind of this
> case, open code the bvec table access since bio_for_each_segment_all()
> will be changed to support for iterating over multipage bvec.
> 
> Cc: Dave Chinner 
> Cc: Kent Overstreet 
> Acked-by: Coly Li 
> Cc: Mike Snitzer 
> Cc: dm-de...@redhat.com
> Cc: Alexander Viro 
> Cc: linux-fsde...@vger.kernel.org
> Cc: Shaohua Li 
> Cc: linux-r...@vger.kernel.org
> Cc: linux-er...@lists.ozlabs.org
> Cc: David Sterba 
> Cc: linux-btrfs@vger.kernel.org
> Cc: Darrick J. Wong 
> Cc: linux-...@vger.kernel.org
> Cc: Gao Xiang 
> Cc: Christoph Hellwig 
> Cc: Theodore Ts'o 
> Cc: linux-e...@vger.kernel.org
> Cc: Coly Li 
> Cc: linux-bca...@vger.kernel.org
> Cc: Boaz Harrosh 
> Cc: Bob Peterson 
> Cc: cluster-de...@redhat.com
> Signed-off-by: Ming Lei 
> ---
>  drivers/md/bcache/util.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
> index 20eddeac1531..8517aebcda2d 100644
> --- a/drivers/md/bcache/util.c
> +++ b/drivers/md/bcache/util.c
> @@ -270,7 +270,7 @@ int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
>   int i;
>   struct bio_vec *bv;
>  
> - bio_for_each_segment_all(bv, bio, i) {
> + for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++) {

This is missing an i++.

Re: [PATCH V10 11/19] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2018-11-16 Thread Christoph Hellwig

> - bio_for_each_segment_all(bv, bio, i) {
> + for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++) {

This really needs a comment.  Otherwise it looks fine to me.

Re: [PATCH V11 11/19] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2018-11-21 Thread Christoph Hellwig

On Wed, Nov 21, 2018 at 11:23:19AM +0800, Ming Lei wrote:
> bch_bio_alloc_pages() is always called on one new bio, so it is safe
> to access the bvec table directly. Given it is the only kind of this
> case, open code the bvec table access since bio_for_each_segment_all()
> will be changed to support for iterating over multipage bvec.

Looks good,

Reviewed-by: Christoph Hellwig

1 2 >

1 - 100 of 112 matches

Mail list logo