question about creating a raid10
Hello, if I create a raid10 it looks like that: mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde but if I've different jbods and I want that every mirror of a raid10 is on a different jbod how can I archive that? in zfs it looks like that: zpool create -o ashift=12 nc_storage mirror j1d03-hdd j2d03-hdd mirror j1d04-hdd j2d04-hdd zpool status [sudo] password for skrueger: pool: nc_storage state: ONLINE scan: scrub repaired 0B in 1h23m with 0 errors on Tue Jan 15 05:38:54 2019 config: NAME STATE READ WRITE CKSUM nc_storage ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 j1d03-hdd ONLINE 0 0 0 j2d03-hdd ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 j1d04-hdd ONLINE 0 0 0 j2d04-hdd ONLINE 0 0 0 how can I be sure that is btrfs the same? best regards Stefan
Re: question about creating a raid10
:( that means when one jbod fail its there is no guarantee that it works fine? like in zfs? well that sucks Didn't anyone think to program it that way? On Wednesday, January 16, 2019 2:42:08 PM CET Hugo Mills wrote: > On Wed, Jan 16, 2019 at 03:36:25PM +0100, Stefan K wrote: > > Hello, > > > > if I create a raid10 it looks like that: > > mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde > > > > but if I've different jbods and I want that every mirror of a raid10 is on > > a different jbod how can I archive that? in zfs it looks like that: > [snip] > > how can I be sure that is btrfs the same? > >I'm afraid you can't. It would take modifications of the chunk > allocator to achieve this (and you'd also need to store the metadata > somewhere as to which devices were in which failure domain). > >Hugo. > >
roadmap for btrfs
Hello, does exist an roadmap or something like "what do first/next"? I saw the project ideas[1] and there are a lot of intersting things in it (like read/write caches, per subvolumes mount options, block devices, etc), but there is no plan or an order of ideas. Did btrfs has something like that? [1] https://btrfs.wiki.kernel.org/index.php/Project_ideas best regards Stefan
Re: question about creating a raid10
> Btrfs raid10 really should not be called raid10. It sets up the wrong > user expectation entirely. It's more like raid0+1, except even that is > deceptive because in theory a legit raid0+1 you can lose multiple > drives on one side of the mirror (but not both); but with Btrfs raid10 > you really can't lose more than one drive. And therefore it does not > scale. The probability of downtime increases as drives are added; > whereas with a real raid10 downtime doesn't change. WTF?! really, so with btrfs raid10 I can't lose more than one drive? that sucks, that an advantage of raid 10! and the crazy thing is, thats not documented, not in the manpage nor btrfs wiki and and thats is very important. thats unbelievable .. > In your case you're better off with raid0'ing the two drives in each > enclosure (whether it's a feature of the enclosure or doing it with > mdadm or LVM). And then using Btrfs raid1 on top of the resulting > virtual block devices. Or do mdadm/LVM raid10, and format it Btrfs. mdadm, lvm.. btrfs is a reason to not use this programms, but since btrfs does not have a 'real' raid10 but a raid01 it does not fit in our use case + I can't configure which disk is in which mirror.. On Wednesday, January 16, 2019 11:15:02 AM CET Chris Murphy wrote: > On Wed, Jan 16, 2019 at 7:58 AM Stefan K wrote: > > > > :( > > that means when one jbod fail its there is no guarantee that it works fine? > > like in zfs? well that sucks > > Didn't anyone think to program it that way? > > The mirroring is a function of the block group, not the block device. > And yes that's part of the intentional design and why it's so > flexible. A real raid10 isn't as flexible, so to enforce the > allocation of specific block group stripes to specific block devices > would add complexity to the allocator while reducing flexibility. It's > not impossible, it'd just come with caveats like no three device > raid10 like now; and you'd have to figure out what to do if the user > adds one new device instead of two at a time, and what if any new > device isn't the same size as existing devices or if you add two > devices that aren't the same size. Do you refuse to add such devices? > What limitations do we run into when rebalancing? It's way more > complicated. > > Btrfs raid10 really should not be called raid10. It sets up the wrong > user expectation entirely. It's more like raid0+1, except even that is > deceptive because in theory a legit raid0+1 you can lose multiple > drives on one side of the mirror (but not both); but with Btrfs raid10 > you really can't lose more than one drive. And therefore it does not > scale. The probability of downtime increases as drives are added; > whereas with a real raid10 downtime doesn't change. > > In your case you're better off with raid0'ing the two drives in each > enclosure (whether it's a feature of the enclosure or doing it with > mdadm or LVM). And then using Btrfs raid1 on top of the resulting > virtual block devices. Or do mdadm/LVM raid10, and format it Btrfs. Or > yeah, use ZFS. > >
kernel calltraces with btrfs and bonnie++
Hello, if I run 'bonnie++ -c4' the system is unusable and hangs, I got also some CallTraces in my syslog. Is that a normal behavior? My system is: uname -a Linux tani 4.19.0-0.bpo.1-amd64 #1 SMP Debian 4.19.12-1~bpo9+1 (2018-12-30) x86_64 GNU/Linux btrfs fi sh Label: none uuid: 24be286b-ece6-4481-aa48-af255e96e5bd Total devices 2 FS bytes used 128.89GiB devid1 size 219.84GiB used 131.03GiB path /dev/sdb2 devid2 size 219.84GiB used 131.03GiB path /dev/sde2 both are new SSDs: smartctl -i /dev/sdb smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.19.0-0.bpo.1-amd64] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: SAMSUNG MZ7LM240HMHQ-5 Serial Number:S2TWNX0KA02412 LU WWN Device Id: 5 002538 c40b988bf Firmware Version: GXT5404Q User Capacity:240,057,409,536 bytes [240 GB] Sector Size: 512 bytes logical/physical Rotation Rate:Solid State Device Form Factor: 2.5 inches Device is:Not in smartctl database [for details use: -P showall] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is:Fri Jan 25 08:37:49 2019 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled if I run this I got this output in my /var/log/syslog: Jan 25 08:19:20 tani kernel: [ 480.733545] WARNING: CPU: 8 PID: 8564 at /build/linux-Ut6wTa/linux-4.19.12/fs/btrfs/ctree.h:1588 btrfs_update_device+0x1b2/0x1c0 [btrfs] Jan 25 08:19:20 tani kernel: [ 480.733546] Modules linked in: intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf pcspkr dm_service_time zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) ast ttm drm_kms_helper drm mei_me ipmi_si ioatdma ipmi_devintf iTCO_wdt sg joydev i2c_algo_bit evdev iTCO_vendor_support lpc_ich mei dca wmi ipmi_msghandler acpi_power_meter acpi_pad pcc_cpufreq button dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip_tables x_ta bles autofs4 btrfs xor zstd_decompress zstd_compress xxhash raid6_pq libcrc32c crc32c_generic ses enclosure sd_mod hid_generic usbhid hid crc32c_intel ahci mpt3sas libahci aesni_intel xhci_pci aes_x86_64 libata Jan 25 08:19:20 tani kernel: [ 480.733582] raid_class crypto_simd xhci_hcd scsi_transport_sas cryptd glue_helper i40e scsi_mod usbcore i2c_i801 usb_common Jan 25 08:19:20 tani kernel: [ 480.733591] CPU: 8 PID: 8564 Comm: bonnie++ Tainted: P OE 4.19.0-0.bpo.1-amd64 #1 Debian 4.19.12-1~bpo9+1 Jan 25 08:19:20 tani kernel: [ 480.733591] Hardware name: Supermicro Super Server/X11DPH-i, BIOS 2.1 06/15/2018 Jan 25 08:19:20 tani kernel: [ 480.733605] RIP: 0010:btrfs_update_device+0x1b2/0x1c0 [btrfs] Jan 25 08:19:20 tani kernel: [ 480.733606] Code: 89 f7 45 31 c0 ba 10 00 00 00 4c 89 ee e8 e6 27 ff ff 4c 89 f7 e8 fe f7 fc ff e9 de fe ff ff 41 bc f4 ff ff ff e9 db fe ff ff <0f> 0b eb b7 e8 25 d5 89 e7 0f 1f 44 00 00 0f 1f 44 00 00 41 55 41 Jan 25 08:19:20 tani kernel: [ 480.733607] RSP: 0018:b6be475bfab0 EFLAGS: 00010206 Jan 25 08:19:20 tani kernel: [ 480.733608] RAX: 0fff RBX: 96b769786bd0 RCX: 0036f60ffc00 Jan 25 08:19:20 tani kernel: [ 480.733609] RDX: 1000 RSI: 3f5c RDI: 96b6d6b76f50 Jan 25 08:19:20 tani kernel: [ 480.733609] RBP: 96b729a2f000 R08: b6be475bfa60 R09: b6be475bfa68 Jan 25 08:19:20 tani kernel: [ 480.733610] R10: 0003 R11: 3000 R12: Jan 25 08:19:20 tani kernel: [ 480.733611] R13: 3f3c R14: 96b6d6b76f50 R15: fff4 Jan 25 08:19:20 tani kernel: [ 480.733612] FS: 7f3451aeb740() GS:96b77fc0() knlGS: Jan 25 08:19:20 tani kernel: [ 480.733613] CS: 0010 DS: ES: CR0: 80050033 Jan 25 08:19:20 tani kernel: [ 480.733614] CR2: 565195079000 CR3: 003f25eec001 CR4: 007606e0 Jan 25 08:19:20 tani kernel: [ 480.733615] DR0: DR1: DR2: Jan 25 08:19:20 tani kernel: [ 480.733616] DR3: DR6: fffe0ff0 DR7: 0400 Jan 25 08:19:20 tani kernel: [ 480.733616] PKRU: 5554 Jan 25 08:19:20 tani kernel: [ 480.733617] Call Trace: Jan 25 08:19:20 tani kernel: [ 480.733632] btrfs_finish_chunk_alloc+0x12d/0x4b0 [btrfs] Jan 25 08:19:20 tani kernel: [ 480.733643] ? btrfs_create_pending_block_groups+0xec/0x240 [btrfs] Jan 25 08:19:20 tani kernel: [ 480.733652] btrfs_create_pending_block_groups+0xec/0x240 [btrfs] Jan 25 08:19:20 tani kernel: [ 480.733664] __btrfs_end_transaction+0x87/0x2c0 [btrfs] Jan 25 08:19:20 tani kernel: [ 480.733673] btrfs_alloc_data_chunk_ondemand+0xf8/0x300 [btrfs] Jan
Re: kernel calltraces with btrfs and bonnie++
so a simple btrfs fi resize -4k / do that trick? On Friday, January 25, 2019 3:51:12 PM CET Qu Wenruo wrote: > > On 2019/1/25 下午3:44, Stefan K wrote: > > since it is my /-root FS its not possible to do that online? > > > > > > > >> You could resize the fs by -4K and it should make the warning disappear. > And since you have 2 devices, you need to resize each partition by -4K. > > Thanks, > Qu > >
btrfs as / filesystem in RAID1
Hello, I've installed my Debian Stretch to have / on btrfs with raid1 on 2 SSDs. Today I want test if it works, it works fine until the server is running and the SSD get broken and I can change this, but it looks like that it does not work if the SSD fails until restart. I got the error, that one of the Disks can't be read and I got a initramfs prompt, I expected that it still runs like mdraid and said something is missing. My question is, is it possible to configure btrfs/fstab/grub that it still boot? (that is what I expected from a RAID1) best regards Stefan
Re: btrfs as / filesystem in RAID1
Thanks, with degraded as kernel parameter and also ind the fstab it works like expected That should be the normal behaviour, cause a server must be up and running, and I don't care about a device loss, thats why I use a RAID1. The device-loss problem can I fix later, but its important that a server is up and running, i got informed at boot time and also in the logs files that a device is missing, also I see that if you use a monitoring program. So please change the normal behavior On Friday, February 1, 2019 7:13:16 PM CET Hans van Kranenburg wrote: > Hi Stefan, > > On 2/1/19 11:28 AM, Stefan K wrote: > > > > I've installed my Debian Stretch to have / on btrfs with raid1 on 2 > > SSDs. Today I want test if it works, it works fine until the server > > is running and the SSD get broken and I can change this, but it looks > > like that it does not work if the SSD fails until restart. I got the > > error, that one of the Disks can't be read and I got a initramfs > > prompt, I expected that it still runs like mdraid and said something > > is missing. > > > > My question is, is it possible to configure btrfs/fstab/grub that it > > still boot? (that is what I expected from a RAID1) > > Yes. I'm not the expert in this area, but I see you haven't got a reply > today yet, so I'll try. > > What you see happening is correct. This is the default behavior. > > To be able to boot into your system with a missing disk, you can add... > rootflags=degraded > ...to the linux kernel command line by editing it on the fly when you > are in the GRUB menu. > > This allows the filesystem to start in 'degraded' mode this one time. > The only thing you should be doing when the system is booted is have a > new disk present already in place and fix the btrfs situation. This > means things like cloning the partition table of the disk that's still > working, doing whatever else is needed in your situation and then > running btrfs replace to replace the missing disk with the new one, and > then making sure you don't have "single" block groups left (using btrfs > balance), which might have been created for new writes when the > filesystem was running in degraded mode. > > -- > Hans van Kranenburg >
Re: btrfs as / filesystem in RAID1
> * Normal desktop users _never_ look at the log files or boot info, and > rarely run monitoring programs, so they as a general rule won't notice > until it's already too late. BTRFS isn't just a server filesystem, so > it needs to be safe for regular users too. I guess a normal desktop user wouldn't create a RAID1 nor other RAID-things, right? So an admin take care of a RAID and monitor it (it doesn't matter if it a hardwareraid, mdraid, zfs raid or what ever) and degraded works only with RAID-things, its not relevant for single-disk usage, right? > Also, LVM and MD have the exact same issue, it's just not as significant > because they re-add and re-sync missing devices automatically when they > reappear, which makes such split-brain scenarios much less likely. why does btrfs don't do that? On Thursday, February 7, 2019 2:39:34 PM CET Austin S. Hemmelgarn wrote: > On 2019-02-07 13:53, waxhead wrote: > > > > > > Austin S. Hemmelgarn wrote: > >> On 2019-02-07 06:04, Stefan K wrote: > >>> Thanks, with degraded as kernel parameter and also ind the fstab it > >>> works like expected > >>> > >>> That should be the normal behaviour, cause a server must be up and > >>> running, and I don't care about a device loss, thats why I use a > >>> RAID1. The device-loss problem can I fix later, but its important > >>> that a server is up and running, i got informed at boot time and also > >>> in the logs files that a device is missing, also I see that if you > >>> use a monitoring program. > >> No, it shouldn't be the default, because: > >> > >> * Normal desktop users _never_ look at the log files or boot info, and > >> rarely run monitoring programs, so they as a general rule won't notice > >> until it's already too late. BTRFS isn't just a server filesystem, so > >> it needs to be safe for regular users too. > > > > I am willing to argue that whatever you refer to as normal users don't > > have a clue how to make a raid1 filesystem, nor do they care about what > > underlying filesystem their computer runs. I can't quite see how a > > limping system would be worse than a failing system in this case. > > Besides "normal" desktop users use Windows anyway, people that run on > > penguin powered stuff generally have at least some technical knowledge. > Once you get into stuff like Arch or Gentoo, yeah, people tend to have > enough technical knowledge to handle this type of thing, but if you're > talking about the big distros like Ubuntu or Fedora, not so much. Yes, > I might be a bit pessimistic here, but that pessimism is based on > personal experience over many years of providing technical support for > people. > > Put differently, human nature is to ignore things that aren't > immediately relevant. Kernel logs don't matter until you see something > wrong. Boot messages don't matter unless you happen to see them while > the system is booting (and most people don't). Monitoring is the only > way here, but most people won't invest the time in proper monitoring > until they have problems. Even as a seasoned sysadmin, I never look at > kernel logs until I see any problem, I rarely see boot messages on most > of the systems I manage (because I'm rarely sitting at the console when > they boot up, and when I am I'm usually handling startup of a dozen or > so systems simultaneously after a network-wide outage), and I only > monitor things that I know for certain need to be monitored. > > > >> * It's easily possible to end up mounting degraded by accident if one > >> of the constituent devices is slow to enumerate, and this can easily > >> result in a split-brain scenario where all devices have diverged and > >> the volume can only be repaired by recreating it from scratch. > > > > Am I wrong or would not the remaining disk have the generation number > > bumped on every commit? would it not make sense to ignore (previously) > > stale disks and require a manual "re-add" of the failed disks. From a > > users perspective with some C coding knowledge this sounds to me (in > > principle) like something as quite simple. > > E.g. if the superblock UUID match for all devices and one (or more) > > devices has a lower generation number than the other(s) then the disk(s) > > with the newest generation number should be considered good and the > > other disks with a lower generation number should be marked as failed. > The problem is that i
Re: btrfs as / filesystem in RAID1
> However the raid1 term only describes replication. It doesn't describe > any policy. yep you're right, but the most sysadmin expect some 'policies'. If I use RAID1 I expect that if one drive failed, I can still boot _without_ boot issues, just some warnings etc, because I use raid1 to have simple 1device tolerance if one fails (which can happen). I can check/monitor the BTRFS RAID status by 'btrfs fi sh' or '(or by 'btrfs dev stat'). I also expect that if a device came back it will sync automatically and if I replace a device it will automatically rebalance the raid1 (which btrfs does, so far). I think a lot of sysadmins feel the same way. On Thursday, February 7, 2019 3:19:01 PM CET Chris Murphy wrote: > On Thu, Feb 7, 2019 at 10:37 AM Martin Steigerwald > wrote: > > > > Chris Murphy - 07.02.19, 18:15: > > > > So please change the normal behavior > > > > > > In the case of no device loss, but device delay, with 'degraded' set > > > in fstab you risk a non-deterministic degraded mount. And there is no > > > automatic balance (sync) after recovering from a degraded mount. And > > > as far as I know there's no automatic transition from degraded to > > > normal operation upon later discovery of a previously missing device. > > > It's just begging for data loss. That's why it's not the default. > > > That's why it's not recommended. > > > > Still the current behavior is not really user-friendly. And does not > > meet expectations that users usually have about how RAID 1 works. I know > > BTRFS RAID 1 is no RAID 1, although it is called like this. > > I mentioned the user experience is not good, in both my Feb 2 and Feb > 5 responses, compared to mdadm and lvm raid1 in the same situation. > > However the raid1 term only describes replication. It doesn't describe > any policy. And whether to fail to mount or mount degraded by default, > is a policy. Whether and how to transition from degraded to normal > operation when a formerly missing device reappears, is a policy. And > whether, and how, and when to rebuild data after resuming normal > operation is a policy. A big part of why these policies are MIA is > because they require features that just don't exist yet. And perhaps > don't even belong in btrfs kernel code or user space tools; but rather > a system service or daemon that manages such policies. However, none > of that means Btrfs raid1 is not raid1. There's a wrong assumption > being made about policies and features in mdadm and LVM, that they are > somehow attached to the definition of raid1, but they aren't. > > > > I also somewhat get that with the current state of BTRFS the current > > behavior of not allowing a degraded mount may be better… however… I see > > clearly room for improvement here. And there very likely will be > > discussions like this on this list… until BTRFS acts in a more user > > friendly way here. > > And it's completely appropriate if someone wants to update the Btrfs > status page to make more clear what features/behaviors/policies apply > to Btrfs raid of all types, or to have a page that summarizes their > differences among mdadm and/or LVM raid levels, so users can better > assess their risk taking, and choose the best Linux storage technology > for their use case. > > But at least developers know this is the case. > > And actually, you could mitigate some decent amount of Btrfs missing > features with server monitoring tools; including parsing kernel > messages. Because right now you aren't even informed of read or write > errors, device or csums mismatches or fixups, unless you're checking > kernel messages. Where mdadm has the option for emailing notifications > to an admin for such things, and lvm has a monitor that I guess does > something I haven't used it. Literally Btrfs will only complain about > failed writes that would cause immediate ejection of the device by md. > > > >
Re: List of known BTRFS Raid 5/6 Bugs?
sorry for disturb this discussion, are there any plans/dates to fix the raid5/6 issue? Is somebody working on this issue? Cause this is for me one of the most important things for a fileserver, with a raid1 config I loose to much diskspace. best regards Stefan
Re: List of known BTRFS Raid 5/6 Bugs?
wow, holy shit, thanks for this extended answer! > The first thing to point out here again is that it's not btrfs-specific. so that mean that every RAID implemantation (with parity) has such Bug? I'm looking a bit, it looks like that ZFS doesn't have a write hole. And it _only_ happens when the server has a ungraceful shutdown, caused by poweroutage? So that mean if I running btrfs raid5/6 and I've no poweroutages I've no problems? > it's possible to specify data as raid5/6 and metadata as raid1 does some have this in production? ZFS btw have 2 copies of metadata by default, maybe it would also be an option or btrfs? in this case you think 'btrfs fi balance start -mconvert=raid1 -dconvert=raid5 /path ' is safe at the moment? > That means small files and modifications to existing files, the ends of large > files, and much of the > metadata, will be written twice, first to the log, then to the final > location. that sounds that the performance will go down? So far as I can see btrfs can't beat ext4 or btrfs nor zfs and then they will made it even slower? thanks in advanced! best regards Stefan On Saturday, September 8, 2018 8:40:50 AM CEST Duncan wrote: > Stefan K posted on Fri, 07 Sep 2018 15:58:36 +0200 as excerpted: > > > sorry for disturb this discussion, > > > > are there any plans/dates to fix the raid5/6 issue? Is somebody working > > on this issue? Cause this is for me one of the most important things for > > a fileserver, with a raid1 config I loose to much diskspace. > > There's a more technically complete discussion of this in at least two > earlier threads you can find on the list archive, if you're interested, > but here's the basics (well, extended basics...) from a btrfs-using- > sysadmin perspective. > > "The raid5/6 issue" can refer to at least three conceptually separate > issues, with different states of solution maturity: > > 1) Now generally historic bugs in btrfs scrub, etc, that are fixed (thus > the historic) in current kernels and tools. Unfortunately these will > still affect for some time many users of longer-term stale^H^Hble distros > who don't update using other sources for some time, as because the raid56 > feature wasn't yet stable at the lock-in time for whatever versions they > stabilized on, they're not likely to get the fixes as it's new-feature > material. > > If you're using a current kernel and tools, however, this issue is > fixed. You can look on the wiki for the specific versions, but with the > 4.18 kernel current latest stable, it or 4.17, and similar tools versions > since the version numbers are synced, are the two latest release series, > with the two latest release series being best supported and considered > "current" on this list. > > Also see... > > 2) General feature maturity: While raid56 mode should be /reasonably/ > stable now, it remains one of the newer features and simply hasn't yet > had the testing of time that tends to flush out the smaller and corner- > case bugs, that more mature features such as raid1 have now had the > benefit of. > > There's nothing to do for this but test, report any bugs you find, and > wait for the maturity that time brings. > > Of course this is one of several reasons we so strongly emphasize and > recommend "current" on this list, because even for reasonably stable and > mature features such as raid1, btrfs itself remains new enough that they > still occasionally get latent bugs found and fixed, and while /some/ of > those fixes get backported to LTS kernels (with even less chance for > distros to backport tools fixes), not all of them do and even when they > do, current still gets the fixes first. > > 3) The remaining issue is the infamous parity-raid write-hole that > affects all parity-raid implementations (not just btrfs) unless they take > specific steps to work around the issue. > > The first thing to point out here again is that it's not btrfs-specific. > Between that and the fact that it *ONLY* affects parity-raid operating in > degraded mode *WITH* an ungraceful-shutdown recovery situation, it could > be argued not to be a btrfs issue at all, but rather one inherent to > parity-raid mode and considered an acceptable risk to those choosing > parity-raid because it's only a factor when operating degraded, if an > ungraceful shutdown does occur. > > But btrfs' COW nature along with a couple technical implementation > factors (the read-modify-write cycle for incomplete stripe widths and how > that risks existing metadata when new metadata is written) does amplify > the risk som
btrfs filesystem show takes a long time
Dear Maintainer, the command btrfs fi show takes too much time: time btrfs fi show Label: none uuid: 513dc574-e8bc-4336-b181-00d1e9782c1c Total devices 2 FS bytes used 2.34GiB devid1 size 927.79GiB used 4.03GiB path /dev/sdv2 devid2 size 927.79GiB used 4.03GiB path /dev/sdar2 real12m59.763s user0m0.008s sys 0m0.044s time btrfs fi show Label: none uuid: 513dc574-e8bc-4336-b181-00d1e9782c1c Total devices 2 FS bytes used 2.34GiB devid1 size 927.79GiB used 4.03GiB path /dev/sdv2 devid2 size 927.79GiB used 4.03GiB path /dev/sdar2 real6m22.498s user0m0.012s sys 0m0.024s time btrfs fi show Label: none uuid: 513dc574-e8bc-4336-b181-00d1e9782c1c Total devices 2 FS bytes used 2.34GiB devid1 size 927.79GiB used 4.03GiB path /dev/sdv2 devid2 size 927.79GiB used 4.03GiB path /dev/sdar2 real6m19.796s user0m0.012s sys 0m0.024s Maybe its related to that I've some harddisk in my system: ls /dev/disk/by-path/ |grep -v part |wc -l 44 This is also a known Debian Bug #891717 -- System Information: Debian Release: 9.3 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.9.0-5-amd64 (SMP w/20 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages btrfs-progs depends on: ii e2fslibs1.43.4-2 ii libblkid1 2.29.2-1 ii libc6 2.24-11+deb9u1 ii libcomerr2 1.43.4-2 ii liblzo2-2 2.08-1.2+b2 ii libuuid12.29.2-1 ii zlib1g 1:1.2.8.dfsg-5
Re: btrfs problems
> If your primary concern is to make the fs as stable as possible, then > keep snapshots to a minimal amount, avoid any functionality you won't > use, like qgroup, routinely balance, RAID5/6. > > And keep the necessary btrfs specific operations to minimal, like > subvolume/snapshot (and don't keep too many snapshots, say over 20), > shrink, send/receive. hehe, that sound like "hey use btrfs, its cool, but please - don't use any btrfs specific feature" ;) best Stefan
Re: btrfs filesystem show takes a long time
Hi, > You may try to run the show command under strace to see where it blocks. any recommendations for strace options? On Friday, September 14, 2018 1:25:30 PM CEST David Sterba wrote: > Hi, > > thanks for the report, I've forwarded it to the issue tracker > https://github.com/kdave/btrfs-progs/issues/148 > > The show command uses the information provided by blkid, that presumably > caches that. The default behaviour of 'fi show' is to skip mount checks, > so the delays are likely caused by blkid, but that's not the only > possible reason. > > You may try to run the show command under strace to see where it blocks. >
raid1 mount as read only
Hello, I've played a little bit with raid1: my steps was: 1. create a raid1 with btrfs (add device; balance start -mconvert=raid1 -dconvert=raid1 /) 2. after finishing, i shutdown the server and remove a device and start it again, 3. it works (i used degraded options in fstab) 4. I shutdown the server, add the 'old' device, start it and run a btrfs balance 5. until here everything is fine 6. I shutdown the server, remove the other harddisk, and start it, 7. it works 8. I shutdown the server, add the 'old' device, start it and try a btrfs balance 9. but now the FS is mounted as read only I got the following messages in dmesg: [6.401740] BTRFS: device fsid b997e926-ab95-46d0-a9be-da52aa09203d devid 1 transid 3570 /dev/sda1 [6.403079] BTRFS info (device sda1): allowing degraded mounts [6.403084] BTRFS info (device sda1): disk space caching is enabled [6.403086] BTRFS info (device sda1): has skinny extents [6.405878] BTRFS warning (device sda1): devid 2 uuid 09a7c9f6-9a13-4852-bca6-2d9120f388d4 missing [6.409108] BTRFS info (device sda1): detected SSD devices, enabling SSD mode [6.652411] BTRFS info (device sda1): allowing degraded mounts [6.652414] BTRFS info (device sda1): disk space caching is enabled [6.652416] BTRFS warning (device sda1): too many missing devices, writeable remount is not allowed but all devices are available: btrfs fi sh Label: none uuid: b997e926-ab95-46d0-a9be-da52aa09203d Total devices 2 FS bytes used 895.41MiB devid1 size 223.57GiB used 4.06GiB path /dev/sda1 devid2 size 223.57GiB used 2.00GiB path /dev/sdb1 mount | grep btrfs /dev/sda1 on / type btrfs (ro,relatime,degraded,ssd,space_cache,subvolid=5,subvol=/) If I try to remount it via, I got the following in dmesg: mount -o remount,rw b997e926-ab95-46d0-a9be-da52aa09203d / [ 320.048563] BTRFS info (device sda1): disk space caching is enabled [ 356.200552] BTRFS: error (device sda1) in write_all_supers:3752: errno=-5 IO failure (errors while submitting device barriers.) [ 356.200628] BTRFS info (device sda1): forced readonly [ 356.200632] BTRFS warning (device sda1): Skipping commit of aborted transaction. [ 356.200633] [ cut here ] [ 356.200677] WARNING: CPU: 0 PID: 1914 at /build/linux-EbeuWA/linux-4.9.130/fs/btrfs/transaction.c:1850 cleanup_transaction+0x1f3/0x2e0 [btrfs] [ 356.200678] BTRFS: Transaction aborted (error -5) [ 356.200679] Modules linked in: intel_rapl skx_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mgag200 ttm drm_kms_helper intel_uncore sg drm dcdbas mei_me i2c_algo_bit intel_rapl_perf pcspkr joydev iTCO_wdt lpc_ich iTCO_vendor_support mfd_core shpchp mei evdev ipmi_si ipmi_msghandler acpi_power_meter button ip_tables x_tables autofs4 ses enclosure scsi_transport_sas sd_mod hid_generic usbhid hid btrfs crc32c_generic xor raid6_pq crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ahci xhci_pci libahci tg3 xhci_hcd ptp megaraid_sas libata pps_core i2c_i801 usbcore bnxt_en i2c_smbus libphy scsi_mod usb_common [ 356.200764] CPU: 0 PID: 1914 Comm: btrfs-transacti Not tainted 4.9.0-8-amd64 #1 Debian 4.9.130-2 [ 356.200766] Hardware name: Dell Inc. PowerEdge R840/08XR9M, BIOS 1.2.4 10/18/2018 [ 356.200768] 9ef33d74 b9535d70fd50 [ 356.200774] 9ec7a59e 8ea977b261f8 b9535d70fda8 8e9971cdd500 [ 356.200778] fffb 8ea97114c980 9ec7a61f [ 356.200783] Call Trace: [ 356.200796] [] ? dump_stack+0x5c/0x78 [ 356.200801] [] ? __warn+0xbe/0xe0 [ 356.200803] [] ? warn_slowpath_fmt+0x5f/0x80 [ 356.200828] [] ? cleanup_transaction+0x1f3/0x2e0 [btrfs] [ 356.200835] [] ? prepare_to_wait_event+0xf0/0xf0 [ 356.200858] [] ? btrfs_commit_transaction+0x298/0xa10 [btrfs] [ 356.200879] [] ? start_transaction+0x96/0x480 [btrfs] [ 356.200900] [] ? transaction_kthread+0x1dc/0x200 [btrfs] [ 356.200919] [] ? btrfs_cleanup_transaction+0x580/0x580 [btrfs] [ 356.200926] [] ? kthread+0xd9/0xf0 [ 356.200933] [] ? __switch_to_asm+0x34/0x70 [ 356.200937] [] ? kthread_park+0x60/0x60 [ 356.200941] [] ? ret_from_fork+0x57/0x70 [ 356.200943] ---[ end trace 17ece06f94583f5a ]--- [ 356.200946] BTRFS: error (device sda1) in cleanup_transaction:1850: errno=-5 IO failure [ 356.200997] BTRFS info (device sda1): delayed_refs has NO entry My question is whats happen and how can I fix this? uname -a Linux yamazaki-06 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux btrfs version btrfs-progs v4.7.3