Re: hierarchical, tree-like structure of snapshots
Although I'm glad that a bug has been uncovered, maybe it's best if I stick with good old rsync for backups. It would be kind of ironic if the first data loss that I experienced in many years of btrfs use would be caused by an ancillary backup tool. On Thu, Dec 31, 2020 at 10:36 PM Zygo Blaxell wrote: > > On Thu, Dec 31, 2020 at 09:48:54PM +0100, john terragon wrote: > > On Thu, Dec 31, 2020 at 8:42 PM Andrei Borzenkov > > wrote: > > > > > > > > > > > How exactly you create subvolume with the same content? There are many > > > possible interpretations. > > > > > > > Zygo wrote that any subvol could be used with -p. So, out of > > curiosity, I did the following > > > > 1) btrfs sub create X > > 2) I unpacked some source (linux kernel) in X > > 3) btrfs sub create W > > 4) I unpacked the same source in W (so X and W have the same content > > but they are independent) > > 5) btrfs sub snap -r X X_RO > > 6) btrfs sub snap -r W W_RO > > 7) btrfs send W_RO | btrfs receive /mnt/btrfs2 > > 8) btrfs send -p W_RO X_RO | btrfs receive /mnt/btrfs2 > > > > And this is the exact output of 8) > > > > At subvol X_RO > > At snapshot X_RO > > ERROR: chown o257-1648413-0 failed: No such file or directory > > Yeah, I only checked that send completed without error and produced a > smaller stream. > > I just dumped the send metadata stream from the incremental snapshot now, > and it's more or less garbage at the start: > > # btrfs sub create A > # btrfs sub create B > # date > A/date > # date > B/date > # mkdir A/t B/u > # btrfs sub snap -r A A_RO > # btrfs sub snap -r B B_RO > # btrfs send A_RO | btrfs receive --dump > At subvol A_RO > subvol ./A_RO > uuid=995adde4-00ac-5e49-8c6f-f01743def072 transid=7329268 > chown ./A_RO/ gid=0 uid=0 > chmod ./A_RO/ mode=755 > utimes ./A_RO/ > atime=2020-12-31T15:51:31-0500 mtime=2020-12-31T15:51:48-0500 > ctime=2020-12-31T15:51:48-0500 > mkfile ./A_RO/o257-7329268-0 > rename ./A_RO/o257-7329268-0 dest=./A_RO/date > utimes ./A_RO/ > atime=2020-12-31T15:51:31-0500 mtime=2020-12-31T15:51:48-0500 > ctime=2020-12-31T15:51:48-0500 > write ./A_RO/date offset=0 len=29 > chown ./A_RO/date gid=0 uid=0 > chmod ./A_RO/date mode=644 > utimes ./A_RO/date > atime=2020-12-31T15:51:38-0500 mtime=2020-12-31T15:51:38-0500 > ctime=2020-12-31T15:51:38-0500 > mkdir ./A_RO/o258-7329268-0 > rename ./A_RO/o258-7329268-0 dest=./A_RO/t > utimes ./A_RO/ > atime=2020-12-31T15:51:31-0500 mtime=2020-12-31T15:51:48-0500 > ctime=2020-12-31T15:51:48-0500 > chown ./A_RO/tgid=0 uid=0 > chmod ./A_RO/tmode=755 > utimes ./A_RO/t > atime=2020-12-31T15:51:48-0500 mtime=2020-12-31T15:51:48-0500 > ctime=2020-12-31T15:51:48-0500 > # btrfs send B_RO -p A_RO | btrfs receive --dump > At subvol B_RO > snapshot./B_RO > uuid=4aa7db26-b219-694e-9b3c-f8f737a46bdb transid=7329268 > parent_uuid=995adde4-00ac-5e49-8c6f-f01743def072 parent_transid=7329268 > utimes ./B_RO/ > atime=2020-12-31T15:51:33-0500 mtime=2020-12-31T15:51:52-0500 > ctime=2020-12-31T15:51:52-0500 > link./B_RO/date dest=date > unlink ./B_RO/date > utimes ./B_RO/ > atime=2020-12-31T15:51:33-0500 mtime=2020-12-31T15:51:52-0500 > ctime=2020-12-31T15:51:52-0500 > write ./B_RO/date offset=0 len=29 > utimes ./B_RO/date > atime=2020-12-31T15:51:41-0500 mtime=2020-12-31T15:51:41-0500 > ctime=2020-12-31T15:51:41-0500 > rename ./B_RO/tdest=./B_RO/u > utimes ./B_RO/ > atime=2020-12-31T15:51:33-0500 mtime=2020-12-31T15:51:52-0500 > ctime=2020-12-31T15:51:52-0500 > utimes ./B_RO/u > atime=2020-12-31T15
Re: hierarchical, tree-like structure of snapshots
On Thu, Dec 31, 2020 at 8:42 PM Andrei Borzenkov wrote: > > > How exactly you create subvolume with the same content? There are many > possible interpretations. > Zygo wrote that any subvol could be used with -p. So, out of curiosity, I did the following 1) btrfs sub create X 2) I unpacked some source (linux kernel) in X 3) btrfs sub create W 4) I unpacked the same source in W (so X and W have the same content but they are independent) 5) btrfs sub snap -r X X_RO 6) btrfs sub snap -r W W_RO 7) btrfs send W_RO | btrfs receive /mnt/btrfs2 8) btrfs send -p W_RO X_RO | btrfs receive /mnt/btrfs2 And this is the exact output of 8) At subvol X_RO At snapshot X_RO ERROR: chown o257-1648413-0 failed: No such file or directory
Re: hierarchical, tree-like structure of snapshots
On Thu, Dec 31, 2020 at 6:28 PM Zygo Blaxell wrote: > > I think your confusion is that you are thinking of these as a tree. > There is no tree, each subvol is an equal peer in the filesystem. > > "send -p A B" just walks over subvol A and B and sends a diff of the > parts of B not in A. You can pick any subvol with -p as long as it's > read-only and present on the receiving side. Obviously it's much more > efficient if the two subvols have a lot of shared extents (e.g. because > B and A were both snapshots made at different times of some other subvol > C), but this is not required. Can you really use ANY subvol to use with -p. Because if I 1) create a subvol X 2) create a subvol W with the exact same content of X (but created independently) 3) do a RO snap X_RO of X 4) do a RO snap W_RO of W 5) send W_RO to the other FS 6) send -p W_RO X_RO to the other FS I get this: At subvol X_RO At snapshot X_RO ERROR: chown o257-1648413-0 failed: No such file or directory any idea?
Re: hierarchical, tree-like structure of snapshots
On Thu, Dec 31, 2020 at 8:05 AM Andrei Borzenkov wrote: > > > > > OK, but then could I use Y as parent of the rw snapshot, let's call it > > W, in a send? > > No > Of course I didn't mean to use Y as a parent of W itself but to a readonly snapshot of W whenever I want to send it to the second FS. And I just tried the following steps and they worked: 1) created subvol X 2) created readonly snap Y of X 3) sent Y to second FS 4) modified X 5) created readonly snap X1 of X 6) sent -p Y X1 to second FS 7) created readwrite snap Y1 of Y 8) modified Y1 9) created readonly snap Y1_RO of Y1 10) sent -p Y Y1_RO to second FS So, as you can see, -in 6) I've used the RO snap Y of X as the parent of X1 (and X) to send X1 to the second FS -in 10) I did the opposite, Y is still used as the parent but this time I've sent the RO snap of a subvol that is a snap of Y. So it seems to work both ways
Re: hierarchical, tree-like structure of snapshots
On Wed, Dec 30, 2020 at 6:24 PM sys wrote: > > > [...] > You should simply make a 'read-write' snapshot (Y-rw) of the 'read-only' > snapshot (Y) that is part of your backup/send scheme. Do not modify > read-only snapshots to be rw. > OK, but then could I use Y as parent of the rw snapshot, let's call it W, in a send? So I would have this tree where Y is still the root. Y-W \ Z-X Can I do a send -p Y W ? Because I thought it was other way around, that is I do a readonly snapshot W of Y and that will be the base for incrementally sending the future modified Y to another FS (provided of course W is already there).
Re: hierarchical, tree-like structure of snapshots
Sorry, that ascii tree came out awful and it looks like Z is the child of Y instead of Y1. I hope this one below looks better. Y1-Y \ Z-X On Wed, Dec 30, 2020 at 5:56 PM john terragon wrote: > > Hi. > I would like to maintain a tree-like hierarchical structure of > snapshots. Let me try to explain what I mean by that. > > Let's say I have a btrfs fs with just one subvolume X, and let's say > that a make a readonly snapshot Y of X. As far as I understand there > is a parent-child relation between Y (the parent) and X the child. > > Now let's say that after some time and modifications of X I do another > snapshot Z of X. Now the "temporal" stucture would be Y-Z-X. So X is > now the "child" of Z and Z is now the "child" of Y. The structure is a > path which is a special case of a tree. > > Now let's suppose that I want to start modify Y but I still want to be > able to have a parent of Z which I might use as a point of reference > for Z in a > send to somewhere. That is I want to be able to still do a send -p Y Z > to another btrfs filesystem where there is previously sent copy of Y > (which, remember, as of this point has been readonly and I'm just now > wanting to start to modify it). > The only thing I think I can do would be to make a readonly snapshot > Y1 of Y and make Y writeable (so that I can start modify it). At that > point the structure would be > > Y1-Y > \ > Z-X > > (yes my ascii art is atrocious...) which is a "proper" tree where Y1 > is the root with two children (Y and Z), Z has one child (X) and Y and > X are leaves. > Now, my question is, would Y1 still be usable in send -p Y1 Z, just > like Y was before becoming writeable and being modified? I would say > that Y1 would be just as good as the readonly original Y was as a > parent for Z in a send. But maybe there is some implementation detail > that escapes me and that prevents Y1 to be used as a perfect > replacement for the original Y. > I hope I was clear enough. > Thanks > John
hierarchical, tree-like structure of snapshots
Hi. I would like to maintain a tree-like hierarchical structure of snapshots. Let me try to explain what I mean by that. Let's say I have a btrfs fs with just one subvolume X, and let's say that a make a readonly snapshot Y of X. As far as I understand there is a parent-child relation between Y (the parent) and X the child. Now let's say that after some time and modifications of X I do another snapshot Z of X. Now the "temporal" stucture would be Y-Z-X. So X is now the "child" of Z and Z is now the "child" of Y. The structure is a path which is a special case of a tree. Now let's suppose that I want to start modify Y but I still want to be able to have a parent of Z which I might use as a point of reference for Z in a send to somewhere. That is I want to be able to still do a send -p Y Z to another btrfs filesystem where there is previously sent copy of Y (which, remember, as of this point has been readonly and I'm just now wanting to start to modify it). The only thing I think I can do would be to make a readonly snapshot Y1 of Y and make Y writeable (so that I can start modify it). At that point the structure would be Y1-Y \ Z-X (yes my ascii art is atrocious...) which is a "proper" tree where Y1 is the root with two children (Y and Z), Z has one child (X) and Y and X are leaves. Now, my question is, would Y1 still be usable in send -p Y1 Z, just like Y was before becoming writeable and being modified? I would say that Y1 would be just as good as the readonly original Y was as a parent for Z in a send. But maybe there is some implementation detail that escapes me and that prevents Y1 to be used as a perfect replacement for the original Y. I hope I was clear enough. Thanks John
btrfs send on top level subvolumes that contain other subvolumes
Hi. Let's say I have a top-level subvolume /sub and that inside /sub I have another subvolume say /sub/X/Y/subsub. If I make a snapshot (both ro and rw give the same results) of /sub, say /sub-snap, right now what I get is this 1) the /sub-snap/X/Y/subsub is present (and empty, and that's OK as snapshot are not recursive) but it doesn't seem to be neither a) an empty subvolume (because btrfs sub list doesn't list it) b) a directory because, for example lsattr -d subsub gives me this result "lsattr: Inappropriate ioctl for device While reading flags on subsub" 2) if /sub-snap is ro and I send it somewhere, then in the destination sub-snap subsub is not present at all (which wouldn't be illogical, given the non-recursive nature of snapshots). So I'm wondering it all of this is the intended outcome when snapshotting and sending a subvolume that has internally defined subvolumes or if perhaps it's a bug. I'm using kernel 3.17.1 patched for the recent ro snapshot corruption bug and btrfs-progs from the 3.17.x branch in git. Thanks John -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert "Btrfs: race free update of commit root for ro snapshots"
-It's not a brand new fs. It has been created four or five days ago with btrfs-progs 3.16.2 (in fact it was created because of the dead unremovable ro snapshots in the previous fs) -the snapshot in question has been created after applying the patch (and it has not become corrupted so far) -not an incremental send -no warnings in dmesg -btrfs check segfaults (as it did before the patch) -there are in fact dead unremovable ro snapshots in the filesystem (it has been used before the patch). But the filesystem seems functional as long as the dead ro snapshots aren't touched. If one of them is accessed with ls -l I get the usual "parent transid verify failed on X wanted Y found Z". But as I said no warnings of that kind (or any kind) appear in dmesg when I do the send on the freshly created ro snapshot. thanks john On Thu, Oct 16, 2014 at 1:05 AM, Filipe David Manana wrote: > On Wed, Oct 15, 2014 at 11:42 PM, john terragon wrote: >> Hi. >> >> I applied the patch to 3.17.1 but although I haven't seen any >> corrupted ro snapshot yet it's still impossible to do btrfs send. As >> soon as I start btrfs send I still get >> >> ERROR: send ioctl failed with -12: Cannot allocate memory >> >> even if I redirect btrfs send's output to a file (instead of involving >> btrfs receive) >> >> Maybe this time it's actually a btrfs-progs bug? > > Not enough information to tell. > > Is it a brand new fs? If not, is it a snapshot created after applying > the patch or before? Does a btrfsck reports any issues with the fs? > Is it an incremental (using -p ) or a full send? Do > you see any warning (traces, errors) in syslog (dmesg)? > > Either an issue in send or, if it's an fs created/used with unpatched > 3.17.0/1, it can be a side effect of the corruption. > > thanks > >> >> Thanks >> John > > > > -- > Filipe David Manana, > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert "Btrfs: race free update of commit root for ro snapshots"
Hi. I applied the patch to 3.17.1 but although I haven't seen any corrupted ro snapshot yet it's still impossible to do btrfs send. As soon as I start btrfs send I still get ERROR: send ioctl failed with -12: Cannot allocate memory even if I redirect btrfs send's output to a file (instead of involving btrfs receive) Maybe this time it's actually a btrfs-progs bug? Thanks John -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs random filesystem corruption in kernel 3.17
And another worrying thing I didn't notice before. Two snapshots have dates that do not make sense. root-b3 and root-b4 have been created Oct 14th (and btw root's modification time was also on Oct the 14th). So why do they show Oct 10th? And root-prov has actually been created on Oct 10 15:37, as it correctly shows, so it's like btrfs sub snap picks up old stale data from who knows were or when or for what reason. Moreover, root-b4 was created with 3.16.5not good. drwxrwsr-x 1 root staff 30 Sep 11 16:15 home d? ? ?? ?? home-backup drwxr-xr-x 1 root root 250 Oct 14 03:02 root d? ? ?? ?? root-b2 drwxr-xr-x 1 root root 250 Oct 10 15:37 root-b3 drwxr-xr-x 1 root root 250 Oct 10 15:37 root-b4 drwxr-xr-x 1 root root 250 Oct 14 03:02 root-b5 drwxr-xr-x 1 root root 250 Oct 14 03:02 root-b6 d? ? ?? ?? root-backup drwxr-xr-x 1 root root 250 Oct 10 15:37 root-prov drwxr-xr-x 1 root root 88 Sep 15 16:02 vms On Tue, Oct 14, 2014 at 1:18 AM, Rich Freeman wrote: > On Mon, Oct 13, 2014 at 5:22 PM, john terragon wrote: >> I'm using "compress=no" so compression doesn't seem to be related, at >> least in my case. Just read-only snapshots on 3.17 (although I haven't >> tried 3.16). > > I was using lzo compression, and hence my comment about turning it off > before going back to 3.16 (not realizing that 3.16 has subsequently > been fixed). > > Ironically enough I discovered this as I was about to migrate my ext4 > backup drive into my btrfs raid1. Maybe I'll go ahead and wait on > that and have an rsync backup of the filesystem handy (minus > snapshots) just in case. :) > > I'd switch to 3.16, but it sounds like there is no way to remove the > snapshots at the moment, and I can live for a while without the > ability to create new ones. > > interestingly enough it doesn't look like ALL snapshots are affected. > I checked and some of the snapshots I made last weekend while doing > system updates look accessible. They are significantly smaller, and > the subvolumes they were made from are also fairly new - though I have > no idea if that is related. > > The subvolumes do show up in btrfs su list. They cannot be examined > using btrfs su show. > > It would be VERY nice to have a way of cleaning this up without > blowing away the entire filesystem... > > -- > Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs random filesystem corruption in kernel 3.17
I'm using "compress=no" so compression doesn't seem to be related, at least in my case. Just read-only snapshots on 3.17 (although I haven't tried 3.16). John -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs random filesystem corruption in kernel 3.17
I think I just found a consistent simple way to trigger the problem (at least on my system). And, as I guessed before, it seems to be related just to readonly snapshots: 1) I create a readonly snapshot 2) I do some changes on the source subvolume for the snapshot (I'm not sure changes are strictly needed) 3) reboot (or probably just unmount and remount. I reboot because the fs I've problems with contains my root subvolume) After the rebooting (or the remount) I consistently have the corruption with the usual multitude of these in dmesg "parent transid verify failed on 902316032 wanted 2484 found 4101" and the characteristic ls -la output drwxr-xr-x 1 root root 250 Oct 10 15:37 root d? ? ?? ?? root-b2 drwxr-xr-x 1 root root 250 Oct 10 15:37 root-b3 d? ? ?? ?? root-backup root-backup and root-b2 are both readonly whereas root-b3 is rw (and it didn't get corrupted). David, maybe you can try the same steps on one of your machines? John -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send and kernel 3.17
Actually it seems strange that a send operation could corrupt the source subvolume or fs. Why would the send modify the source subvolume in any significant way? The only way I can find to reconcile your observations with mine is that maybe the snapshots get corrupted not by the send operation by itself but when they are generated with -r (readonly, as it is needed to send them). Are the corrupted snapshots you have in machine 2 (the one in which send was never used) readonly? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send and kernel 3.17
Hi. I just wanted to "confirm David's story" so to speak :) -kernel 3.17-rc7 (didn't bother to compile 3.17 as there weren't any btrfs fixes, I think) -btrfs-progs 3.16.2 (also compiled from source, so no distribution-specific patches) -fresh fs -I get the same two errors David got (first I got the I/O error one and then the memory allocation one) -plus now when I ls -la the fs top volume this is what I get drwxrwsr-x 1 root staff 30 Sep 11 16:15 home d? ? ?? ?? home-backup drwxr-xr-x 1 root root 250 Oct 10 15:37 root d? ? ?? ?? root-backup drwxr-xr-x 1 root root 88 Sep 15 16:02 vms drwxr-xr-x 1 root root 88 Sep 15 16:02 vms-backup yes, the question marks on those two *-backup snapshots are really there. I can't access the snapshots, I can't delete them, I can't do anything with them. -btrfs check segfaults -the events that led to this situation are these: 1) btrfs su snap -r root root-backup 2) send |receive (the entire root-backup, not and incremental send) immediate I/O error 3) move on to home: btrfs su snap -r home home-backup 4) send|receive (again not an incremental send) everything goes well (!) 5) retry with root: btrfs su snap -r root root-backup 6) send|receive and it goes seemingly well 7) apt-get dist-upgrade just to modify root and try an incremental send 8) reboot after the dist-upgrade 9) ls -la the fs top volume: first I get the memory allocation error and after that any ls -la gives the output I pasted above. (notice that beside the ls -la, the two snapshots were not touched in any way since the two send|receive) Few final notes. I haven't tried send/receive in a while (they were unreliable) so I can't tell which is the last version they worked for me (well, no version actually :) ). I've never had any problem with just snapshots. I make them regularly, I use them, I modify them and I've never had one problem (with 3.17 too, it's just send/receive that murders them). Best regards John -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS critical (device dm-0): invalid dir item name len: 45389
Everyone knows what raid0 entails. Moreover, with btrfs being an experimental fs, not having backups would obviously be pure idiocy. I wrote that it was "pretty serious" because the situation came out of nowhere on a low-traffic fs on which the most exiciting thing that can happen is an occasional snapshot once on a while when I do a heavy update with apt-get (snapshot that gets always removed right after the update goes invariably well and my paranoia fades). The problem seems to have happen right after a hard lock probably due to 3.17.0-rc3 (and before you explain to me what that rc3 stands for, let me tell you that I'm not complaining, I knew what I was doing). I had to power-off "brutally" and right after that the problem occurred. I'm pretty sure about that because for obvious reasons I rsync the hell out of that filesystem every chance I get. Rsync obviously does a traversal of the fs and so the "critical" (btrfs words, not mine) problem would have showed on kmsg (another place that I watch like a hawk, because of the raid0+experimental fs thing). I don't know if you are a btrfs developer but that "pretty serious" was not meant to offend them nor to complain. Actually I've been a pretty happy customer up until now (and I still am) because I have never been bitten by any big bug even with such a complex fs. I just have this zombie directory that can't be rm'd, but I mv'd out of the way and everything is fine. It'll get sorted when I do the next wipe-and-restore iteration (again, being experimental, I don't let the fs to become too "old"). So, the "pretty serious" was more due to the surprise than anything else. John -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS critical (device dm-0): invalid dir item name len: 45389
Some more details about this problem: -the directory involved is /lib/modules/3.17.0-rc3-cu3/kernel/drivers/iio/gyro -in that dir there should be kernel object named hid-sensor-gyro-3d.ko but there's no trace of it -that dir cannot be removed or overwritten. rm -rf fails saying that the dir cannot be removed because it's not empty (?, even with -rf ?) and trying to reinstall the .deb package for that kernel image (thus overwriting that dir) ends up in a segfault The only workaround is to mv that dir (well, I simply mv the whole 3.17.0-rc3-cu3 dir but it should work also for the gyro subdir) and reinstall the deb package. So, it's pretty serious because there's actual loss of data (even though I was lucky I just lost a ko I don't use). John -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS critical (device dm-0): invalid dir item name len: 45389
Hi. When I traverse one of my btrfs, for example with a simple "find /", I get the following in kmsg BTRFS critical (device dm-0): invalid dir item name len: 45389 The message appears just one time (so I guess it involves just one file/dir). dm-0 is the first dmcrypt device of a pair on which I have btrfs in RAID0 (btrfs native raid). Though I can't be 100% sure, this seems to be a very recent problem (I would have noticed something "critical" in kmsg if it happened before). Everything else seems to work fine. So, should I be worried. Is there a way to fix this? (I assume that a scrub would not do any good since it seems to be related to btrfs data structures more than actual file data). Is there at least a way to know which file/dir is involved? Maybe a verbose debug mode? Or maybe I should just add some printk in the verify_dir_item function that seems to generate the message. Thanks John -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds
I wasn't sure what you meant with so I dd'd all the three possible cases: 1) here's the dmcrypt device on which I mkfs.btrfs 2097152000 bytes (2.1 GB) copied, 487.265 s, 4.3 MB/s 2) here's the partition of the usb stick (which has another partition containing /boot) on top of which the dmcrypt device is created 2097152000 bytes (2.1 GB) copied, 449.693 s, 4.7 MB/s 3) here's the whole usb stick device 2097152000 bytes (2.1 GB) copied, 448.003 s, 4.7 MB/s It's a usb2 device but doesn't it seem kind of slow? Thanks John On Wed, Sep 3, 2014 at 2:36 PM, Chris Mason wrote: > On 09/02/2014 09:31 PM, john terragon wrote: >> Rsync finished. FWIW in the end it reported an average speed of about >> 900K/sec. Without autodefrag there have been no messages about hung >> kworkers even though rsync seemingly keeps getting hung for several >> minutes throughout the whole execution. > > So lets take a step back and figure out how fast the usb stick actually is. > This will erase your usb stick, but give us an idea of its performance: > > dd if=/dev/zero of=/dev/ bs=20M oflag=direct > count=100 > > Note again, the above command will erase your usb stick ;) Use whatever > device name > you've been sending to mkfs.btrfs > > The kernel will allow a pretty significant amount of ram to be dirtied before > forcing writeback, which is why you're seeing rsync stall at seemingly strange > intervals. In the base of btrfs with compression, we add some worker threads > between > rsync and the device, and these may be turning the writeback into a somewhat > more bursty operation. > > -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds
I tried the same routine on 32GB usb sticks. Same exact problems. 32GB seems a bit much for a --mixed btrfs. I haven't tried ssd_spread, maybe it's beneficial. However, as I wrote above, disabling autodefrag gets rid completely of the "INFO: hung task" messages but even though the kernel doesn't complain about blocked kworkers, the rsync process still blocks for several minutes throughout the whole copy. On Wed, Sep 3, 2014 at 4:44 AM, Chris Murphy wrote: > > On Sep 2, 2014, at 12:40 AM, Duncan <1i5t5.dun...@cox.net> wrote: >> >> Mkfs.btrfs used to default to 4 KiB node/leaf sizes; now days it defaults >> to 16 KiB as that's far better for most usage. I wonder if USB sticks >> are an exception... > > USB sticks > 1 GB get 16KB nodesize also. At <= 1 GB, mixed-bg is default as > is 4KB nodesize. Probably because queue/rotational is 1 for USB sticks, they > mount without ssd or ssd_spread which may be unfortunate (I haven't > benchmarked it but I suspect ssd_spread would work well for USB sticks). > > It was suggested a while ago that maybe mixed-bg should apply to larger > volumes, maybe up to 8GB or 16GB? > > > Chris Murphy > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds
Rsync finished. FWIW in the end it reported an average speed of about 900K/sec. Without autodefrag there have been no messages about hung kworkers even though rsync seemingly keeps getting hung for several minutes throughout the whole execution. Thanks John On Tue, Sep 2, 2014 at 10:48 PM, john terragon wrote: > OK, so I'm using 3.17-rc3, same test on a flash usb drive, no > autodefrag. The situation is even stranger. The rsync is clearly > stuck, it's trying to write the same file for much more than 120 secs. > However dmesg is clean, no "INFO: task kworker/u16:11:1763 blocked for > more than 120 seconds" or anything. > df is responsive but shows no increase in used space. > Consider that with autodefrag this bug is completely "reliable", the > hung-task info starts to show up almost immediately. > > Oh wait (I'm live...) now rsync is unstuck, files are being written > and df shows an increase in used space. BUT, still no hung-task > message in the kernel log, even though rsync was actually stuck for > several minutes. > > So, to summarize, same conditions except no autodefrag. Result: > process stuck for way more than 120 secs but this time no complaints > in the kernel log. > > Thanks > John > > > > On Tue, Sep 2, 2014 at 10:23 PM, john terragon wrote: >> I don't know what to tell you about the ENOSPC code being heavily >> involved. At this point I'm using this simple test to see if things >> improve: >> >> -freshly created btrfs on dmcrypt, >> -rsync some stuff (since the fs is empty I could just use cp but I >> keep the test the same as it was when I had the problem for the first >> time) >> -note: the rsynced stuff is about the size of the volume but with >> compression I always end up with 1/2 to 3/4 free space >> >> I'm not sure how do I even get close to involving the ENOSPC code but >> probably I'm not fully aware of the inner workings of btrfs. >> >>> Can you try flipping off autodefrag? >> >> As soon as the damn unkillable rsync decides to obey the kill -9... >> >> Thanks >> >> John >> >> On Tue, Sep 2, 2014 at 10:10 PM, Chris Mason wrote: >>>> On 09/02/2014 03:56 PM, john terragon wrote: >>>> Nice...now I get the hung task even with 3.14.17 And I tried with >>>> 4K for node and leaf size...same result. And to top it all off, today >>>> I've been bitten by the bug also on my main root fs (which is on two >>>> fast ssd), although with 3.16.1. >>>> >>>> Is it at least safe for the data? I mean, as long as the hung process >>>> terminates and no other error shows up, can I at least be sure that >>>> the data written is correct? >>> >>> Your traces are a little different. The ENOSPC code is throttling >>> things to make sure you have enough room for the writes you're doing. >>> The code we have in 3.17-rc3 (or my for-linus branch) are the best >>> choices right now. You can pull that down to 3.16 if you want all the >>> fixes on a more stable kernel. >>> >>> Nailing down the ENOSPC code is going to be a little different, I think >>> autodefrag probably isn't interacting well with being short on space and >>> encryption. This is leading to much more IO than we'd normally do, and >>> dm-crypt makes it fairly intensive. >>> >>> Can you try flipping off autodefrag? >>> >>> -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds
OK, so I'm using 3.17-rc3, same test on a flash usb drive, no autodefrag. The situation is even stranger. The rsync is clearly stuck, it's trying to write the same file for much more than 120 secs. However dmesg is clean, no "INFO: task kworker/u16:11:1763 blocked for more than 120 seconds" or anything. df is responsive but shows no increase in used space. Consider that with autodefrag this bug is completely "reliable", the hung-task info starts to show up almost immediately. Oh wait (I'm live...) now rsync is unstuck, files are being written and df shows an increase in used space. BUT, still no hung-task message in the kernel log, even though rsync was actually stuck for several minutes. So, to summarize, same conditions except no autodefrag. Result: process stuck for way more than 120 secs but this time no complaints in the kernel log. Thanks John On Tue, Sep 2, 2014 at 10:23 PM, john terragon wrote: > I don't know what to tell you about the ENOSPC code being heavily > involved. At this point I'm using this simple test to see if things > improve: > > -freshly created btrfs on dmcrypt, > -rsync some stuff (since the fs is empty I could just use cp but I > keep the test the same as it was when I had the problem for the first > time) > -note: the rsynced stuff is about the size of the volume but with > compression I always end up with 1/2 to 3/4 free space > > I'm not sure how do I even get close to involving the ENOSPC code but > probably I'm not fully aware of the inner workings of btrfs. > >> Can you try flipping off autodefrag? > > As soon as the damn unkillable rsync decides to obey the kill -9... > > Thanks > > John > > On Tue, Sep 2, 2014 at 10:10 PM, Chris Mason wrote: >>> On 09/02/2014 03:56 PM, john terragon wrote: >>> Nice...now I get the hung task even with 3.14.17 And I tried with >>> 4K for node and leaf size...same result. And to top it all off, today >>> I've been bitten by the bug also on my main root fs (which is on two >>> fast ssd), although with 3.16.1. >>> >>> Is it at least safe for the data? I mean, as long as the hung process >>> terminates and no other error shows up, can I at least be sure that >>> the data written is correct? >> >> Your traces are a little different. The ENOSPC code is throttling >> things to make sure you have enough room for the writes you're doing. >> The code we have in 3.17-rc3 (or my for-linus branch) are the best >> choices right now. You can pull that down to 3.16 if you want all the >> fixes on a more stable kernel. >> >> Nailing down the ENOSPC code is going to be a little different, I think >> autodefrag probably isn't interacting well with being short on space and >> encryption. This is leading to much more IO than we'd normally do, and >> dm-crypt makes it fairly intensive. >> >> Can you try flipping off autodefrag? >> >> -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds
I don't know what to tell you about the ENOSPC code being heavily involved. At this point I'm using this simple test to see if things improve: -freshly created btrfs on dmcrypt, -rsync some stuff (since the fs is empty I could just use cp but I keep the test the same as it was when I had the problem for the first time) -note: the rsynced stuff is about the size of the volume but with compression I always end up with 1/2 to 3/4 free space I'm not sure how do I even get close to involving the ENOSPC code but probably I'm not fully aware of the inner workings of btrfs. > Can you try flipping off autodefrag? As soon as the damn unkillable rsync decides to obey the kill -9... Thanks John On Tue, Sep 2, 2014 at 10:10 PM, Chris Mason wrote: >> On 09/02/2014 03:56 PM, john terragon wrote: >> Nice...now I get the hung task even with 3.14.17 And I tried with >> 4K for node and leaf size...same result. And to top it all off, today >> I've been bitten by the bug also on my main root fs (which is on two >> fast ssd), although with 3.16.1. >> >> Is it at least safe for the data? I mean, as long as the hung process >> terminates and no other error shows up, can I at least be sure that >> the data written is correct? > > Your traces are a little different. The ENOSPC code is throttling > things to make sure you have enough room for the writes you're doing. > The code we have in 3.17-rc3 (or my for-linus branch) are the best > choices right now. You can pull that down to 3.16 if you want all the > fixes on a more stable kernel. > > Nailing down the ENOSPC code is going to be a little different, I think > autodefrag probably isn't interacting well with being short on space and > encryption. This is leading to much more IO than we'd normally do, and > dm-crypt makes it fairly intensive. > > Can you try flipping off autodefrag? > > -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds
Nice...now I get the hung task even with 3.14.17 And I tried with 4K for node and leaf size...same result. And to top it all off, today I've been bitten by the bug also on my main root fs (which is on two fast ssd), although with 3.16.1. Is it at least safe for the data? I mean, as long as the hung process terminates and no other error shows up, can I at least be sure that the data written is correct? Thanks John On Tue, Sep 2, 2014 at 8:40 AM, Duncan <1i5t5.dun...@cox.net> wrote: > john terragon posted on Tue, 02 Sep 2014 08:12:36 +0200 as excerpted: > >> I will definitely try the latest 3.14.x (never had any problem of this >> kind with it). And I'll look into the other possibilities you pointed >> out. However what I can tell you right now is this: >> >> -the filesystem was "new". I've been bitten by this bug with 3.15 and >> 3.16 and I kept >> trying to do the same thing (create the fs, rsync or cp the same >> stuff) to see if it >> got better. > > OK. I had read your post as implying that the filesystem had been around > since before 3.14, in which case the firmware shuffling could well have > been a factor. If it was a brand new filesystem, then likely not, as > mkfs.btrfs tries to do a trim of the whole filesystem range before it > sets up. > > But that does remind me of one other possibility I had thought to mention > and then forgot... that's even more likely now that it's known to be a > new filesystem... > > I don't recall the btrfs-progs version, but somewhere along the line one > other thing of potential interest changed: > > Mkfs.btrfs used to default to 4 KiB node/leaf sizes; now days it defaults > to 16 KiB as that's far better for most usage. I wonder if USB sticks > are an exception... > > Since this is set at mkfs time, trying a 3.14 series kernel with current > mkfs.btrfs defaults shouldn't change things; if the 16 KiB nodesize is > why it's slow, it should still be slow with the 3.14 series kernel. > > Conversely, if this is the problem, specifically creating the filesystem > with --nodeside 4k should fix it, and it should stay fixed regardless of > what kernel you use with it, 3.14, 3.16, 3.17-rcX, shouldn't matter. > > And that'd be a very useful thing to put on the wiki as well, should it > be found to be the case. So please test and post if it helps (and feel > free to put it on the wiki too if it works)! =:^) > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds
I will definitely try the latest 3.14.x (never had any problem of this kind with it). And I'll look into the other possibilities you pointed out. However what I can tell you right now is this: -the filesystem was "new". I've been bitten by this bug with 3.15 and 3.16 and I kept trying to do the same thing (create the fs, rsync or cp the same stuff) to see if it got better. -there does not seem to be a problem of space because the volume is about 14G and in the end about 8G are usually occupied (when the process terminates). I always used compression one way or another, either forced or not and either lzo of zlib. Maybe I should try without compression. -it's not one specific usb flash drive. I tried several ones and I always get the same behaviour. -The process freezes for several minutes. It's completely frozen, no I/O. So even if the firmware of the usb key is shuffling things around blocking everything, it shouldn't take all that time for a small amount of data. Also, as I mentioned, I tried ext4 and xfs and the data seems to be written in a continuous way, without any big lock (even though I realize that ext4 and xfs have very different writing patterns than a cow filesystem, so I can't be sure it's significant). Thanks John On Tue, Sep 2, 2014 at 7:20 AM, Duncan <1i5t5.dun...@cox.net> wrote: > john terragon posted on Mon, 01 Sep 2014 18:36:49 +0200 as excerpted: > >> I was trying it again and it seems to have completed, albeit very slowly >> (even for an usb flash drive). Was the 3.14 series the last immune one >> from this problem? Should I try the latest 3.14.x? > > The 3.14 series was before the switch to generic kworker threads, while > btrfs still had its own custom work-queue threads. There was known to be > a very specific problem with the kworker threads, but in 3.17-rc3 that > should be fixed. > > So it may well be a problem with btrfs in general, at least as it exists > today and historically, in which case 3.14.x won't help you much if at > all. > > But I'd definitely recommend trying it. If 3.14 is significantly faster > and it's repeatedly so, then there's obviously some other regression, > either with kworker threads or with something else, since then. If not, > then at least we know for sure kworker threads aren't a factor, since > 3.14 was previous to them entering the picture. > > > The other possibility I'm aware of would be erase-block related. I see > you're using autodefrag so it shouldn't be direct file fragmentation, but > particularly if the filesystem has been used for some time, it might be > the firmware trying to shuffle things around and having trouble due to > having already used up all the known-free erase blocks so it's having to > stop and free one by shifting things around every time it needs another > one, and that's what's taking the time. > > What does btrfs fi show say about free space (the device line (lines, for > multi-device btrfs) size vs. used, not the top line, is the interesting > bit)? What does btrfs fi df say for data and metadata (total vs. used)? > > For btrfs fi df ideally your data/metadata spread between used and total > shouldn't be too large (a few gig for data and a gig or so for metadata > isn't too bad, assuming a large enough device, of course). If it is, a > balance may be in order, perhaps using the -dusage=20 and/or -musage=20 > style options to keep it from rebalancing everything (read up on the wiki > and choose your number, 5 might be good if there's plenty of room, you > might need 50 or higher if you're close to full, more than about 80 and > you might as well just use -d or -m and forget the usage bit). > > Similarly, for btrfs fi show, you want as much space as possible left, > several gigs at least if your device isn't too small for that to be > practical. Again, if btrfs fi df is out of balance it'll use more space > in show as well, and a balance should retrieve some of it. > > Once you have some space to work with (or before the balance if you > suspect your firmware is SERIOUSLY out of space and shuffling, as that'll > slow the balance down too, and again after), try running fstrim on the > device. It may or may not work on that device, but if it does and the > firmware /was/ out of space and having to shuffle hard, it could improve > performance *DRAMATICALLY*. The reason being that on devices where it > works, fstrim will tell the firmware what blocks are free, allowing it > more flexibility in erase-block shuffling. > > If that makes a big difference, you can /try/ the discard mount option. > Tho doing the trim/discard as part of normal operations can slow them > down some too.
Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds
I was trying it again and it seems to have completed, albeit very slowly (even for an usb flash drive). Was the 3.14 series the last immune one from this problem? Should I try the latest 3.14.x? Thanks John On Mon, Sep 1, 2014 at 6:02 PM, Chris Mason wrote: > On 09/01/2014 09:33 AM, john terragon wrote: >> Hi. >> >> I'm not sure if this is related to the hung task problem that I've >> been seeing in this ml for a while. But I've been having this >> seemingly related problem with 3.15, 3.16 and now 3.17-rc3 (which, if >> I'm not mistaken, should have a fix for the hung task problem). So >> here it is: I have a usb flash drive with btrfs (on top of dmcrypt) >> usually mounted with these options >> >> rw,noatime,compress-force=zlib,ssd,space_cache,autodefrag >> >> When I try to rsync the usb flash drive I get a truck-load of "INFO: >> task rsync:2524 blocked for more than 120 seconds" as you can see >> below. >> The rsync process crawls into an almost complete stop and I can't even >> kill it. I know the usb key is OK because I've tried the same thing >> with ext4 and xfs and everything went fine. > > This does have all of our fixes for hangs. Does the rsync eventually > complete? Or do we just sit there forever? > > -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds
Hi. I'm not sure if this is related to the hung task problem that I've been seeing in this ml for a while. But I've been having this seemingly related problem with 3.15, 3.16 and now 3.17-rc3 (which, if I'm not mistaken, should have a fix for the hung task problem). So here it is: I have a usb flash drive with btrfs (on top of dmcrypt) usually mounted with these options rw,noatime,compress-force=zlib,ssd,space_cache,autodefrag When I try to rsync the usb flash drive I get a truck-load of "INFO: task rsync:2524 blocked for more than 120 seconds" as you can see below. The rsync process crawls into an almost complete stop and I can't even kill it. I know the usb key is OK because I've tried the same thing with ext4 and xfs and everything went fine. Any ideas? Thanks John [ 2763.077502] INFO: task rsync:2524 blocked for more than 120 seconds. [ 2763.077513] Not tainted 3.17.0-rc3-cu3 #1 [ 2763.077516] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2763.077521] rsync D 880347b63840 0 2524 2523 0x [ 2763.077531] 880347b633f0 0082 00013200 880347b2bfd8 [ 2763.077540] 00013200 880347b633f0 8803f73af660 880347b2baa0 [ 2763.077546] 8803f73af664 880347b633f0 8803f73af668 [ 2763.077554] Call Trace: [ 2763.077573] [] ? schedule_preempt_disabled+0x20/0x60 [ 2763.077582] [] ? __mutex_lock_slowpath+0x14b/0x1d0 [ 2763.077593] [] ? del_timer_sync+0x4a/0x60 [ 2763.077601] [] ? mutex_lock+0x16/0x25 [ 2763.077656] [] ? btrfs_wait_ordered_roots+0x3e/0x1f0 [btrfs] [ 2763.077682] [] ? flush_space+0x1ea/0x4b0 [btrfs] [ 2763.077706] [] ? get_alloc_profile+0x85/0x1c0 [btrfs] [ 2763.077730] [] ? can_overcommit+0x81/0xe0 [btrfs] [ 2763.077755] [] ? reserve_metadata_bytes+0x1c0/0x3d0 [btrfs] [ 2763.077780] [] ? btrfs_block_rsv_add+0x28/0x50 [btrfs] [ 2763.077811] [] ? start_transaction+0x442/0x500 [btrfs] [ 2763.077839] [] ? btrfs_check_dir_item_collision+0x74/0x100 [btrfs] [ 2763.077871] [] ? btrfs_rename2+0x15f/0x6d0 [btrfs] [ 2763.077880] [] ? capable_wrt_inode_uidgid+0x4b/0x60 [ 2763.077887] [] ? cap_validate_magic+0x100/0x100 [ 2763.077897] [] ? vfs_rename+0x5a1/0x790 [ 2763.077905] [] ? follow_managed+0x2a0/0x2b0 [ 2763.077913] [] ? SYSC_renameat2+0x483/0x530 [ 2763.077922] [] ? notify_change+0x2cd/0x380 [ 2763.077927] [] ? __sb_end_write+0x28/0x60 [ 2763.077937] [] ? lockref_put_or_lock+0x48/0x80 [ 2763.077943] [] ? dput+0xad/0x170 [ 2763.077951] [] ? path_put+0xd/0x20 [ 2763.077958] [] ? SyS_chmod+0x41/0x90 [ 2763.077966] [] ? system_call_fastpath+0x16/0x1b [ 2883.203005] INFO: task kworker/u16:11:1617 blocked for more than 120 seconds. [ 2883.203017] Not tainted 3.17.0-rc3-cu3 #1 [ 2883.203020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2883.203024] kworker/u16:11 D 8804185d1740 0 1617 2 0x [ 2883.203085] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] [ 2883.203091] 8804185d12f0 0046 00013200 880435f3bfd8 [ 2883.203099] 00013200 8804185d12f0 8803b86bef00 8803e9a761f0 [ 2883.203106] 8803e9a761f0 0001 8803aece6520 [ 2883.203113] Call Trace: [ 2883.203149] [] ? wait_current_trans.isra.22+0x97/0xf0 [btrfs] [ 2883.203161] [] ? prepare_to_wait_event+0xf0/0xf0 [ 2883.203190] [] ? start_transaction+0x2a8/0x500 [btrfs] [ 2883.203221] [] ? btrfs_finish_ordered_io+0x250/0x5c0 [btrfs] [ 2883.203230] [] ? __switch_to+0x119/0x580 [ 2883.203261] [] ? normal_work_helper+0xaf/0x190 [btrfs] [ 2883.203272] [] ? process_one_work+0x167/0x380 [ 2883.203280] [] ? worker_thread+0x114/0x480 [ 2883.203288] [] ? rescuer_thread+0x2b0/0x2b0 [ 2883.203294] [] ? kthread+0xb8/0xd0 [ 2883.203301] [] ? kthread_create_on_node+0x170/0x170 [ 2883.203309] [] ? ret_from_fork+0x7c/0xb0 [ 2883.203315] [] ? kthread_create_on_node+0x170/0x170 [ 2883.203332] INFO: task btrfs-transacti:2126 blocked for more than 120 seconds. [ 2883.203336] Not tainted 3.17.0-rc3-cu3 #1 [ 2883.203338] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2883.203341] btrfs-transacti D 8803e98a2860 0 2126 2 0x [ 2883.203348] 8803e98a2410 0046 00013200 8803e9af3fd8 [ 2883.203355] 00013200 8803e98a2410 88045fa13af0 8803e9af3b48 [ 2883.203361] 88045fdb2928 0002 8149d750 8803e9af3c08 [ 2883.203368] Call Trace: [ 2883.203378] [] ? bit_wait+0x40/0x40 [ 2883.203386] [] ? io_schedule+0x94/0x120 [ 2883.203394] [] ? bit_wait_io+0x23/0x40 [ 2883.203402] [] ? __wait_on_bit+0x55/0x80 [ 2883.203410] [] ? wait_on_page_bit+0x6e/0x80 [ 2883.203418] [] ? autoremove_wake_function+0x30/0x30 [ 2883.203425] [] ? filemap_fdatawait_range+0xd0/0x160 [ 2883.203459] [] ? btrfs_wait_ordered_range+0x62/0x120 [btrfs] [ 2883.203490] []
Re: is it safe to change BTRFS_STRIPE_LEN?
Yes the btrfs-tools would have to be recompiled too ( BTRFS_STRIPE_LEN is defined in a volumes.h in there too). And yes, kernel and tools would certainly kill any raid0 btrfs fs and maybe any other multidevice kind of setting. On Sat, May 24, 2014 at 9:07 PM, Austin S Hemmelgarn wrote: > On 05/24/2014 12:44 PM, john terragon wrote: >> Hi. >> >> I'm playing around with (software) raid0 on SSDs and since I remember >> I read somewhere that intel recommends 128K stripe size for HDD arrays >> but only 16K stripe size for SSD arrays, I wanted to see how a >> small(er) stripe size would work on my system. Obviously with btrfs on >> top of md-raid I could use the stripe size I want. But if I'm not >> mistaken the stripe size with the native raid0 in btrfs is fixed to >> 64K in BTRFS_STRIPE_LEN (volumes.h). >> So I was wondering if it would be reasonably safe to just change that >> to 16K (and duck and wait for the explosion ;) ). >> >> Can anyone adept to the inner workings of btrfs raid0 code confirm if >> that would be the right way to proceed? (obviously without absolutely >> any blame to be placed on anyone other than myself if things should go >> badly :) ) > I personally can't render an opinion on whether changing it would make > things break or not, but I do know that it would need to be changed both > in the kernel and the tools, and the resultant kernel and tools would > not be entirely compatible with filesystems produced by the regular > tools and kernel, possibly to the point of corrupting any filesystem > they touch. > > As for the 64k default strip size, that sounds correct, and is probably > because that's the largest block that the I/O schedulers on Linux will > dispatch as a single write to the underlying device. > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
is it safe to change BTRFS_STRIPE_LEN?
Hi. I'm playing around with (software) raid0 on SSDs and since I remember I read somewhere that intel recommends 128K stripe size for HDD arrays but only 16K stripe size for SSD arrays, I wanted to see how a small(er) stripe size would work on my system. Obviously with btrfs on top of md-raid I could use the stripe size I want. But if I'm not mistaken the stripe size with the native raid0 in btrfs is fixed to 64K in BTRFS_STRIPE_LEN (volumes.h). So I was wondering if it would be reasonably safe to just change that to 16K (and duck and wait for the explosion ;) ). Can anyone adept to the inner workings of btrfs raid0 code confirm if that would be the right way to proceed? (obviously without absolutely any blame to be placed on anyone other than myself if things should go badly :) ) Thanks john -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on software RAID0
just one last doubt: why do you use --align-payload=1024? (or 8912) Cryptsetup man says that the default for the payload alignment is 2048 (512-byte sectors). So, it's already aligned by default to 4K-byte physical sectors (if that was your concern). Am I missing something? John On Mon, May 5, 2014 at 11:25 PM, Marc MERLIN wrote: > On Mon, May 05, 2014 at 10:51:46PM +0200, john terragon wrote: >> Hi. >> I'm about to try btrfs on an RAID0 md device (to be precise there will >> be dm-crypt in between the md device and btrfs). If I used ext4 I >> would set the stride and stripe_width extended options. Is there >> anything similar I should be doing with mkfs.btrfs? Or maybe some >> mount options beneficial to this kind of setting. > > This is not directly an answer to your question, so far I haven't used a > special option like this with btrfs on my arrays although my > undertstanding is that it's not as important as with ext4. > > That said, please read > http://marc.merlins.org/perso/btrfs/post_2014-04-27_Btrfs-Multi-Device-Dmcrypt.html > > 1) use align-payload=1024 on cryptsetup instead of something bigger like > 8192. This will reduce write amplification (if you're not on an SSD). > > 2) you don't need md0 in the middle, crypt each device and then use > btrfs built in raid0 which will be faster (and is stable, at least as > far as we know :) ). > > Then use /etc/crypttab or a script like this > http://marc.merlins.org/linux/scripts/start-btrfs-dmcrypt > to decrypt all your devices in one swoop and mount btrfs. > > Marc > -- > "A mouse is a device used to point at the xterm you want to type in" - A.S.R. > Microsoft is to operating systems > what McDonalds is to gourmet > cooking > Home page: http://marc.merlins.org/ | PGP > 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs on software RAID0
On Mon, May 5, 2014 at 11:25 PM, Marc MERLIN wrote: > This is not directly an answer to your question, so far I haven't used a > special option like this with btrfs on my arrays although my > undertstanding is that it's not as important as with ext4. > > That said, please read > http://marc.merlins.org/perso/btrfs/post_2014-04-27_Btrfs-Multi-Device-Dmcrypt.html > > 1) use align-payload=1024 on cryptsetup instead of something bigger like > 8192. This will reduce write amplification (if you're not on an SSD). > > 2) you don't need md0 in the middle, crypt each device and then use > btrfs built in raid0 which will be faster (and is stable, at least as > far as we know :) ). > > Then use /etc/crypttab or a script like this > http://marc.merlins.org/linux/scripts/start-btrfs-dmcrypt > to decrypt all your devices in one swoop and mount btrfs. I know about btrfs native raid capabilities but to be honest most of the times I see people having "scary" problems with btrfs is when they use it with multiple devices. So far my experience with btrfs has been pretty smooth (always with btrfs on top of a single device) and I wanted to let that part of btrfs to maybe mature a little bit more. But maybe I'm wrong, so maybe I'll give both approaches a try. About unlocking all the dm-crypt device in one swoop, there's this script too https://github.com/gebi/keyctl_keyscript which uses the kernel keyring to temporarily store the passphrase. I was thinking about using it in a dm-crypt->md-raid->btrfs setting to have one thread for each dm-crypt device, but probably aesni instructions are fast enough to not cause the single dm-crypt thread in a md-raid->dm-crypt->btrfs setting to become a bottleneck (at least with hdds, with sdds it might be a different story) John -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs on software RAID0
Hi. I'm about to try btrfs on an RAID0 md device (to be precise there will be dm-crypt in between the md device and btrfs). If I used ext4 I would set the stride and stripe_width extended options. Is there anything similar I should be doing with mkfs.btrfs? Or maybe some mount options beneficial to this kind of setting. Thanks John -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html