Re: hierarchical, tree-like structure of snapshots

2020-12-31 Thread john terragon
Although I'm glad that a bug has been uncovered, maybe it's best if I
stick with good old rsync for backups.
It would be kind of ironic if the first data loss that I experienced
in many years of btrfs use would be caused by an ancillary backup
tool.

On Thu, Dec 31, 2020 at 10:36 PM Zygo Blaxell
 wrote:
>
> On Thu, Dec 31, 2020 at 09:48:54PM +0100, john terragon wrote:
> > On Thu, Dec 31, 2020 at 8:42 PM Andrei Borzenkov  
> > wrote:
> > >
> >
> > >
> > > How exactly you create subvolume with the same content? There are many
> > > possible interpretations.
> > >
> >
> > Zygo wrote that any subvol could be used with -p. So, out of
> > curiosity, I did the following
> >
> > 1) btrfs sub create X
> > 2) I unpacked some source (linux kernel) in X
> > 3) btrfs sub create W
> > 4) I unpacked the same source in W (so X and W have the same content
> > but they are independent)
> > 5) btrfs sub snap -r X X_RO
> > 6) btrfs sub snap -r W W_RO
> > 7) btrfs send W_RO | btrfs receive /mnt/btrfs2
> > 8) btrfs send -p W_RO X_RO | btrfs receive /mnt/btrfs2
> >
> > And this is the exact output of 8)
> >
> > At subvol X_RO
> > At snapshot X_RO
> > ERROR: chown o257-1648413-0 failed: No such file or directory
>
> Yeah, I only checked that send completed without error and produced a
> smaller stream.
>
> I just dumped the send metadata stream from the incremental snapshot now,
> and it's more or less garbage at the start:
>
> # btrfs sub create A
> # btrfs sub create B
> # date > A/date
> # date > B/date
> # mkdir A/t B/u
> # btrfs sub snap -r A A_RO
> # btrfs sub snap -r B B_RO
> # btrfs send A_RO | btrfs receive --dump
> At subvol A_RO
> subvol  ./A_RO  
> uuid=995adde4-00ac-5e49-8c6f-f01743def072 transid=7329268
> chown   ./A_RO/ gid=0 uid=0
> chmod   ./A_RO/ mode=755
> utimes  ./A_RO/ 
> atime=2020-12-31T15:51:31-0500 mtime=2020-12-31T15:51:48-0500 
> ctime=2020-12-31T15:51:48-0500
> mkfile  ./A_RO/o257-7329268-0
> rename  ./A_RO/o257-7329268-0   dest=./A_RO/date
> utimes  ./A_RO/ 
> atime=2020-12-31T15:51:31-0500 mtime=2020-12-31T15:51:48-0500 
> ctime=2020-12-31T15:51:48-0500
> write   ./A_RO/date offset=0 len=29
> chown   ./A_RO/date gid=0 uid=0
> chmod   ./A_RO/date mode=644
> utimes  ./A_RO/date 
> atime=2020-12-31T15:51:38-0500 mtime=2020-12-31T15:51:38-0500 
> ctime=2020-12-31T15:51:38-0500
> mkdir   ./A_RO/o258-7329268-0
> rename  ./A_RO/o258-7329268-0   dest=./A_RO/t
> utimes  ./A_RO/ 
> atime=2020-12-31T15:51:31-0500 mtime=2020-12-31T15:51:48-0500 
> ctime=2020-12-31T15:51:48-0500
> chown   ./A_RO/tgid=0 uid=0
> chmod   ./A_RO/tmode=755
> utimes  ./A_RO/t
> atime=2020-12-31T15:51:48-0500 mtime=2020-12-31T15:51:48-0500 
> ctime=2020-12-31T15:51:48-0500
> # btrfs send B_RO -p A_RO | btrfs receive --dump
> At subvol B_RO
> snapshot./B_RO  
> uuid=4aa7db26-b219-694e-9b3c-f8f737a46bdb transid=7329268 
> parent_uuid=995adde4-00ac-5e49-8c6f-f01743def072 parent_transid=7329268
> utimes  ./B_RO/ 
> atime=2020-12-31T15:51:33-0500 mtime=2020-12-31T15:51:52-0500 
> ctime=2020-12-31T15:51:52-0500
> link./B_RO/date dest=date
> unlink  ./B_RO/date
> utimes  ./B_RO/ 
> atime=2020-12-31T15:51:33-0500 mtime=2020-12-31T15:51:52-0500 
> ctime=2020-12-31T15:51:52-0500
> write   ./B_RO/date offset=0 len=29
> utimes  ./B_RO/date 
> atime=2020-12-31T15:51:41-0500 mtime=2020-12-31T15:51:41-0500 
> ctime=2020-12-31T15:51:41-0500
> rename  ./B_RO/tdest=./B_RO/u
> utimes  ./B_RO/ 
> atime=2020-12-31T15:51:33-0500 mtime=2020-12-31T15:51:52-0500 
> ctime=2020-12-31T15:51:52-0500
> utimes  ./B_RO/u
> atime=2020-12-31T15

Re: hierarchical, tree-like structure of snapshots

2020-12-31 Thread john terragon
On Thu, Dec 31, 2020 at 8:42 PM Andrei Borzenkov  wrote:
>

>
> How exactly you create subvolume with the same content? There are many
> possible interpretations.
>

Zygo wrote that any subvol could be used with -p. So, out of
curiosity, I did the following

1) btrfs sub create X
2) I unpacked some source (linux kernel) in X
3) btrfs sub create W
4) I unpacked the same source in W (so X and W have the same content
but they are independent)
5) btrfs sub snap -r X X_RO
6) btrfs sub snap -r W W_RO
7) btrfs send W_RO | btrfs receive /mnt/btrfs2
8) btrfs send -p W_RO X_RO | btrfs receive /mnt/btrfs2

And this is the exact output of 8)

At subvol X_RO
At snapshot X_RO
ERROR: chown o257-1648413-0 failed: No such file or directory


Re: hierarchical, tree-like structure of snapshots

2020-12-31 Thread john terragon
On Thu, Dec 31, 2020 at 6:28 PM Zygo Blaxell
 wrote:

>
> I think your confusion is that you are thinking of these as a tree.
> There is no tree, each subvol is an equal peer in the filesystem.
>
> "send -p A B" just walks over subvol A and B and sends a diff of the
> parts of B not in A.  You can pick any subvol with -p as long as it's
> read-only and present on the receiving side.  Obviously it's much more
> efficient if the two subvols have a lot of shared extents (e.g. because
> B and A were both snapshots made at different times of some other subvol
> C), but this is not required.

Can you really use ANY subvol to use with -p. Because if I

1) create a subvol X
2) create a subvol W with the exact same content of X (but created
independently)
3) do a RO snap X_RO of X
4) do a RO snap W_RO of W
5) send W_RO to the other FS
6) send -p W_RO X_RO to the other FS

I get this:

At subvol X_RO
At snapshot X_RO
ERROR: chown o257-1648413-0 failed: No such file or directory

any idea?


Re: hierarchical, tree-like structure of snapshots

2020-12-31 Thread john terragon
On Thu, Dec 31, 2020 at 8:05 AM Andrei Borzenkov  wrote:
>

> >
> > OK, but then could I use Y as parent of the rw snapshot, let's call it
> > W, in a send?
>
> No
>

Of course I didn't mean to use Y as a parent of W itself but to a
readonly snapshot of W whenever I want to send it to the second FS.

And I just tried the following steps and they worked:

1) created subvol X
2) created readonly snap Y of X
3) sent Y to second FS
4) modified X
5) created readonly snap X1 of X
6) sent -p Y X1 to second FS
7) created readwrite snap Y1 of Y
8) modified Y1
9) created readonly snap Y1_RO of Y1
10) sent -p Y Y1_RO to second FS

So, as you can see,

-in 6) I've used the RO snap Y of X as the parent of X1 (and X) to
send X1 to the second FS

-in 10) I did the opposite, Y is still used as the parent but this
time I've sent the RO snap of a subvol that is a snap of Y.

So it seems to work both ways


Re: hierarchical, tree-like structure of snapshots

2020-12-30 Thread john terragon
On Wed, Dec 30, 2020 at 6:24 PM sys  wrote:
>
>
>
[...]
> You should simply make a 'read-write' snapshot (Y-rw) of the 'read-only'
> snapshot (Y) that is part of your backup/send scheme. Do not modify
> read-only snapshots to be rw.
>

OK, but then could I use Y as parent of the rw snapshot, let's call it
W, in a send?
So I would have this tree where Y is still the root.

Y-W
 \
  Z-X

Can I do a send -p Y W ?
Because I thought it was other way around, that is I do a readonly
snapshot W of Y and that will be the base for incrementally sending
the future modified Y to another  FS (provided of course W is already
there).


Re: hierarchical, tree-like structure of snapshots

2020-12-30 Thread john terragon
Sorry, that ascii tree came out awful and it looks like Z is the child
of Y instead of Y1. I hope this one below looks better.

Y1-Y
 \
  Z-X

On Wed, Dec 30, 2020 at 5:56 PM john terragon  wrote:
>
> Hi.
> I would like to maintain a tree-like hierarchical structure of
> snapshots. Let me try to explain what I mean by that.
>
> Let's say I have a btrfs fs with just one subvolume X, and let's say
> that a make a readonly snapshot Y of X. As far as I understand there
> is a parent-child relation between Y (the parent) and X the child.
>
> Now let's say that after some time and modifications of X I do another
> snapshot Z of X. Now the "temporal" stucture would be Y-Z-X. So X is
> now the "child" of Z and Z is now the "child" of Y. The structure is a
> path which is a special case of a tree.
>
> Now let's suppose that I want to start modify Y but I still want to be
> able to have a parent of Z which I might use as a point of reference
> for Z in a
> send to somewhere. That is I want to be able to still do a send -p Y Z
> to another btrfs filesystem where there is previously sent copy of Y
> (which, remember, as of this point has been readonly and I'm just now
> wanting to start to modify it).
> The only thing I think I can do would be to make a readonly snapshot
> Y1 of Y and make Y writeable (so that I can start modify it). At that
> point the structure would be
>
> Y1-Y
> \
>   Z-X
>
> (yes my ascii art is atrocious...) which is a "proper" tree where Y1
> is the root with two children (Y and Z), Z has one child (X) and Y and
> X are leaves.
> Now, my question is, would Y1 still be usable in send -p Y1 Z, just
> like Y was before becoming writeable and being modified? I would say
> that Y1 would be just as good as the readonly original Y was as a
> parent for Z in a send. But maybe there is some implementation detail
> that escapes me and that prevents Y1 to be used as a perfect
> replacement for the original Y.
> I hope I was clear enough.
> Thanks
> John


hierarchical, tree-like structure of snapshots

2020-12-30 Thread john terragon
Hi.
I would like to maintain a tree-like hierarchical structure of
snapshots. Let me try to explain what I mean by that.

Let's say I have a btrfs fs with just one subvolume X, and let's say
that a make a readonly snapshot Y of X. As far as I understand there
is a parent-child relation between Y (the parent) and X the child.

Now let's say that after some time and modifications of X I do another
snapshot Z of X. Now the "temporal" stucture would be Y-Z-X. So X is
now the "child" of Z and Z is now the "child" of Y. The structure is a
path which is a special case of a tree.

Now let's suppose that I want to start modify Y but I still want to be
able to have a parent of Z which I might use as a point of reference
for Z in a
send to somewhere. That is I want to be able to still do a send -p Y Z
to another btrfs filesystem where there is previously sent copy of Y
(which, remember, as of this point has been readonly and I'm just now
wanting to start to modify it).
The only thing I think I can do would be to make a readonly snapshot
Y1 of Y and make Y writeable (so that I can start modify it). At that
point the structure would be

Y1-Y
\
  Z-X

(yes my ascii art is atrocious...) which is a "proper" tree where Y1
is the root with two children (Y and Z), Z has one child (X) and Y and
X are leaves.
Now, my question is, would Y1 still be usable in send -p Y1 Z, just
like Y was before becoming writeable and being modified? I would say
that Y1 would be just as good as the readonly original Y was as a
parent for Z in a send. But maybe there is some implementation detail
that escapes me and that prevents Y1 to be used as a perfect
replacement for the original Y.
I hope I was clear enough.
Thanks
John


btrfs send on top level subvolumes that contain other subvolumes

2014-10-19 Thread john terragon
Hi.

Let's say I have a top-level subvolume /sub and that inside /sub I
have another subvolume say /sub/X/Y/subsub.

If I make a snapshot (both ro and rw give the same results) of /sub,
say /sub-snap, right now what I get is this

1) the /sub-snap/X/Y/subsub is present (and empty, and that's OK as
snapshot are not recursive) but it doesn't seem to be neither
   a) an empty subvolume (because btrfs sub list doesn't list it)
   b) a directory because, for example lsattr -d subsub gives me this
result "lsattr: Inappropriate ioctl for device While reading flags on
subsub"

2) if /sub-snap is ro and I send it somewhere, then in the destination
sub-snap subsub is not present at all (which wouldn't be illogical,
given the non-recursive nature of snapshots).

So I'm wondering it all of this is the intended outcome when
snapshotting and sending a subvolume that has internally defined
subvolumes or if perhaps it's a bug.

I'm using kernel 3.17.1 patched for the recent ro snapshot corruption
bug and btrfs-progs from the 3.17.x branch in git.

Thanks
John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "Btrfs: race free update of commit root for ro snapshots"

2014-10-15 Thread john terragon
-It's not a brand new fs. It has been created four or five days ago
with btrfs-progs 3.16.2 (in fact it was created because of the dead
unremovable ro snapshots in the previous fs)

-the snapshot in question has been created after applying the patch
(and it has not become corrupted so far)

-not an incremental send

-no warnings in dmesg

-btrfs check segfaults (as it did before the patch)

-there are in fact dead unremovable ro snapshots in the filesystem (it
has been used before the patch). But the filesystem seems functional
as long as the dead ro snapshots aren't touched. If one of them is
accessed with ls -l  I get the usual "parent transid verify failed on
X wanted Y found Z". But as I said no warnings of that kind (or any
kind) appear in dmesg when I do the send on the freshly created ro
snapshot.

thanks
john


On Thu, Oct 16, 2014 at 1:05 AM, Filipe David Manana  wrote:
> On Wed, Oct 15, 2014 at 11:42 PM, john terragon  wrote:
>> Hi.
>>
>> I applied the patch to 3.17.1 but although I haven't seen any
>> corrupted ro snapshot yet it's still impossible to do btrfs send. As
>> soon as I start btrfs send I still get
>>
>> ERROR: send ioctl failed with -12: Cannot allocate memory
>>
>> even if I redirect btrfs send's output to a file (instead of involving
>> btrfs receive)
>>
>> Maybe this time it's actually a btrfs-progs bug?
>
> Not enough information to tell.
>
> Is it a brand new fs? If not, is it a snapshot created after applying
> the patch or before? Does a btrfsck reports any issues with the fs?
> Is it an incremental (using -p ) or a full send? Do
> you see any warning (traces, errors) in syslog (dmesg)?
>
> Either an issue in send or, if it's an fs created/used with unpatched
> 3.17.0/1, it can be a side effect of the corruption.
>
> thanks
>
>>
>> Thanks
>> John
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "Btrfs: race free update of commit root for ro snapshots"

2014-10-15 Thread john terragon
Hi.

I applied the patch to 3.17.1 but although I haven't seen any
corrupted ro snapshot yet it's still impossible to do btrfs send. As
soon as I start btrfs send I still get

ERROR: send ioctl failed with -12: Cannot allocate memory

even if I redirect btrfs send's output to a file (instead of involving
btrfs receive)

Maybe this time it's actually a btrfs-progs bug?

Thanks
John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread john terragon
And another worrying thing I didn't notice before. Two snapshots have
dates that do not make sense. root-b3 and root-b4 have been created
Oct 14th (and btw root's modification time was also on Oct the 14th).
So why do they show Oct 10th? And root-prov has actually been created
on Oct 10 15:37, as it correctly shows, so it's like btrfs sub snap
picks up old stale data from who knows were or when or for what
reason. Moreover, root-b4 was created with 3.16.5not good.

drwxrwsr-x 1 root staff  30 Sep 11 16:15 home
d? ? ??   ?? home-backup
drwxr-xr-x 1 root root  250 Oct 14 03:02 root
d? ? ??   ?? root-b2
drwxr-xr-x 1 root root  250 Oct 10 15:37 root-b3
drwxr-xr-x 1 root root  250 Oct 10 15:37 root-b4
drwxr-xr-x 1 root root  250 Oct 14 03:02 root-b5
drwxr-xr-x 1 root root  250 Oct 14 03:02 root-b6
d? ? ??   ?? root-backup
drwxr-xr-x 1 root root  250 Oct 10 15:37 root-prov
drwxr-xr-x 1 root root   88 Sep 15 16:02 vms

On Tue, Oct 14, 2014 at 1:18 AM, Rich Freeman
 wrote:
> On Mon, Oct 13, 2014 at 5:22 PM, john terragon  wrote:
>> I'm using "compress=no" so compression doesn't seem to be related, at
>> least in my case. Just read-only snapshots on 3.17 (although I haven't
>> tried 3.16).
>
> I was using lzo compression, and hence my comment about turning it off
> before going back to 3.16 (not realizing that 3.16 has subsequently
> been fixed).
>
> Ironically enough I discovered this as I was about to migrate my ext4
> backup drive into my btrfs raid1.  Maybe I'll go ahead and wait on
> that and have an rsync backup of the filesystem handy (minus
> snapshots) just in case.  :)
>
> I'd switch to 3.16, but it sounds like there is no way to remove the
> snapshots at the moment, and I can live for a while without the
> ability to create new ones.
>
> interestingly enough it doesn't look like ALL snapshots are affected.
> I checked and some of the snapshots I made last weekend while doing
> system updates look accessible.  They are significantly smaller, and
> the subvolumes they were made from are also fairly new - though I have
> no idea if that is related.
>
> The subvolumes do show up in btrfs su list.  They cannot be examined
> using btrfs su show.
>
> It would be VERY nice to have a way of cleaning this up without
> blowing away the entire filesystem...
>
> --
> Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread john terragon
I'm using "compress=no" so compression doesn't seem to be related, at
least in my case. Just read-only snapshots on 3.17 (although I haven't
tried 3.16).

John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs random filesystem corruption in kernel 3.17

2014-10-13 Thread john terragon
I think I just found a consistent simple way to trigger the problem
(at least on my system). And, as I guessed before, it seems to be
related just to readonly snapshots:

1) I create a readonly snapshot
2) I do some changes on the source subvolume for the snapshot (I'm not
sure changes are strictly needed)
3) reboot (or probably just unmount and remount. I reboot because the
fs I've problems with contains my root subvolume)

After the rebooting (or the remount) I consistently have the corruption
with the usual multitude of these in dmesg
"parent transid verify failed on 902316032 wanted 2484 found 4101"
and the characteristic ls -la output

drwxr-xr-x 1 root root  250 Oct 10 15:37 root
d? ? ??   ?? root-b2
drwxr-xr-x 1 root root  250 Oct 10 15:37 root-b3
d? ? ??   ?? root-backup

root-backup and root-b2 are both readonly whereas root-b3 is rw (and
it didn't get corrupted).

David, maybe you can try the same steps on one of your machines?

John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send and kernel 3.17

2014-10-13 Thread john terragon
Actually it seems strange that a send operation could corrupt the
source subvolume or fs. Why would the send modify the source subvolume
in any significant way? The only way I can find to reconcile your
observations with mine is that maybe the snapshots get corrupted not
by the send operation by itself but when they are generated with -r
(readonly, as it is needed to send them). Are the corrupted snapshots
you have in machine 2 (the one in which send was never used) readonly?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send and kernel 3.17

2014-10-12 Thread john terragon
Hi.

I just wanted to "confirm David's story" so to speak :)

-kernel 3.17-rc7 (didn't bother to compile 3.17 as there weren't any
btrfs fixes, I think)

-btrfs-progs 3.16.2 (also compiled from source, so no
distribution-specific patches)

-fresh fs

-I get the same two errors David got (first I got the I/O error one
and then the memory allocation one)

-plus now when I ls -la the fs top volume this is what I get

drwxrwsr-x 1 root staff  30 Sep 11 16:15 home
d? ? ??   ?? home-backup
drwxr-xr-x 1 root root  250 Oct 10 15:37 root
d? ? ??   ?? root-backup
drwxr-xr-x 1 root root   88 Sep 15 16:02 vms
drwxr-xr-x 1 root root   88 Sep 15 16:02 vms-backup

yes, the question marks on those two *-backup snapshots are really
there. I can't access the snapshots, I can't delete them, I can't do
anything with them.

-btrfs check segfaults

-the events that led to this situation are these:
 1) btrfs su snap -r root root-backup
 2) send |receive (the entire root-backup, not and incremental send)
 immediate I/O error
 3) move on to home: btrfs su snap -r home home-backup
 4) send|receive (again not an incremental send)
 everything goes well (!)
 5) retry with root: btrfs su snap -r root root-backup
 6) send|receive
 and it goes seemingly well
 7) apt-get dist-upgrade just to modify root and try an incremental send
 8) reboot after the dist-upgrade
 9) ls -la the fs top volume: first I get the memory allocation error
and after that
   any ls -la gives the output I pasted above. (notice that beside
the ls -la, the
   two snapshots were not touched in any way since the two send|receive)

Few final notes. I haven't tried send/receive in a while (they were
unreliable) so I can't tell which is the last version they worked for
me (well, no version actually :) ).
I've never had any problem with just snapshots. I make them regularly,
I use them, I modify them and I've never had one problem (with 3.17
too, it's just send/receive that murders them).

Best regards

John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS critical (device dm-0): invalid dir item name len: 45389

2014-09-04 Thread john terragon
Everyone knows what raid0 entails. Moreover, with btrfs being an
experimental fs, not having backups would obviously be pure idiocy.

I wrote that it was "pretty serious" because the situation came out of
nowhere on a low-traffic fs on which the most exiciting thing that can
happen is an occasional snapshot once on a while when I do a heavy
update with apt-get (snapshot that gets always removed right after the
update goes invariably well and my paranoia fades).
The problem seems to have happen right after a hard lock probably due
to 3.17.0-rc3 (and before you explain to me what that rc3 stands for,
let me tell you that I'm not complaining, I knew what I was doing). I
had to power-off "brutally" and right after that the problem occurred.
I'm pretty sure about that because for obvious reasons I rsync the
hell out of that filesystem  every chance I get. Rsync obviously does
a traversal of the fs and so the "critical" (btrfs words, not mine)
problem would have showed on kmsg (another place that I watch like a
hawk, because of the raid0+experimental fs thing).

I don't know if you are a btrfs developer but that "pretty serious"
was not meant to offend them nor to complain. Actually I've been a
pretty happy customer up until now (and I still am) because I have
never been bitten by any big bug even with such a complex fs. I just
have this zombie directory that can't be rm'd, but I mv'd out of the
way and everything is fine. It'll get sorted when I do the next
wipe-and-restore iteration (again, being experimental, I don't let the
fs to become too "old").
So, the "pretty serious" was more due to the surprise than anything else.

John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS critical (device dm-0): invalid dir item name len: 45389

2014-09-04 Thread john terragon
Some more details about this problem:

-the directory involved is /lib/modules/3.17.0-rc3-cu3/kernel/drivers/iio/gyro
-in that dir there should be kernel object named hid-sensor-gyro-3d.ko
but there's
 no trace of it
-that dir cannot be removed or overwritten. rm -rf fails saying that
the dir cannot be
 removed because it's not empty (?, even with -rf ?) and trying to
reinstall the .deb
 package for that kernel image (thus overwriting that dir) ends up in a segfault

The only workaround is to mv that dir (well, I simply mv the whole
3.17.0-rc3-cu3 dir but it should work also for the gyro subdir) and
reinstall the deb package.

So, it's pretty serious because there's actual loss of data (even
though I was lucky I just lost a ko I don't use).

John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS critical (device dm-0): invalid dir item name len: 45389

2014-09-03 Thread john terragon
Hi.

When I traverse one of my btrfs, for example with a simple "find /", I
get the following in kmsg

BTRFS critical (device dm-0): invalid dir item name len: 45389

The message appears just one time (so I guess it involves just one
file/dir). dm-0 is the first dmcrypt device of a pair on which I have
btrfs in RAID0 (btrfs native raid). Though I can't be 100% sure, this
seems to be a very recent problem (I would have noticed something
"critical" in kmsg if it happened before). Everything else seems to
work fine.

So, should I be worried. Is there a way to fix this? (I assume that a
scrub would not do any good since it seems to be related to btrfs data
structures more than actual file data). Is there at least a way to
know which file/dir is involved? Maybe a verbose debug mode? Or maybe
I should just add some printk in the verify_dir_item function that
seems to generate the message.

Thanks
John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds

2014-09-03 Thread john terragon
I wasn't sure what you meant with  so I dd'd
all the three possible cases:

1) here's the dmcrypt device on which I mkfs.btrfs

   2097152000 bytes (2.1 GB) copied, 487.265 s, 4.3 MB/s

2) here's the partition of the usb stick (which has another partition
containing /boot) on top of which the dmcrypt device is created

  2097152000 bytes (2.1 GB) copied, 449.693 s, 4.7 MB/s

3) here's the whole usb stick device

  2097152000 bytes (2.1 GB) copied, 448.003 s, 4.7 MB/s

It's a usb2 device but doesn't it seem kind of slow?

Thanks
John


On Wed, Sep 3, 2014 at 2:36 PM, Chris Mason  wrote:
> On 09/02/2014 09:31 PM, john terragon wrote:
>> Rsync finished. FWIW in the end it reported an average speed of about
>>  900K/sec. Without autodefrag there have been no messages about hung
>> kworkers even though rsync seemingly keeps getting hung for several
>> minutes throughout the whole execution.
>
> So lets take a step back and figure out how fast the usb stick actually is.
> This will erase your usb stick, but give us an idea of its performance:
>
> dd if=/dev/zero of=/dev/ bs=20M oflag=direct 
> count=100
>
> Note again, the above command will erase your usb stick ;)  Use whatever 
> device name
> you've been sending to mkfs.btrfs
>
> The kernel will allow a pretty significant amount of ram to be dirtied before
> forcing writeback, which is why you're seeing rsync stall at seemingly strange
> intervals.  In the base of btrfs with compression, we add some worker threads 
> between
> rsync and the device, and these may be turning the writeback into a somewhat
> more bursty operation.
>
> -chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds

2014-09-02 Thread john terragon
I tried the same routine on 32GB usb sticks. Same exact problems. 32GB
seems a bit much for a --mixed btrfs.
I haven't tried ssd_spread, maybe it's beneficial. However, as I wrote
above, disabling autodefrag gets rid completely of the "INFO: hung
task" messages but even though the kernel doesn't complain about
blocked kworkers, the rsync process still  blocks for several minutes
throughout the whole copy.


On Wed, Sep 3, 2014 at 4:44 AM, Chris Murphy  wrote:
>
> On Sep 2, 2014, at 12:40 AM, Duncan <1i5t5.dun...@cox.net> wrote:
>>
>> Mkfs.btrfs used to default to 4 KiB node/leaf sizes; now days it defaults
>> to 16 KiB as that's far better for most usage.  I wonder if USB sticks
>> are an exception...
>
> USB sticks > 1 GB get 16KB nodesize also. At <= 1 GB, mixed-bg is default as 
> is 4KB nodesize. Probably because queue/rotational is 1 for USB sticks, they 
> mount without ssd or ssd_spread which may be unfortunate (I haven't 
> benchmarked it but I suspect ssd_spread would work well for USB sticks).
>
> It was suggested a while ago that maybe mixed-bg should apply to larger 
> volumes, maybe up to 8GB or 16GB?
>
>
> Chris Murphy
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds

2014-09-02 Thread john terragon
Rsync finished. FWIW in the end it reported an average speed of about
 900K/sec. Without autodefrag there have been no messages about hung
kworkers even though rsync seemingly keeps getting hung for several
minutes throughout the whole execution.

Thanks
John



On Tue, Sep 2, 2014 at 10:48 PM, john terragon  wrote:
> OK, so I'm using 3.17-rc3, same test on a flash usb drive, no
> autodefrag. The situation is even stranger. The rsync is clearly
> stuck, it's trying to write the same file for much more than 120 secs.
> However dmesg is clean, no "INFO: task kworker/u16:11:1763 blocked for
> more than 120 seconds" or anything.
> df is responsive but shows no increase in used space.
> Consider that with autodefrag this bug is completely "reliable", the
> hung-task info starts to show up almost immediately.
>
> Oh wait (I'm live...) now rsync is unstuck, files are being written
> and df shows an increase in used space. BUT, still no hung-task
> message in the kernel log, even though rsync was actually stuck for
> several minutes.
>
> So, to summarize, same conditions except no autodefrag. Result:
> process stuck for way more than 120 secs but this time no complaints
> in the kernel log.
>
> Thanks
> John
>
>
>
> On Tue, Sep 2, 2014 at 10:23 PM, john terragon  wrote:
>> I don't know what to tell you about the ENOSPC code being heavily
>> involved. At this point I'm using this simple test to see if things
>> improve:
>>
>> -freshly created btrfs on dmcrypt,
>> -rsync some stuff (since the fs is empty I could just use cp but I
>> keep the test the same as it was when I had the problem for the first
>> time)
>> -note: the rsynced stuff is about the size of the volume but with
>> compression I always end up with 1/2 to 3/4 free space
>>
>> I'm not sure how do I even get close to involving the ENOSPC code but
>> probably I'm not fully aware of the inner workings of btrfs.
>>
>>> Can you try flipping off autodefrag?
>>
>> As soon as the damn unkillable rsync decides to obey the kill -9...
>>
>> Thanks
>>
>> John
>>
>> On Tue, Sep 2, 2014 at 10:10 PM, Chris Mason  wrote:
>>>> On 09/02/2014 03:56 PM, john terragon wrote:
>>>> Nice...now I get the hung task even with 3.14.17 And I tried with
>>>> 4K for node and leaf size...same result. And to top it all off, today
>>>> I've been bitten by the bug also on my main root fs (which is on two
>>>> fast ssd), although with 3.16.1.
>>>>
>>>> Is it at least safe for the data? I mean, as long as the hung process
>>>> terminates and no other error shows up, can I at least be sure that
>>>> the data written is correct?
>>>
>>> Your traces are a little different.  The ENOSPC code is throttling
>>> things to make sure you have enough room for the writes you're doing.
>>> The code we have in 3.17-rc3 (or my for-linus branch) are the best
>>> choices right now.  You can pull that down to 3.16 if you want all the
>>> fixes on a more stable kernel.
>>>
>>> Nailing down the ENOSPC code is going to be a little different, I think
>>> autodefrag probably isn't interacting well with being short on space and
>>> encryption.  This is leading to much more IO than we'd normally do, and
>>> dm-crypt makes it fairly intensive.
>>>
>>> Can you try flipping off autodefrag?
>>>
>>> -chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds

2014-09-02 Thread john terragon
OK, so I'm using 3.17-rc3, same test on a flash usb drive, no
autodefrag. The situation is even stranger. The rsync is clearly
stuck, it's trying to write the same file for much more than 120 secs.
However dmesg is clean, no "INFO: task kworker/u16:11:1763 blocked for
more than 120 seconds" or anything.
df is responsive but shows no increase in used space.
Consider that with autodefrag this bug is completely "reliable", the
hung-task info starts to show up almost immediately.

Oh wait (I'm live...) now rsync is unstuck, files are being written
and df shows an increase in used space. BUT, still no hung-task
message in the kernel log, even though rsync was actually stuck for
several minutes.

So, to summarize, same conditions except no autodefrag. Result:
process stuck for way more than 120 secs but this time no complaints
in the kernel log.

Thanks
John



On Tue, Sep 2, 2014 at 10:23 PM, john terragon  wrote:
> I don't know what to tell you about the ENOSPC code being heavily
> involved. At this point I'm using this simple test to see if things
> improve:
>
> -freshly created btrfs on dmcrypt,
> -rsync some stuff (since the fs is empty I could just use cp but I
> keep the test the same as it was when I had the problem for the first
> time)
> -note: the rsynced stuff is about the size of the volume but with
> compression I always end up with 1/2 to 3/4 free space
>
> I'm not sure how do I even get close to involving the ENOSPC code but
> probably I'm not fully aware of the inner workings of btrfs.
>
>> Can you try flipping off autodefrag?
>
> As soon as the damn unkillable rsync decides to obey the kill -9...
>
> Thanks
>
> John
>
> On Tue, Sep 2, 2014 at 10:10 PM, Chris Mason  wrote:
>>> On 09/02/2014 03:56 PM, john terragon wrote:
>>> Nice...now I get the hung task even with 3.14.17 And I tried with
>>> 4K for node and leaf size...same result. And to top it all off, today
>>> I've been bitten by the bug also on my main root fs (which is on two
>>> fast ssd), although with 3.16.1.
>>>
>>> Is it at least safe for the data? I mean, as long as the hung process
>>> terminates and no other error shows up, can I at least be sure that
>>> the data written is correct?
>>
>> Your traces are a little different.  The ENOSPC code is throttling
>> things to make sure you have enough room for the writes you're doing.
>> The code we have in 3.17-rc3 (or my for-linus branch) are the best
>> choices right now.  You can pull that down to 3.16 if you want all the
>> fixes on a more stable kernel.
>>
>> Nailing down the ENOSPC code is going to be a little different, I think
>> autodefrag probably isn't interacting well with being short on space and
>> encryption.  This is leading to much more IO than we'd normally do, and
>> dm-crypt makes it fairly intensive.
>>
>> Can you try flipping off autodefrag?
>>
>> -chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds

2014-09-02 Thread john terragon
I don't know what to tell you about the ENOSPC code being heavily
involved. At this point I'm using this simple test to see if things
improve:

-freshly created btrfs on dmcrypt,
-rsync some stuff (since the fs is empty I could just use cp but I
keep the test the same as it was when I had the problem for the first
time)
-note: the rsynced stuff is about the size of the volume but with
compression I always end up with 1/2 to 3/4 free space

I'm not sure how do I even get close to involving the ENOSPC code but
probably I'm not fully aware of the inner workings of btrfs.

> Can you try flipping off autodefrag?

As soon as the damn unkillable rsync decides to obey the kill -9...

Thanks

John

On Tue, Sep 2, 2014 at 10:10 PM, Chris Mason  wrote:
>> On 09/02/2014 03:56 PM, john terragon wrote:
>> Nice...now I get the hung task even with 3.14.17 And I tried with
>> 4K for node and leaf size...same result. And to top it all off, today
>> I've been bitten by the bug also on my main root fs (which is on two
>> fast ssd), although with 3.16.1.
>>
>> Is it at least safe for the data? I mean, as long as the hung process
>> terminates and no other error shows up, can I at least be sure that
>> the data written is correct?
>
> Your traces are a little different.  The ENOSPC code is throttling
> things to make sure you have enough room for the writes you're doing.
> The code we have in 3.17-rc3 (or my for-linus branch) are the best
> choices right now.  You can pull that down to 3.16 if you want all the
> fixes on a more stable kernel.
>
> Nailing down the ENOSPC code is going to be a little different, I think
> autodefrag probably isn't interacting well with being short on space and
> encryption.  This is leading to much more IO than we'd normally do, and
> dm-crypt makes it fairly intensive.
>
> Can you try flipping off autodefrag?
>
> -chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds

2014-09-02 Thread john terragon
Nice...now I get the hung task even with 3.14.17 And I tried with
4K for node and leaf size...same result. And to top it all off, today
I've been bitten by the bug also on my main root fs (which is on two
fast ssd), although with 3.16.1.

Is it at least safe for the data? I mean, as long as the hung process
terminates and no other error shows up, can I at least be sure that
the data written is correct?

Thanks
John


On Tue, Sep 2, 2014 at 8:40 AM, Duncan <1i5t5.dun...@cox.net> wrote:
> john terragon posted on Tue, 02 Sep 2014 08:12:36 +0200 as excerpted:
>
>> I will definitely try the latest 3.14.x (never had any problem of this
>> kind with it). And I'll look into the other possibilities you pointed
>> out. However what I can tell you right now is this:
>>
>> -the filesystem was "new". I've been bitten by this bug with 3.15 and
>> 3.16 and I kept
>>  trying to do the same thing (create the fs, rsync or cp the same
>> stuff) to see if it
>>  got better.
>
> OK.  I had read your post as implying that the filesystem had been around
> since before 3.14, in which case the firmware shuffling could well have
> been a factor.  If it was a brand new filesystem, then likely not, as
> mkfs.btrfs tries to do a trim of the whole filesystem range before it
> sets up.
>
> But that does remind me of one other possibility I had thought to mention
> and then forgot... that's even more likely now that it's known to be a
> new filesystem...
>
> I don't recall the btrfs-progs version, but somewhere along the line one
> other thing of potential interest changed:
>
> Mkfs.btrfs used to default to 4 KiB node/leaf sizes; now days it defaults
> to 16 KiB as that's far better for most usage.  I wonder if USB sticks
> are an exception...
>
> Since this is set at mkfs time, trying a 3.14 series kernel with current
> mkfs.btrfs defaults shouldn't change things; if the 16 KiB nodesize is
> why it's slow, it should still be slow with the 3.14 series kernel.
>
> Conversely, if this is the problem, specifically creating the filesystem
> with --nodeside 4k should fix it, and it should stay fixed regardless of
> what kernel you use with it, 3.14, 3.16, 3.17-rcX, shouldn't matter.
>
> And that'd be a very useful thing to put on the wiki as well, should it
> be found to be the case.  So please test and post if it helps (and feel
> free to put it on the wiki too if it works)!  =:^)
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds

2014-09-01 Thread john terragon
I will definitely try the latest 3.14.x (never had any problem of this
kind with it). And I'll look into the other possibilities you pointed
out. However what I can tell you right now is this:

-the filesystem was "new". I've been bitten by this bug with 3.15 and
3.16 and I kept
 trying to do the same thing (create the fs, rsync or cp the same
stuff) to see if it
 got better.

-there does not seem to be a problem of space because the volume is
about 14G and in the end about 8G are usually occupied (when the
process terminates). I always used compression one way or another,
either forced or not and either lzo of zlib. Maybe I should try
without compression.

-it's not one specific usb flash drive. I tried several ones and I
always get the same behaviour.

-The process freezes for several minutes. It's completely frozen, no
I/O. So even if the firmware of the usb key is shuffling things around
blocking everything, it shouldn't take all that time for a small
amount of data. Also, as I mentioned, I tried ext4 and xfs and the
data seems to be written in a continuous way, without any big lock
(even though I realize that ext4 and xfs have very different writing
patterns than a cow filesystem, so I can't be sure it's significant).

Thanks
John






On Tue, Sep 2, 2014 at 7:20 AM, Duncan <1i5t5.dun...@cox.net> wrote:
> john terragon posted on Mon, 01 Sep 2014 18:36:49 +0200 as excerpted:
>
>> I was trying it again and it seems to have completed, albeit very slowly
>> (even for an usb flash drive). Was the 3.14 series the last immune one
>> from this problem? Should I try the latest 3.14.x?
>
> The 3.14 series was before the switch to generic kworker threads, while
> btrfs still had its own custom work-queue threads.  There was known to be
> a very specific problem with the kworker threads, but in 3.17-rc3 that
> should be fixed.
>
> So it may well be a problem with btrfs in general, at least as it exists
> today and historically, in which case 3.14.x won't help you much if at
> all.
>
> But I'd definitely recommend trying it.  If 3.14 is significantly faster
> and it's repeatedly so, then there's obviously some other regression,
> either with kworker threads or with something else, since then.  If not,
> then at least we know for sure kworker threads aren't a factor, since
> 3.14 was previous to them entering the picture.
>
>
> The other possibility I'm aware of would be erase-block related.  I see
> you're using autodefrag so it shouldn't be direct file fragmentation, but
> particularly if the filesystem has been used for some time, it might be
> the firmware trying to shuffle things around and having trouble due to
> having already used up all the known-free erase blocks so it's having to
> stop and free one by shifting things around every time it needs another
> one, and that's what's taking the time.
>
> What does btrfs fi show say about free space (the device line (lines, for
> multi-device btrfs) size vs. used, not the top line, is the interesting
> bit)?  What does btrfs fi df say for data and metadata (total vs. used)?
>
> For btrfs fi df ideally your data/metadata spread between used and total
> shouldn't be too large (a few gig for data and a gig or so for metadata
> isn't too bad, assuming a large enough device, of course).  If it is, a
> balance may be in order, perhaps using the -dusage=20 and/or -musage=20
> style options to keep it from rebalancing everything (read up on the wiki
> and choose your number, 5 might be good if there's plenty of room, you
> might need 50 or higher if you're close to full, more than about 80 and
> you might as well just use -d or -m and forget the usage bit).
>
> Similarly, for btrfs fi show, you want as much space as possible left,
> several gigs at least if your device isn't too small for that to be
> practical.  Again, if btrfs fi df is out of balance it'll use more space
> in show as well, and a balance should retrieve some of it.
>
> Once you have some space to work with (or before the balance if you
> suspect your firmware is SERIOUSLY out of space and shuffling, as that'll
> slow the balance down too, and again after), try running fstrim on the
> device.  It may or may not work on that device, but if it does and the
> firmware /was/ out of space and having to shuffle hard, it could improve
> performance *DRAMATICALLY*.  The reason being that on devices where it
> works, fstrim will tell the firmware what blocks are free, allowing it
> more flexibility in erase-block shuffling.
>
> If that makes a big difference, you can /try/ the discard mount option.
> Tho doing the trim/discard as part of normal operations can slow them
> down some too. 

Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds

2014-09-01 Thread john terragon
I was trying it again and it seems to have completed, albeit very
slowly (even for an usb flash drive). Was the 3.14 series the last
immune one from this problem? Should I try the latest 3.14.x?

Thanks
John

On Mon, Sep 1, 2014 at 6:02 PM, Chris Mason  wrote:
> On 09/01/2014 09:33 AM, john terragon wrote:
>> Hi.
>>
>> I'm not sure if this is related to the hung task problem that I've
>> been seeing in this ml for a while. But  I've been having this
>> seemingly related problem with 3.15, 3.16 and now 3.17-rc3 (which, if
>> I'm not mistaken, should have a fix for the hung task problem). So
>> here it is: I have a usb flash drive with btrfs (on top of dmcrypt)
>> usually mounted with these options
>>
>> rw,noatime,compress-force=zlib,ssd,space_cache,autodefrag
>>
>> When I try to rsync the usb flash drive I get a truck-load of "INFO:
>> task rsync:2524 blocked for more than 120 seconds" as you can see
>> below.
>> The rsync process crawls into an almost complete stop and I can't even
>> kill it. I know the usb key is OK because I've tried the same thing
>> with ext4 and xfs and everything went fine.
>
> This does have all of our fixes for hangs.  Does the rsync eventually
> complete?  Or do we just sit there forever?
>
> -chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds

2014-09-01 Thread john terragon
Hi.

I'm not sure if this is related to the hung task problem that I've
been seeing in this ml for a while. But  I've been having this
seemingly related problem with 3.15, 3.16 and now 3.17-rc3 (which, if
I'm not mistaken, should have a fix for the hung task problem). So
here it is: I have a usb flash drive with btrfs (on top of dmcrypt)
usually mounted with these options

rw,noatime,compress-force=zlib,ssd,space_cache,autodefrag

When I try to rsync the usb flash drive I get a truck-load of "INFO:
task rsync:2524 blocked for more than 120 seconds" as you can see
below.
The rsync process crawls into an almost complete stop and I can't even
kill it. I know the usb key is OK because I've tried the same thing
with ext4 and xfs and everything went fine.

Any ideas?

Thanks
John


[ 2763.077502] INFO: task rsync:2524 blocked for more than 120 seconds.
[ 2763.077513]   Not tainted 3.17.0-rc3-cu3 #1
[ 2763.077516] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 2763.077521] rsync   D 880347b63840 0  2524   2523 0x
[ 2763.077531]  880347b633f0 0082 00013200
880347b2bfd8
[ 2763.077540]  00013200 880347b633f0 8803f73af660
880347b2baa0
[ 2763.077546]  8803f73af664 880347b633f0 
8803f73af668
[ 2763.077554] Call Trace:
[ 2763.077573]  [] ? schedule_preempt_disabled+0x20/0x60
[ 2763.077582]  [] ? __mutex_lock_slowpath+0x14b/0x1d0
[ 2763.077593]  [] ? del_timer_sync+0x4a/0x60
[ 2763.077601]  [] ? mutex_lock+0x16/0x25
[ 2763.077656]  [] ?
btrfs_wait_ordered_roots+0x3e/0x1f0 [btrfs]
[ 2763.077682]  [] ? flush_space+0x1ea/0x4b0 [btrfs]
[ 2763.077706]  [] ? get_alloc_profile+0x85/0x1c0 [btrfs]
[ 2763.077730]  [] ? can_overcommit+0x81/0xe0 [btrfs]
[ 2763.077755]  [] ?
reserve_metadata_bytes+0x1c0/0x3d0 [btrfs]
[ 2763.077780]  [] ? btrfs_block_rsv_add+0x28/0x50 [btrfs]
[ 2763.077811]  [] ? start_transaction+0x442/0x500 [btrfs]
[ 2763.077839]  [] ?
btrfs_check_dir_item_collision+0x74/0x100 [btrfs]
[ 2763.077871]  [] ? btrfs_rename2+0x15f/0x6d0 [btrfs]
[ 2763.077880]  [] ? capable_wrt_inode_uidgid+0x4b/0x60
[ 2763.077887]  [] ? cap_validate_magic+0x100/0x100
[ 2763.077897]  [] ? vfs_rename+0x5a1/0x790
[ 2763.077905]  [] ? follow_managed+0x2a0/0x2b0
[ 2763.077913]  [] ? SYSC_renameat2+0x483/0x530
[ 2763.077922]  [] ? notify_change+0x2cd/0x380
[ 2763.077927]  [] ? __sb_end_write+0x28/0x60
[ 2763.077937]  [] ? lockref_put_or_lock+0x48/0x80
[ 2763.077943]  [] ? dput+0xad/0x170
[ 2763.077951]  [] ? path_put+0xd/0x20
[ 2763.077958]  [] ? SyS_chmod+0x41/0x90
[ 2763.077966]  [] ? system_call_fastpath+0x16/0x1b
[ 2883.203005] INFO: task kworker/u16:11:1617 blocked for more than 120 seconds.
[ 2883.203017]   Not tainted 3.17.0-rc3-cu3 #1
[ 2883.203020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 2883.203024] kworker/u16:11  D 8804185d1740 0  1617  2 0x
[ 2883.203085] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 2883.203091]  8804185d12f0 0046 00013200
880435f3bfd8
[ 2883.203099]  00013200 8804185d12f0 8803b86bef00
8803e9a761f0
[ 2883.203106]  8803e9a761f0 0001 
8803aece6520
[ 2883.203113] Call Trace:
[ 2883.203149]  [] ?
wait_current_trans.isra.22+0x97/0xf0 [btrfs]
[ 2883.203161]  [] ? prepare_to_wait_event+0xf0/0xf0
[ 2883.203190]  [] ? start_transaction+0x2a8/0x500 [btrfs]
[ 2883.203221]  [] ?
btrfs_finish_ordered_io+0x250/0x5c0 [btrfs]
[ 2883.203230]  [] ? __switch_to+0x119/0x580
[ 2883.203261]  [] ? normal_work_helper+0xaf/0x190 [btrfs]
[ 2883.203272]  [] ? process_one_work+0x167/0x380
[ 2883.203280]  [] ? worker_thread+0x114/0x480
[ 2883.203288]  [] ? rescuer_thread+0x2b0/0x2b0
[ 2883.203294]  [] ? kthread+0xb8/0xd0
[ 2883.203301]  [] ? kthread_create_on_node+0x170/0x170
[ 2883.203309]  [] ? ret_from_fork+0x7c/0xb0
[ 2883.203315]  [] ? kthread_create_on_node+0x170/0x170
[ 2883.203332] INFO: task btrfs-transacti:2126 blocked for more than
120 seconds.
[ 2883.203336]   Not tainted 3.17.0-rc3-cu3 #1
[ 2883.203338] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 2883.203341] btrfs-transacti D 8803e98a2860 0  2126  2 0x
[ 2883.203348]  8803e98a2410 0046 00013200
8803e9af3fd8
[ 2883.203355]  00013200 8803e98a2410 88045fa13af0
8803e9af3b48
[ 2883.203361]  88045fdb2928 0002 8149d750
8803e9af3c08
[ 2883.203368] Call Trace:
[ 2883.203378]  [] ? bit_wait+0x40/0x40
[ 2883.203386]  [] ? io_schedule+0x94/0x120
[ 2883.203394]  [] ? bit_wait_io+0x23/0x40
[ 2883.203402]  [] ? __wait_on_bit+0x55/0x80
[ 2883.203410]  [] ? wait_on_page_bit+0x6e/0x80
[ 2883.203418]  [] ? autoremove_wake_function+0x30/0x30
[ 2883.203425]  [] ? filemap_fdatawait_range+0xd0/0x160
[ 2883.203459]  [] ?
btrfs_wait_ordered_range+0x62/0x120 [btrfs]
[ 2883.203490]  [] 

Re: is it safe to change BTRFS_STRIPE_LEN?

2014-05-24 Thread john terragon
Yes the btrfs-tools would have to be recompiled too ( BTRFS_STRIPE_LEN
is defined in a volumes.h in there too).
And yes, kernel and tools would certainly kill any raid0 btrfs fs and
maybe any other multidevice kind of setting.


On Sat, May 24, 2014 at 9:07 PM, Austin S Hemmelgarn
 wrote:
> On 05/24/2014 12:44 PM, john terragon wrote:
>> Hi.
>>
>> I'm playing around with (software) raid0 on SSDs and since I remember
>> I read somewhere that intel recommends 128K stripe size for HDD arrays
>> but only 16K stripe size for SSD arrays, I wanted to see how a
>> small(er) stripe size would work on my system. Obviously with btrfs on
>> top of md-raid I could use the stripe size I want. But if I'm not
>> mistaken the stripe size with the native raid0 in btrfs is fixed to
>> 64K in BTRFS_STRIPE_LEN (volumes.h).
>> So I was wondering if it would be reasonably safe to just change that
>> to 16K (and duck and wait for the explosion ;) ).
>>
>> Can anyone adept to the inner workings of btrfs raid0 code confirm if
>> that would be the right way to proceed? (obviously without absolutely
>> any blame to be placed on anyone other than myself if things should go
>> badly :) )
> I personally can't render an opinion on whether changing it would make
> things break or not, but I do know that it would need to be changed both
> in the kernel and the tools, and the resultant kernel and tools would
> not be entirely compatible with filesystems produced by the regular
> tools and kernel, possibly to the point of corrupting any filesystem
> they touch.
>
> As for the 64k default strip size, that sounds correct, and is probably
> because that's the largest block that the I/O schedulers on Linux will
> dispatch as a single write to the underlying device.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


is it safe to change BTRFS_STRIPE_LEN?

2014-05-24 Thread john terragon
Hi.

I'm playing around with (software) raid0 on SSDs and since I remember
I read somewhere that intel recommends 128K stripe size for HDD arrays
but only 16K stripe size for SSD arrays, I wanted to see how a
small(er) stripe size would work on my system. Obviously with btrfs on
top of md-raid I could use the stripe size I want. But if I'm not
mistaken the stripe size with the native raid0 in btrfs is fixed to
64K in BTRFS_STRIPE_LEN (volumes.h).
So I was wondering if it would be reasonably safe to just change that
to 16K (and duck and wait for the explosion ;) ).

Can anyone adept to the inner workings of btrfs raid0 code confirm if
that would be the right way to proceed? (obviously without absolutely
any blame to be placed on anyone other than myself if things should go
badly :) )

Thanks

john
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on software RAID0

2014-05-06 Thread john terragon
just one last doubt:

why do you use --align-payload=1024? (or 8912)
Cryptsetup man says that the default for the payload alignment is 2048
(512-byte sectors). So, it's already aligned by default to 4K-byte
physical sectors (if that was your concern). Am I missing something?

John

On Mon, May 5, 2014 at 11:25 PM, Marc MERLIN  wrote:
> On Mon, May 05, 2014 at 10:51:46PM +0200, john terragon wrote:
>> Hi.
>> I'm about to try btrfs on an RAID0 md device (to be precise there will
>> be dm-crypt in between the md device and btrfs). If I used ext4 I
>> would set the stride and stripe_width extended options. Is there
>> anything similar I should be doing with mkfs.btrfs? Or maybe some
>> mount options beneficial to this kind of setting.
>
> This is not directly an answer to your question, so far I haven't used a
> special option like this with btrfs on my arrays although my
> undertstanding is that it's not as important as with ext4.
>
> That said, please read
> http://marc.merlins.org/perso/btrfs/post_2014-04-27_Btrfs-Multi-Device-Dmcrypt.html
>
> 1) use align-payload=1024 on cryptsetup instead of something bigger like
> 8192. This will reduce write amplification (if you're not on an SSD).
>
> 2) you don't need md0 in the middle, crypt each device and then use
> btrfs built in raid0 which will be faster (and is stable, at least as
> far as we know :) ).
>
> Then use /etc/crypttab or a script like this
> http://marc.merlins.org/linux/scripts/start-btrfs-dmcrypt
> to decrypt all your devices in one swoop and mount btrfs.
>
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>    what McDonalds is to gourmet 
> cooking
> Home page: http://marc.merlins.org/ | PGP 
> 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs on software RAID0

2014-05-05 Thread john terragon
On Mon, May 5, 2014 at 11:25 PM, Marc MERLIN  wrote:
> This is not directly an answer to your question, so far I haven't used a
> special option like this with btrfs on my arrays although my
> undertstanding is that it's not as important as with ext4.
>
> That said, please read
> http://marc.merlins.org/perso/btrfs/post_2014-04-27_Btrfs-Multi-Device-Dmcrypt.html
>
> 1) use align-payload=1024 on cryptsetup instead of something bigger like
> 8192. This will reduce write amplification (if you're not on an SSD).
>
> 2) you don't need md0 in the middle, crypt each device and then use
> btrfs built in raid0 which will be faster (and is stable, at least as
> far as we know :) ).
>
> Then use /etc/crypttab or a script like this
> http://marc.merlins.org/linux/scripts/start-btrfs-dmcrypt
> to decrypt all your devices in one swoop and mount btrfs.


I know about btrfs native raid capabilities but to be honest most of
the times I see people having "scary" problems with btrfs is when they
use it with multiple devices. So far my experience with btrfs has been
pretty smooth (always with btrfs on top of a single device) and I
wanted to let that part of btrfs to maybe mature a little bit more.
But maybe I'm wrong, so maybe I'll give both approaches a try.

About unlocking all the dm-crypt device in one swoop, there's this script too

https://github.com/gebi/keyctl_keyscript

which uses the kernel keyring to temporarily store the passphrase.
I was thinking about using it in a dm-crypt->md-raid->btrfs setting to
have one thread for each dm-crypt device, but probably aesni
instructions are fast enough
to not cause the single dm-crypt thread  in a md-raid->dm-crypt->btrfs setting
to become a bottleneck (at least with hdds, with sdds it might be a
different story)

John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs on software RAID0

2014-05-05 Thread john terragon
Hi.
I'm about to try btrfs on an RAID0 md device (to be precise there will
be dm-crypt in between the md device and btrfs). If I used ext4 I
would set the stride and stripe_width extended options. Is there
anything similar I should be doing with mkfs.btrfs? Or maybe some
mount options beneficial to this kind of setting.

Thanks
John
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html