Re: hitting BUG_ON on troublesome FS

2014-02-03 Thread Duncan
Remco Hosman - Yerf-it.com posted on Mon, 03 Feb 2014 21:51:26 +0100 as
excerpted:

> Anything i can do to resolve / debug the issue?

I see from the trace you're running kernel 3.13.0.  (FWIW 3.13.1 is out, 
but there weren't any btrfs commits therein, as they weren't upstream in 
3.14-pre yet at that time.)

You might try kernel 3.14-rc1 now that it's out.  There's a big btrfs git 
pull in that, including a number of btrfs send/receive fixes.  FWIW, 
btrfs send/receive seems to be a big focus right now, and I see a lot of 
patches floating by on the list even after that pull, so there's more 
where those came from.

I'm not a dev (just another btrfs user and list regular for several 
kernel cycles now) and haven't followed the patches closely enough to 
know if your particular balance problem is covered in one of them, but as 
I said, since 3.14-rc1 is out now, you might as well try it...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs raid5 unmountable

2014-02-03 Thread Duncan
Tetja Rediske posted on Mon, 03 Feb 2014 17:12:24 +0100 as excerpted:

[...]

> What happened before:
> 
> One disk was faulty, I added a new one and removed the old one, followed
> by a balance.
> 
> So far so good.
> 
> Some days after this I accidently removed a SATA Power Connector from
> another drive, without noticing it at first. Worked about an hour on the
> system, building new Kernel on another Filesystem. Rebooted with my new
> Kernel and the FS was not mountable. I noticed the "missing" disk and
> reattached the power.
> 
> So far i tried:
> 
> mount -o recovery
> btrfs check
> (after google) btrfs-zero-log
> 
> Sadly no luck. Whoever I can get my Files with btrfs restore. The
> Filesystem contains mainly Mediafiles, so it is not so bad, if they were
> lost, but restoring them from backups and sources will need atleast
> about a week. (Most of the Files are mirrored on a private Server, but
> even with 100MBit this takes a lot of time ; )
> 
> Some Idea who to recover this FS?

[As a btrfs users and list regular, /not/ a dev...]

That filesystem is very likely toast. =:(  Tho there's one thing you 
didn't mention trying yet that's worth the try.  See below...

You can read the list archives for the details if you like, but 
basically, the raid5/6 recovery code simply isn't complete yet and is not 
recommended for actual deployment in any way, shape or form.  In practice 
at present it's a fancy raid0 that calculates and writes a bunch of extra 
parity, and can be run-time tested and even in some cases recover from 
online-device-loss (as you noted), but throw a shutdown in there along 
with the bad device, and like a raid0, you might as well consider the 
filesystem lost... at least until the recovery code is complete, at which 
point if the filesystem is still around you may well be able to recover 
it, since the parity is all there, the code to actually recover from it 
just isn't all there yet.

FWIW, single-device btrfs is what I'd call almost-stable now altho you're 
still strongly encouraged to keep current and tested backups as there are 
still occasional corner-cases, and stay on current kernels and btrfs-
tools as potentially data-risking bugs still are getting fixed.  Multi-
device btrfs in single/raid0/1/10 modes are also closing in on stable 
now, tho not /quite/ as stable as single device, but quite usable as long 
as you do have tested backups -- unless you're unlucky you won't actually 
have to use them (I haven't had to use mine), but definitely keep 'em 
just in case.  But raid5/6, no-go, with the exception of pure testing 
data that you really are prepared to throw away, because recovery for it 
it really is still incomplete and thus known-broken.

The one thing I didn't see you mention that's worth a try if you haven't 
already, is the degraded mount option.  See
$KERNELSRC/Documentation/filesystems/btrfs.txt.  Tho really that should 
have been the first thing you tried for mounting once you realized you 
were down a device.

But with a bit of luck...

Also, if you've run btrf check with the --repair option (you didn't say, 
if you didn't, you should be fine as without --repair it's only a read-
only diagnostic), you may have made things worse, as that's really 
intended to be a last resort.

Of course if you'd been following the list as btrfs testers really should 
still be doing at this point, you'd have seen all this covered before.  
And of course, if you had done pre-deployment testing before you stuck 
valuable data on that btrfs raid5, you'd have noted the problems, even 
without reading about it on-list or on the wiki.  But of course hindsight 
is 20/20, as they say, and at least you DO have backups, even if they'll 
take awhile to restore.  =:^) That's already vastly better than a lot of 
the reports we unfortunately get here. =:^\

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: there is devid 0 when replace is running

2014-02-03 Thread Anand Jain
 as of now when the replace-er disk is add to dev list
 with its devid 0. We fail to obtain details of devid 0
 since we don't query devid 0 at all as below.

---
 btrfs rep start /dev/sdb /dev/sdf /btrfs

 btrfs fi show
 Label: none  uuid: f8fb9819-16c8-47b7-b62f-0ff90f8c56cd
Total devices 3 FS bytes used 1.94GiB
devid1 size 1.10GiB used 1.10GiB path /dev/sdb
devid2 size 1.10GiB used 1.08GiB path /dev/sdc
devid0 size 0.00 used 0.00 path
---

  this patch will make it proper by querying dev id 0.
-
btrfs repl start /dev/sdb /dev/sdf /btrfs
btrfs fi show /btrfs
Label: none  uuid: f8fb9819-16c8-47b7-b62f-0ff90f8c56cd
Total devices 3 FS bytes used 1.94GiB
devid0 size 1.10GiB used 1.10GiB path /dev/sdf
devid1 size 1.10GiB used 1.10GiB path /dev/sdb
devid2 size 1.10GiB used 1.08GiB path /dev/sdc
-

 Its fine to query dev id 0 when there is no replace
 activity well because we just skip the error ENODEV


btrfs fi show /btrfs
Label: none  uuid: f8fb9819-16c8-47b7-b62f-0ff90f8c56cd
Total devices 2 FS bytes used 1.94GiB
devid1 size 1.10GiB used 1.10GiB path /dev/sdf
devid2 size 1.10GiB used 1.08GiB path /dev/sdc


Signed-off-by: Anand Jain 
---
 utils.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/utils.c b/utils.c
index de513b6..a045ffd 100644
--- a/utils.c
+++ b/utils.c
@@ -1696,7 +1696,7 @@ int get_fs_info(char *path, struct 
btrfs_ioctl_fs_info_args *fi_args,
goto out;
}
 
-   for (; i <= fi_args->max_id; ++i) {
+   for (i = 0; i <= fi_args->max_id; ++i) {
BUG_ON(ndevs >= fi_args->num_devices);
ret = get_device_info(fd, i, &di_args[ndevs]);
if (ret == -ENODEV)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: disable snapshot aware defrag for now

2014-02-03 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/02/14 09:27, Josef Bacik wrote:
> It is so totally broken that I don't want it being turned on by anybody
> who can't edit this and change it themselves.

The symptoms I saw are huge amounts of kernel memory consumption, possibly
till exhaustion of swap.  Are there other ways in which is it broken (eg
corruption)?

Also is this patch making its way to the various stables?

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)

iEYEARECAAYFAlLwXE8ACgkQmOOfHg372QRLngCgpc445lPvM7YhGUxVdlU2O4vN
1CUAoM2NmeGPOeYxOji4yL4VRysBnTxg
=sQ3M
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Receive on same subvolume

2014-02-03 Thread Matthew Lai

On 03/02/2014 4:34 PM, Chris Murphy wrote:

On Feb 3, 2014, at 3:53 PM, Matthew Lai  wrote:


On 03/02/2014 11:26 AM, Chris Murphy wrote:

On Feb 3, 2014, at 11:19 AM, Matthew Lai  wrote:


Thanks. I should clarify what I'm trying to do.

I'm trying to use btrfs send for backup, without having another btrfs volume.

So the initial backup is a complete send, piped to Amazon Glacier (so my 
machine never has the whole file, and doesn't have space for one).

OK so you've use btrfs send piped to Glacier which creates a *file*, I'll call it 
"initial", not a navigable directory of files? Right?

That is correct.

It looks like the problem now is the sent file can't be applied to the original 
volume (for restore).

I'm counting two sent files: initial, increment1. I'm not sure which one you're applying. 
If you have the exact same read-only snapshot that the btrfs send file 
"initial" is based on, then you'd apply the increment1 to that read-only 
snapshot which will cause a new read-only snapshot to be created with the incremental 
data applied to it. The error you're getting sounds like the parent read-only snapshot 
isn't available?


That is also correct. There are 2 sent files. I am trying to apply increment1, 
on a snapshot of the parent (that was used to create increment1).

I don't understand how you can apply increment1 to the snapshot of increment1; 
and also I don't understand how the parent is also increment1.





I added -vv. Here is the test script for reproducing this entire setup.

---
#!/bin/sh

btrfs subvolume create data
btrfs subvolume snapshot -r data first_backup
touch data/a
btrfs subvolume snapshot -r data second_backup
btrfs send -p first_backup second_backup > increment
btrfs subvolume snapshot first_backup first_backup_rw
btrfs receive -vv first_backup_rw < increment
---

Output:
---
Create subvolume './data'
Create a readonly snapshot of 'data' in './first_backup'
Create a readonly snapshot of 'data' in './second_backup'
At subvol second_backup
Create a snapshot of 'first_backup' in './first_backup_rw'
At snapshot second_backup
receiving snapshot second_backup uuid=e6159a2a-3430-344a-a23d-b9bb83851a63, 
ctransid=28 parent_uuid=20c4ff66-a9ec-fc44-93c6-2c12637e95e6, parent_ctransid=26
ERROR: could not find parent subvolume
---

I would think applying the "patch" to first_backup_rw should succeed, because 
it's exactly the same as first_backup, which is the parent for the send, but it doesn't.

btrfs sub snap -r subvol.1 subvol
btrfs send subvol.1 -f /subvol.1.btrfs
#write some more files to subvol
btrfs sub snap -r subvol.2 subvol
btrfs send -p subvol.1 subvol.2 -f /subvol.2.btrfs

#To make subvol.1 into subvol.2 by applying subvol.2.btrfs to subvol.1, the actual 
original subvol.1 must be present first or you need to "receive" it from 
subvol.1.btrfs first. And also, I'm pretty sure you can't have subvol.2 already present 
because receive must create it.

Again, I haven't tried > and < so I don't know they work. Have you tried -f to 
point to the files?

According to the manpage, -f is the same as output redirection.
"Output is normally written to stdout. To write to a file, use this 
option.  An alternative would be to use pipes."


The reason why I can't use something like your sequence of commands 
(assuming the order of arguments for snap should be reversed) is that I 
want to be able to verify that the diff is correct, since there are 
still integrity problems with send/receive.


I was planning to do that by applying the "patch" to a snapshot of the 
parent right away, and make sure the patched volume is equal to the 
current snapshot (by trying another send, and making sure the output is 0).


Thanks
Matthew
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Receive on same subvolume

2014-02-03 Thread Chris Murphy

On Feb 3, 2014, at 3:53 PM, Matthew Lai  wrote:

> On 03/02/2014 11:26 AM, Chris Murphy wrote:
>> On Feb 3, 2014, at 11:19 AM, Matthew Lai  wrote:
>> 
>>> Thanks. I should clarify what I'm trying to do.
>>> 
>>> I'm trying to use btrfs send for backup, without having another btrfs 
>>> volume.
>>> 
>>> So the initial backup is a complete send, piped to Amazon Glacier (so my 
>>> machine never has the whole file, and doesn't have space for one).
>> OK so you've use btrfs send piped to Glacier which creates a *file*, I'll 
>> call it "initial", not a navigable directory of files? Right?
> That is correct.
>>> It looks like the problem now is the sent file can't be applied to the 
>>> original volume (for restore).
>> I'm counting two sent files: initial, increment1. I'm not sure which one 
>> you're applying. If you have the exact same read-only snapshot that the 
>> btrfs send file "initial" is based on, then you'd apply the increment1 to 
>> that read-only snapshot which will cause a new read-only snapshot to be 
>> created with the incremental data applied to it. The error you're getting 
>> sounds like the parent read-only snapshot isn't available?
>> 
> That is also correct. There are 2 sent files. I am trying to apply 
> increment1, on a snapshot of the parent (that was used to create increment1).

I don't understand how you can apply increment1 to the snapshot of increment1; 
and also I don't understand how the parent is also increment1.




> 
> I added -vv. Here is the test script for reproducing this entire setup.
> 
> ---
> #!/bin/sh
> 
> btrfs subvolume create data
> btrfs subvolume snapshot -r data first_backup
> touch data/a
> btrfs subvolume snapshot -r data second_backup
> btrfs send -p first_backup second_backup > increment
> btrfs subvolume snapshot first_backup first_backup_rw
> btrfs receive -vv first_backup_rw < increment
> ---
> 
> Output:
> ---
> Create subvolume './data'
> Create a readonly snapshot of 'data' in './first_backup'
> Create a readonly snapshot of 'data' in './second_backup'
> At subvol second_backup
> Create a snapshot of 'first_backup' in './first_backup_rw'
> At snapshot second_backup
> receiving snapshot second_backup uuid=e6159a2a-3430-344a-a23d-b9bb83851a63, 
> ctransid=28 parent_uuid=20c4ff66-a9ec-fc44-93c6-2c12637e95e6, 
> parent_ctransid=26
> ERROR: could not find parent subvolume
> ---
> 
> I would think applying the "patch" to first_backup_rw should succeed, because 
> it's exactly the same as first_backup, which is the parent for the send, but 
> it doesn't.

btrfs sub snap -r subvol.1 subvol
btrfs send subvol.1 -f /subvol.1.btrfs
#write some more files to subvol
btrfs sub snap -r subvol.2 subvol
btrfs send -p subvol.1 subvol.2 -f /subvol.2.btrfs

#To make subvol.1 into subvol.2 by applying subvol.2.btrfs to subvol.1, the 
actual original subvol.1 must be present first or you need to "receive" it from 
subvol.1.btrfs first. And also, I'm pretty sure you can't have subvol.2 already 
present because receive must create it.

Again, I haven't tried > and < so I don't know they work. Have you tried -f to 
point to the files?



Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: add regression test for running snapshot and send concurrently

2014-02-03 Thread Dave Chinner
On Mon, Feb 03, 2014 at 11:22:36PM +0800, Wang Shilong wrote:
> From: Wang Shilong 
> 
> Btrfs would fail to send if snapshot run concurrently, this test is to make
> sure we have fixed the bug.
> 
Couple of comments below.

> +_scratch_mkfs > /dev/null 2>&1
> +_scratch_mount
> +
> +
> +touch $SCRATCH_MNT/foo
> +
> +# get file with fragments by using backwards writes.
> +for i in `seq 10240 -1 1`; do
> + $XFS_IO_PROG -f -d -c "pwrite $(($i * 4096)) 4096" \
> + $SCRATCH_MNT/foo > /dev/null | _filter_xfs_io

Indentation.

> +done
> +
> +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
> + $SCRATCH_MNT/snap_1 >> $seqres.full 2>&1
> +
> +$BTRFS_UTIL_PROG send -f $SCRATCH_MNT/send_file \
> + $SCRATCH_MNT/snap_1 >> $seqres.full 2>&1 &
> +
> +pid=$!
> +
> +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/snap_1 \
> + $SCRATCH_MNT/snap_2 >> $seqres.full 2>&1
> +
> +wait $pid || echo "Failed to send, see dmesg"

This seems kind of racy. It assumes that the send command
doesn't complete before the wait $pid call is made. If $pid doesn't
exist at this time because it has completed, wait will return 127
and the test will fail

Also, why would a failure to send result in meaingful information in
dmesg? Shouldn't the userspace command output information to tell
you why there was a failure into $seqres.full?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: throttle delayed refs better

2014-02-03 Thread Johannes Hirte
On Mon, 3 Feb 2014 16:08:08 -0500
Josef Bacik  wrote:

> 
> On 02/03/2014 01:28 PM, Johannes Hirte wrote:
> > On Thu, 23 Jan 2014 13:07:52 -0500
> > Josef Bacik  wrote:
> >
> >> On one of our gluster clusters we noticed some pretty big lag
> >> spikes.  This turned out to be because our transaction commit was
> >> taking like 3 minutes to complete.  This is because we have like 30
> >> gigs of metadata, so our global reserve would end up being the max
> >> which is like 512 mb.  So our throttling code would allow a
> >> ridiculous amount of delayed refs to build up and then they'd all
> >> get run at transaction commit time, and for a cold mounted file
> >> system that could take up to 3 minutes to run.  So fix the
> >> throttling to be based on both the size of the global reserve and
> >> how long it takes us to run delayed refs. This patch tracks the
> >> time it takes to run delayed refs and then only allows 1 seconds
> >> worth of outstanding delayed refs at a time.  This way it will
> >> auto-tune itself from cold cache up to when everything is in
> >> memory and it no longer has to go to disk.  This makes our
> >> transaction commits take much less time to run. Thanks,
> >>
> >> Signed-off-by: Josef Bacik 
> > This one breaks my system. Shortly after boot the btrfs-freespace
> > thread goes up to 100% CPU usage and the system is nearly
> > unresponsive. I've seen it first with the full pull request for
> > 3.14-rc1 and was able to track it down to this patch.
> Could you turn on the softlockup timer and see if you can get a 
> backtrace of where it is stuck?  In the meantime I will go through
> and see if I can pinpoint where it may be happening.  Thanks,
> 
> Josef

This is what I've got with

CONFIG_LOCKUP_DETECTOR=y
CONFIG_HARDLOCKUP_DETECTOR=y
# CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=0
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_PANIC_TIMEOUT=0
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
CONFIG_TIMER_STATS=y
CONFIG_DEBUG_PREEMPT=y

[  203.610758] perf samples too long (2513 > 2500), lowering 
kernel.perf_event_max_sample_rate to 5
[  360.625822] INFO: task btrfs-endio-wri:1075 blocked for more than 120 
seconds.
[  360.625826]   Not tainted 3.14.0-rc1 #19
[  360.625828] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  360.625829] btrfs-endio-wri D 880137c12d00 0  1075  2 0x
[  360.625833]  8800b6b10950 0002 00012d00 
8800b6b10950
[  360.625837]  8801325b3fd8 8800a2dcc000 8801325719e8 

[  360.625840]   880132571800 8800b635ba00 
81256192
[  360.625844] Call Trace:
[  360.625854]  [] ? wait_current_trans.isra.19+0xbb/0xdf
[  360.625858]  [] ? finish_wait+0x65/0x65
[  360.625860]  [] ? start_transaction+0x2f1/0x4e3
[  360.625864]  [] ? btrfs_finish_ordered_io+0x44c/0x7b2
[  360.625869]  [] ? try_to_del_timer_sync+0x53/0x5e
[  360.625871]  [] ? del_timer_sync+0x26/0x43
[  360.625875]  [] ? schedule_timeout+0xeb/0x104
[  360.625877]  [] ? rcu_read_unlock_sched_notrace+0x11/0x11
[  360.625882]  [] ? worker_loop+0x162/0x4c3
[  360.625884]  [] ? btrfs_queue_worker+0x275/0x275
[  360.625888]  [] ? kthread+0xa3/0xab
[  360.625893]  [] ? trace_preempt_on+0xd/0x2a
[  360.625895]  [] ? freeze_workqueues_begin+0x8/0x11e
[  360.625897]  [] ? __kthread_parkme+0x5a/0x5a
[  360.625901]  [] ? ret_from_fork+0x7c/0xb0
[  360.625903]  [] ? __kthread_parkme+0x5a/0x5a
[  360.625906] INFO: task btrfs-transacti:1084 blocked for more than 120 
seconds.
[  360.625908]   Not tainted 3.14.0-rc1 #19
[  360.625909] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  360.625910] btrfs-transacti D 880137c52d00 0  1084  2 0x
[  360.625912]  880132428950 0002 00012d00 
880132428950
[  360.625915]  8800b5a35fd8 8801331a5a70 8801331a5ae8 

[  360.625918]  8800aba981b8 00015000 0001 
8126b986
[  360.625921] Call Trace:
[  360.625925]  [] ? btrfs_start_ordered_extent+0x91/0xdf
[  360.625928]  [] ? finish_wait+0x65/0x65
[  360.625931]  [] ? btrfs_wait_ordered_range+0xab/0x10a
[  360.625934]  [] ? __btrfs_write_out_cache+0x43c/0x67f
[  360.625939]  [] ? kmem_cache_free+0x66/0x10d
[  360.625942]  [] ? btrfs_update_inode_item+0xb9/0xcd
[  360.625944]  [] ? __btrfs_prealloc_file_range+0x276/0x2db
[  360.625947]  [] ? btrfs_write_out_ino_cache+0x3f/0x5e
[  360.625950]  [] ? btrfs_save_ino_cache+0x269/0x2ea
[  360.625952]  [] ? commit_fs_roots.isra.17+0xa6/0x148
[  360.625954]  [] ? trace_preempt_on+0xd/0x2a
[  360.625958]  [] ? preempt_count_sub+0xbd/0xc9
[  36

Re: Receive on same subvolume

2014-02-03 Thread Matthew Lai

On 03/02/2014 11:26 AM, Chris Murphy wrote:

On Feb 3, 2014, at 11:19 AM, Matthew Lai  wrote:


Thanks. I should clarify what I'm trying to do.

I'm trying to use btrfs send for backup, without having another btrfs volume.

So the initial backup is a complete send, piped to Amazon Glacier (so my 
machine never has the whole file, and doesn't have space for one).

OK so you've use btrfs send piped to Glacier which creates a *file*, I'll call it 
"initial", not a navigable directory of files? Right?

That is correct.

It looks like the problem now is the sent file can't be applied to the original 
volume (for restore).

I'm counting two sent files: initial, increment1. I'm not sure which one you're applying. 
If you have the exact same read-only snapshot that the btrfs send file 
"initial" is based on, then you'd apply the increment1 to that read-only 
snapshot which will cause a new read-only snapshot to be created with the incremental 
data applied to it. The error you're getting sounds like the parent read-only snapshot 
isn't available?

That is also correct. There are 2 sent files. I am trying to apply 
increment1, on a snapshot of the parent (that was used to create 
increment1).


I added -vv. Here is the test script for reproducing this entire setup.

---
#!/bin/sh

btrfs subvolume create data
btrfs subvolume snapshot -r data first_backup
touch data/a
btrfs subvolume snapshot -r data second_backup
btrfs send -p first_backup second_backup > increment
btrfs subvolume snapshot first_backup first_backup_rw
btrfs receive -vv first_backup_rw < increment
---

Output:
---
Create subvolume './data'
Create a readonly snapshot of 'data' in './first_backup'
Create a readonly snapshot of 'data' in './second_backup'
At subvol second_backup
Create a snapshot of 'first_backup' in './first_backup_rw'
At snapshot second_backup
receiving snapshot second_backup 
uuid=e6159a2a-3430-344a-a23d-b9bb83851a63, ctransid=28 
parent_uuid=20c4ff66-a9ec-fc44-93c6-2c12637e95e6, parent_ctransid=26

ERROR: could not find parent subvolume
---

I would think applying the "patch" to first_backup_rw should succeed, 
because it's exactly the same as first_backup, which is the parent for 
the send, but it doesn't.


Thanks
Matthew
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lost with degraded RAID1

2014-02-03 Thread Johan Kröckel
State is: I wont use this filesystem again. I have a backup. So I am
interested to give the necessary information for debuging it and
afterwards format it and create a new one. I already did fscks and
btrfschk --repair and pushed the output to txt-files but they are more
than 4 mb in size.

So I will post excerpts:

file: btrfsck.out===
Checking filesystem on /dev/mapper/bunkerA
UUID: 7f954a85-7566-4251-832c-44f2d3de2211
42
parent transid verify failed on 1887688011776 wanted 121037 found 88533
parent transid verify failed on 1888518615040 wanted 121481 found 90267
parent transid verify failed on 1681394102272 wanted 110919 found 91024
parent transid verify failed on 1888522838016 wanted 121486 found 90270
parent transid verify failed on 1888398331904 wanted 121062 found 89987
leaf parent key incorrect 1887867330560
bad block 1887867330560
leaf parent key incorrect 188812032
bad block 188812032
leaf parent key incorrect 1888124637184
bad block 1888124637184
leaf parent key incorrect 1888131444736
bad block 1888131444736

[...and so on for 4MB]

bad block 1888513552384
leaf parent key incorrect 1888513642496
bad block 1888513642496
leaf parent key incorrect 1888513654784
bad block 1888513654784
leaf parent key incorrect 1888514023424
bad block 1888514023424
btrfsck: cmds-check.c:2212: check_owner_ref: Assertion `!(rec->is_root)' failed.

file: smartctl-before-btrfschk-repair==
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.12-0.bpo.1-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   118   099   006Pre-fail
Always   -   172055696
  3 Spin_Up_Time0x0003   093   093   000Pre-fail
Always   -   0
  4 Start_Stop_Count0x0032   100   100   020Old_age
Always   -   7
  5 Reallocated_Sector_Ct   0x0033   100   100   010Pre-fail
Always   -   0
  7 Seek_Error_Rate 0x000f   069   060   030Pre-fail
Always   -   9085642
  9 Power_On_Hours  0x0032   097   097   000Old_age
Always   -   2769
 10 Spin_Retry_Count0x0013   100   100   097Pre-fail
Always   -   0
 12 Power_Cycle_Count   0x0032   100   100   020Old_age
Always   -   7
184 End-to-End_Error0x0032   100   100   099Old_age
Always   -   0
187 Reported_Uncorrect  0x0032   100   100   000Old_age
Always   -   0
188 Command_Timeout 0x0032   100   100   000Old_age
Always   -   0
189 High_Fly_Writes 0x003a   083   083   000Old_age
Always   -   17
190 Airflow_Temperature_Cel 0x0022   077   071   045Old_age
Always   -   23 (Min/Max 22/23)
191 G-Sense_Error_Rate  0x0032   100   100   000Old_age
Always   -   0
192 Power-Off_Retract_Count 0x0032   100   100   000Old_age
Always   -   5
193 Load_Cycle_Count0x0032   100   100   000Old_age
Always   -   7
194 Temperature_Celsius 0x0022   023   040   000Old_age
Always   -   23 (0 20 0 0)
197 Current_Pending_Sector  0x0012   100   100   000Old_age
Always   -   0
198 Offline_Uncorrectable   0x0010   100   100   000Old_age
Offline  -   0
199 UDMA_CRC_Error_Count0x003e   200   200   000Old_age
Always   -   0

=file:btrfsck-repair.out
enabling repair mode
Checking filesystem on /dev/mapper/bunkerA
UUID: 7f954a85-7566-4251-832c-44f2d3de2211
ify failed on 1887688011776 wanted 121037 found 88533
parent transid verify failed on 1888518615040 wanted 121481 found 90267
parent transid verify failed on 1681394102272 wanted 110919 found 91024
parent transid verify failed on 1888522838016 wanted 121486 found 90270
parent transid verify failed on 1888398331904 wanted 121062 found 89987
leaf parent key incorrect 1887867330560
bad block 1887867330560

[...and so on for 4MB]

bad block 1888513642496
leaf parent key incorrect 1888513654784
bad block 1888513654784
leaf parent key incorrect 1888514023424
bad block 1888514023424
btrfsck: cmds-check.c:2212: check_owner_ref: Assertion `!(rec->is_root)' failed.

==file:smartctl-after-btrfschk-repair==
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.12-0.bpo.1-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   118   099   0

Re: [PATCH] Btrfs: convert to add transaction protection for btrfs send

2014-02-03 Thread Josef Bacik

On 01/31/2014 11:37 AM, Wang Shilong wrote:
> Hello Josef,
>
>> 在 2014-1-31,上午12:23,Josef Bacik  写道:
>>
>>> On 01/30/2014 11:20 AM, Wang Shilong wrote:
 Hello Josef,

> On 01/30/2014 04:42 AM, Wang Shilong wrote:
>> Hi Josef,
>>
>>> On 01/29/2014 10:32 AM, Wang Shilong wrote:
 From: Wang Shilong 

 I sent a patch to kick off transaction from btrfs send, however it gets
 a regression that btrfs send try to search extent commit root without
 transaction protection.

 To fix this regression, we have two ideas:

 1. don't use extent commit root for sending.

 2. add transaction protection to use extent commit root safely.

 Both approaches need transaction actually, however, the first approach
 will add extent tree lock contention, so we'd better adopt the second
 approach.

 Luckily, now we only need transaction protection when iterating
 extent root, the protection's *range* is smaller than before.
>>> So what is the problem exactly?  How does it show up and what are you 
>>> doing to make it happen?  I'd really like to kill the transaction 
>>> taking completely in the send path so I'd like to know what is going 
>>> wrong so we can either take the extent commit semaphore and be 
>>> satisfied that is ok or come up with a different solution.  Thanks,
>> See in find_extent_clone(), we have to walk backrefs  while we have to 
>> search extent tree!
>> i was thinking to kick off transaction for initial  full send, however, 
>> we need to consider ref links even
>> in the initial send.
>>
>> It is easy to trigger problems like the following steps:
>>
>> # mkfs.btrfs -f /dev/sda8
>> # mount /dev/sda8 /mnt
>> # dd if=/dev/zero of=/mnt/data bs=4k count=102400 oflag=direct
>> # btrfs sub snapshot -r /mnt /mnt/snap
>> # btrfs send /mnt/snap -f /mnt/send_file &
>> # btrfs sub snapshot /mnt/snap /mnt/snap_1
>>
>> Feel free to correct me if i miss something here^_^(As i sometimes made 
>> some mistakes).
>>
> Ok so this is a lot of broken things, but it's not really the extent 
> root, cause like I said before nothings going to change that matters for 
> the snapshots bytes.
>
> What _does_ matter is the actual commit root for the actual fs root, and 
> that requires quite a bit of manoeuvring to get right.  So I'll send a 
> patch in a few minutes when I'm happy with what I have to fix this.  In 
> the meantime would you rig this example up into an xfstest so we can make 
> sure we don't have this problem in the future? Thanks,
 I am a little confused that we don't need protect extent commit root 
 anyway, it is really safe to search extent commit  root without any 
 transaction protection^_^….
 And i am ok to send a xfstest case for this..

>>> Sorry I didn't say that quite right.  We definitely need to protect the 
>>> commit root for the extent root because we could easily swap it out and 
>>> then write over blocks as we search down it, which would break things.  But 
>>> that's not what was screwing up here, we are cow'ing the root for /mnt/snap 
>>> and swapping out the commit root out from under us which is screwing us up 
>>> because we end up with a different root level than what we are expecting.
>>>
>>> So we need to use extent_commit_sem anywhere we search the commit root for 
>>> the extent tree, but we also need to do the same for searching the fs 
>>> roots.  Thanks,
> By some debugging, i found snapshots  will cow src root(this is a little 
> strange...), we need do the same thing
> for searching fs roots. Really thanks for looking into issue, and correct me, 
>  waiting for your fix.^_^ ^_^
>
So I've figured it out. We definitely need to protect the commit roots,
but that's not what is screwing us. Say we have commit root for snap at
block 1 and we search down the extent tree and see that it is at 1. Then
we go to do the search down to level on the root for that block, but in
the meantime we've snapshotted and switched the commit root for that
fs_tree to block 2. We go to search down and don't find our bytenr we
were looking for and we exit out without finding our original subvolume.

So there are a few things we can do here

1) Only switch the commit roots for the fs_root _after_ we switch the
extent root commit root. This works out well because we'd need to hold
the extent_commit_sem for the entirety of this operation so we'd end up
with a consistent view of everything. The drawback of this is that we
have to process the fs_roots twice, once to update the root items and
then again to swap the commit roots.

2) Remove the per-root rwsem for the commit root and just make one big
rwsem that covers all commit root switching. This way everybody who
wants to search with the commit root can just

hitting BUG_ON on troublesome FS

2014-02-03 Thread Remco Hosman - Yerf-it.com
FIrst, a bit of history of the filesystem:
used to be 6 disks, now 5. partially raid1 / raid10. been migrating back and 
forth a few times.
As some point, a balance would not complete and would end with 164 ENOSPC’ses, 
while there was plenty of unallocated space on each disk.

i scanned for extends larger then 1gig and found a few, so ran a recursive 
balance of the entire FS.

I deceided to empty the filesystem and format it.

i pulled most files off it some via btrfs send/receive, some via rsync. but 1 
subvol wouldn’t send. i don’t remember the exact error, but it was that a 
extend could not be found on 1 of the disks.

with only a few 100gig of data left, i decided to balance some remaining empty 
space before doing a `btrfs dev del`, so have another disk to store more data 
on.
but im hitting a snag, i hit a BUG_ON when doing a `btrfs bal start -dusage=2 
/mountpoint` :

[ 3327.678329] btrfs: found 198 extents
[ 3328.117274] btrfs: relocating block group 84473084968960 flags 17
[ 3329.278521] btrfs: found 103 extents
[ 3331.907931] btrfs: found 103 extents
[ 3332.386172] btrfs: relocating block group 84466642518016 flags 17
[ .536595] btrfs: found 86 extents
[ 3335.982967] btrfs: found 86 extents
[ 3336.599555] btrfs (4746) used greatest stack depth: 2744 bytes left
[ 3379.073464] btrfs: relocating block group 89878368419840 flags 17
[ 3381.608948] btrfs: found 499 extents
[ 3383.884696] [ cut here ]
[ 3383.884720] kernel BUG at fs/btrfs/relocation.c:3405!
[ 3383.884731] invalid opcode:  [#1] SMP 
[ 3383.884742] Modules linked in:
[ 3383.884753] CPU: 0 PID: 5663 Comm: btrfs Not tainted 3.13.0 #1
[ 3383.884763] Hardware name: System manufacturer System Product Name/E45M1-I 
DELUXE, BIOS 0405 08/08/2012
[ 3383.884778] task: 8802360eae80 ti: 88010dcaa000 task.ti: 
88010dcaa000
[ 3383.884790] RIP: 0010:[]  [] 
__add_tree_block+0x1c5/0x1e0
[ 3383.884811] RSP: 0018:88010dcaba38  EFLAGS: 00010202
[ 3383.884821] RAX: 0001 RBX: 880039f18000 RCX: 
[ 3383.884832] RDX:  RSI:  RDI: 
[ 3383.884843] RBP: 88010dcaba90 R08: 88010dcab9f4 R09: 88010dcab930
[ 3383.884854] R10:  R11: 047f R12: 1000
[ 3383.884865] R13: 88023489c630 R14:  R15: 528d112e4000
[ 3383.884876] FS:  7f8e27e74880() GS:88023ec0() 
knlGS:
[ 3383.884888] CS:  0010 DS:  ES:  CR0: 8005003b
[ 3383.884897] CR2: 7f60d89f35a8 CR3: 0001b5ada000 CR4: 07f0
[ 3383.884907] Stack:
[ 3383.884941]  88010dcabb28 4000812bde34 00a8528d112e 
0010
[ 3383.885012]  1000 1000 0f3a 
8802348d6990
[ 3383.885082]  88001cbf5a00 880039f18000 00b8 
88010dcabb00
[ 3383.885153] Call Trace:
[ 3383.885192]  [] add_data_references+0x244/0x2e0
[ 3383.885232]  [] relocate_block_group+0x56b/0x640
[ 3383.885272]  [] btrfs_relocate_block_group+0x1a2/0x2f0
[ 3383.885313]  [] btrfs_relocate_chunk.isra.27+0x6a/0x740
[ 3383.885355]  [] ? btrfs_set_path_blocking+0x31/0x70
[ 3383.885432]  [] ? btrfs_search_slot+0x386/0x960
[ 3383.885473]  [] ? free_extent_buffer+0x47/0xa0
[ 3383.885513]  [] btrfs_balance+0x90b/0xea0
[ 3383.885553]  [] btrfs_ioctl_balance+0x162/0x520
[ 3383.885592]  [] btrfs_ioctl+0xcbd/0x25c0
[ 3383.885632]  [] ? __do_page_fault+0x1dc/0x520
[ 3383.885673]  [] do_vfs_ioctl+0x2c8/0x490
[ 3383.885712]  [] SyS_ioctl+0x81/0xa0
[ 3383.885752]  [] tracesys+0xdd/0xe2
[ 3383.885787] Code: ff 48 8b 4d a8 48 8d 75 b6 4c 89 ea 48 89 df e8 42 e7 ff 
ff 4c 89 ef 89 45 a8 e8 c7 0f f9 ff 8b 45 a8 e9 69 ff ff ff 85 c0 74 d6 <0f> 0b 
66 0f 1f 84 00 00 00 00 00 b8 f4 ff ff ff e9 50 ff ff ff 
[ 3383.886001] RIP  [] __add_tree_block+0x1c5/0x1e0
[ 3383.886042]  RSP 
[ 3383.886359] ---[ end trace 075209044ce10da3 ]---
Anything i can do to resolve / debug the issue?

Remco--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: throttle delayed refs better

2014-02-03 Thread Josef Bacik


On 02/03/2014 01:28 PM, Johannes Hirte wrote:

On Thu, 23 Jan 2014 13:07:52 -0500
Josef Bacik  wrote:


On one of our gluster clusters we noticed some pretty big lag
spikes.  This turned out to be because our transaction commit was
taking like 3 minutes to complete.  This is because we have like 30
gigs of metadata, so our global reserve would end up being the max
which is like 512 mb.  So our throttling code would allow a
ridiculous amount of delayed refs to build up and then they'd all get
run at transaction commit time, and for a cold mounted file system
that could take up to 3 minutes to run.  So fix the throttling to be
based on both the size of the global reserve and how long it takes us
to run delayed refs. This patch tracks the time it takes to run
delayed refs and then only allows 1 seconds worth of outstanding
delayed refs at a time.  This way it will auto-tune itself from cold
cache up to when everything is in memory and it no longer has to go
to disk.  This makes our transaction commits take much less time to
run. Thanks,

Signed-off-by: Josef Bacik 

This one breaks my system. Shortly after boot the btrfs-freespace
thread goes up to 100% CPU usage and the system is nearly unresponsive.
I've seen it first with the full pull request for 3.14-rc1 and was able
to track it down to this patch.
Could you turn on the softlockup timer and see if you can get a 
backtrace of where it is stuck?  In the meantime I will go through and 
see if I can pinpoint where it may be happening.  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lost with degraded RAID1

2014-02-03 Thread Chris Murphy

On Feb 3, 2014, at 1:55 PM, Johan Kröckel  wrote:

> 2014-01-30 Chris Murphy :
>> 
>> On Jan 30, 2014, at 10:58 AM, Hugo Mills  wrote:
>> 
>>> On Thu, Jan 30, 2014 at 10:33:21AM -0700, Chris Murphy wrote:
 You're doing an online conversion of a degraded raid1 volume into single? 
 Does anyone know if this is expected or intended to work?
>>> 
>>>  I don't see why not. One suggested method of recovering RAID from a
>>> degraded situation is to rebalance over just the remaining devices
>>> (space permitting, of course).
>> 
>> Right but that's not a conversion. That's a regular balance on a degraded 
>> mount, with multiple remaining devices: e.g. a 4 disk raid1, drive fails, 
>> mount -o degraded, delete missing, then balance will replicate any missing 
>> 2nd copies onto three drives.
>> 
>> The bigger problem at the moment is that -o degraded isn't working for 
>> Johan. The too many missing devices message seems like a bug and with 
>> limited information it may even be whatever that bug is, that cause the 
>> conversion to fail. Some 11GB were converted prior to the failure.
> Which usefull information can provide. On the weekend I was at the
> server and found out, that the vanishing of the drive at reboot was
> strange behavior of the bios. So the drive is online again. but the
> filesystem is still showing strange behavior, but now I can mount it
> rw.

I'd like to see btrfs fi df results for the volume. And new btrfs check. And 
then a backup if needed, and then a scrub to see if that fixes anything broken 
between them. I'm not sure what happens if a new generation object is broken 
and the old generation is OK, what scrub will do? Maybe it just reports it, I'm 
not sure. If you want you could do a btrfs scrub -r which is read only and just 
reports what the problems are.

You also have an incomplete balance, right? So it's possible some things might 
not be fixable if the conversion to single was successful. You'll need to 
decide if you want to reconvert back to data/metadata raid1/raid from whatever 
you're at now.


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lost with degraded RAID1

2014-02-03 Thread Johan Kröckel
2014-01-30 Chris Murphy :
>
> On Jan 30, 2014, at 10:58 AM, Hugo Mills  wrote:
>
>> On Thu, Jan 30, 2014 at 10:33:21AM -0700, Chris Murphy wrote:
>>> You're doing an online conversion of a degraded raid1 volume into single? 
>>> Does anyone know if this is expected or intended to work?
>>
>>   I don't see why not. One suggested method of recovering RAID from a
>> degraded situation is to rebalance over just the remaining devices
>> (space permitting, of course).
>
> Right but that's not a conversion. That's a regular balance on a degraded 
> mount, with multiple remaining devices: e.g. a 4 disk raid1, drive fails, 
> mount -o degraded, delete missing, then balance will replicate any missing 
> 2nd copies onto three drives.
>
> The bigger problem at the moment is that -o degraded isn't working for Johan. 
> The too many missing devices message seems like a bug and with limited 
> information it may even be whatever that bug is, that cause the conversion to 
> fail. Some 11GB were converted prior to the failure.
Which usefull information can provide. On the weekend I was at the
server and found out, that the vanishing of the drive at reboot was
strange behavior of the bios. So the drive is online again. but the
filesystem is still showing strange behavior, but now I can mount it
rw.
> Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Receive on same subvolume

2014-02-03 Thread Chris Murphy

On Feb 3, 2014, at 11:19 AM, Matthew Lai  wrote:

> Thanks. I should clarify what I'm trying to do.
> 
> I'm trying to use btrfs send for backup, without having another btrfs volume.
> 
> So the initial backup is a complete send, piped to Amazon Glacier (so my 
> machine never has the whole file, and doesn't have space for one).

OK so you've use btrfs send piped to Glacier which creates a *file*, I'll call 
it "initial", not a navigable directory of files? Right?

> 
> At the same time I'm keeping a snapshot of the current volume.
> 
> On the next incremental backup, I would use the first snapshot as the parent, 
> and send the differences to Glacier again (without having the entire file on 
> the system at any time).

That's fine as long as the stdout from btrfs send ends up as a self contained 
file on Glacier. I'll call this "increment1"

> 
> It looks like the problem now is the sent file can't be applied to the 
> original volume (for restore).

I'm counting two sent files: initial, increment1. I'm not sure which one you're 
applying. If you have the exact same read-only snapshot that the btrfs send 
file "initial" is based on, then you'd apply the increment1 to that read-only 
snapshot which will cause a new read-only snapshot to be created with the 
incremental data applied to it. The error you're getting sounds like the parent 
read-only snapshot isn't available?

Have you tried -vv flag to get more verbose error information when using btrfs 
receive?

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: throttle delayed refs better

2014-02-03 Thread Johannes Hirte
On Thu, 23 Jan 2014 13:07:52 -0500
Josef Bacik  wrote:

> On one of our gluster clusters we noticed some pretty big lag
> spikes.  This turned out to be because our transaction commit was
> taking like 3 minutes to complete.  This is because we have like 30
> gigs of metadata, so our global reserve would end up being the max
> which is like 512 mb.  So our throttling code would allow a
> ridiculous amount of delayed refs to build up and then they'd all get
> run at transaction commit time, and for a cold mounted file system
> that could take up to 3 minutes to run.  So fix the throttling to be
> based on both the size of the global reserve and how long it takes us
> to run delayed refs. This patch tracks the time it takes to run
> delayed refs and then only allows 1 seconds worth of outstanding
> delayed refs at a time.  This way it will auto-tune itself from cold
> cache up to when everything is in memory and it no longer has to go
> to disk.  This makes our transaction commits take much less time to
> run. Thanks,
> 
> Signed-off-by: Josef Bacik 

This one breaks my system. Shortly after boot the btrfs-freespace
thread goes up to 100% CPU usage and the system is nearly unresponsive.
I've seen it first with the full pull request for 3.14-rc1 and was able
to track it down to this patch.

regards,
  Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Receive on same subvolume

2014-02-03 Thread Matthew Lai

Thanks. I should clarify what I'm trying to do.

I'm trying to use btrfs send for backup, without having another btrfs 
volume.


So the initial backup is a complete send, piped to Amazon Glacier (so my 
machine never has the whole file, and doesn't have space for one).


At the same time I'm keeping a snapshot of the current volume.

On the next incremental backup, I would use the first snapshot as the 
parent, and send the differences to Glacier again (without having the 
entire file on the system at any time).


It looks like the problem now is the sent file can't be applied to the 
original volume (for restore).


Thanks
Matthew

On 03/02/2014 9:30 AM, Chris Murphy wrote:

On Jan 29, 2014, at 2:26 PM, Matthew Lai  wrote:


Hello,

Is this supposed to work? (/data is the root volume, /data/a is a subvolume)

btrfs subvolume snapshot /data/a /data/b
# make some changes in b
btrfs send -p /data/a /data/b > delta
btrfs receive /data/a < delta

I'm getting "ERROR: could not find parent subvolume" on receive.
What I'm trying to do is to back up using send/receive, but I don't have 50% 
free space, and (please correct me if I'm wrong) since receive doesn't do 
deduplication, I want to use snapshot to do the initial bootstrapping, instead 
of send/receive without a parent.

I think you've oversimplified your commands, because it looks like you're using 
send/receive on the same file system. But if it's a backup, necessarily you'd 
have to be sending the subvolume(s) to another file system on another disk 
(either on the same system or remotely). So that needs some clarification.

Also, btrfs send requires subvolumes to be read only. Are they?

And btrfs incremental receive expects the identical parent already on the 
destination. Is it?

Also, while I'm not certain it matters, man page says to use -f to specify files. I 
haven't tested < and >. But then also the step where you create this 
intermediate snapshot file isn't necessary, just combine the send receive commands 
through pipe.


https://btrfs.wiki.kernel.org/index.php/Incremental_Backup


Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] btrfs: send: remove virtual_mem member from fs_path

2014-02-03 Thread David Sterba
We don't need to keep track of that, it's available via is_vmalloc_addr.

Signed-off-by: David Sterba 
---
 fs/btrfs/send.c |8 ++--
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 524086a882f9..ea427624e842 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -55,7 +55,6 @@ struct fs_path {
char *buf;
int buf_len;
unsigned int reversed:1;
-   unsigned int virtual_mem:1;
char inline_buf[];
};
char pad[PAGE_SIZE];
@@ -241,7 +240,6 @@ static struct fs_path *fs_path_alloc(void)
if (!p)
return NULL;
p->reversed = 0;
-   p->virtual_mem = 0;
p->buf = p->inline_buf;
p->buf_len = FS_PATH_INLINE_SIZE;
fs_path_reset(p);
@@ -265,7 +263,7 @@ static void fs_path_free(struct fs_path *p)
if (!p)
return;
if (p->buf != p->inline_buf) {
-   if (p->virtual_mem)
+   if (is_vmalloc_addr(p->buf))
vfree(p->buf);
else
kfree(p->buf);
@@ -299,13 +297,12 @@ static int fs_path_ensure_buf(struct fs_path *p, int len)
tmp_buf = vmalloc(len);
if (!tmp_buf)
return -ENOMEM;
-   p->virtual_mem = 1;
}
memcpy(tmp_buf, p->buf, p->buf_len);
p->buf = tmp_buf;
p->buf_len = len;
} else {
-   if (p->virtual_mem) {
+   if (is_vmalloc_addr(p->buf)) {
tmp_buf = vmalloc(len);
if (!tmp_buf)
return -ENOMEM;
@@ -319,7 +316,6 @@ static int fs_path_ensure_buf(struct fs_path *p, int len)
return -ENOMEM;
memcpy(tmp_buf, p->buf, p->buf_len);
kfree(p->buf);
-   p->virtual_mem = 1;
}
}
p->buf = tmp_buf;
-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] btrfs: send: lower memory requirements in common case

2014-02-03 Thread David Sterba
The fs_path structure uses an inline buffer and falls back to a chain of
allocations, but vmalloc is not necessary because PATH_MAX fits into
PAGE_SIZE.

The size of fs_path has been reduced to 256 bytes from PAGE_SIZE,
usually 4k. Experimental measurements show that most paths on a single
filesystem do not exceed 200 bytes, and these get stored into the inline
buffer directly, which is now 230 bytes. Longer paths are kmalloced when
needed.

Signed-off-by: David Sterba 
---
 fs/btrfs/send.c |  103 ++-
 1 files changed, 34 insertions(+), 69 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index cb12c2ec37dc..4e3a3d413417 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -57,7 +57,12 @@ struct fs_path {
unsigned short reversed:1;
char inline_buf[];
};
-   char pad[PAGE_SIZE];
+   /*
+* Average path length does not exceed 200 bytes, we'll have
+* better packing in the slab and higher chance to satisfy
+* a allocation later during send.
+*/
+   char pad[256];
};
 };
 #define FS_PATH_INLINE_SIZE \
@@ -262,12 +267,8 @@ static void fs_path_free(struct fs_path *p)
 {
if (!p)
return;
-   if (p->buf != p->inline_buf) {
-   if (is_vmalloc_addr(p->buf))
-   vfree(p->buf);
-   else
-   kfree(p->buf);
-   }
+   if (p->buf != p->inline_buf)
+   kfree(p->buf);
kfree(p);
 }
 
@@ -287,40 +288,28 @@ static int fs_path_ensure_buf(struct fs_path *p, int len)
if (p->buf_len >= len)
return 0;
 
-   path_len = p->end - p->start;
-   old_buf_len = p->buf_len;
-   len = PAGE_ALIGN(len);
-
+   /*
+* First time the inline_buf does not suffice
+*/
if (p->buf == p->inline_buf) {
-   tmp_buf = kmalloc(len, GFP_NOFS | __GFP_NOWARN);
-   if (!tmp_buf) {
-   tmp_buf = vmalloc(len);
-   if (!tmp_buf)
-   return -ENOMEM;
-   }
-   memcpy(tmp_buf, p->buf, p->buf_len);
-   p->buf = tmp_buf;
-   p->buf_len = len;
+   p->buf = kmalloc(len, GFP_NOFS);
+   if (!p->buf)
+   return -ENOMEM;
+   /*
+* The real size of the buffer is bigger, this will let the
+* fast path happen most of the time
+*/
+   p->buf_len = ksize(p->buf);
} else {
-   if (is_vmalloc_addr(p->buf)) {
-   tmp_buf = vmalloc(len);
-   if (!tmp_buf)
-   return -ENOMEM;
-   memcpy(tmp_buf, p->buf, p->buf_len);
-   vfree(p->buf);
-   } else {
-   tmp_buf = krealloc(p->buf, len, GFP_NOFS);
-   if (!tmp_buf) {
-   tmp_buf = vmalloc(len);
-   if (!tmp_buf)
-   return -ENOMEM;
-   memcpy(tmp_buf, p->buf, p->buf_len);
-   kfree(p->buf);
-   }
-   }
-   p->buf = tmp_buf;
-   p->buf_len = len;
+   p->buf = krealloc(p->buf, len, GFP_NOFS);
+   if (!p->buf)
+   return -ENOMEM;
+   p->buf_len = ksize(p->buf);
}
+
+   path_len = p->end - p->start;
+   old_buf_len = p->buf_len;
+
if (p->reversed) {
tmp_buf = p->buf + old_buf_len - path_len - 1;
p->end = p->buf + p->buf_len - 1;
@@ -911,9 +900,7 @@ static int iterate_dir_item(struct btrfs_root *root, struct 
btrfs_path *path,
struct btrfs_dir_item *di;
struct btrfs_key di_key;
char *buf = NULL;
-   char *buf2 = NULL;
-   int buf_len;
-   int buf_virtual = 0;
+   const int buf_len = PATH_MAX;
u32 name_len;
u32 data_len;
u32 cur;
@@ -923,7 +910,6 @@ static int iterate_dir_item(struct btrfs_root *root, struct 
btrfs_path *path,
int num;
u8 type;
 
-   buf_len = PAGE_SIZE;
buf = kmalloc(buf_len, GFP_NOFS);
if (!buf) {
ret = -ENOMEM;
@@ -945,30 +931,12 @@ static int iterate_dir_item(struct btrfs_root *root, 
struct btrfs_path *path,
type = btrfs_dir_type(eb, di);
btrfs_dir_item_key_to_cpu(eb, di, &di_key);
 
+   /*
+* Path too long
+*/
if (name_len + data_len > buf_len) {
-   buf_len = PAGE_ALIGN(name_len + data_len);
-   if (buf_virtual) {
-  

[PATCH 3/6] btrfs: send: squeeze bitfilelds in fs_path

2014-02-03 Thread David Sterba
We know that buf_len is at most PATH_MAX, 4k, and can merge it with the
reversed member. This saves 3 bytes in favor of inline_buf.

Signed-off-by: David Sterba 
---
 fs/btrfs/send.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index ea427624e842..cb12c2ec37dc 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -53,8 +53,8 @@ struct fs_path {
char *end;
 
char *buf;
-   int buf_len;
-   unsigned int reversed:1;
+   unsigned short buf_len:15;
+   unsigned short reversed:1;
char inline_buf[];
};
char pad[PAGE_SIZE];
-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] btrfs: send: remove prepared member from fs_path

2014-02-03 Thread David Sterba
The member is used only to return value back from
fs_path_prepare_for_add, we can do it locally and save 8 bytes for the
inline_buf path.

Signed-off-by: David Sterba 
---
 fs/btrfs/send.c |   26 +-
 1 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 04c07ed51df5..524086a882f9 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -51,7 +51,6 @@ struct fs_path {
struct {
char *start;
char *end;
-   char *prepared;
 
char *buf;
int buf_len;
@@ -338,7 +337,8 @@ static int fs_path_ensure_buf(struct fs_path *p, int len)
return 0;
 }
 
-static int fs_path_prepare_for_add(struct fs_path *p, int name_len)
+static int fs_path_prepare_for_add(struct fs_path *p, int name_len,
+  char **prepared)
 {
int ret;
int new_len;
@@ -354,11 +354,11 @@ static int fs_path_prepare_for_add(struct fs_path *p, int 
name_len)
if (p->start != p->end)
*--p->start = '/';
p->start -= name_len;
-   p->prepared = p->start;
+   *prepared = p->start;
} else {
if (p->start != p->end)
*p->end++ = '/';
-   p->prepared = p->end;
+   *prepared = p->end;
p->end += name_len;
*p->end = 0;
}
@@ -370,12 +370,12 @@ out:
 static int fs_path_add(struct fs_path *p, const char *name, int name_len)
 {
int ret;
+   char *prepared;
 
-   ret = fs_path_prepare_for_add(p, name_len);
+   ret = fs_path_prepare_for_add(p, name_len, &prepared);
if (ret < 0)
goto out;
-   memcpy(p->prepared, name, name_len);
-   p->prepared = NULL;
+   memcpy(prepared, name, name_len);
 
 out:
return ret;
@@ -384,12 +384,12 @@ out:
 static int fs_path_add_path(struct fs_path *p, struct fs_path *p2)
 {
int ret;
+   char *prepared;
 
-   ret = fs_path_prepare_for_add(p, p2->end - p2->start);
+   ret = fs_path_prepare_for_add(p, p2->end - p2->start, &prepared);
if (ret < 0)
goto out;
-   memcpy(p->prepared, p2->start, p2->end - p2->start);
-   p->prepared = NULL;
+   memcpy(prepared, p2->start, p2->end - p2->start);
 
 out:
return ret;
@@ -400,13 +400,13 @@ static int fs_path_add_from_extent_buffer(struct fs_path 
*p,
  unsigned long off, int len)
 {
int ret;
+   char *prepared;
 
-   ret = fs_path_prepare_for_add(p, len);
+   ret = fs_path_prepare_for_add(p, len, &prepared);
if (ret < 0)
goto out;
 
-   read_extent_buffer(eb, p->prepared, off, len);
-   p->prepared = NULL;
+   read_extent_buffer(eb, prepared, off, len);
 
 out:
return ret;
-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] Btrfs send updates - reduce memory consumption

2014-02-03 Thread David Sterba
[Sorry if you see this twice, first attempt hasn't appeared in the list yet]

This reduces size of the path buffer in common case. Has been tested by
xfstests, but at the moment v3.13 with or without this patch blows, so I'm
sending it anyway.

Based on current btrfs-next/master.

David Sterba (6):
  btrfs: send: remove prepared member from fs_path
  btrfs: send: remove virtual_mem member from fs_path
  btrfs: send: squeeze bitfilelds in fs_path
  btrfs: send: lower memory requirements in common case
  btrfs: send: remove BUG from process_all_refs
  btrfs: send: remove BUG_ON from name_cache_delete

 fs/btrfs/send.c |  153 ++
 1 files changed, 62 insertions(+), 91 deletions(-)

-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] btrfs: send: remove BUG from process_all_refs

2014-02-03 Thread David Sterba
There are only 2 static callers, the BUG would normally be never
reached, but let's be nice.

Signed-off-by: David Sterba 
---
 fs/btrfs/send.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 4e3a3d413417..b0bf4ff40b5b 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -3568,7 +3568,10 @@ static int process_all_refs(struct send_ctx *sctx,
root = sctx->parent_root;
cb = __record_deleted_ref;
} else {
-   BUG();
+   btrfs_err(sctx->send_root->fs_info,
+   "Wrong command %d in process_all_refs", cmd);
+   ret = -EINVAL;
+   goto out;
}
 
key.objectid = sctx->cmp_key->objectid;
-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] btrfs: send: remove BUG_ON from name_cache_delete

2014-02-03 Thread David Sterba
If cleaning the name cache fails, we could try to proceed at the cost of
some memory leak. This is not expected to happen often.

Signed-off-by: David Sterba 
---
 fs/btrfs/send.c |   11 +--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index b0bf4ff40b5b..7b17b778eaf7 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -1849,13 +1849,20 @@ static void name_cache_delete(struct send_ctx *sctx,
 
nce_head = radix_tree_lookup(&sctx->name_cache,
(unsigned long)nce->ino);
-   BUG_ON(!nce_head);
+   if (!nce_head) {
+   btrfs_err(sctx->send_root->fs_info,
+ "name_cache_delete lookup failed ino %llu cache size %d, leaking 
memory",
+   nce->ino, sctx->name_cache_size);
+   }
 
list_del(&nce->radix_list);
list_del(&nce->list);
sctx->name_cache_size--;
 
-   if (list_empty(nce_head)) {
+   /*
+* We may not get to the final release of nce_head if the lookup fails
+*/
+   if (nce_head && list_empty(nce_head)) {
radix_tree_delete(&sctx->name_cache, (unsigned long)nce->ino);
kfree(nce_head);
}
-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Btrfs

2014-02-03 Thread Chris Mason

On Mon 03 Feb 2014 12:54:05 PM EST, David Sterba wrote:

On Thu, Jan 30, 2014 at 04:52:54PM -0500, Chris Mason wrote:

Chris Mason (3) commits (+64/-32):
 Btrfs: setup inode location during btrfs_init_inode_locked (+9/-9)
 Btrfs: don't use ram_bytes for uncompressed inline items (+52/-22)


The patches are CC: stable, but haven't gone through the mailinglist.
Are they still going to be picked by stable?


We do need both in -stable, I'll help with backports.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] btrfs: add simple debugfs interface

2014-02-03 Thread David Sterba
Help during debugging to export various interesting information and
tunables without the need of extra mount options or ioctls.

Usage:
* declare your variable in sysfs.h, and include where you need it
* define the variable in sysfs.c and make it visible via
  debugfs_create_TYPE

Depends on CONFIG_DEBUG_FS.

Signed-off-by: David Sterba 
---

v3:
- fix typo in changelog

v2:
- added missing return to btrfs_init_debugfs
- updated error handling to btrfs_init_sysfs, the cleanup
  is done in btrfs_exit_sysfs
- removed #ifdef in btrfs_exit_sysfs,

 fs/btrfs/sysfs.c |   33 +++--
 1 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 782374d8fd19..b725e4574448 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ctree.h"
 #include "disk-io.h"
@@ -593,6 +594,12 @@ static int add_device_membership(struct btrfs_fs_info 
*fs_info)
 /* /sys/fs/btrfs/ entry */
 static struct kset *btrfs_kset;
 
+/* /sys/kernel/debug/btrfs */
+static struct dentry *btrfs_debugfs_root_dentry;
+
+/* Debugging tunables and exported data */
+u64 btrfs_debugfs_test;
+
 int btrfs_sysfs_add_one(struct btrfs_fs_info *fs_info)
 {
int error;
@@ -636,27 +643,41 @@ failure:
return error;
 }
 
+static int btrfs_init_debugfs(void)
+{
+#ifdef CONFIG_DEBUG_FS
+   btrfs_debugfs_root_dentry = debugfs_create_dir("btrfs", NULL);
+   if (!btrfs_debugfs_root_dentry)
+   return -ENOMEM;
+
+   debugfs_create_u64("test", S_IRUGO | S_IWUGO, btrfs_debugfs_root_dentry,
+   &btrfs_debugfs_test);
+#endif
+   return 0;
+}
+
 int btrfs_init_sysfs(void)
 {
int ret;
+
btrfs_kset = kset_create_and_add("btrfs", NULL, fs_kobj);
if (!btrfs_kset)
return -ENOMEM;
 
-   init_feature_attrs();
+   ret = btrfs_init_debugfs();
+   if (ret)
+   return ret;
 
+   init_feature_attrs();
ret = sysfs_create_group(&btrfs_kset->kobj, &btrfs_feature_attr_group);
-   if (ret) {
-   kset_unregister(btrfs_kset);
-   return ret;
-   }
 
-   return 0;
+   return ret;
 }
 
 void btrfs_exit_sysfs(void)
 {
sysfs_remove_group(&btrfs_kset->kobj, &btrfs_feature_attr_group);
kset_unregister(btrfs_kset);
+   debugfs_remove_recursive(btrfs_debugfs_root_dentry);
 }
 
-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Btrfs

2014-02-03 Thread David Sterba
On Thu, Jan 30, 2014 at 04:52:54PM -0500, Chris Mason wrote:
> Chris Mason (3) commits (+64/-32):
> Btrfs: setup inode location during btrfs_init_inode_locked (+9/-9)
> Btrfs: don't use ram_bytes for uncompressed inline items (+52/-22)

The patches are CC: stable, but haven't gone through the mailinglist.
Are they still going to be picked by stable?

The commit ids:
90d3e592e99b8e374ead2b45148abf506493a959
514ac8ad8793a097c0c9d89202c642479d6dfa34

but unfortunatelly neither applies directly to anything 3.10+


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Receive on same subvolume

2014-02-03 Thread Chris Murphy

On Jan 29, 2014, at 2:26 PM, Matthew Lai  wrote:

> Hello,
> 
> Is this supposed to work? (/data is the root volume, /data/a is a subvolume)
> 
> btrfs subvolume snapshot /data/a /data/b
> # make some changes in b
> btrfs send -p /data/a /data/b > delta
> btrfs receive /data/a < delta
> 
> I'm getting "ERROR: could not find parent subvolume" on receive.

> What I'm trying to do is to back up using send/receive, but I don't have 50% 
> free space, and (please correct me if I'm wrong) since receive doesn't do 
> deduplication, I want to use snapshot to do the initial bootstrapping, 
> instead of send/receive without a parent.

I think you've oversimplified your commands, because it looks like you're using 
send/receive on the same file system. But if it's a backup, necessarily you'd 
have to be sending the subvolume(s) to another file system on another disk 
(either on the same system or remotely). So that needs some clarification.

Also, btrfs send requires subvolumes to be read only. Are they?

And btrfs incremental receive expects the identical parent already on the 
destination. Is it?

Also, while I'm not certain it matters, man page says to use -f to specify 
files. I haven't tested < and >. But then also the step where you create this 
intermediate snapshot file isn't necessary, just combine the send receive 
commands through pipe.


https://btrfs.wiki.kernel.org/index.php/Incremental_Backup


Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: disable snapshot aware defrag for now

2014-02-03 Thread Josef Bacik


On 02/03/2014 09:48 AM, David Sterba wrote:

On Wed, Jan 29, 2014 at 04:05:30PM -0500, Josef Bacik wrote:

It's just broken and it's taking a lot of effort to fix it, so for now just
disable it so people can defrag in peace.  Thanks,

Cc: sta...@vger.kernel.org
Signed-off-by: Josef Bacik 
---
  fs/btrfs/inode.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3b65987..8c0bc31 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2628,7 +2628,7 @@ static int btrfs_finish_ordered_io(struct 
btrfs_ordered_extent *ordered_extent)
EXTENT_DEFRAG, 1, cached_state);
if (ret) {
u64 last_snapshot = btrfs_root_last_snapshot(&root->root_item);
-   if (last_snapshot >= BTRFS_I(inode)->generation)
+   if (0 && last_snapshot >= BTRFS_I(inode)->generation)

That's not very flexible, how are we supposed to test that in the
meantime? Editing sources is not the peferred way.


Well since I'm the only one currently working on fixing it I'm not 
worried about it.  If anybody else wants to fix it they can easily 
change it themselves.  It is so totally broken that I don't want it 
being turned on by anybody who can't edit this and change it 
themselves.  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: send: replace check with an assert in gen_unique_name

2014-02-03 Thread David Sterba
The buffer passed to snprintf can hold the fully expanded format string,
64 = 3x largest ULL + 3x char + trailing null.  I don't think that removing the
check entirely is a good idea, hence the ASSERT.

Signed-off-by: David Sterba 
---


 fs/btrfs/send.c |6 +-
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 730dce395858..f65355dfc882 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -1408,11 +1408,7 @@ static int gen_unique_name(struct send_ctx *sctx,
while (1) {
len = snprintf(tmp, sizeof(tmp), "o%llu-%llu-%llu",
ino, gen, idx);
-   if (len >= sizeof(tmp)) {
-   /* should really not happen */
-   ret = -EOVERFLOW;
-   goto out;
-   }
+   ASSERT(len < sizeof(tmp));
 
di = btrfs_lookup_dir_item(NULL, sctx->send_root,
path, BTRFS_FIRST_FREE_OBJECTID,
-- 
1.7.9

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND] xfstests: add test for btrfs incremental send data corruption

2014-02-03 Thread Filipe David Borba Manana
Btrfs incremental send had an issue where it would detect a non-existent
file hole and then overwrite the file section that hole covers with zeroes,
overriding file data that it shouldn't.

The respective btrfs kernel patch that fixed this issue is titled:

   Btrfs: fix send file hole detection leading to data corruption
   (https://patchwork.kernel.org/patch/3544831/)

Signed-off-by: Filipe David Borba Manana 
Reviewed-by: Josef Bacik 
---

This is a patch resend, without any changes to the test, since Dave Chinner told
in his last e-mail to resend any patches that he might have missed on the last
patch merge party.

 tests/btrfs/034 |  101 +++
 tests/btrfs/034.out |6 +++
 tests/btrfs/group   |1 +
 3 files changed, 108 insertions(+)
 create mode 100755 tests/btrfs/034
 create mode 100644 tests/btrfs/034.out

diff --git a/tests/btrfs/034 b/tests/btrfs/034
new file mode 100755
index 000..db792de
--- /dev/null
+++ b/tests/btrfs/034
@@ -0,0 +1,101 @@
+#! /bin/bash
+# FS QA Test No. btrfs/034
+#
+# Test for a btrfs incremental send data corruption issue due to
+# bad detection of file holes.
+#
+#---
+# Copyright (c) 2014 Filipe Manana.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+tmp=`mktemp -d`
+
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+rm -fr $tmp
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_need_to_be_root
+
+rm -f $seqres.full
+
+_scratch_mkfs >/dev/null 2>&1
+_scratch_mount
+
+# Create a file such that its file extent items span at least 3 btree leafs.
+# This is necessary to trigger a btrfs incremental send bug where file hole
+# detection was not correct, leading to data corruption by overriding latest
+# data regions of a file with zeroes.
+
+run_check $XFS_IO_PROG -f -c "truncate 104857600" $SCRATCH_MNT/foo
+
+for ((i = 0; i < 940; i++))
+do
+OFFSET=$((32768 + i * 8192))
+LEN=$((OFFSET + 8192))
+run_check $XFS_IO_PROG -c "falloc -k $OFFSET $LEN" $SCRATCH_MNT/foo
+run_check $XFS_IO_PROG -c "pwrite -S 0xf0 $OFFSET 4096" $SCRATCH_MNT/foo
+done
+
+run_check $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
+$SCRATCH_MNT/mysnap1
+
+run_check $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
+run_check $XFS_IO_PROG -c "truncate 3882008" $SCRATCH_MNT/foo
+
+run_check $BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
+$SCRATCH_MNT/mysnap2
+
+run_check $BTRFS_UTIL_PROG send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
+run_check $BTRFS_UTIL_PROG send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
+-f $tmp/2.snap
+
+md5sum $SCRATCH_MNT/foo | _filter_scratch
+md5sum $SCRATCH_MNT/mysnap1/foo | _filter_scratch
+md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch
+
+_scratch_unmount
+_check_btrfs_filesystem $SCRATCH_DEV
+_scratch_mkfs >/dev/null 2>&1
+_scratch_mount
+
+run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/1.snap
+md5sum $SCRATCH_MNT/mysnap1/foo | _filter_scratch
+
+run_check $BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $tmp/2.snap
+md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch
+
+_scratch_unmount
+_check_btrfs_filesystem $SCRATCH_DEV
+
+status=0
+exit
diff --git a/tests/btrfs/034.out b/tests/btrfs/034.out
new file mode 100644
index 000..808e6b4
--- /dev/null
+++ b/tests/btrfs/034.out
@@ -0,0 +1,6 @@
+QA output created by 034
+9023ed93111c422d82e9cd54043a6fb0  SCRATCH_MNT/foo
+8e58ce8749d203f29f6b8f6990da722f  SCRATCH_MNT/mysnap1/foo
+9023ed93111c422d82e9cd54043a6fb0  SCRATCH_MNT/mysnap2/foo
+8e58ce8749d203f29f6b8f6990da722f  SCRATCH_MNT/mysnap1/foo
+9023ed93111c422d82e9cd54043a6fb0  SCRATCH_MNT/mysnap2/foo
diff --git a/tests/btrfs/group b/tests/btrfs/group
index b29236c..f9f062f 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -36,3 +36,4 @@
 031 auto quick
 032 auto quick
 033 auto quick
+034 auto quick
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http:/

btrfs raid5 unmountable

2014-02-03 Thread Tetja Rediske
Hi,

since Freenode is doomed today, i ask the direct way.

Following Filesystem:

Label: 'data'  uuid: 3a6fd6d7-5943-4cad-b56f-2e6dcabff453
Total devices 6 FS bytes used 7.02TiB
devid1 size 1.82TiB used 1.82TiB path /dev/sda3
devid2 size 2.73TiB used 2.48TiB path /dev/sdc3
devid3 size 931.38GiB used 931.38GiB path /dev/sdd3
devid5 size 931.51GiB used 931.51GiB path /dev/sde1
devid6 size 931.51GiB used 931.51GiB path /dev/sdf1
devid7 size 2.73TiB used 2.48TiB path /dev/sdb3

Btrfs v3.12-dirty

If I try to mount it from dmesg:

[30644.681210] parent transid verify failed on 32059176910848 wanted
259627 found 259431 
[30644.681307] parent transid verify failed on 32059176910848 wanted
259627 found 259431 
[30644.681399] btrfs bad tree block start 0 32059176910848 
[30644.681407] Failed to read block groups: -5 
[30644.776879] btrfs: open_ctree failed

btrfs check aborts with (many of the 1st lines)

[...] 
Ignoring transid failure parent transid verify failed on 32059196616704
wanted 259627 found 259432
parent transid verify failed on 32059196616704 wanted 259627 found
259432 Check tree block failed, want=32059196616704, have=32059196747776

parent transid verify failed on 32059196616704 wanted 259627 found259432
Ignoring transid failure
parent transid verify failed on 32059196616704 wanted 259627 found259432
Ignoring transid failure
parent transid verify failed on 32059177230336 wanted 259627 found259431
Ignoring transid failure
parent transid verify failed on 32059196620800 wanted 259627 found259432
parent transid verify failed on 32059196620800 wanted 259627 found259432
Check tree block failed, want=32059196620800, have=1983699371120445514
Check tree block failed, want=32059196620800, have=1983699371120445514
Check tree block failed, want=32059196620800, have=1983699371120445514
read block failed check_tree_block
btrfs: cmds-check.c:2212: check_owner_ref: Assertion `!(rec->is_root)'
failed.
Aborted

What happened before:

One disk was faulty, I added a new one and removed the old one,
followed by a balance.

So far so good.

Some days after this I accidently removed a SATA Power Connector from
another drive, without noticing it at first. Worked about an hour on
the system, building new Kernel on another Filesystem. Rebooted with my
new Kernel and the FS was not mountable. I noticed the "missing" disk
and reattached the power.

So far i tried:

mount -o recovery
btrfs check
(after google) btrfs-zero-log

Sadly no luck. Whoever I can get my Files with btrfs restore. The
Filesystem contains mainly Mediafiles, so it is not so bad, if they
were lost, but restoring them from backups and sources will need
atleast about a week. (Most of the Files are mirrored on a private
Server, but even with 100MBit this takes a lot of time ; )

Some Idea who to recover this FS?

Kind Regards
Tetja Rediske




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: add regression test for running snapshot and send concurrently

2014-02-03 Thread Wang Shilong
From: Wang Shilong 

Btrfs would fail to send if snapshot run concurrently, this test is to make
sure we have fixed the bug.

Signed-off-by: Wang Shilong 
---
 tests/btrfs/034 | 75 +
 tests/btrfs/034.out |  2 ++
 tests/btrfs/group   |  1 +
 3 files changed, 78 insertions(+)
 create mode 100644 tests/btrfs/034
 create mode 100644 tests/btrfs/034.out

diff --git a/tests/btrfs/034 b/tests/btrfs/034
new file mode 100644
index 000..e27e3cf
--- /dev/null
+++ b/tests/btrfs/034
@@ -0,0 +1,75 @@
+#!/bin/bash
+# FS QA Test No. btrfs/034
+#
+# Regression test for running snapshots and send concurrently.
+#
+#---
+# Copyright (c) 2014 Fujitsu.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+rm -f $tmp.*
+}
+
+trap "_cleanup ; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+
+_scratch_mkfs > /dev/null 2>&1
+_scratch_mount
+
+
+touch $SCRATCH_MNT/foo
+
+# get file with fragments by using backwards writes.
+for i in `seq 10240 -1 1`; do
+   $XFS_IO_PROG -f -d -c "pwrite $(($i * 4096)) 4096" \
+   $SCRATCH_MNT/foo > /dev/null | _filter_xfs_io
+done
+
+$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
+   $SCRATCH_MNT/snap_1 >> $seqres.full 2>&1
+
+$BTRFS_UTIL_PROG send -f $SCRATCH_MNT/send_file \
+   $SCRATCH_MNT/snap_1 >> $seqres.full 2>&1 &
+
+pid=$!
+
+$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/snap_1 \
+   $SCRATCH_MNT/snap_2 >> $seqres.full 2>&1
+
+wait $pid || echo "Failed to send, see dmesg"
+
+echo "Silence is golden"
+status=0 ; exit
diff --git a/tests/btrfs/034.out b/tests/btrfs/034.out
new file mode 100644
index 000..4c8873c
--- /dev/null
+++ b/tests/btrfs/034.out
@@ -0,0 +1,2 @@
+QA output created by 034
+Silence is golden
diff --git a/tests/btrfs/group b/tests/btrfs/group
index b29236c..f9f062f 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -36,3 +36,4 @@
 031 auto quick
 032 auto quick
 033 auto quick
+034 auto quick
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: disable snapshot aware defrag for now

2014-02-03 Thread David Sterba
On Wed, Jan 29, 2014 at 04:05:30PM -0500, Josef Bacik wrote:
> It's just broken and it's taking a lot of effort to fix it, so for now just
> disable it so people can defrag in peace.  Thanks,
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 3b65987..8c0bc31 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2628,7 +2628,7 @@ static int btrfs_finish_ordered_io(struct 
> btrfs_ordered_extent *ordered_extent)
>   EXTENT_DEFRAG, 1, cached_state);
>   if (ret) {
>   u64 last_snapshot = btrfs_root_last_snapshot(&root->root_item);
> - if (last_snapshot >= BTRFS_I(inode)->generation)
> + if (0 && last_snapshot >= BTRFS_I(inode)->generation)

That's not very flexible, how are we supposed to test that in the
meantime? Editing sources is not the peferred way.

I was thinking about adding a config option that would cover any
experimental/broken features, this one be the first, as we currently
have no other way to disable it. I'd rather avoid adding a temporary
mount option.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Receive on same subvolume

2014-02-03 Thread Felix Blanke
Hi Matthew,

I'm not sure what you are trying to achive. Couldn't you simply do
another snapshot of the subvolume? I don't understand why you want to
use send/receive on the same subvolume to be honest.

Regards,
Felix

On Wed, Jan 29, 2014 at 10:26 PM, Matthew Lai  wrote:
> Hello,
>
> Is this supposed to work? (/data is the root volume, /data/a is a subvolume)
>
> btrfs subvolume snapshot /data/a /data/b
> # make some changes in b
> btrfs send -p /data/a /data/b > delta
> btrfs receive /data/a < delta
>
> I'm getting "ERROR: could not find parent subvolume" on receive.
>
> What I'm trying to do is to back up using send/receive, but I don't have 50%
> free space, and (please correct me if I'm wrong) since receive doesn't do
> deduplication, I want to use snapshot to do the initial bootstrapping,
> instead of send/receive without a parent.
>
> Thanks!
> Matthew
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html