Re: [PATCH 2/4 v3] fiemap: add EXTENT_DATA_COMPRESSED flag

2014-07-17 Thread Andreas Dilger
David,
any progress on this patch series?

I never saw an updated version of this patch series after the last round of
reviews, but it would be great to move it forward.  I have filefrag patches
in my e2fsprogs tree waiting for an updated version of your patch.

I recall the main changes were:
- add FIEMAP_EXTENT_PHYS_LENGTH flag to indicate if fe_phys_length was valid
- rename fe_length to fe_logi_length and #define fe_length fe_logi_length
- always fill in fe_phys_length (= fe_logi_length for uncompressed files)
  and set FIEMAP_EXTENT_PHYS_LENGTH whether the extent is compressed or not
- add WARN_ONCE() in fiemap_fill_next_extent() as described below

I don't know if there was any clear statement about whether there should be
separate FIEMAP_EXTENT_PHYS_LENGTH and FIEMAP_EXTENT_DATA_COMPRESSED flags,
or if the latter should be implicit?  Probably makes sense to have separate
flags.  It should be fine to use:

#define FIEMAP_EXTENT_PHYS_LENGTH   0x0010

since this flag was never used.

Cheers, Andreas

On Dec 12, 2013, at 5:02 PM, Andreas Dilger adil...@dilger.ca wrote:
 On Dec 12, 2013, at 4:24 PM, Dave Chinner da...@fromorbit.com wrote:
 On Thu, Dec 12, 2013 at 04:25:59PM +0100, David Sterba wrote:
 This flag was not accepted when fiemap was proposed [2] due to lack of
 in-kernel users. Btrfs has compression for a long time and we'd like to
 see that an extent is compressed in the output of 'filefrag' utility
 once it's taught about it.
 
 For that purpose, a reserved field from fiemap_extent is used to let the
 filesystem store along the physcial extent length when the flag is set.
 This keeps compatibility with applications that use FIEMAP.
 
 I'd prefer to just see the new physical length field always filled
 out, regardless of whether it is a compressed extent or not. In
 terms of backwards compatibility to userspace, it makes no
 difference because the value of reserved/unused fields is undefined
 by the API. Yes, the implementation zeros them, but there's nothing
 in the documentation that says reserved fields must be zero.
 Hence I think we should just set it for every extent.
 
 I'd actually thought the same thing while reading the patch, but I figured
 people would object because it implies that old kernels will return a
 physical length of 0 bytes (which might be valid) and badly-written tools
 will not work correctly on older kernels.  That said, applications _should_
 be checking the FIEMAP_EXTENT_DATA_COMPRESSED flag, and I suspect in the
 future fewer developers will be confused if fe_phys_length == fe_length
 going forward.
 
 If the initial tools get it right (in particular filefrag), then hopefully
 others will get it correct also.
 
 From the point of view of the kernel API (fiemap_fill_next_extent),
 passing the physical extent size in the len parameter for normal
 extents, then passing 0 for the physical length makes absolutely
 no sense.
 
 IOWs, what you have created is a distinction between the extent's
 logical length and it's physical length. For uncompressed
 extents, they are both equal and they should both be passed to
 fiemap_fill_next_extent as the same value. Extents where they are
 different (i.e.  encoded extents) is when they can be different.
 Perhaps fiemap_fill_next_extent() should check and warn about
 mismatches when they differ and the relevant flags are not set...
 
 Seems reasonable to have a WARN_ONCE() in that case.  That would catch bugs
 in the filesystem, code as well:
 
   WARN_ONCE(phys_len != lgcl_len 
 !(flags  FIEMAP_EXTENT_DATA_COMPRESSED),
 physical len %llu != logical length %llu without 
 DATA_COMPRESSED\n,
 phys_len, logical_len, phys_len, logical_len);
 
 diff --git a/include/uapi/linux/fiemap.h b/include/uapi/linux/fiemap.h
 index 93abfcd..0e32cae 100644
 --- a/include/uapi/linux/fiemap.h
 +++ b/include/uapi/linux/fiemap.h
 @@ -19,7 +19,9 @@ struct fiemap_extent {
 __u64 fe_physical; /* physical offset in bytes for the start
 * of the extent from the beginning of the disk */
 __u64 fe_length;   /* length in bytes for this extent */
 -   __u64 fe_reserved64[2];
 +   __u64 fe_phys_length; /* physical length in bytes, undefined if
 +  * DATA_COMPRESSED not set */
 +   __u64 fe_reserved64;
 __u32 fe_flags;/* FIEMAP_EXTENT_* flags for this extent */
 __u32 fe_reserved[3];
 };
 
 The comment for fe_length needs to change, too, because it needs to
 indicate that it is the logical extent length and that it may be
 different to the fe_phys_length depending on the flags that are set
 on the extent.
 
 Would it make sense to rename fe_length to fe_logi_length (or something,
 I'm open to suggestions), and have a compat macro:
 
 #define fe_length fe_logi_length
 
 around for older applications?  That way, new developers would start to
 use the new name, old applications would still compile for both newer and
 older interfaces, 

Kindly consider my proposal

2014-07-17 Thread Mr. Alfred Robert
 Original Message 
Good day

My name is Alfred Robert and I work with the finance house here in the
Netherlands. I found your address through my countries international
Web directory. During our last meeting and examination of the bank
accounts here in the Netherlands, my department found a dormant account
with an enormous sum of US$ 6,500,000.00 (Six million five hundred
thousand US dollar) which was deposited by late Mr. Williams
from England.

Before his death he transferred the sum of US$ 6,500,000.00 (Six million
five hundred thousand US dollar) to a bank here in Netherlands. From our
investigation he had no beneficiary or next of kin to claim these funds.
The financial regulations of our bank allow only a foreigner to stand as
next relatives or next of kin. The request of a foreigner as a next of kin
is base on the fact that the depositor was a foreigner and somebody in the
Netherlands cannot stand as the next of kin.

I need your permission as the next relative or next of kin of our deceased
customer, so that the funds can be released and transferred to your
account, at the end of the transaction 40% will be for you and 60% will be
for me and my colleague.
We need a foreign account. I still work at the financial house and that's
the actual reason that I need a second party or person to stand and work
with me and apply to the bank here in the Netherlands as the next of kin.
I have in my possession all the necessary documents to have this
transaction carried out successfully.

Further information will be provided upon the receipt of your prompt
response and I want you to know that there is no risk involved. I will
need us to work together if you are interested and I assure you that I
will provide all useful information and documentation as this business
needs urgent attention as there is no much time to waste.

Kindly Write me directly with your Name, Address, telephone and fax number
on this  so I can explain the procedures.

Regards
ALFRED ROBERT.
ar0654...@gmail.com

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] xfstests/btrfs: _devmgt_add() to check if the device is back online

2014-07-17 Thread Anand Jain
btrfs/003 uses a method to remove the device as part of the test
case, and after the test completes the removed device is added
back to the system. However on certain system, albeit the slow
running system the device comes back a bit later, and so the
latter occurring sub-test with in the btrfs/003 fails.

This patch adds script to wait and test if the device is back online,
and thus report the same to to the full log.

Signed-off-by: Anand Jain anand.j...@oracle.com
---
 common/rc | 25 +
 1 file changed, 25 insertions(+)

diff --git a/common/rc b/common/rc
index 2c83340..4a6511f 100644
--- a/common/rc
+++ b/common/rc
@@ -2054,6 +2054,31 @@ _devmgt_add()
tdl=`echo ${1} | cut -d: -f 2-|sed 's/:/ /g'`
 
echo ${tdl}   /sys/class/scsi_host/host${h}/scan || _fail Add disk 
failed
+
+   # ensure the device comes online
+   dev_back_oneline=0
+   for i in `seq 1 10`; do
+   if [ -d /sys/class/scsi_device/${1}/device/block ]; then
+   dev=`ls /sys/class/scsi_device/${1}/device/block`
+   for j in `seq 1 10`;
+   do
+   stat /dev/$dev  /dev/null 21
+   if [ $? -eq 0 ]; then
+   dev_back_oneline=1
+   break
+   fi
+   sleep 1
+   done
+   break
+   else
+   sleep 1
+   fi
+   done
+   if [ $dev_back_oneline -eq 0 ]; then
+   echo /dev/$dev online failed  $seqres.full
+   else
+   echo /dev/$dev is back online  $seqres.full
+   fi
 }
 
 _require_fstrim()
-- 
2.0.0.153.g79d

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is it safe to mount subvolumes of already-mounted volumes (even with different options)?

2014-07-17 Thread Sebastian Ochmann

Hello,

I need to clarify, I'm _not_ sharing a drive between multiple computers 
at the _same_ time. It's a portable device which I use at different 
locations with different computers. I just wanted to give a rationale 
for mounting the whole drive to some mountpoint and then also part of 
that drive (a subvolume) to the respective computer's /home mountpoint. 
So it's controlled by the same kernel in the same computer, it's just 
that part of the filesystem is mounted at multiple mountpoints, much 
like a bind-mount, but I'm interested in mounting a subvolume of the 
already-mounted volume to some other mountpoint. Sorry for the confusion.


Best regards
Sebastian


On 17.07.2014 01:18, Chris Murphy wrote:


On Jul 16, 2014, at 4:18 PM, Sebastian Ochmann ochm...@informatik.uni-bonn.de 
wrote:


Hello,

I'm sharing a btrfs-formatted drive between multiple computers and each of the 
machines has a separate home directory on that drive.


2+ computers writing to the same block device? I don't see how this is safe. 
Seems possibly a bug that the 1st mount event isn't setting some metadata so 
that another kernel instance knows not to allow another mount.


Chris Murphy


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs fi df shows unknown ?

2014-07-17 Thread Swâmi Petaramesh
Hi there,

Since a few days, I have noticed that btrfs fi df / displays an entry about 
unknown used space, and I can see this on several Fedora machines, so it is 
not an issue related to a given system...

Does anybody know what these unknown data are ?

i.e:

# btrfs fi df /
Data, single: total=106.00GiB, used=88.28GiB
System, DUP: total=32.00MiB, used=24.00KiB
Metadata, DUP: total=1.00GiB, used=520.36MiB
unknown, single: total=176.00MiB, used=0.00

# btrfs --version
Btrfs v3.14.2

# uname -r
3.15.5-200.fc20.x86_64

TIA, kind regards.

-- 
Swâmi Petaramesh sw...@petaramesh.org http://petaramesh.org PGP 9076E32E

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/2] vfs / btrfs: add support for ustat()

2014-07-17 Thread Christoph Hellwig
On Wed, Jul 16, 2014 at 02:37:56PM -0700, Luis R. Rodriguez wrote:
 From: Luis R. Rodriguez mcg...@suse.com
 
 This makes the implementation simpler by stuffing the struct on
 the driver and just letting the driver iinsert it and remove it
 onto the sb list. This avoids the kzalloc() completely.

Again, NAK.  Make btrfs report the proper anon dev_t in stat and
everything will just work.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] Btrfs: fix abnormal long waiting in fsync

2014-07-17 Thread Liu Bo
xfstests generic/127 detected this problem.

With commit 7fc34a62ca4434a79c68e23e70ed26111b7a4cf8, now fsync will only flush
data within the passed range.  This is the cause of the above problem,
-- btrfs's fsync has a stage called 'sync log' which will wait for all the
ordered extents it've recorded to finish.

In xfstests/generic/127, with mixed operations such as truncate, fallocate,
punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will
mmap, and then msync.  And I find that msync will wait for quite a long time
(about 20s in my case), thanks to ftrace, it turns out that the previous
fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the
range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants,
there can be some ordered extents created but not getting corresponding pages
flushed, then they're left in memory until we fsync which runs into the
stage 'sync log', and fsync will just wait for the system writeback thread
to flush those pages and get ordered extents finished, so the latency is
inevitable.

This adds a flush similar to btrfs_start_ordered_extent() in
btrfs_wait_logged_extents() to fix that.

Reviewed-by: Miao Xie mi...@cn.fujitsu.com
Signed-off-by: Liu Bo bo.li@oracle.com
---
v3:
   Add a check for IO_DONE flag to avoid unnecessary flush.
v2: 
   Move flush part into btrfs_wait_logged_extents() to get the flush range
   more precise.

 fs/btrfs/ordered-data.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index e12441c..7187b14 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -484,8 +484,19 @@ void btrfs_wait_logged_extents(struct btrfs_root *log, u64 
transid)
   log_list);
list_del_init(ordered-log_list);
spin_unlock_irq(log-log_extents_lock[index]);
+
+   if (!test_bit(BTRFS_ORDERED_IO_DONE, ordered-flags) 
+   !test_bit(BTRFS_ORDERED_DIRECT, ordered-flags)) {
+   struct inode *inode = ordered-inode;
+   u64 start = ordered-file_offset;
+   u64 end = ordered-file_offset + ordered-len - 1;
+
+   WARN_ON(!inode);
+   filemap_fdatawrite_range(inode-i_mapping, start, end);
+   }
wait_event(ordered-wait, test_bit(BTRFS_ORDERED_IO_DONE,
   ordered-flags));
+
btrfs_put_ordered_extent(ordered);
spin_lock_irq(log-log_extents_lock[index]);
}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is it safe to mount subvolumes of already-mounted volumes (even with different options)?

2014-07-17 Thread Hugo Mills
On Thu, Jul 17, 2014 at 12:18:37AM +0200, Sebastian Ochmann wrote:
 I'm sharing a btrfs-formatted drive between multiple computers and each of
 the machines has a separate home directory on that drive. The root of the
 drive is mounted at /mnt/tray and the home directory for machine {hostname}
 is under /mnt/tray/Homes/{hostname}. Up until now, I have mounted /mnt/tray
 like a normal volume and then did an additional bind-mount of
 /mnt/tray/Homes/{hostname} to /home.

   You've said you're not sharing it concurrently, which is good -- as
long as you've only got one machine accessing it at the same time,
you're fine there.

 Now I have a new drive and wanted to do things a bit more advanced by
 creating subvolumes for each of the machines' home directories so that I can
 also do independent snapshotting. I guess I could use the bind-mount method
 like before but my question is if it is considered safe to do an additional,
 regular mount of one of the subvolumes to /home instead, like
 
 mount /dev/sdxN /mnt/tray
 mount -o subvol=/Homes/{hostname} /dev/sdxN /home
 
 When I experimented with such additional mounts of subvolumes of
 already-mounted volumes, I noticed that the mount options of the additional
 subvolume mount might differ from the original mount. For instance, the
 root volume might be mounted with noatime while the subvolume mount may
 have relatime.
 
 So my questions are: Is mounting a subvolume of an already mounted volume
 considered safe

   Yes, absolutely:

hrm@amelia:~$ mount | grep btrfs
/dev/sda2 on /boot type btrfs (rw,noatime,space_cache)
/dev/sda2 on /home type btrfs (rw,noatime,space_cache)
/dev/sda2 on /media/video type btrfs (rw,noatime,space_cache)
/dev/sda2 on /media/pipeline type btrfs (rw,noatime,space_cache)
/dev/sda2 on /media/snarf type btrfs (rw,noatime,space_cache)
/dev/sda2 on /media/audio type btrfs (rw,noatime,space_cache)
/dev/sda2 on /srv/nfs/home type btrfs (rw,noatime,space_cache)
/dev/sda2 on /srv/nfs/video type btrfs (rw,noatime,space_cache)
/dev/sda2 on /srv/nfs/testing type btrfs (rw,noatime,space_cache)
/dev/sda2 on /srv/nfs/pipeline type btrfs (rw,noatime,space_cache)
/dev/sda2 on /srv/nfs/audio type btrfs (rw,noatime,space_cache)
/dev/sda2 on /srv/nfs/nadja type btrfs (rw,noatime,space_cache)

 and are there any combinations of possibly conflicting mount
 options one should be aware of (compression, autodefrag, cache clearing)? Is
 it advisable to use the same mount options for all mounts pointing to the
 same physical device?

   If you assume that the first mount options are the ones used for
everything, regardless of any different options provided in subsequent
mounts, then you probably won't go far wrong. It's not quite true:
some options do work on a per-mount basis, but most are
per-filesystem. I'm sure there was a list of them on the wiki at some
point, but I can't seem to track it down right now.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Try everything once,  except incest and folk-dancing. ---  


signature.asc
Description: Digital signature


Re: btrfs fi df shows unknown ?

2014-07-17 Thread Hugo Mills
On Thu, Jul 17, 2014 at 10:02:01AM +0200, Swâmi Petaramesh wrote:
 Hi there,
 
 Since a few days, I have noticed that btrfs fi df / displays an entry about 
 unknown used space, and I can see this on several Fedora machines, so it is 
 not an issue related to a given system...
 
 Does anybody know what these unknown data are ?

   It's the block reserve, which used to be part of metadata, but is
now split out to its own type. An updated userspace should be able to
show it properly.

   Hugo.

 i.e:
 
 # btrfs fi df /
 Data, single: total=106.00GiB, used=88.28GiB
 System, DUP: total=32.00MiB, used=24.00KiB
 Metadata, DUP: total=1.00GiB, used=520.36MiB
 unknown, single: total=176.00MiB, used=0.00
 
 # btrfs --version
 Btrfs v3.14.2
 
 # uname -r
 3.15.5-200.fc20.x86_64
 
 TIA, kind regards.
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Try everything once,  except incest and folk-dancing. ---  


signature.asc
Description: Digital signature


Re: Is it safe to mount subvolumes of already-mounted volumes (even with different options)?

2014-07-17 Thread Qu Wenruo


 Original Message 
Subject: Re: Is it safe to mount subvolumes of already-mounted volumes 
(even with different options)?

From: Sebastian Ochmann ochm...@informatik.uni-bonn.de
To: Chris Murphy li...@colorremedies.com, zhe.zhang.resea...@gmail.com
Date: 2014年07月17日 15:58

Hello,

I need to clarify, I'm _not_ sharing a drive between multiple 
computers at the _same_ time. It's a portable device which I use at 
different locations with different computers. I just wanted to give a 
rationale for mounting the whole drive to some mountpoint and then 
also part of that drive (a subvolume) to the respective computer's 
/home mountpoint. So it's controlled by the same kernel in the same 
computer, it's just that part of the filesystem is mounted at multiple 
mountpoints, much like a bind-mount, but I'm interested in mounting a 
subvolume of the already-mounted volume to some other mountpoint. 
Sorry for the confusion.


Best regards
Sebastian

If you mean something like the following use case:
# mount /dev/sdb1 -o subvolid=257 /home
# mount /dev/sdb1 -o subvolid=5 /some/other/place

That is completly OK.

But when it comes to different mount option, especially different ro/rw 
mount option,
although it is working for 3.16-rc*, the ro/rw mount option is still 
under disscussion and the current rc implement
will cause a kernel warning mounting a subvolume rw when it's first 
mounted as ro.


So in short:
1) mount subvolumes when the btrfs fs is already mounted.
Completly OK.

2) different mount option for different subvolume in one btrfs fs.
For most mount option including ro/rw, No.

Thanks,
Qu



On 17.07.2014 01:18, Chris Murphy wrote:


On Jul 16, 2014, at 4:18 PM, Sebastian Ochmann 
ochm...@informatik.uni-bonn.de wrote:



Hello,

I'm sharing a btrfs-formatted drive between multiple computers and 
each of the machines has a separate home directory on that drive.


2+ computers writing to the same block device? I don't see how this 
is safe. Seems possibly a bug that the 1st mount event isn't setting 
some metadata so that another kernel instance knows not to allow 
another mount.



Chris Murphy


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-porgs: fix xfstest btrfs/023 random failure

2014-07-17 Thread Anand Jain
xfstest btrfs/023 which does the following tests

create_group_profile raid0
check_group_profile RAID0

create_group_profile raid1
check_group_profile RAID1

create_group_profile raid10
check_group_profile RAID10

create_group_profile raid5
check_group_profile RAID5

create_group_profile raid6
check_group_profile RAID6

fails randomly with the error as below

 ERROR: device scan failed '/dev/sde' - Invalid argument

since failure is at random group profile it indicates to me that
btrfs kernel did not see the newly created btrfs on the device

To note: I have the following patch on the kernel which
is not yet integrated, but its not related to this bug.

btrfs: RFC: code optimize use btrfs_get_bdev_and_sb() at btrfs_scan_one_device
btrfs: looping 'mkfs.btrfs -f dev' may fail with EBUSY
btrfs: check generation as replace duplicates devid+uuid

This patch calls fsync() at btrfs_prepare_device().

With this btrfs/023 has NOT failed consistently for several long
iterations.

Signed-off-by: Anand Jain anand.j...@oracle.com
---
 utils.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/utils.c b/utils.c
index fbc5bde..e144dfd 100644
--- a/utils.c
+++ b/utils.c
@@ -741,6 +741,8 @@ int btrfs_prepare_device(int fd, char *file, int zero_end, 
u64 *block_count_ret,
}
*block_count_ret = block_count;
 
+   fsync(fd);
+
 zero_dev_error:
if (ret  0) {
fprintf(stderr, ERROR: failed to zero device '%s' - %s\n,
-- 
2.0.0.153.g79d

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix wrong manpage of defrag command

2014-07-17 Thread Liu Bo
'btrfs filesystem defrag' has an option '-t', whose manpage says

Any extent bigger than threshold given by -t option, will be
considered already defragged. Use 0 to take the kernel default, and
use 1 to say every single extent must be rewritten.

Here 'use 0' still works, it refers to the default value(256K), however,
'use 1' is an obvious typo, it should be -1, which means the largest value
it can be.

Right now, we use parse_size() which no more allow value '-1', so in
order to keep the manpage correct, this updates it to only keep value '0'.

If you want to make sure every single extent is rewritten, please use a fairly
large size, say 1G.

Reported-by: Sebastian Ochmann ochm...@informatik.uni-bonn.de
Signed-off-by: Liu Bo bo.li@oracle.com
---
 Documentation/btrfs-filesystem.txt | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/Documentation/btrfs-filesystem.txt 
b/Documentation/btrfs-filesystem.txt
index 0ee79cb..c9c0b00 100644
--- a/Documentation/btrfs-filesystem.txt
+++ b/Documentation/btrfs-filesystem.txt
@@ -41,8 +41,7 @@ The start position and the number of bytes to defragment can 
be specified by
 start and len using '-s' and '-l' options below.
 Any extent bigger than threshold given by '-t' option, will be considered
 already defragged.
-Use 0 to take the kernel default, and use 1 to
-say every single extent must be rewritten.
+Use 0 to take the kernel default.
 You can also turn on compression in defragment operations.
 +
 `Options`
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS FILE ID not unique when exporting many brtfs subvolumes

2014-07-17 Thread Hugo Mills
On Thu, Jul 17, 2014 at 10:40:14AM +, philippe.simo...@swisscom.com wrote:
 I have a problem using btrfs/nfs to store my vmware images.
[snip]
 - vmware is basing its NFS files locks on the nfs fileid field returned from 
 a NFS GETATTR request for the file being locked
   
 http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1007909
vmware assumes that these nfs fileid are unique per storage.
 
 - it seemed that these nfs fileid are only unique 'per-subvolume', but 
 because my nfs export contains many subvolumes,
 the nfs export has then my files (in different subvolume) with the same nfs 
 fileid.
 
 - no problem when I start all machine alone, but when 2 machines are running 
 at the same time, vmware seems to mix its reference to lock file and 
 sometimes kills one vm.
 
   in esx server, following messages : /var/log/vmkwarning.log : 
 
   2014-07-17T06:31:46.854Z cpu2:268913)WARNING: NFSLock: 1315: Inode 
 (Dup: 260 Orig: 260) has been recycled by server, freeing lock info for 
 .lck-0401
   2014-07-17T06:34:47.925Z cpu2:114740)WARNING: NFSLock: 2348: Unable to 
 remove lockfile .invalid, not found
   2014-07-17T10:18:50.320Z cpu0:32824)WARNING: NFSLock: 2348: Unable to 
 remove lockfile .invalid, not found
 
   and in machine log : 
   Message from sncubeesx02: The lock protecting vm-w7-sysp.vmdk 
 has been lost, 
   possibly due to underlying storage issues. If this virtual 
 machine is configured to be highly 
   available, ensure that the virtual machine is running on some 
 other host before clicking OK. 
   
 - vmware try to make its own file locking for flowing file type : 
   
 http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=10051
 
   VMNAME.vswp 
   DISKNAME-flat.vmdk 
   DISKNAME-ITERATION-delta.vmdk 
   VMNAME.vmx 
   VMNAME.vmxf 
   vmware.log
 
 Is there a way to deal with this problem ? is that a bug ? 

   Add an arbitrary and unique fsid=0x12345 value to the exports
declaration. For example, my server exports a number of subvolumes
from the same FS with:

/srv/nfs/nadja-rw,async,fsid=0x1729,no_subtree_check,no_root_squash \
   10.0.0.20 fe80::20
/srv/nfs/home -rw,async,fsid=0x1730,no_subtree_check,no_root_squash \
   fe80::/64
/srv/nfs/video-ro,async,fsid=0x1731,no_subtree_check \
   10.0.0.0/24 fe80::/64

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- You can get more with a kind word and a two-by-four than you ---   
   can with just a kind word.


signature.asc
Description: Digital signature


Re: [PATCH v3] Btrfs: fix abnormal long waiting in fsync

2014-07-17 Thread Chris Mason
On 07/17/2014 04:08 AM, Liu Bo wrote:
 xfstests generic/127 detected this problem.
 
 With commit 7fc34a62ca4434a79c68e23e70ed26111b7a4cf8, now fsync will only 
 flush
 data within the passed range.  This is the cause of the above problem,
 -- btrfs's fsync has a stage called 'sync log' which will wait for all the
 ordered extents it've recorded to finish.
 
 In xfstests/generic/127, with mixed operations such as truncate, fallocate,
 punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will
 mmap, and then msync.  And I find that msync will wait for quite a long time
 (about 20s in my case), thanks to ftrace, it turns out that the previous
 fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the
 range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants,
 there can be some ordered extents created but not getting corresponding pages
 flushed, then they're left in memory until we fsync which runs into the
 stage 'sync log', and fsync will just wait for the system writeback thread
 to flush those pages and get ordered extents finished, so the latency is
 inevitable.
 
 This adds a flush similar to btrfs_start_ordered_extent() in
 btrfs_wait_logged_extents() to fix that.

I was able to trigger the stalls with plain fsx as well.  Thanks!

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: NFS FILE ID not unique when exporting many brtfs subvolumes

2014-07-17 Thread Philippe.Simonet
Hi Hugo

 -Original Message-
 From: Hugo Mills [mailto:h...@carfax.org.uk]
 Sent: Thursday, July 17, 2014 1:13 PM
 To: Simonet Philippe, INI-ON-FIT-NW-IPE
 Cc: linux-btrfs@vger.kernel.org
 Subject: Re: NFS FILE ID not unique when exporting many brtfs subvolumes
 
 On Thu, Jul 17, 2014 at 10:40:14AM +, philippe.simo...@swisscom.com
 wrote:
  I have a problem using btrfs/nfs to store my vmware images.
 [snip]
  - vmware is basing its NFS files locks on the nfs fileid field returned 
  from a NFS
 GETATTR request for the file being locked
 
   http://kb.vmware.com/selfservice/microsites/search.do?language=en_
 UScmd=displayKCexternalId=1007909
 vmware assumes that these nfs fileid are unique per storage.
 
  - it seemed that these nfs fileid are only unique 'per-subvolume', but 
  because
 my nfs export contains many subvolumes,
  the nfs export has then my files (in different subvolume) with the same nfs
 fileid.
 
  - no problem when I start all machine alone, but when 2 machines are running
 at the same time, vmware seems to mix its reference to lock file and
  sometimes kills one vm.
 
  in esx server, following messages : /var/log/vmkwarning.log :
 
  2014-07-17T06:31:46.854Z cpu2:268913)WARNING: NFSLock: 1315:
 Inode (Dup: 260 Orig: 260) has been recycled by server, freeing lock info for
 .lck-0401
  2014-07-17T06:34:47.925Z cpu2:114740)WARNING: NFSLock: 2348:
 Unable to remove lockfile .invalid, not found
  2014-07-17T10:18:50.320Z cpu0:32824)WARNING: NFSLock: 2348:
 Unable to remove lockfile .invalid, not found
 
  and in machine log :
  Message from sncubeesx02: The lock protecting vm-w7-
 sysp.vmdk has been lost,
  possibly due to underlying storage issues. If this virtual 
  machine
 is configured to be highly
  available, ensure that the virtual machine is running on some
 other host before clicking OK.
 
  - vmware try to make its own file locking for flowing file type :
 
   http://kb.vmware.com/selfservice/microsites/search.do?language=en_
 UScmd=displayKCexternalId=10051
 
  VMNAME.vswp
  DISKNAME-flat.vmdk
  DISKNAME-ITERATION-delta.vmdk
  VMNAME.vmx
  VMNAME.vmxf
  vmware.log
 
  Is there a way to deal with this problem ? is that a bug ?
 
Add an arbitrary and unique fsid=0x12345 value to the exports
 declaration. For example, my server exports a number of subvolumes
 from the same FS with:
 
 /srv/nfs/nadja-rw,async,fsid=0x1729,no_subtree_check,no_root_squash \
10.0.0.20 fe80::20
 /srv/nfs/home -rw,async,fsid=0x1730,no_subtree_check,no_root_squash \
fe80::/64
 /srv/nfs/video-ro,async,fsid=0x1731,no_subtree_check \
10.0.0.0/24 fe80::/64
 
Hugo.
 

first of all, thank for your answer !
on my system, I have one export, that is the root btrfs subvolume and contains 
itself one subvolume per vm.
if I change the NFS export  fsid, it does not change anything in each the file 
IDs of the whole NFS export. 
(I crossed checked it just to be sure, tshark -V -nlp -t a port 2049 | egrep 
Entry: name|File ID, and effectively, 
fsid has no impact on file id) 



 --
 === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
   PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- You can get more with a kind word and a two-by-four than you ---
can with just a kind word.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Blocked tasks on 3.15.1

2014-07-17 Thread Chris Mason

[ deadlocks during rsync in 3.15 with compression enabled ]

Hi everyone,

I still haven't been able to reproduce this one here, but I'm going
through a series of tests with lzo compression foraced and every
operation forced to ordered.  Hopefully it'll kick it out soon.

While I'm hammering away, could you please try this patch.  If this is
the buy you're hitting, the deadlock will go away and you'll see this
printk in the log.

thanks!

-chris

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3668048..8ab56df 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8157,6 +8157,13 @@ void btrfs_destroy_inode(struct inode *inode)
spin_unlock(root-fs_info-ordered_root_lock);
}
 
+   spin_lock(root-fs_info-ordered_root_lock);
+   if (!list_empty(BTRFS_I(inode)-ordered_operations)) {
+   list_del_init(BTRFS_I(inode)-ordered_operations);
+printk(KERN_CRIT racing inode deletion with ordered operations!!!\n);
+   }
+   spin_unlock(root-fs_info-ordered_root_lock);
+
if (test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
 BTRFS_I(inode)-runtime_flags)) {
btrfs_info(root-fs_info, inode %llu still on the orphan list,
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS FILE ID not unique when exporting many brtfs subvolumes

2014-07-17 Thread Hugo Mills
On Thu, Jul 17, 2014 at 01:02:06PM +, philippe.simo...@swisscom.com wrote:
 Hi Hugo
 
  -Original Message-
  From: Hugo Mills [mailto:h...@carfax.org.uk]
  Sent: Thursday, July 17, 2014 1:13 PM
  To: Simonet Philippe, INI-ON-FIT-NW-IPE
  Cc: linux-btrfs@vger.kernel.org
  Subject: Re: NFS FILE ID not unique when exporting many brtfs subvolumes
  
  On Thu, Jul 17, 2014 at 10:40:14AM +, philippe.simo...@swisscom.com
  wrote:
   I have a problem using btrfs/nfs to store my vmware images.
  [snip]
   - vmware is basing its NFS files locks on the nfs fileid field returned 
   from a NFS
  GETATTR request for the file being locked
  
  http://kb.vmware.com/selfservice/microsites/search.do?language=en_
  UScmd=displayKCexternalId=1007909
  vmware assumes that these nfs fileid are unique per storage.
  
   - it seemed that these nfs fileid are only unique 'per-subvolume', but 
   because
  my nfs export contains many subvolumes,
   the nfs export has then my files (in different subvolume) with the same 
   nfs
  fileid.
  
   - no problem when I start all machine alone, but when 2 machines are 
   running
  at the same time, vmware seems to mix its reference to lock file and
   sometimes kills one vm.
  
 in esx server, following messages : /var/log/vmkwarning.log :
  
 2014-07-17T06:31:46.854Z cpu2:268913)WARNING: NFSLock: 1315:
  Inode (Dup: 260 Orig: 260) has been recycled by server, freeing lock info 
  for
  .lck-0401
 2014-07-17T06:34:47.925Z cpu2:114740)WARNING: NFSLock: 2348:
  Unable to remove lockfile .invalid, not found
 2014-07-17T10:18:50.320Z cpu0:32824)WARNING: NFSLock: 2348:
  Unable to remove lockfile .invalid, not found
  
 and in machine log :
 Message from sncubeesx02: The lock protecting vm-w7-
  sysp.vmdk has been lost,
 possibly due to underlying storage issues. If this virtual 
   machine
  is configured to be highly
 available, ensure that the virtual machine is running on some
  other host before clicking OK.
  
   - vmware try to make its own file locking for flowing file type :
  
  http://kb.vmware.com/selfservice/microsites/search.do?language=en_
  UScmd=displayKCexternalId=10051
  
 VMNAME.vswp
 DISKNAME-flat.vmdk
 DISKNAME-ITERATION-delta.vmdk
 VMNAME.vmx
 VMNAME.vmxf
 vmware.log
  
   Is there a way to deal with this problem ? is that a bug ?
  
 Add an arbitrary and unique fsid=0x12345 value to the exports
  declaration. For example, my server exports a number of subvolumes
  from the same FS with:
  
  /srv/nfs/nadja-rw,async,fsid=0x1729,no_subtree_check,no_root_squash \
 10.0.0.20 fe80::20
  /srv/nfs/home -rw,async,fsid=0x1730,no_subtree_check,no_root_squash \
 fe80::/64
  /srv/nfs/video-ro,async,fsid=0x1731,no_subtree_check \
 10.0.0.0/24 fe80::/64
  
 Hugo.
  

 first of all, thank for your answer !

 on my system, I have one export, that is the root btrfs subvolume
 and contains itself one subvolume per vm.
 if I change the NFS export fsid, it does not change anything in each
 the file IDs of the whole NFS export.
 (I crossed checked it just to be sure, tshark -V -nlp -t a port 2049
 | egrep Entry: name|File ID, and effectively,
 fsid has no impact on file id) 

   Aaah, that's interesting. I suspect that you'll have to make the
mounts explicit, so for every subvolume exported from the server,
there's a line in fstab to mount it to the place it's exported from.
This happens as a side-effect of the recommended filesystem/subvol
layout[1] anyway, since it doesn't use nested subvolumes at all, so
I've never actually noticed the situation you mention.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- There's a Martian war machine outside -- they want to talk ---   
to you about a cure for the common cold.


signature.asc
Description: Digital signature


Re: Is it safe to mount subvolumes of already-mounted volumes (even with different options)?

2014-07-17 Thread Duncan
Hugo Mills posted on Thu, 17 Jul 2014 09:41:53 +0100 as excerpted:

 and are there any combinations of possibly conflicting mount options
 one should be aware of (compression, autodefrag, cache clearing)? Is it
 advisable to use the same mount options for all mounts pointing to the
 same physical device?
 
If you assume that the first mount options are the ones used for
 everything, regardless of any different options provided in subsequent
 mounts, then you probably won't go far wrong. It's not quite true: some
 options do work on a per-mount basis, but most are per-filesystem. I'm
 sure there was a list of them on the wiki at some point, but I can't
 seem to track it down right now.

IIRC/AFAIK, the btrfs-specific mount options should be per filesystem, 
while stuff like relatime vs noatime is VFS level and should work per 
subvolume.

There's actually a current discussion about ro vs rw.  Consider the case 
of a parent subvolume (perhaps but not necessarily the root subvolume, 
id=5), being mounted writable in one location, with a child mounted 
elsewhere read-only.  Because it's possible to browse in the parent's 
subvolume down into the child subvolume as well, and someone could write 
a file there, that write would then show up in the elsewhere mounted read-
only child subvolume as well.

That's unexpected behavior to say the least!  Normally, read-only means 
it cannot and will not change, but in this case it wouldn't mean that at 
all!

My idea is that the same rules should apply to ro/rw as apply to btrfs 
snapshots -- they stop at subvolume borders.  Any write into a child 
subvolume would thus throw an error, regardless of how the parent 
subvolume was mounted.  The only way to write into a subvolume would be 
to mount it read-write on its own.  That would solve the ambiguity, but 
it would also be quite a change from existing behavior, where a read-
write mount of the root subvolume can write into any subvolume.

Someone else suggested that we separate filesystem read-write from 
subvolume read-write.  There's already the concept of read-only 
snapshots, used in btrfs-send, for one thing.  The idea here would be 
that a read-only filesystem/root mount means the entire filesystem is 
read-only, but provided the filesystem/root was mounted read-write, 
individual subvolumes could be mounted read-only using a different 
option, subv=ro, or similar, which would be hooked into the existing read-
only subvolume mechanism.  In that case, if the filesystem/root was read-
write, then the subvolume specific rw/ro mount option would take 
precedence and would trigger an error on write to that subvolume even if 
written from the read-write parent mount.

But while btrfs is the first filesystem to do this sort of thing and thus 
to deal with the problem, it might not be the last, so policy 
coordination with the VFS layer should be considered and a generic kernel 
policy for any filesystem dealing with subvolumes should be established.  
IOW, it's bigger than simply btrfs.

So anyway, while there was a patch applied earlier that did allow 
different read-only/read-write subvolume mounts, I believe that's 
reverted for 3.16, while this discussion continues and until it gets 
resolved one way or another, possibly at a kernel conference or the like.

But I believe generic VFS stuff like noatime/relatime/atime and dev/nodev/
suid/nosuid/exec/noexec is fine per-subvolume, because that's enforced at 
the VFS layer and there's no internal or expectation inconsistency to 
worry about if you can access for example the same device-file as a 
device via one mountpoint and not by another.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/2] vfs / btrfs: add support for ustat()

2014-07-17 Thread Luis R. Rodriguez
On Thu, Jul 17, 2014 at 01:03:01AM -0700, Christoph Hellwig wrote:
 On Wed, Jul 16, 2014 at 02:37:56PM -0700, Luis R. Rodriguez wrote:
  From: Luis R. Rodriguez mcg...@suse.com
  
  This makes the implementation simpler by stuffing the struct on
  the driver and just letting the driver iinsert it and remove it
  onto the sb list. This avoids the kzalloc() completely.
 
 Again, NAK.  Make btrfs report the proper anon dev_t in stat and
 everything will just work.

Let's consider this userspace case:

struct stat buf;
struct ustat ubuf;  

/* Find a valid device number */
if (stat(/, buf)) {
fprintf(stderr, Stat failed: %s\n, strerror(errno));  
return 1;   
}   

/* Call ustat on it */  
if (ustat(buf.st_dev, ubuf)) { 
fprintf(stderr, Ustat failed: %s\n, strerror(errno)); 
return 1;   
} 

In the btrfs case it has an inode op for getattr, that is used and we set
the dev to anonymous dev_t. Later ustat will use user_get_super() which
will only be able to work with a userblock if the super block's only
dev_t is assigned to it. Since we have many anonymous to dev_t mapping
to super block though we can't complete the search for btfs and ustat()
fails with -EINVAL. The series expands the number of dev_t's that a super
block can have and allows this search to complete.

  Luis
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] Btrfs: fix sparse warning

2014-07-17 Thread Zach Brown
   @@ -515,7 +515,8 @@ static int write_buf(struct file *filp, const void 
   *buf,
   u32 len, loff_t *off)
 
  Though this probably wants to be rewritten in terms of kernel_write().
  That'd give an opportunity to get rid of the sctx-send_off and have it
  use f_pos in the filp.
 
 Do you mean directly call kernel_write from send_cmd/send_header ?
 I guess that loop around vfs_write in write_buf is there for something ...

write_buf() could still exist to iterate over the buffer in the case of
partial writes but it doesn't need to muck around with set_fs() and
forcing casts.

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] btrfs: correctly handle return from ulist_add

2014-07-17 Thread Mark Fasheh
ulist_add() can return '1' on sucess, which qgroup_subtree_accounting()
doesn't take into account. As a result, that value can be bubbled up to
callers, causing an error to be printed. Fix this by only returning the
value of ulist_add() when it indicates an error.

Signed-off-by: Mark Fasheh mfas...@suse.de
---
 fs/btrfs/qgroup.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 2ec2432..b55870c 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1976,6 +1976,7 @@ static int qgroup_subtree_accounting(struct 
btrfs_trans_handle *trans,
struct btrfs_qgroup_list *glist;
struct ulist *parents;
int ret = 0;
+   int err;
struct btrfs_qgroup *qg;
u64 root_obj = 0;
struct seq_list elem = {};
@@ -2030,10 +2031,12 @@ static int qgroup_subtree_accounting(struct 
btrfs_trans_handle *trans,
 * while adding parents of the parents to our ulist.
 */
list_for_each_entry(glist, qg-groups, next_group) {
-   ret = ulist_add(parents, glist-group-qgroupid,
+   err = ulist_add(parents, glist-group-qgroupid,
ptr_to_u64(glist-group), GFP_ATOMIC);
-   if (ret  0)
+   if (err  0) {
+   ret = err;
goto out_unlock;
+   }
}
 
ULIST_ITER_INIT(uiter);
@@ -2045,10 +2048,12 @@ static int qgroup_subtree_accounting(struct 
btrfs_trans_handle *trans,
 
/* Add any parents of the parents */
list_for_each_entry(glist, qg-groups, next_group) {
-   ret = ulist_add(parents, glist-group-qgroupid,
+   err = ulist_add(parents, glist-group-qgroupid,
ptr_to_u64(glist-group), GFP_ATOMIC);
-   if (ret  0)
+   if (err  0) {
+   ret = err;
goto out_unlock;
+   }
}
}
 
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] btrfs: delete qgroup items in drop_snapshot

2014-07-17 Thread Mark Fasheh
btrfs_drop_snapshot() leaves subvolume qgroup items on disk after
completion. This wastes space and also can cause problems with snapshot
creation. If a new snapshot tries to claim the deleted subvolumes id,
btrfs will get -EEXIST from add_qgroup_item() and go read-only.

We can partially fix this by catching -EEXIST in add_qgroup_item() and
initializing the existing items. This will leave orphaned relation items
(BTRFS_QGROUP_RELATION_KEY) around however would be confusing to the end
user. Also this does nothing to fix the wasted space taken up by orphaned
qgroup items.

So the full fix is to delete all qgroup items related to the deleted
snapshot in btrfs_drop_snapshot.  If an item persists (either due to a
previous drop_snapshot without the fix, or some error) we can still continue
with snapshot create instead of throwing the whole filesystem readonly.

In the very small chance that some relation items persist, they will not
affect functioning of our level 0 subvolume qgroup.

Signed-off-by: Mark Fasheh mfas...@suse.de
---
 fs/btrfs/extent-tree.c |   6 +++
 fs/btrfs/qgroup.c  | 114 +++--
 fs/btrfs/qgroup.h  |   3 ++
 3 files changed, 120 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ed9e13c..2dad701 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8296,6 +8296,12 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
if (err)
goto out_end_trans;
 
+   ret = btrfs_del_qgroup_items(trans, root);
+   if (ret) {
+   btrfs_abort_transaction(trans, root, ret);
+   goto out_end_trans;
+   }
+
ret = btrfs_del_root(trans, tree_root, root-root_key);
if (ret) {
btrfs_abort_transaction(trans, tree_root, ret);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 1569338..2ec2432 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -35,7 +35,6 @@
 #include qgroup.h
 
 /* TODO XXX FIXME
- *  - subvol delete - delete when ref goes to 0? delete limits also?
  *  - reorganize keys
  *  - compressed
  *  - sync
@@ -99,6 +98,16 @@ struct btrfs_qgroup_list {
struct btrfs_qgroup *member;
 };
 
+/*
+ * used in remove_qgroup_relations() to track qgroup relations that
+ * need deleting
+ */
+struct relation_rec {
+   struct list_head list;
+   u64 src;
+   u64 dst;
+};
+
 #define ptr_to_u64(x) ((u64)(uintptr_t)x)
 #define u64_to_ptr(x) ((struct btrfs_qgroup *)(uintptr_t)x)
 
@@ -551,9 +560,15 @@ static int add_qgroup_item(struct btrfs_trans_handle 
*trans,
key.type = BTRFS_QGROUP_INFO_KEY;
key.offset = qgroupid;
 
+   /*
+* Avoid a transaction abort by catching -EEXIST here. In that
+* case, we proceed by re-initializing the existing structure
+* on disk.
+*/
+
ret = btrfs_insert_empty_item(trans, quota_root, path, key,
  sizeof(*qgroup_info));
-   if (ret)
+   if (ret  ret != -EEXIST)
goto out;
 
leaf = path-nodes[0];
@@ -572,7 +587,7 @@ static int add_qgroup_item(struct btrfs_trans_handle *trans,
key.type = BTRFS_QGROUP_LIMIT_KEY;
ret = btrfs_insert_empty_item(trans, quota_root, path, key,
  sizeof(*qgroup_limit));
-   if (ret)
+   if (ret  ret != -EEXIST)
goto out;
 
leaf = path-nodes[0];
@@ -2817,3 +2832,96 @@ btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
btrfs_queue_work(fs_info-qgroup_rescan_workers,
 fs_info-qgroup_rescan_work);
 }
+
+static struct relation_rec *
+qlist_to_relation_rec(struct btrfs_qgroup_list *qlist, struct list_head *all)
+{
+   u64 group, member;
+   struct relation_rec *rec;
+
+   BUILD_BUG_ON(sizeof(struct btrfs_qgroup_list)  sizeof(struct 
relation_rec));
+
+   list_del(qlist-next_group);
+   list_del(qlist-next_member);
+   group = qlist-group-qgroupid;
+   member = qlist-member-qgroupid;
+   rec = (struct relation_rec *)qlist;
+   rec-src = group;
+   rec-dst = member;
+
+   list_add(rec-list, all);
+   return rec;
+}
+
+static int remove_qgroup_relations(struct btrfs_trans_handle *trans,
+  struct btrfs_fs_info *fs_info, u64 qgroupid)
+{
+   int ret, err;
+   struct btrfs_root *quota_root = fs_info-quota_root;
+   struct relation_rec *rec;
+   struct btrfs_qgroup_list *qlist;
+   struct btrfs_qgroup *qgroup;
+   LIST_HEAD(relations);
+
+   spin_lock(fs_info-qgroup_lock);
+   qgroup = find_qgroup_rb(fs_info, qgroupid);
+
+   while (!list_empty(qgroup-groups)) {
+   qlist = list_first_entry(qgroup-groups,
+struct btrfs_qgroup_list, next_group);
+   rec = qlist_to_relation_rec(qlist, relations);
+   }
+
+   

[PATCH 3/5] Btrfs: __btrfs_mod_ref should always use no_quota

2014-07-17 Thread Mark Fasheh
From: Josef Bacik jba...@fb.com

Before I extended the no_quota arg to btrfs_dec/inc_ref because I didn't
understand how snapshot delete was using it and assumed that we needed the
quota operations there.  With Mark's work this has turned out to be not the
case, we _always_ need to use no_quota for btrfs_dec/inc_ref, so just drop the
argument and make __btrfs_mod_ref call it's process function with no_quota set
always.  Thanks,

Signed-off-by: Josef Bacik jba...@fb.com
Signed-off-by: Mark Fasheh mfas...@suse.de
---
 fs/btrfs/ctree.c   | 20 ++--
 fs/btrfs/ctree.h   |  4 ++--
 fs/btrfs/extent-tree.c | 24 +++-
 3 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index aeab453..44ee5d2 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -280,9 +280,9 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
 
WARN_ON(btrfs_header_generation(buf)  trans-transid);
if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID)
-   ret = btrfs_inc_ref(trans, root, cow, 1, 1);
+   ret = btrfs_inc_ref(trans, root, cow, 1);
else
-   ret = btrfs_inc_ref(trans, root, cow, 0, 1);
+   ret = btrfs_inc_ref(trans, root, cow, 0);
 
if (ret)
return ret;
@@ -1035,14 +1035,14 @@ static noinline int update_ref_for_cow(struct 
btrfs_trans_handle *trans,
if ((owner == root-root_key.objectid ||
 root-root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) 
!(flags  BTRFS_BLOCK_FLAG_FULL_BACKREF)) {
-   ret = btrfs_inc_ref(trans, root, buf, 1, 1);
+   ret = btrfs_inc_ref(trans, root, buf, 1);
BUG_ON(ret); /* -ENOMEM */
 
if (root-root_key.objectid ==
BTRFS_TREE_RELOC_OBJECTID) {
-   ret = btrfs_dec_ref(trans, root, buf, 0, 1);
+   ret = btrfs_dec_ref(trans, root, buf, 0);
BUG_ON(ret); /* -ENOMEM */
-   ret = btrfs_inc_ref(trans, root, cow, 1, 1);
+   ret = btrfs_inc_ref(trans, root, cow, 1);
BUG_ON(ret); /* -ENOMEM */
}
new_flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF;
@@ -1050,9 +1050,9 @@ static noinline int update_ref_for_cow(struct 
btrfs_trans_handle *trans,
 
if (root-root_key.objectid ==
BTRFS_TREE_RELOC_OBJECTID)
-   ret = btrfs_inc_ref(trans, root, cow, 1, 1);
+   ret = btrfs_inc_ref(trans, root, cow, 1);
else
-   ret = btrfs_inc_ref(trans, root, cow, 0, 1);
+   ret = btrfs_inc_ref(trans, root, cow, 0);
BUG_ON(ret); /* -ENOMEM */
}
if (new_flags != 0) {
@@ -1069,11 +1069,11 @@ static noinline int update_ref_for_cow(struct 
btrfs_trans_handle *trans,
if (flags  BTRFS_BLOCK_FLAG_FULL_BACKREF) {
if (root-root_key.objectid ==
BTRFS_TREE_RELOC_OBJECTID)
-   ret = btrfs_inc_ref(trans, root, cow, 1, 1);
+   ret = btrfs_inc_ref(trans, root, cow, 1);
else
-   ret = btrfs_inc_ref(trans, root, cow, 0, 1);
+   ret = btrfs_inc_ref(trans, root, cow, 0);
BUG_ON(ret); /* -ENOMEM */
-   ret = btrfs_dec_ref(trans, root, buf, 1, 1);
+   ret = btrfs_dec_ref(trans, root, buf, 1);
BUG_ON(ret); /* -ENOMEM */
}
clean_tree_block(trans, root, buf);
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index be91397..8e29b61 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3326,9 +3326,9 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 
num_bytes,
 u64 min_alloc_size, u64 empty_size, u64 hint_byte,
 struct btrfs_key *ins, int is_data, int delalloc);
 int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
- struct extent_buffer *buf, int full_backref, int no_quota);
+ struct extent_buffer *buf, int full_backref);
 int btrfs_dec_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
- struct extent_buffer *buf, int full_backref, int no_quota);
+ struct extent_buffer *buf, int full_backref);
 int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
u64 bytenr, u64 num_bytes, u64 flags,
diff --git 

[PATCH 0/5] btrfs: qgroup fixes for btrfs_drop_snapshot V5

2014-07-17 Thread Mark Fasheh
Hi, the following patches try to fix a long outstanding issue with qgroups
and snapshot deletion. The core problem is that btrfs_drop_snapshot will
skip shared extents during it's tree walk. This results in an inconsistent
qgroup state once the drop is processed. We also have a bug where qgroup
items are not deleted after drop_snapshot. The orphaned items will cause
btrfs to go readonly when a snapshot is created with the same id as the
deleted one.

The first patch adds some tracing which I found very useful in debugging
qgroup operations. The second patch is an actual fix to the problem. A third
patch, from Josef is also added. We need this because it fixes at least one
set of inconsistencies qgroups can get to via drop_snapshot. The fourth
patch adds code to delete qgroup items from disk once drop_snapshot has
completed.

With this version of the patch series, I can no longer reproduce
qgroup inconsistencies via drop_snapshot on my test disks.

Change from last patch set:

- Added a small fix (patch #5). I can fold this back into the main patch if
  requested.

Changes from V3-V4:

- Added patch 'btrfs: delete qgroup items in drop_snapshot'

Changes from V2-V3:

- search on bytenr and root, but not seq in btrfs_record_ref when
  we're looking for existing qgroup operations.

Changes before that (V1-V2):

- remove extra extent_buffer_uptodate call from account_shared_subtree()

- catch return values for the accounting calls now and do the right thing
  (log an error and tell the user to rescan)

- remove the loop on roots in qgroup_subtree_accounting and just use the
  nnodes member to make our first decision.

- Don't queue up the subtree root for a change (the code in drop_snapshot
  handkles qgroup updates for this block).

- only walk subtrees if we're actually in DROP_REFERENCE stage and we're
  going to call free_extent

- account leaf items for level zero blocks that we are dropping in
  walk_up_proc

Please review, thanks. Diffstat follows,
--Mark

 fs/btrfs/ctree.c |   20 +-
 fs/btrfs/ctree.h |4 
 fs/btrfs/extent-tree.c   |  291  --
 fs/btrfs/qgroup.c|  295  
+--
 fs/btrfs/qgroup.h|4 
 fs/btrfs/super.c |1 
 include/trace/events/btrfs.h |   59 
 7 files changed, 641 insertions(+), 33 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] btrfs: add trace for qgroup accounting

2014-07-17 Thread Mark Fasheh
We want this to debug qgroup changes on live systems.

Signed-off-by: Mark Fasheh mfas...@suse.de
Reviewed-by: Josef Bacik jba...@fb.com
---
 fs/btrfs/qgroup.c|  3 +++
 fs/btrfs/super.c |  1 +
 include/trace/events/btrfs.h | 56 
 3 files changed, 60 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 98cb6b2..6a6dc62 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1290,6 +1290,7 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle 
*trans,
oper-seq = atomic_inc_return(fs_info-qgroup_op_seq);
INIT_LIST_HEAD(oper-elem.list);
oper-elem.seq = 0;
+   trace_btrfs_qgroup_record_ref(oper);
ret = insert_qgroup_oper(fs_info, oper);
if (ret) {
/* Shouldn't happen so have an assert for developers */
@@ -1911,6 +1912,8 @@ static int btrfs_qgroup_account(struct btrfs_trans_handle 
*trans,
 
ASSERT(is_fstree(oper-ref_root));
 
+   trace_btrfs_qgroup_account(oper);
+
switch (oper-type) {
case BTRFS_QGROUP_OPER_ADD_EXCL:
case BTRFS_QGROUP_OPER_SUB_EXCL:
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 8e16bca..38b8bd8 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -60,6 +60,7 @@
 #include backref.h
 #include tests/btrfs-tests.h
 
+#include qgroup.h
 #define CREATE_TRACE_POINTS
 #include trace/events/btrfs.h
 
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 4ee4e30..b8774b3 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -23,6 +23,7 @@ struct map_lookup;
 struct extent_buffer;
 struct btrfs_work;
 struct __btrfs_workqueue;
+struct btrfs_qgroup_operation;
 
 #define show_ref_type(type)\
__print_symbolic(type,  \
@@ -1119,6 +1120,61 @@ DEFINE_EVENT(btrfs__workqueue_done, 
btrfs_workqueue_destroy,
TP_ARGS(wq)
 );
 
+#define show_oper_type(type)   \
+   __print_symbolic(type,  \
+   { BTRFS_QGROUP_OPER_ADD_EXCL,   OPER_ADD_EXCL },  \
+   { BTRFS_QGROUP_OPER_ADD_SHARED, OPER_ADD_SHARED },\
+   { BTRFS_QGROUP_OPER_SUB_EXCL,   OPER_SUB_EXCL },  \
+   { BTRFS_QGROUP_OPER_SUB_SHARED, OPER_SUB_SHARED })
+
+DECLARE_EVENT_CLASS(btrfs_qgroup_oper,
+
+   TP_PROTO(struct btrfs_qgroup_operation *oper),
+
+   TP_ARGS(oper),
+
+   TP_STRUCT__entry(
+   __field(u64,  ref_root  )
+   __field(u64,  bytenr)
+   __field(u64,  num_bytes )
+   __field(u64,  seq   )
+   __field(int,  type  )
+   __field(u64,  elem_seq  )
+   ),
+
+   TP_fast_assign(
+   __entry-ref_root   = oper-ref_root;
+   __entry-bytenr = oper-bytenr,
+   __entry-num_bytes  = oper-num_bytes;
+   __entry-seq= oper-seq;
+   __entry-type   = oper-type;
+   __entry-elem_seq   = oper-elem.seq;
+   ),
+
+   TP_printk(ref_root = %llu, bytenr = %llu, num_bytes = %llu, 
+ seq = %llu, elem.seq = %llu, type = %s,
+ (unsigned long long)__entry-ref_root,
+ (unsigned long long)__entry-bytenr,
+ (unsigned long long)__entry-num_bytes,
+ (unsigned long long)__entry-seq,
+ (unsigned long long)__entry-elem_seq,
+ show_oper_type(__entry-type))
+);
+
+DEFINE_EVENT(btrfs_qgroup_oper, btrfs_qgroup_account,
+
+   TP_PROTO(struct btrfs_qgroup_operation *oper),
+
+   TP_ARGS(oper)
+);
+
+DEFINE_EVENT(btrfs_qgroup_oper, btrfs_qgroup_record_ref,
+
+   TP_PROTO(struct btrfs_qgroup_operation *oper),
+
+   TP_ARGS(oper)
+);
+
 #endif /* _TRACE_BTRFS_H */
 
 /* This part must be outside protection */
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] btrfs: qgroup: account shared subtrees during snapshot delete

2014-07-17 Thread Mark Fasheh
During its tree walk, btrfs_drop_snapshot() will skip any shared
subtrees it encounters. This is incorrect when we have qgroups
turned on as those subtrees need to have their contents
accounted. In particular, the case we're concerned with is when
removing our snapshot root leaves the subtree with only one root
reference.

In those cases we need to find the last remaining root and add
each extent in the subtree to the corresponding qgroup exclusive
counts.

This patch implements the shared subtree walk and a new qgroup
operation, BTRFS_QGROUP_OPER_SUB_SUBTREE. When an operation of
this type is encountered during qgroup accounting, we search for
any root references to that extent and in the case that we find
only one reference left, we go ahead and do the math on it's
exclusive counts.

Signed-off-by: Mark Fasheh mfas...@suse.de
Reviewed-by: Josef Bacik jba...@fb.com
---
 fs/btrfs/extent-tree.c   | 261 +++
 fs/btrfs/qgroup.c| 165 +++
 fs/btrfs/qgroup.h|   1 +
 include/trace/events/btrfs.h |   3 +-
 4 files changed, 429 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 813537f..1aa4325 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7478,6 +7478,220 @@ reada:
wc-reada_slot = slot;
 }
 
+static int account_leaf_items(struct btrfs_trans_handle *trans,
+ struct btrfs_root *root,
+ struct extent_buffer *eb)
+{
+   int nr = btrfs_header_nritems(eb);
+   int i, extent_type, ret;
+   struct btrfs_key key;
+   struct btrfs_file_extent_item *fi;
+   u64 bytenr, num_bytes;
+
+   for (i = 0; i  nr; i++) {
+   btrfs_item_key_to_cpu(eb, key, i);
+
+   if (key.type != BTRFS_EXTENT_DATA_KEY)
+   continue;
+
+   fi = btrfs_item_ptr(eb, i, struct btrfs_file_extent_item);
+   /* filter out non qgroup-accountable extents  */
+   extent_type = btrfs_file_extent_type(eb, fi);
+
+   if (extent_type == BTRFS_FILE_EXTENT_INLINE)
+   continue;
+
+   bytenr = btrfs_file_extent_disk_bytenr(eb, fi);
+   if (!bytenr)
+   continue;
+
+   num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi);
+
+   ret = btrfs_qgroup_record_ref(trans, root-fs_info,
+ root-objectid,
+ bytenr, num_bytes,
+ BTRFS_QGROUP_OPER_SUB_SUBTREE, 0);
+   if (ret)
+   return ret;
+   }
+   return 0;
+}
+
+/*
+ * Walk up the tree from the bottom, freeing leaves and any interior
+ * nodes which have had all slots visited. If a node (leaf or
+ * interior) is freed, the node above it will have it's slot
+ * incremented. The root node will never be freed.
+ *
+ * At the end of this function, we should have a path which has all
+ * slots incremented to the next position for a search. If we need to
+ * read a new node it will be NULL and the node above it will have the
+ * correct slot selected for a later read.
+ *
+ * If we increment the root nodes slot counter past the number of
+ * elements, 1 is returned to signal completion of the search.
+ */
+static int adjust_slots_upwards(struct btrfs_root *root,
+   struct btrfs_path *path, int root_level)
+{
+   int level = 0;
+   int nr, slot;
+   struct extent_buffer *eb;
+
+   if (root_level == 0)
+   return 1;
+
+   while (level = root_level) {
+   eb = path-nodes[level];
+   nr = btrfs_header_nritems(eb);
+   path-slots[level]++;
+   slot = path-slots[level];
+   if (slot = nr || level == 0) {
+   /*
+* Don't free the root -  we will detect this
+* condition after our loop and return a
+* positive value for caller to stop walking the tree.
+*/
+   if (level != root_level) {
+   btrfs_tree_unlock_rw(eb, path-locks[level]);
+   path-locks[level] = 0;
+
+   free_extent_buffer(eb);
+   path-nodes[level] = NULL;
+   path-slots[level] = 0;
+   }
+   } else {
+   /*
+* We have a valid slot to walk back down
+* from. Stop here so caller can process these
+* new nodes.
+*/
+   break;
+   }
+
+   level++;
+   }
+
+   eb = path-nodes[root_level];
+   if 

Questions on incremental backups

2014-07-17 Thread Sam Bull
I've a couple of questions on incremental backups. I've read the wiki
page, and would like to confirm my understanding of some features, and
also see if other features are possible that are not mentioned. I'm
looking to replace my existing backup solution, and hoping to match the
features I currently use, and go a little beyond.

=== Daily snapshot ===

So, if I understand correctly, I can make a daily snapshot of my
filesystem with very little overhead. Then these can later be synced
efficiently to another system (only syncing the differences), so I can
backup regularly over the internet to my server, and also to an external
HDD. After syncing, I can delete the snapshots (other than the trailing
one needed for the next backup).

In this way I can keep a constant stream of daily backups even when
offline, and simply sync them next time I am online before deleting them
locally.

=== Ignore directories ===

Due to storage limitations on my server, is it possible to ignore
certain directories? For example, ignoring the folder that stores all my
games, as this could be rather large, and the contents can easily be
re-downloaded. The instructions involve subvolumes, so maybe it's
possible to ignore a subvolume when syncing?

If that is possible, then is it also possible to have a separate backup
that does include the ignored directory? For example, having the smaller
sync to the storage-limited server, but having a full sync to an
external HDD.

=== Display backups ===

Is it possible to view the contents of all backups? So, the expected
interface would be something like a tree of all files from across all
snapshots. Any files that are not present in the latest snapshot would
be greyed out to show they have been deleted. Selecting a file would
show a list of versions of the file, with one version for each snapshot
the file has been modified in.

As long as I can get access to this information, maybe some kind of diff
between snapshots, I'm willing to write the actual software to display
this interface. (I suppose even if it's not supported, I could crawl
through the filesystems and generate some kind of database, but that
sounds like a painful process.)

=== Merge snapshots down ===

Is there some way to merge snapshots down? So, I could merge the last
week of daily snapshots into a single weekly snapshot. The new snapshot
should include all files across all the snapshots (even if deleted in
some of the snapshots), and include just the latest version of each
file.

This way, I'd like to maintain daily snapshots, which can be regularly
merged down into weekly snapshots, and then into monthly snapshots, and
then finally into yearly snapshots.


And, finally, there's no problem in deleting old snapshots? I'm assuming
any data from these snapshots used by other snapshots will still be
referenced by the other snapshots, and thus be retained, so nothing will
break?


signature.asc
Description: This is a digitally signed message part


Re: Unmountable btrfs filesystem - 'unable to find logical' / 'no mapping'

2014-07-17 Thread Gareth Clay
Duncan 1i5t5.duncan at cox.net writes:

 
 Gareth Clay posted on Tue, 15 Jul 2014 14:35:22 +0100 as excerpted:
 
  I noticed yesterday that the mount points on my btrfs RAID1 filesystem
  had become read-only. On a reboot, the filesystem fails to mount. I
  wondered if someone here might be able offer any advice on how to
  recover (if possible) from this position?
 
 I had a similar (but I think different) issue some weeks ago.  It was my 
 first real experience with btrfs troubleshooting and recovery.
 
 First, the recommendation is do NOT do btrfs check --repair except either 
 at the recommendation of a dev after they've seen the details and 
 determined it can fix them, or if your next step would be a new mkfs of 
 the filesystem, thus blowing away what's there anyway, so you've nothing 
 to lose.  You can try btrfs check (aka btrfsck) without --repair to see 
 what it reports as that's read-only and thus won't break anything 
 further, but similarly, won't repair anything either.
 
 Also, as a general recommendation, try a current kernel as btrfs is still 
 developing fast enough that if you're a kernel series behind, there's 
 fixes in the new version that you won't have in older kernels.  I see 
 you're on an ubuntu 3.13 series kernel, and the recommendation would be 
 the latest 3.15 series stable kernel, if not the 3.16-rc series 
 development kernel, since that's past rc5 now and thus getting close to 
 release.
 
 The userspace, btrfs-progs, isn't quite as critical, but running at least 
 v3.12 (which you are), is recommended.  FWIW, v3.14.2 is current (as of 
 when I last checked a couple days ago anyway) and is what I am running 
 here.
 
 In general, you can try mounting with recovery and then with recovery,ro 
 options, but that didn't work here.  You can also try with the degraded 
 option (tho I didn't), to see if it'll mount with just one of the pair.
 
 Of course, btrfs is still not fully stable and keeping current backups is 
 recommended.  I did have backups, but they weren't as current as I wanted.
 
 Beyond that, there's btrfs restore (a separate btrfs-restore executable 
 in older btrfs-progs, part of the main btrfs executable in newer 
 versions), which is what I ended up using and is what the rest of this 
 reply is about.  That does NOT mount or write to the filesystem, but DOES 
 let you pull files off the unmounted filesystem and write them to a 
 working filesystem (btrfs or other, it was reiserfs here) in ordered to 
 recover what you can.  You can use --dry-run to list files that would be 
 recovered in ordered to get an idea of how much it can recover.
 
 There's a page on the wiki about using btrfs recover in combination with 
 btrfs-find-root, if the current root is damaged and won't let you recover 
 much.  Note that generation and transid refer to the same thing, and 
 you want to specify the root (using the -t location option, with the 
 location found using find-root) that lets you recover the most.  The -l 
 (list tree roots) option is also useful in this context.
 
 https://btrfs.wiki.kernel.org/index.php/Restore
 
 Of course restoring in this manner means you have to have somewhere else 
 to put what you restore, which was fine for me as I'm using relatively 
 small independent btrfs filesystems and could restore to a larger 
 reiserfs on a different device, but could be rather tougher for large 
 multi-terabyte filesystems, unless you have (or purchase) a spare disk to 
 put it on.
 
 One thing I did NOT realize until later, however, is that btrfs restore 
 loses the user and permissions information (at least without -x, which 
 says it restores extended attributes, I didn't try it with that).  I 
 hacked up a find script to compare the restore to the backup and set 
 ownership/permissions appropriately based on the files in the backup, but 
 of course that didn't help for files that were new since the backup, and 
 I had to set their ownership/permissions manually.
 


Hi Duncan,

Thanks for your thorough response and the tips - sorry to hear you've had
issues too. Point taken  on the kernel updates! I'm in a similar situation to
you - this is my first btrfs recovery experience. I've  been playing with the fs
for some time and have had no apparent issues, but this has been a useful
reality check. Read / write error counts were high so there's a suggestion that
it might be down to  drive failure.

In the end I had a lot of help from xaba on the #btrfs IRC channel, whose
suggestions got me to the  point where, with a bang up to date version of the
userspace utils, I could get a successful btrfsck run  using the -b option (3.12
only got part way). At that point btrfs restore still couldn't be run, degraded
mounting also wouldn't work, and I'd spent about as much time as I was prepared
to spend on  recovering this fs, so I took a deep breath and ran btrfsck
--repair. That's got me to the point where  btrfs restore can now be run, so I'm
going to dump as much as I 

Re: Questions on incremental backups

2014-07-17 Thread Russell Coker
Daily snapshots work welk with kernel 3.14 and above (I had problems with 3.13 
and previous). I have snapshots every 15 mins on some subvols.

Very large numbers of snapshots can cause performance problems. I suggest 
keeping below 1000 snapshots at this time.

You can use send/recv functionality for remote backups. So far I've used rsync, 
it works well and send/recv has some limitations about filesystem structure 
etc. Rsync can transfer to a ext4 or ZFS filesystem if you wish.

Ignoring directories in send/recv is done by subvol. Even if you use rsync it's 
a good idea to have different subvols for directory trees with different backup 
requirements.

Displaying backups is an issue of backup software. It is above the level that 
BTRFS development touches. While people here can probably offer generic advice 
on backup software it's not the topic of the list.

I use date based snapshots on my backup BTRFS filesystems and I can easily 
delete snapshots in the middle of the list.
-- 
Sent from my Samsung Galaxy Note 2 with K-9 Mail.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html