Re: Unable to fixup (regular) error in RAID1 fs

2014-10-29 Thread Juan Orti

El 2014-10-29 04:02, Duncan escribió:

Juan Orti posted on Tue, 28 Oct 2014 16:54:19 +0100 as excerpted:


[ 3713.086292] BTRFS: unable to fixup (regular) error at logical
483011874816 on dev /dev/sdb2
[ 3713.092577] BTRFS: checksum error at logical 483011948544 on dev
/dev/sdb2, sector 628793528, root 2500, inode 1436631, offset
4059963392, length 4096, links 1 (path:
juan/.local/share/gnome-boxes/images/boxes-unknown)
[ 3713.092584] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, 
corrupt

38, gen 0
[ 3713.093035] BTRFS: unable to fixup (regular) error at logical
483011948544 on dev /dev/sdb2

Why can't it fix the errors? a bad device? smartctl says the disk is 
ok.
I'm currently running a full scrub to see if it finds more errors. 
What

should I do?


Btrfs raid1, and I see you have it for both data and metadata.

During normal operation, when btrfs comes across a block that doesn't
match its checksum, it will look to see if there's another copy (which
there is with raid1, which has exactly two copies) of that block and 
will

try to use it instead if so.  If the second copy matches the checksum,
all is fine and btrfs will in fact attempt to rewrite the bad copy 
using

the good copy, as well as returning the good copy to whatever was
reading it.

Those corruption errors seem to indicate that it can't find a good
copy to update the bad copy with -- both copies ended up bad.  Either
that or it found the good copy and returned it to whatever was reading,
but couldn't rewrite the bad copy, for some reason.

I'm not sure which of those interpretations is correct, but given
that you didn't see anything else bad happening, no apps returning
errors due to read error, etc, I'd guess the second.  Because
otherwise whatever was doing the read should have returned an
error.


When this error happened, I was editing some text files with vi, and it 
was painfully slow, it took 30 seconds to open a 20 lines file, so 
something weird was going on. Anyway, no visible user space error could 
be seen.





Doing a scrub, as you already did, is the first thing I'd try here,
since normal operation won't catch all the errors.

BUT, you report that the scrub found no errors, which is weird.
You have the log saying there's corruption errors, but scrub
saying there's not.

The easiest explanation for something like that, is that the errors
were temporary.  If it happens again or regularly, consider running
memcheck or the like, as it could be bad memory.  Do you have ECC RAM?


I don't have ECC RAM, it's a regular desktop PC. Some RAM checks in the 
past have shown no errors, I'll check it again.




Another question.  Do you have skinny metadata on that btrfs?  If you
do, btrfs should mention skinny extents when mounting the filesystem.


No skinny metadata. I made the fs with the standard options, just with 
raid1 for data and metadata.




The reason I'm asking this is that if I'm reading the patch 
descriptions
correctly, a recently posted patch deals with a specific 
skinny-metadata

bug where wrong results would occasionally be returned, resulting in
errors.  Not being a dev I don't have the technical ability to know for
sure whether this could be connected to that or not, but it sounds like
the sort of thing I might expect from a bug that intermittently 
returned

bad data -- odd apparent corruption errors in normal use that scrub
can't see, even tho it's designed to catch and fix if possible exactly
that sort of corruption error.

Anyway, if scrub says no corruption, for a potential corruption error
I'd be inclined to trust scrub, so I think the filesystem is fine.
But if so, I'm worried about what might be triggering these
intermittent errors.  Certainly watch for more of them, and if you're
running skinny-metadata, consider finding and applying that patch.
If not or in general, also be on the lookout for more possible hints
of failing memory and/or run a good memory checker for a few hours
and see if it reports all is well.

But as they say about some kinds of potential cancer reports at times,
sometimes watchful waiting is the best you can do, hoping no further
symptoms show up, but being alert in case they do, to try something
more drastic, that isn't warranted /unless/ they do.


That's what I'll do, I'll wait and see.

Thank you for your explanation.

--
Juan Orti
https://miceliux.com

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] Btrfs: fix snapshot inconsistency after a file write followed by truncate

2014-10-29 Thread Filipe Manana
If right after starting the snapshot creation ioctl we perform a write against a
file followed by a truncate, with both operations increasing the file's size, we
can get a snapshot tree that reflects a state of the source subvolume's tree 
where
the file truncation happened but the write operation didn't. This leaves a gap
between 2 file extent items of the inode, which makes btrfs' fsck complain 
about it.

For example, if we perform the following file operations:

$ mkfs.btrfs -f /dev/vdd
$ mount /dev/vdd /mnt
$ xfs_io -f \
  -c pwrite -S 0xaa -b 32K 0 32K \
  -c fsync \
  -c pwrite -S 0xbb -b 32770 16K 32770 \
  -c truncate 90123 \
  /mnt/foobar

and the snapshot creation ioctl was just called before the second write, we 
often
can get the following inode items in the snapshot's btree:

item 120 key (257 INODE_ITEM 0) itemoff 7987 itemsize 160
inode generation 146 transid 7 size 90123 block group 0 mode 
100600 links 1 uid 0 gid 0 rdev 0 flags 0x0
item 121 key (257 INODE_REF 256) itemoff 7967 itemsize 20
inode ref index 282 namelen 10 name: foobar
item 122 key (257 EXTENT_DATA 0) itemoff 7914 itemsize 53
extent data disk byte 1104855040 nr 32768
extent data offset 0 nr 32768 ram 32768
extent compression 0
item 123 key (257 EXTENT_DATA 53248) itemoff 7861 itemsize 53
extent data disk byte 0 nr 0
extent data offset 0 nr 40960 ram 40960
extent compression 0

There's a file range, corresponding to the interval [32K; ALIGN(16K + 32770, 
4096)[
for which there's no file extent item covering it. This is because the file 
write
and file truncate operations happened both right after the snapshot creation 
ioctl
called btrfs_start_delalloc_inodes(), which means we didn't start and wait for 
the
ordered extent that matches the write and, in btrfs_setsize(), we were able to 
call
btrfs_cont_expand() before being able to commit the current transaction in the
snapshot creation ioctl. So this made it possibe to insert the hole file extent
item in the source subvolume (which represents the region added by the truncate)
right before the transaction commit from the snapshot creation ioctl.

Btrfs' fsck tool complains about such cases with a message like the following:

root 331 inode 257 errors 100, file extent discount

From a user perspective, the expectation when a snapshot is created while those
file operations are being performed is that the snapshot will have a file that
either:

1) is empty
2) only the first write was captured
3) only the 2 writes were captured
4) both writes and the truncation were captured

But never capture a state where only the first write and the truncation were
captured (since the second write was performed before the truncation).

A test case for xfstests follows.

Signed-off-by: Filipe Manana fdman...@suse.com
---

V2: Use different approach to solve the problem. Don't start and wait for all
dellaloc to finish after every expanding truncate, instead add an additional
flush at transaction commit time if we're doing a transaction commit that
creates snapshots.

V3: Removed useless test condition in +wait_pending_snapshot_roots_delalloc().

 fs/btrfs/transaction.c | 59 ++
 1 file changed, 59 insertions(+)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 396ae8b..5e7f004 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1714,12 +1714,65 @@ static inline void btrfs_wait_delalloc_flush(struct 
btrfs_fs_info *fs_info)
btrfs_wait_ordered_roots(fs_info, -1);
 }
 
+static int
+start_pending_snapshot_roots_delalloc(struct btrfs_trans_handle *trans,
+ struct list_head *splice)
+{
+   struct btrfs_pending_snapshot *pending_snapshot;
+   int ret = 0;
+
+   if (btrfs_test_opt(trans-root, FLUSHONCOMMIT))
+   return 0;
+
+   spin_lock(trans-root-fs_info-trans_lock);
+   list_splice_init(trans-transaction-pending_snapshots, splice);
+   spin_unlock(trans-root-fs_info-trans_lock);
+
+   /*
+* Start again delalloc for the roots our pending snapshots are made
+* from. We did it before starting/joining a transaction and we do it
+* here again because new inode operations might have happened since
+* then and we want to make sure the snapshot captures a fully
+* consistent state of the source root tree. For example, if after the
+* first delalloc flush a write is made against an inode followed by
+* an expanding truncate, we want to make sure the snapshot captured
+* both the write and the truncation, and not just the truncation.
+* Here we shouldn't have much delalloc work to do, as the bulk of it
+* was done before and outside the 

Re: [PATCH] Btrfs: don't do async reclaim during log replay V2

2014-10-29 Thread Miao Xie
Ping..

On Thu, 23 Oct 2014 16:44:54 +0800, Miao Xie wrote:
 On Thu, 18 Sep 2014 11:27:17 -0400, Josef Bacik wrote:
 Trying to reproduce a log enospc bug I hit a panic in the async reclaim code
 during log replay.  This is because we use fs_info-fs_root as our root for
 shrinking and such.  Technically we can use whatever root we want, but let's
 just not allow async reclaim while we're doing log replay.  Thanks,
 
 Why not move the code of fs_root initialization to the front of log replay?
 I think it is better than the fix way in this patch because the async 
 reclaimer
 can help us do some work.
 
 Thanks
 Miao
 

 Signed-off-by: Josef Bacik jba...@fb.com
 ---
 V1-V2: use fs_info-log_root_recovering instead, didn't notice this existed
 before.

  fs/btrfs/extent-tree.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 28a27d5..44d0497 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -4513,7 +4513,13 @@ again:
  space_info-flush = 1;
  } else if (!ret  space_info-flags  BTRFS_BLOCK_GROUP_METADATA) {
  used += orig_bytes;
 -if (need_do_async_reclaim(space_info, root-fs_info, used) 
 +/*
 + * We will do the space reservation dance during log replay,
 + * which means we won't have fs_info-fs_root set, so don't do
 + * the async reclaim as we will panic.
 + */
 +if (!root-fs_info-log_root_recovering 
 +need_do_async_reclaim(space_info, root-fs_info, used) 
  !work_busy(root-fs_info-async_reclaim_work))
  queue_work(system_unbound_wq,
 root-fs_info-async_reclaim_work);

 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix snapshot inconsistency after a file write followed by truncate

2014-10-29 Thread Miao Xie
On Wed, 29 Oct 2014 08:21:12 +, Filipe Manana wrote:
 If right after starting the snapshot creation ioctl we perform a write 
 against a
 file followed by a truncate, with both operations increasing the file's size, 
 we
 can get a snapshot tree that reflects a state of the source subvolume's tree 
 where
 the file truncation happened but the write operation didn't. This leaves a gap
 between 2 file extent items of the inode, which makes btrfs' fsck complain 
 about it.
 
 For example, if we perform the following file operations:
 
 $ mkfs.btrfs -f /dev/vdd
 $ mount /dev/vdd /mnt
 $ xfs_io -f \
   -c pwrite -S 0xaa -b 32K 0 32K \
   -c fsync \
   -c pwrite -S 0xbb -b 32770 16K 32770 \
   -c truncate 90123 \
   /mnt/foobar
 
 and the snapshot creation ioctl was just called before the second write, we 
 often
 can get the following inode items in the snapshot's btree:
 
 item 120 key (257 INODE_ITEM 0) itemoff 7987 itemsize 160
 inode generation 146 transid 7 size 90123 block group 0 mode 
 100600 links 1 uid 0 gid 0 rdev 0 flags 0x0
 item 121 key (257 INODE_REF 256) itemoff 7967 itemsize 20
 inode ref index 282 namelen 10 name: foobar
 item 122 key (257 EXTENT_DATA 0) itemoff 7914 itemsize 53
 extent data disk byte 1104855040 nr 32768
 extent data offset 0 nr 32768 ram 32768
 extent compression 0
 item 123 key (257 EXTENT_DATA 53248) itemoff 7861 itemsize 53
 extent data disk byte 0 nr 0
 extent data offset 0 nr 40960 ram 40960
 extent compression 0
 
 There's a file range, corresponding to the interval [32K; ALIGN(16K + 32770, 
 4096)[
 for which there's no file extent item covering it. This is because the file 
 write
 and file truncate operations happened both right after the snapshot creation 
 ioctl
 called btrfs_start_delalloc_inodes(), which means we didn't start and wait 
 for the
 ordered extent that matches the write and, in btrfs_setsize(), we were able 
 to call
 btrfs_cont_expand() before being able to commit the current transaction in the
 snapshot creation ioctl. So this made it possibe to insert the hole file 
 extent
 item in the source subvolume (which represents the region added by the 
 truncate)
 right before the transaction commit from the snapshot creation ioctl.
 
 Btrfs' fsck tool complains about such cases with a message like the following:
 
 root 331 inode 257 errors 100, file extent discount
 
From a user perspective, the expectation when a snapshot is created while 
those
 file operations are being performed is that the snapshot will have a file that
 either:
 
 1) is empty
 2) only the first write was captured
 3) only the 2 writes were captured
 4) both writes and the truncation were captured
 
 But never capture a state where only the first write and the truncation were
 captured (since the second write was performed before the truncation).
 
 A test case for xfstests follows.
 
 Signed-off-by: Filipe Manana fdman...@suse.com
 ---
 
 V2: Use different approach to solve the problem. Don't start and wait for all
 dellaloc to finish after every expanding truncate, instead add an 
 additional
 flush at transaction commit time if we're doing a transaction commit that
 creates snapshots.

This method will make the transaction commit spend more time, why not use
i_disk_size to expand the file size in btrfs_setsize()? Or we might rename
btrfs_{start, end}_nocow_write(), and use them in btrfs_setsize()?

Thanks
Miao

 
 V3: Removed useless test condition in +wait_pending_snapshot_roots_delalloc().
 
  fs/btrfs/transaction.c | 59 
 ++
  1 file changed, 59 insertions(+)
 
 diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
 index 396ae8b..5e7f004 100644
 --- a/fs/btrfs/transaction.c
 +++ b/fs/btrfs/transaction.c
 @@ -1714,12 +1714,65 @@ static inline void btrfs_wait_delalloc_flush(struct 
 btrfs_fs_info *fs_info)
   btrfs_wait_ordered_roots(fs_info, -1);
  }
  
 +static int
 +start_pending_snapshot_roots_delalloc(struct btrfs_trans_handle *trans,
 +   struct list_head *splice)
 +{
 + struct btrfs_pending_snapshot *pending_snapshot;
 + int ret = 0;
 +
 + if (btrfs_test_opt(trans-root, FLUSHONCOMMIT))
 + return 0;
 +
 + spin_lock(trans-root-fs_info-trans_lock);
 + list_splice_init(trans-transaction-pending_snapshots, splice);
 + spin_unlock(trans-root-fs_info-trans_lock);
 +
 + /*
 +  * Start again delalloc for the roots our pending snapshots are made
 +  * from. We did it before starting/joining a transaction and we do it
 +  * here again because new inode operations might have happened since
 +  * then and we want to make sure the snapshot captures a fully
 +  * consistent state of the source root 

Re: [PATCH v2] btrfs: ioctl BTRFS_IOC_FS_INFO and BTRFS_IOC_DEV_INFO miss-matched with slots

2014-10-29 Thread Anand Jain



 There will be compatibility issue with this patch running older
 kernel, sorry I slipped some combination. As I see this is already in,
 I am sending a patch to back out this changes if it helps. Thanks.



On 09/04/14 20:02, Anand Jain wrote:




On 09/04/2014 05:58 PM, David Sterba wrote:

On Mon, Aug 18, 2014 at 04:38:18PM +0800, Anand Jain wrote:

ioctl BTRFS_IOC_FS_INFO return num_devices which does _not_ include seed
device, But the following ioctl BTRFS_IOC_DEV_INFO counts and gets seed
disk when probed. So in the userland we hit a count-slot missmatch
bug..
 get_fs_info()
 ::
 BUG_ON(ndevs = fi_args-num_devices);
which hits this bug when we have mounted a seed device.

So to fix this problem here in this patch ioctl BTRFS_IOC_FS_INFO
will provide total_devices instead of num_devices.


The ioctl is very unclear what the 'num_device' actually means.


  Right. Thats also true in kernel. very messy. very confusing.
  tool btrfs-devlist would help understand whats going on.


  $ egrep num_device *.c | egrep total_device
ioctl.c:fi_args-num_devices = fs_devices-total_devices;
super.c:ret = !(fs_devices-num_devices ==
fs_devices-total_devices);
volumes.c:total_devices = btrfs_super_num_devices(disk_super);


  By the way about BTRFS_IOC_DEVICES_READY ioctl above its long time
  broken with seed/replace, just waiting to get these patches integrated
  first so to fix it later.



This would fix the problem partly. Partly because ealier num_devices
included the replacing device but now total_device does not include
the replacing device. Getting a count which includes a transient device
is rather too in efficient/wrong indeed, because there can be a race
condition where in the time between ioctl BTRFS_IOC_FS_INFO to
BTRFS_IOC_DEV_INFO the replace device operation might have been
completed. So to fix this problem its better that user land btrfs-progs
probes replacing device (at devid 0) separately.

v2:
Agree with Wang's comment. Its better to show seed disks under the
sprout fs, so that user can establish mapping of seed to sprout devices.

So here I am making BTRFS_IOC_FS_INFO to return the total_devices
which would count the seed devices (but not the replacing device).


This is even more confusing. I think we need to add another member to
the ioctl struct to reflect the number of regular devices (num_devices)
and the true total number of devices including seeding and replaced
devices.


  that will be a better way. thanks.


The difference should be accompanied by a flag that would say
if there's a seeding or replace in progress.

There are some backward compatibility concerns. Setting num_devices to
total_devices changes semantics of the ioctl, so I think it should stay
as is for now,


  As I have tested there is not backward compatibility issue.
  But from semantics perspective .. agreed.


but the BUG_ON can be removed and replaced by code that
reallocates the buffer or allocates a few more items in advance.


   We don't know how may seed devices are there for a sprout FS.
   So thats not possible.

  Will review  resubmit.

Thanks for commenting.

Anand


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] revert btrfs-progs: do a separate probe for _transient_ replacing device

2014-10-29 Thread Anand Jain
There is a compatibility issue with older kernel with the progs commit id as 
below.

05cd2907557ba627cfb86e60b214ea6228613a84

So as of now writing to revert the above commit id.
The brewing sysfs interface would help to fix the impending issue, which is
seed device would fail show in 'btrfs fi show' output of a sprout device.

Signed-off-by: Anand Jain anand.j...@oracle.com
---
 utils.c | 19 +--
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/utils.c b/utils.c
index a8691fe..1d1cc77 100644
--- a/utils.c
+++ b/utils.c
@@ -1869,29 +1869,12 @@ int get_fs_info(char *path, struct 
btrfs_ioctl_fs_info_args *fi_args,
if (!fi_args-num_devices)
goto out;
 
-   /*
-* with kernel patch
-* btrfs: ioctl BTRFS_IOC_FS_INFO and BTRFS_IOC_DEV_INFO miss-matched 
with slots
-* the kernel now returns total_devices which does not include
-* replacing device if running.
-* As we need to get dev info of the replace device if it is running,
-* so just add one to fi_args-num_devices.
-*/
-
-   di_args = *di_ret = malloc((fi_args-num_devices + 1) * 
sizeof(*di_args));
+   di_args = *di_ret = malloc((fi_args-num_devices) * sizeof(*di_args));
if (!di_args) {
ret = -errno;
goto out;
}
 
-   /* get the replace target device if it is there */
-   ret = get_device_info(fd, i, di_args[ndevs]);
-   if (!ret) {
-   ndevs++;
-   fi_args-num_devices++;
-   }
-   i++;
-
for (; i = fi_args-max_id; ++i) {
BUG_ON(ndevs = fi_args-num_devices);
ret = get_device_info(fd, i, di_args[ndevs]);
-- 
2.0.0.153.g79d

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: fix dev stats error output related to replace handle

2014-10-29 Thread Anand Jain


Hi Gui,

 We don't need this patch. Actually you should back out this patch to
 get this correct.

[PATCH] btrfs-progs: do a separate probe for _transient_ replacing device

 OR apply. this

[PATCH] revert btrfs-progs: do a separate probe for _transient_ 
replacing device


 Try it out. Lets know.

Thanks




On 10/23/14 09:56, Gui Hecheng wrote:

Steps to reproduce:
# mkfs.btrfs -f /dev/sdb7
# mount /dev/sdb7 /mnt
# btrfs dev stats /dev/sdb7
output:
[/dev/sdb7].write_io_errs   0
[/dev/sdb7].read_io_errs0
[/dev/sdb7].flush_io_errs   0
[/dev/sdb7].corruption_errs 0
[/dev/sdb7].generation_errs 0
* ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS) on  failed: No such device

while the following cmd:
# btrfs dev stats /mnt
yields the right thing:
[/dev/sdb7].write_io_errs   0
[/dev/sdb7].read_io_errs0
[/dev/sdb7].flush_io_errs   0
[/dev/sdb7].corruption_errs 0
[/dev/sdb7].generation_errs 0

This is caused by commit:
commit d0588bfa479409b2a0f6243f894338a01a56221a
btrfs-progs: do a separate probe for transient replacing device

The above commit trys to handle the fi show problem with device under
replacing, but it changes the @get_fs_info() logic which annoys dev stats.
For @get_fs_info():
o If the passed in @path is a mount point, then the @get_device_info() to
   probe the replacing device will be glad to accept the device index
   var @i as its init value 0 and the following i++ correctly sets @i
   to 1 as the start of all devices in btrfs.
o If @path is a block device, then the problem comes...
   The device index @i is set to devid of the block device passed in,
   and the @get_device_info() will be forced to accept the devid unwillingly.
   Then the following i++ do the evil of skip the block device desired and an
   empty piece is handled next which causes the ERROR above.

To fix this problem, let's just pass 0 to the @get_device_info() explicitly,
and set the index @i to 1 if a mount point is passed in.

Under my own test, this will not affect the original fix of the fi show
problem with device under replacing.

Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com
---
  utils.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/utils.c b/utils.c
index f10c178..0ba2b26 100644
--- a/utils.c
+++ b/utils.c
@@ -1881,12 +1881,15 @@ int get_fs_info(char *path, struct 
btrfs_ioctl_fs_info_args *fi_args,
}

/* get the replace target device if it is there */
-   ret = get_device_info(fd, i, di_args[ndevs]);
+   ret = get_device_info(fd, 0, di_args[ndevs]);
if (!ret) {
ndevs++;
fi_args-num_devices++;
}
-   i++;
+
+   /* if a mount point is passed in, start from devid 1 */
+   if (fi_args-num_devices != 1)
+   i = 1;

for (; i = fi_args-max_id; ++i) {
BUG_ON(ndevs = fi_args-num_devices);


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: fix dev stats error output related to replace handle

2014-10-29 Thread Gui Hecheng
On Wed, 2014-10-29 at 18:56 +0800, Anand Jain wrote:
 Hi Gui,
 
   We don't need this patch. Actually you should back out this patch to
   get this correct.
 
 [PATCH] btrfs-progs: do a separate probe for _transient_ replacing device
 
   OR apply. this
 
 [PATCH] revert btrfs-progs: do a separate probe for _transient_ 
 replacing device
 
   Try it out. Lets know.
 
 Thanks

Oh, yes, I've tried your revert patch and I acknowledge that it fixes
the problem.

So please *ignore* my patch David, sorry for the noise.

-Gui
 
 
 
 On 10/23/14 09:56, Gui Hecheng wrote:
  Steps to reproduce:
  # mkfs.btrfs -f /dev/sdb7
  # mount /dev/sdb7 /mnt
  # btrfs dev stats /dev/sdb7
  output:
  [/dev/sdb7].write_io_errs   0
  [/dev/sdb7].read_io_errs0
  [/dev/sdb7].flush_io_errs   0
  [/dev/sdb7].corruption_errs 0
  [/dev/sdb7].generation_errs 0
  * ERROR: ioctl(BTRFS_IOC_GET_DEV_STATS) on  failed: No such device
 
  while the following cmd:
  # btrfs dev stats /mnt
  yields the right thing:
  [/dev/sdb7].write_io_errs   0
  [/dev/sdb7].read_io_errs0
  [/dev/sdb7].flush_io_errs   0
  [/dev/sdb7].corruption_errs 0
  [/dev/sdb7].generation_errs 0
 
  This is caused by commit:
  commit d0588bfa479409b2a0f6243f894338a01a56221a
  btrfs-progs: do a separate probe for transient replacing device
 
  The above commit trys to handle the fi show problem with device under
  replacing, but it changes the @get_fs_info() logic which annoys dev stats.
  For @get_fs_info():
  o If the passed in @path is a mount point, then the @get_device_info() to
 probe the replacing device will be glad to accept the device index
 var @i as its init value 0 and the following i++ correctly sets @i
 to 1 as the start of all devices in btrfs.
  o If @path is a block device, then the problem comes...
 The device index @i is set to devid of the block device passed in,
 and the @get_device_info() will be forced to accept the devid 
  unwillingly.
 Then the following i++ do the evil of skip the block device desired and 
  an
 empty piece is handled next which causes the ERROR above.
 
  To fix this problem, let's just pass 0 to the @get_device_info() explicitly,
  and set the index @i to 1 if a mount point is passed in.
 
  Under my own test, this will not affect the original fix of the fi show
  problem with device under replacing.
 
  Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com
  ---
utils.c | 7 +--
1 file changed, 5 insertions(+), 2 deletions(-)
 
  diff --git a/utils.c b/utils.c
  index f10c178..0ba2b26 100644
  --- a/utils.c
  +++ b/utils.c
  @@ -1881,12 +1881,15 @@ int get_fs_info(char *path, struct 
  btrfs_ioctl_fs_info_args *fi_args,
  }
 
  /* get the replace target device if it is there */
  -   ret = get_device_info(fd, i, di_args[ndevs]);
  +   ret = get_device_info(fd, 0, di_args[ndevs]);
  if (!ret) {
  ndevs++;
  fi_args-num_devices++;
  }
  -   i++;
  +
  +   /* if a mount point is passed in, start from devid 1 */
  +   if (fi_args-num_devices != 1)
  +   i = 1;
 
  for (; i = fi_args-max_id; ++i) {
  BUG_ON(ndevs = fi_args-num_devices);
 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] revert btrfs-progs: do a separate probe for _transient_ replacing device

2014-10-29 Thread Gui Hecheng
On Wed, 2014-10-29 at 18:51 +0800, Anand Jain wrote:
 There is a compatibility issue with older kernel with the progs commit id as 
 below.
 
 05cd2907557ba627cfb86e60b214ea6228613a84

Which tree does this commit id belongs to?
I can't find it anywhere?

 So as of now writing to revert the above commit id.
 The brewing sysfs interface would help to fix the impending issue, which is
 seed device would fail show in 'btrfs fi show' output of a sprout device.
 
 Signed-off-by: Anand Jain anand.j...@oracle.com
 ---
  utils.c | 19 +--
  1 file changed, 1 insertion(+), 18 deletions(-)
 
 diff --git a/utils.c b/utils.c
 index a8691fe..1d1cc77 100644
 --- a/utils.c
 +++ b/utils.c
 @@ -1869,29 +1869,12 @@ int get_fs_info(char *path, struct 
 btrfs_ioctl_fs_info_args *fi_args,
   if (!fi_args-num_devices)
   goto out;
  
 - /*
 -  * with kernel patch
 -  * btrfs: ioctl BTRFS_IOC_FS_INFO and BTRFS_IOC_DEV_INFO miss-matched 
 with slots
 -  * the kernel now returns total_devices which does not include
 -  * replacing device if running.
 -  * As we need to get dev info of the replace device if it is running,
 -  * so just add one to fi_args-num_devices.
 -  */
 -
 - di_args = *di_ret = malloc((fi_args-num_devices + 1) * 
 sizeof(*di_args));
 + di_args = *di_ret = malloc((fi_args-num_devices) * sizeof(*di_args));
   if (!di_args) {
   ret = -errno;
   goto out;
   }
  
 - /* get the replace target device if it is there */
 - ret = get_device_info(fd, i, di_args[ndevs]);
 - if (!ret) {
 - ndevs++;
 - fi_args-num_devices++;
 - }
 - i++;
 -
   for (; i = fi_args-max_id; ++i) {
   BUG_ON(ndevs = fi_args-num_devices);
   ret = get_device_info(fd, i, di_args[ndevs]);


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] Btrfs: fix snapshot inconsistency after a file write followed by truncate

2014-10-29 Thread Filipe Manana
If right after starting the snapshot creation ioctl we perform a write against a
file followed by a truncate, with both operations increasing the file's size, we
can get a snapshot tree that reflects a state of the source subvolume's tree 
where
the file truncation happened but the write operation didn't. This leaves a gap
between 2 file extent items of the inode, which makes btrfs' fsck complain 
about it.

For example, if we perform the following file operations:

$ mkfs.btrfs -f /dev/vdd
$ mount /dev/vdd /mnt
$ xfs_io -f \
  -c pwrite -S 0xaa -b 32K 0 32K \
  -c fsync \
  -c pwrite -S 0xbb -b 32770 16K 32770 \
  -c truncate 90123 \
  /mnt/foobar

and the snapshot creation ioctl was just called before the second write, we 
often
can get the following inode items in the snapshot's btree:

item 120 key (257 INODE_ITEM 0) itemoff 7987 itemsize 160
inode generation 146 transid 7 size 90123 block group 0 mode 
100600 links 1 uid 0 gid 0 rdev 0 flags 0x0
item 121 key (257 INODE_REF 256) itemoff 7967 itemsize 20
inode ref index 282 namelen 10 name: foobar
item 122 key (257 EXTENT_DATA 0) itemoff 7914 itemsize 53
extent data disk byte 1104855040 nr 32768
extent data offset 0 nr 32768 ram 32768
extent compression 0
item 123 key (257 EXTENT_DATA 53248) itemoff 7861 itemsize 53
extent data disk byte 0 nr 0
extent data offset 0 nr 40960 ram 40960
extent compression 0

There's a file range, corresponding to the interval [32K; ALIGN(16K + 32770, 
4096)[
for which there's no file extent item covering it. This is because the file 
write
and file truncate operations happened both right after the snapshot creation 
ioctl
called btrfs_start_delalloc_inodes(), which means we didn't start and wait for 
the
ordered extent that matches the write and, in btrfs_setsize(), we were able to 
call
btrfs_cont_expand() before being able to commit the current transaction in the
snapshot creation ioctl. So this made it possibe to insert the hole file extent
item in the source subvolume (which represents the region added by the truncate)
right before the transaction commit from the snapshot creation ioctl.

Btrfs' fsck tool complains about such cases with a message like the following:

root 331 inode 257 errors 100, file extent discount

From a user perspective, the expectation when a snapshot is created while those
file operations are being performed is that the snapshot will have a file that
either:

1) is empty
2) only the first write was captured
3) only the 2 writes were captured
4) both writes and the truncation were captured

But never capture a state where only the first write and the truncation were
captured (since the second write was performed before the truncation).

A test case for xfstests follows.

Signed-off-by: Filipe Manana fdman...@suse.com
---

V2: Use different approach to solve the problem. Don't start and wait for all
dellaloc to finish after every expanding truncate, instead add an additional
flush at transaction commit time if we're doing a transaction commit that
creates snapshots.

V3: Removed useless test condition in +wait_pending_snapshot_roots_delalloc().

V4: Use another approach that doesn't imply starting delalloc work and wait
for it to finish at transaction commit time.

 fs/btrfs/ctree.h   |  4 ++--
 fs/btrfs/extent-tree.c | 16 +---
 fs/btrfs/file.c| 10 +-
 fs/btrfs/inode.c   | 47 ---
 fs/btrfs/ioctl.c   |  7 ---
 5 files changed, 60 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b72b358..36f82ba 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3427,8 +3427,8 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info);
 int btrfs_delayed_refs_qgroup_accounting(struct btrfs_trans_handle *trans,
 struct btrfs_fs_info *fs_info);
 int __get_raid_index(u64 flags);
-int btrfs_start_nocow_write(struct btrfs_root *root);
-void btrfs_end_nocow_write(struct btrfs_root *root);
+int btrfs_start_write_no_snapshoting(struct btrfs_root *root);
+void btrfs_end_write_no_snapshoting(struct btrfs_root *root);
 /* ctree.c */
 int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key,
 int level, int *slot);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a84e00d..9ba886c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9657,12 +9657,14 @@ int btrfs_trim_fs(struct btrfs_root *root, struct 
fstrim_range *range)
 }
 
 /*
- * btrfs_{start,end}_write() is similar to mnt_{want, drop}_write(),
- * they are used to prevent the some tasks writing data into the page cache
- * by nocow before the subvolume is snapshoted, but flush the data into
- * the disk 

Re: read block failed check_tree_block / Couldn't read chunk tree

2014-10-29 Thread Rene Thomas
Can't find commit in official repos

Get fatal: bad object 915902c5002485fb13d27c4b699a73fb66cc0f09 from git show

Found

commit 2513077f2f830b4bc83d528bfb6979eb461918bd

btrfs-progs: fix device missing of btrfs fi show with seed devices


Thanks
René

2014-10-29 4:45 GMT+01:00 Anand Jain anand.j...@oracle.com:

 this is (most likely) due to patch below,
 
 commit 915902c5002485fb13d27c4b699a73fb66cc0f09

 btrfs-progs: fix device missing of btrfs fi show with seed devices
 

  Could you try to back out the patch from progs and give it a shot ?
  and pls report what you see. Thanks.





 On 10/25/14 00:43, Rene Thomas wrote:

   # btrfs --version
 Btrfs v3.17

   # btrfs fi show
 Label: 'mythstorage'  uuid: 9b454272-6800-4b3c-b196-9e180407a6cb
  Total devices 1 FS bytes used 2.36MiB
  devid1 size 931.51GiB used 10.04GiB path /dev/sdd1

   Check tree block failed, want=5845480062976, have=0
 Check tree block failed, want=5845480062976, have=0
 Check tree block failed, want=5845480062976, have=65536
 Check tree block failed, want=5845480062976, have=0
 Check tree block failed, want=5845480062976, have=0
 read block failed check_tree_block
 Couldn't read chunk tree
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] Btrfs: fix snapshot inconsistency after a file write followed by truncate

2014-10-29 Thread Chris Mason



On Wed, Oct 29, 2014 at 7:57 AM, Filipe Manana fdman...@suse.com 
wrote:
If right after starting the snapshot creation ioctl we perform a 
write against a
file followed by a truncate, with both operations increasing the 
file's size, we
can get a snapshot tree that reflects a state of the source 
subvolume's tree where
the file truncation happened but the write operation didn't. This 
leaves a gap
between 2 file extent items of the inode, which makes btrfs' fsck 
complain about it.


For example, if we perform the following file operations:

$ mkfs.btrfs -f /dev/vdd
$ mount /dev/vdd /mnt
$ xfs_io -f \
  -c pwrite -S 0xaa -b 32K 0 32K \
  -c fsync \
  -c pwrite -S 0xbb -b 32770 16K 32770 \
  -c truncate 90123 \
  /mnt/foobar

and the snapshot creation ioctl was just called before the second 
write, we often

can get the following inode items in the snapshot's btree:

item 120 key (257 INODE_ITEM 0) itemoff 7987 itemsize 160
inode generation 146 transid 7 size 90123 block group 
0 mode 100600 links 1 uid 0 gid 0 rdev 0 flags 0x0

item 121 key (257 INODE_REF 256) itemoff 7967 itemsize 20
inode ref index 282 namelen 10 name: foobar
item 122 key (257 EXTENT_DATA 0) itemoff 7914 itemsize 53
extent data disk byte 1104855040 nr 32768
extent data offset 0 nr 32768 ram 32768
extent compression 0
item 123 key (257 EXTENT_DATA 53248) itemoff 7861 itemsize 53
extent data disk byte 0 nr 0
extent data offset 0 nr 40960 ram 40960
extent compression 0

There's a file range, corresponding to the interval [32K; ALIGN(16K + 
32770, 4096)[
for which there's no file extent item covering it. This is because 
the file write
and file truncate operations happened both right after the snapshot 
creation ioctl
called btrfs_start_delalloc_inodes(), which means we didn't start and 
wait for the
ordered extent that matches the write and, in btrfs_setsize(), we 
were able to call
btrfs_cont_expand() before being able to commit the current 
transaction in the
snapshot creation ioctl. So this made it possibe to insert the hole 
file extent
item in the source subvolume (which represents the region added by 
the truncate)

right before the transaction commit from the snapshot creation ioctl.

Btrfs' fsck tool complains about such cases with a message like the 
following:


root 331 inode 257 errors 100, file extent discount

From a user perspective, the expectation when a snapshot is created 
while those
file operations are being performed is that the snapshot will have a 
file that

either:

1) is empty
2) only the first write was captured
3) only the 2 writes were captured
4) both writes and the truncation were captured

But never capture a state where only the first write and the 
truncation were

captured (since the second write was performed before the truncation).

A test case for xfstests follows.

Signed-off-by: Filipe Manana fdman...@suse.com
---

V2: Use different approach to solve the problem. Don't start and wait 
for all
dellaloc to finish after every expanding truncate, instead add an 
additional
flush at transaction commit time if we're doing a transaction 
commit that

creates snapshots.

V3: Removed useless test condition in 
+wait_pending_snapshot_roots_delalloc().


V4: Use another approach that doesn't imply starting delalloc work 
and wait

for it to finish at transaction commit time.


I like this one better ;)  Taking it for a spin here.

-chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation.

2014-10-29 Thread Liu Bo
On Mon, Oct 27, 2014 at 04:36:26PM +0800, Qu Wenruo wrote:
 
  Original Message 
 Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm
 to reduce ENOSPC caused by unbalanced data/metadata allocation.
 From: Liu Bo bo.li@oracle.com
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2014年10月27日 16:14
 On Mon, Oct 27, 2014 at 08:18:12AM +0800, Qu Wenruo wrote:
  Original Message 
 Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm
 to reduce ENOSPC caused by unbalanced data/metadata allocation.
 From: Liu Bo bo.li@oracle.com
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2014年10月24日 19:06
 On Thu, Oct 23, 2014 at 10:37:51AM +0800, Qu Wenruo wrote:
 When btrfs allocate a chunk, it will try to alloc up to 1G for data and
 256M for metadata, or 10% of all the writeable space if there is enough
 10G for data,
  if (type  BTRFS_BLOCK_GROUP_DATA) {
  max_stripe_size = 1024 * 1024 * 1024;
  max_chunk_size = 10 * max_stripe_size;
 Oh, sorry, 10G is right.
 
 Any other comments?
 
 Thanks,
 Qu
 
 
...
 
 thanks,
 -liubo
 
 space for the stripe on device.
 
 However, when we run out of space, this allocation may cause unbalanced
 chunk allocation.
 For example, there are only 1G unallocated space, and request for
 allocate DATA chunk is sent, and all the space will be allocated as data
 chunk, making later metadata chunk alloc request unable to handle, which
 will cause ENOSPC.
 This is the one of the common complains from end users about why ENOSPC
 happens but there is still available space.
 Okay, I don't think this is the common case, AFAIK, the most ENOSPC is caused
 by our runtime worst case metadata reservation problem.
 
 btrfs has been inclined to create a fairly large metadata chunk (1G) in its
 initial mkfs stage and 256M metadata chunk is also a very large one.
 
 As of your below example, yes, we don't have space for metadata
 allocation, but do we really need to allocate a new one?
 
 Or am I missing something?
 
 thanks,
 -liubo
 Yes that's true this is not the common cause, but at least this
 patch may make the percentage
 of 'df' command reach as close to 100% as possible before hitting
 ENOSPC under normal operations.
 (If not using balance)
 
 And some case like the following mail may be improved by the patch:
 https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36097.html
 
 I understand that most of the cases that a lot of free data space
 and no metadata space is caused by
 create and then delete large files, but if the last giga bytes can
 be allocated more carefully,
 at least the available bytes of 'df'  command should be reduced
 before hit ENOSPC.
 
 How do you think about it?

Sorry for the late reply.

I just notice that a recent commit has fixed this problem.

commit 47ab2a6c689913db23ccae38349714edf8365e0a
Author: Josef Bacik jba...@fb.com
Date:   Thu Sep 18 11:20:02 2014 -0400

Btrfs: remove empty block groups automatically

thanks,
-liubo

 
 Thanks,
 Qu
 
 This patch will try not to alloc chunk which is more than half of the
 unallocated space, making the last space more balanced at a small cost
 of more fragmented chunk at the last 1G.
 
 Some easy example:
 Preallocate 17.5G on a 20G empty btrfs fs:
 [Before]
   # btrfs fi show /mnt/test
 Label: none  uuid: da8741b1-5d47-4245-9e94-bfccea34e91e
   Total devices 1 FS bytes used 17.50GiB
   devid1 size 20.00GiB used 20.00GiB path /dev/sdb
 All space is allocated. No space later metadata space.
 
 [After]
   # btrfs fi show /mnt/test
 Label: none  uuid: e6935aeb-a232-4140-84f9-80aab1f23d56
   Total devices 1 FS bytes used 17.50GiB
   devid1 size 20.00GiB used 19.77GiB path /dev/sdb
 About 230M is still available for later metadata allocation.
 
 Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
 ---
   fs/btrfs/volumes.c | 18 ++
   1 file changed, 18 insertions(+)
 
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index d47289c..fa8de79 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -4240,6 +4240,7 @@ static int __btrfs_alloc_chunk(struct 
 btrfs_trans_handle *trans,
   int ret;
   u64 max_stripe_size;
   u64 max_chunk_size;
 + u64 total_avail_space = 0;
   u64 stripe_size;
   u64 num_bytes;
   u64 raid_stripe_len = BTRFS_STRIPE_LEN;
 @@ -4352,10 +4353,27 @@ static int __btrfs_alloc_chunk(struct 
 btrfs_trans_handle *trans,
   devices_info[ndevs].max_avail = max_avail;
   devices_info[ndevs].total_avail = total_avail;
   devices_info[ndevs].dev = device;
 + total_avail_space += total_avail;
   ++ndevs;
   }
   /*
 +  * Try not to occupy more than half of the unallocated space.
 +  * When run short of space and alloc all the space to
 +  * data/metadata will cause ENOSPC to be triggered more easily.
 +  *
 +  * And since the 

Re: Unable to fixup (regular) error in RAID1 fs

2014-10-29 Thread Chris Murphy

On Oct 29, 2014, at 2:08 AM, Juan Orti juan.o...@miceliux.com wrote:

 El 2014-10-29 04:02, Duncan escribió:
 Juan Orti posted on Tue, 28 Oct 2014 16:54:19 +0100 as excerpted:
 [ 3713.086292] BTRFS: unable to fixup (regular) error at logical
 483011874816 on dev /dev/sdb2
 [ 3713.092577] BTRFS: checksum error at logical 483011948544 on dev
 /dev/sdb2, sector 628793528, root 2500, inode 1436631, offset
 4059963392, length 4096, links 1 (path:
 juan/.local/share/gnome-boxes/images/boxes-unknown)
 [ 3713.092584] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt
 38, gen 0
 [ 3713.093035] BTRFS: unable to fixup (regular) error at logical
 483011948544 on dev /dev/sdb2
 Why can't it fix the errors? a bad device? smartctl says the disk is ok.
 I'm currently running a full scrub to see if it finds more errors. What
 should I do?
 Btrfs raid1, and I see you have it for both data and metadata.
 During normal operation, when btrfs comes across a block that doesn't
 match its checksum, it will look to see if there's another copy (which
 there is with raid1, which has exactly two copies) of that block and will
 try to use it instead if so.  If the second copy matches the checksum,
 all is fine and btrfs will in fact attempt to rewrite the bad copy using
 the good copy, as well as returning the good copy to whatever was
 reading it.
 Those corruption errors seem to indicate that it can't find a good
 copy to update the bad copy with -- both copies ended up bad.  Either
 that or it found the good copy and returned it to whatever was reading,
 but couldn't rewrite the bad copy, for some reason.
 I'm not sure which of those interpretations is correct, but given
 that you didn't see anything else bad happening, no apps returning
 errors due to read error, etc, I'd guess the second.  Because
 otherwise whatever was doing the read should have returned an
 error.
 
 When this error happened, I was editing some text files with vi, and it was 
 painfully slow, it took 30 seconds to open a 20 lines file, so something 
 weird was going on. Anyway, no visible user space error could be seen.

Anything in dmesg prior to the previously reported errors?

Either with syslog messages or journalctl, filter by btrfs and see what you get 
for the past couple of days. And then also find out what ata port the two 
drives are on and filter by those; usually in the form ataX.00. You could also 
search for exception Emask and see if anything comes up. This would account 
for either controller or drive hardware error messages.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


v3.18-rc2 at a 32 bit KVM gives :INFO: trying to register non-static key.the code is fine but needs lockdep annotation.

2014-10-29 Thread Toralf Förster
This is new in my eyes, or ? :

Oct 29 17:53:04 n22kvmclone kernel: INFO: trying to register non-static key.
Oct 29 17:53:04 n22kvmclone kernel: the code is fine but needs lockdep 
annotation.
Oct 29 17:53:04 n22kvmclone kernel: turning off the locking correctness 
validator.
Oct 29 17:53:04 n22kvmclone kernel: CPU: 0 PID: 2525 Comm: trinity-c0 Not 
tainted 3.18.0-rc2 #1
Oct 29 17:53:04 n22kvmclone kernel: Hardware name: QEMU Standard PC (i440FX + 
PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 
04/01/2014
Oct 29 17:53:04 n22kvmclone kernel:    f55e5b70 c2a5d3ba 
 f55e5bc4 c2684a7b c2ba5888
Oct 29 17:53:04 n22kvmclone kernel:  c2a64822 f55a f55e5bb4  
 0001  f890b458
Oct 29 17:53:04 n22kvmclone kernel:   f55a0df4 f55a0e04 f40c3000 
0100 c2d68578 f5719c94 f62b6ea0
Oct 29 17:53:04 n22kvmclone kernel: Call Trace:
Oct 29 17:53:04 n22kvmclone kernel:  [c2a5d3ba] dump_stack+0x41/0x52
Oct 29 17:53:04 n22kvmclone kernel:  [c2684a7b] 
__lock_acquire.isra.31+0x89b/0x9a0
Oct 29 17:53:04 n22kvmclone kernel:  [c2a64822] ? _raw_spin_unlock+0x22/0x30
Oct 29 17:53:04 n22kvmclone kernel:  [f890b458] ? 
btrfs_make_block_group+0x1d8/0x290 [btrfs]
Oct 29 17:53:04 n22kvmclone kernel:  [c263f360] ? native_wbinvd+0x10/0x10
Oct 29 17:53:04 n22kvmclone kernel:  [c26850ff] lock_acquire+0x8f/0x110
Oct 29 17:53:04 n22kvmclone kernel:  [f894d001] ? btrfs_alloc_chunk+0x41/0x50 
[btrfs]
Oct 29 17:53:04 n22kvmclone kernel:  [f8949304] 
__btrfs_alloc_chunk+0x684/0xb10 [btrfs]
Oct 29 17:53:04 n22kvmclone kernel:  [f894d001] ? btrfs_alloc_chunk+0x41/0x50 
[btrfs]
Oct 29 17:53:04 n22kvmclone kernel:  [f894d001] btrfs_alloc_chunk+0x41/0x50 
[btrfs]
Oct 29 17:53:04 n22kvmclone kernel:  [f8901f8d] do_chunk_alloc+0x1dd/0x410 
[btrfs]
Oct 29 17:53:04 n22kvmclone kernel:  [f88fc196] ? 
get_alloc_profile+0x166/0x2d0 [btrfs]
Oct 29 17:53:04 n22kvmclone kernel:  [f8903144] 
btrfs_check_data_free_space+0x144/0x320 [btrfs]
Oct 29 17:53:04 n22kvmclone kernel:  [f8930e8b] 
__btrfs_buffered_write+0x10b/0x550 [btrfs]
Oct 29 17:53:04 n22kvmclone kernel:  [f89316c0] 
btrfs_file_write_iter+0x3f0/0x6c0 [btrfs]
Oct 29 17:53:04 n22kvmclone kernel:  [c2755eba] ? 
do_iter_readv_writev+0x6a/0xa0
Oct 29 17:53:04 n22kvmclone kernel:  [c2755eba] do_iter_readv_writev+0x6a/0xa0
Oct 29 17:53:04 n22kvmclone kernel:  [f89312d0] ? 
__btrfs_buffered_write+0x550/0x550 [btrfs]
Oct 29 17:53:04 n22kvmclone kernel:  [c2757210] do_readv_writev+0xa0/0x270
Oct 29 17:53:04 n22kvmclone kernel:  [f89312d0] ? 
__btrfs_buffered_write+0x550/0x550 [btrfs]
Oct 29 17:53:04 n22kvmclone kernel:  [c2755f80] ? 
do_sync_readv_writev+0x90/0x90
Oct 29 17:53:04 n22kvmclone kernel:  [c27720c0] ? __fdget_pos+0x30/0x40
Oct 29 17:53:04 n22kvmclone kernel:  [c269d6c1] ? do_setitimer+0x121/0x200
Oct 29 17:53:04 n22kvmclone kernel:  [c2a64992] ? 
_raw_spin_unlock_irq+0x22/0x40
Oct 29 17:53:04 n22kvmclone kernel:  [c269d6c1] ? do_setitimer+0x121/0x200
Oct 29 17:53:04 n22kvmclone kernel:  [c2757414] vfs_writev+0x34/0x60
Oct 29 17:53:04 n22kvmclone kernel:  [c27575d6] SyS_writev+0x56/0xe0
Oct 29 17:53:04 n22kvmclone kernel:  [c2a6522b] sysenter_do_call+0x12/0x12

-- 
Toralf
pgp key: 0076 E94E

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID1 fails to recover chunk tree

2014-10-29 Thread Zack Coffey


$ sudo mount -o degraded,ro /dev/sdd1 /asdf
mount: wrong fs type, bad option, bad superblock on /dev/sdd1,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.
$ dmesg | tail
[524718.760792] BTRFS info (device sdd1): allowing degraded mounts
[524718.760800] BTRFS info (device sdd1): disk space caching is enabled
[524718.762087] BTRFS: failed to read chunk root on sdd1
[524718.776524] BTRFS: open_ctree failed

$ uname -a
Linux mach 3.17.1-52.g5c4d099-desktop #1 SMP PREEMPT Sat Oct 18 23:36:23 
UTC 2014 (5c4d099) x86_64 x86_64 x86_64 GNU/Linux

$ btrfs --version
Btrfs v3.16.2+20141003


On 10/28/2014 11:55 PM, Anand Jain wrote:



 'mount degraded,ro'
  see if there is any non-zero non-raid1 group profile.



On 10/29/14 04:32, Zack Coffey wrote:

Revisit of a previous issue. Setup a single 640GB drive with BTRFS and
compression. This was not a system drive, just a place to put random
junk.

Made a RAID1 with another drive of just the metadata. Was in
that state for less than 12 hours-ish, removed the second drive and
now cannot get to any data on the original drive. Data remained single
while only metadata was RAID1.

Single drive btrfs was made on Ubuntu with kernel 3.13.0 and tools
3.12.

$ sudo mount -o degraded /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so

$ dmesg | tail
[45353.869448] KBD BUG in
../../../../../../../../
drivers/2d/lnx/fgl/drm/kernel/
gal.c at line:
304!
[45353.901511] KBD BUG in
../../../../../../../../
drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45353.901666] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148488] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148573] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[46241.155350] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67
devid 1 transid 60944 /dev/sdc1
[46241.155923] btrfs: allowing degraded mounts
[46241.155927] btrfs: disk space caching is enabled
[46241.159436] btrfs: failed to read chunk root on sdc1
[46241.177815] btrfs: open_ctree failed

$ btrfs-show-super /dev/sdc1
superblock: bytenr=65536, device=/dev/sdc1
--
---
csum0x93bcb1b5 [match]
bytenr  65536
flags   0x1
magic   _BHRfS_M [match]
fsidbd78815a-802b-43e2-8387-fc6ab4237d67
label
generation  60944
root909586694144
sys_array_size  97
chunk_root_generation   60938
root_level  1
chunk_root  911673917440
chunk_root_level1
log_root0
log_root_transid0
log_root_level  0
total_bytes 1115871535104
bytes_used  321833435136
sectorsize  4096
nodesize4096
leafsize4096
stripesize  4096
root_dir6
num_devices 2
compat_flags0x0
compat_ro_flags 0x0
incompat_flags  0x9
csum_type   0
csum_size   4
cache_generation60944
uuid_tree_generation60944
dev_item.uuid   d82b2027-17b6-4513-a86d-9227a42d7ed1
dev_item.fsid   bd78815a-802b-43e2-8387-fc6ab4237d67 [match]
dev_item.type   0
dev_item.total_bytes615763673088
dev_item.bytes_used 324270030848
dev_item.io_align   4096
dev_item.io_width   4096
dev_item.sector_size4096
dev_item.devid  1
dev_item.dev_group  0
dev_item.seek_speed 0
dev_item.bandwidth  0
dev_item.generation 0


$ sudo btrfs device add -f /dev/sdh1 /dev/sdc1
ERROR: error adding the device '/dev/sdh1' - Inappropriate ioctl for 
device


$ sudo btrfs device delete missing /dev/sdc1
ERROR: error removing the device 'missing' - Inappropriate ioctl for 
device


$ sudo mount -o degraded,defaults,compress=lzo /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so

$ dmesg | tail
[106991.655384] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[106991.665066] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.954397] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.962009] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107070.124927] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107070.126475] btrfs: allowing 

Fix Penguin Penalty 17th October2014 ( mail-archive.com )

2014-10-29 Thread calm83767
Dear Sir

Did your website get hit by Google Penguin update on October 17th 2014? What 
basically is Google Penguin Update? It is actually a code name for Google 
algorithm which aims at decreasing your websites search engine rankings that 
violate Google’s guidelines by using black hat SEO techniques to rank your 
webpage by giving number of spammy links to the page.
 
We are one of those few SEO companies that can help you avoid penalties from 
Google Updates like Penguin and Panda. Our clients have survived all the 
previous and present updates with ease. They have never been hit because we use 
100% white hat SEO techniques to rank Webpages.  Simple thing that we do to 
keep websites away from any Penguin or Panda penalties is follow Google 
guidelines and we give Google users the best answers to their queries.

If you are looking to increase the quality of your websites and to get more 
targeted traffic or save your websites from these Google penalties email us 
back with your interest. 

We will be glad to serve you and help you grow your business.

Regards

Vince G

SEO Manager ( TOB )
B7 Green Avenue, Amritsar 143001 Punjab

NO CLICK in the subject to STOP EMAILS
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs raid1 array has issues with rtorrent usage pattern.

2014-10-29 Thread Dan Merillat
I'm in the middle of debugging the exact same thing.  3.17.0 -
rtorrent dies with SIGBUS.

I've done some debugging, the sequence is something like this:
open a new file
fallocate() to the final size
mmap() all (or a portion) of the file
write to the region
run SHA1 on that mmap'd region to validate the chink
crash, eventually.  Generally not at the same point.

Reading that file (cat  /dev/null) returns -EIO.

Looking up the process maps, the SIGBUS appears to be happening in the
middle of a mapped region of a pre-allocated file - I.E. it shouldn't
be.  I'm not completely ruling out a rtorrent bug but it appears sane
to me.

Weirder: old files, that have been around a while, work just fine for seeding.
I've re-hashed my entire collection without an error.

Seeing this on both inherit-COW and no-inherit-COW files, and the
filesystem is not using compression.

The interesting part is going back and attempting to read the files
later they sometimes don't throw an IO error.

Absolutely nothing in dmesg.

Working on a testcase that triggers it reliably but no luck so far.  I
thought I had bad RAM but two people upgrading to 3.17 and seeing the
same bug at around the same time can't be a coincidence.  I rebooted
to 3.17 on the 25th, the first new download was on the 28th and that
failed.

Working on a testcase for it that's more reproducable than go grab
torrent files with rtorrent.

On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne a...@tevsa.net wrote:
 Hi, it seems that when using rtorrent to download into a btrfs system,
 it leads to the creation of files that fail to read properly.
 For instance, I get rtorrent to crash, but if I try to rsync the file he
 was writting into someplace else, rsync also fails with the message
 can't map file $file: Input/Output error (5).
 If I give it time, eventually the file gets into a good state and I can
 rsync it somewhere else (as long as rtorrent doesn't keep writting into
 it). This doesn't happen using ext4 on the same system.

 No btrfs errors, or any other errors, show up in any log. Scrubbing or
 balancing don't turn up any issues. I've tried using a subvolume mounted
 with nodatacow and/or flushoncommit, which didn't help. I'm not using
 quotas and at some point had a single snapshot that I deleted. The
 filesystem was originally created recently (on a 3.16.4+ kernel).

 Here's what the array looks like:

 Label: 'data'  uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811
 Total devices 4 FS bytes used 3.14TiB
 devid4 size 2.73TiB used 2.36TiB path /dev/sdd1
 devid5 size 1.82TiB used 1.45TiB path /dev/sdc1
 devid6 size 1.82TiB used 1.45TiB path /dev/sdb1
 devid7 size 1.82TiB used 1.45TiB path /dev/sda1

 Btrfs v3.17

 Data, RAID1: total=3.34TiB, used=3.13TiB
 System, RAID1: total=32.00MiB, used=512.00KiB
 Metadata, RAID1: total=10.00GiB, used=7.31GiB
 GlobalReserve, single: total=512.00MiB, used=0.00B


 On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28
 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3
 AuthenticAMD GNU/Linux

 I'm utterly puzzled and clueless at how to dig into this issue.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[bug] allows umount before transactions complete

2014-10-29 Thread Chris Murphy
Filed bug here with more details and complete dmesg attached:
https://bugzilla.kernel.org/show_bug.cgi?id=87131

kernel-3.18.0-0.rc2.git1.1.fc22.x86_64

SUMMARY:
 After umount returning to prompt, and physical disconnected 2x devices (btrfs 
raid1 on raw devices), I get a backtrace with some scary messages. Here are 
some snippets:
[10570.371285] BTRFS: error (device sdc) in btrfs_commit_transaction:1917: 
errno=-5 IO failure (Error while writing out transaction)
[10570.372426] BTRFS info (device sdc): forced readonly
[10570.372432] BTRFS warning (device sdc): Skipping commit of aborted 
transaction.
[10570.372456] BTRFS: Transaction aborted (error -5)
[10570.372807] BTRFS: error (device sdc) in cleanup_transaction:1599: errno=-5 
IO failure
[10570.373960] BTRFS info (device sdc): delayed_refs has NO entry

After reboot, kernel shows both devids have the same generation.
btrfs check comes up clean.
mount also has no complaints, and mounts rw.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID1 fails to recover chunk tree

2014-10-29 Thread Robert White

On 10/28/2014 01:32 PM, Zack Coffey wrote:

Made a RAID1 with another drive of just the metadata. Was in
that state for less than 12 hours-ish, removed the second drive and
now cannot get to any data on the original drive. Data remained single
while only metadata was RAID1.


I don't know all the details but I would _never_ suspect the action you 
described to _not_ hose up the file system.


The single mode is not restrict to one drive its concatenation, as 
in treat the entire space as if it were a single drive.


In that twelve hour window data migrated. I _think_ directories may 
count as data in this sense. If a key element (say the root directory) 
migrated onto the disk you eventually removed then there is no root 
directory to read. And if not root, then any secondary directory you choose.


So sure your checksum trees and your extent maps were all duplicated in 
the mirror, but your actual data -- you know all those files that were 
copied on write -- may well be only on that second drive you pulled out.


RAID metadata, and non RAID1 data, would not safely allow for failure 
(or removal) of one drive.


I'm not sure what you expected to happen but what you did is full of fail.

You need to put the second drive back in and then coerce all the data 
back to the first drive. btrfs device delete is what you want. You 
_may_ need to switch the metadata back to single before the delete.


--Rob.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs raid1 array has issues with rtorrent usage pattern.

2014-10-29 Thread Dan Merillat
The following code reliably throws a SIGBUS in the memset, and cat
testfile  /dev/null returns an IO error.

I've sometimes gotten as high as iteration 900 before a SIGBUS, so
don't assume a single clear is OK.

linux 3.17.0, SATA - MD(raid5) - bcache (ssd) - btrfs

Working on eliminating more variables.

#include fcntl.h
#include unistd.h
#include sys/mman.h
#include stdint.h
#include stdlib.h
#include stdio.h
#include string.h

#define MB  (1024ull * 1024)
#define GB  (1024ull * MB)
#define TEST_SIZE   (4096)

int main() {
int fd;
srandom(1024);
fd=open(testfile, O_RDWR|O_CREAT, 0600);
posix_fallocate(fd, 0, TEST_SIZE * MB);

uint8_t * map = 0;

int i;
for(i=0;i1000;i++) {
size_t location=(random() % (TEST_SIZE-1)) * MB;
map = (uint8_t *) mmap(map, MB, PROT_READ|PROT_WRITE,
MAP_SHARED,
fd, location);

printf(%d: writing at %04zd mb\n, i, location);

memset(map, 0x5a, 1 * MB);
msync(map, 1*MB, MS_ASYNC);

munmap(map, MB);
}
}

On Wed, Oct 29, 2014 at 5:50 PM, Dan Merillat dan.meril...@gmail.com wrote:
 I'm in the middle of debugging the exact same thing.  3.17.0 -
 rtorrent dies with SIGBUS.

 I've done some debugging, the sequence is something like this:
 open a new file
 fallocate() to the final size
 mmap() all (or a portion) of the file
 write to the region
 run SHA1 on that mmap'd region to validate the chink
 crash, eventually.  Generally not at the same point.

 Reading that file (cat  /dev/null) returns -EIO.

 Looking up the process maps, the SIGBUS appears to be happening in the
 middle of a mapped region of a pre-allocated file - I.E. it shouldn't
 be.  I'm not completely ruling out a rtorrent bug but it appears sane
 to me.

 Weirder: old files, that have been around a while, work just fine for 
 seeding.
 I've re-hashed my entire collection without an error.

 Seeing this on both inherit-COW and no-inherit-COW files, and the
 filesystem is not using compression.

 The interesting part is going back and attempting to read the files
 later they sometimes don't throw an IO error.

 Absolutely nothing in dmesg.

 Working on a testcase that triggers it reliably but no luck so far.  I
 thought I had bad RAM but two people upgrading to 3.17 and seeing the
 same bug at around the same time can't be a coincidence.  I rebooted
 to 3.17 on the 25th, the first new download was on the 28th and that
 failed.

 Working on a testcase for it that's more reproducable than go grab
 torrent files with rtorrent.

 On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne a...@tevsa.net wrote:
 Hi, it seems that when using rtorrent to download into a btrfs system,
 it leads to the creation of files that fail to read properly.
 For instance, I get rtorrent to crash, but if I try to rsync the file he
 was writting into someplace else, rsync also fails with the message
 can't map file $file: Input/Output error (5).
 If I give it time, eventually the file gets into a good state and I can
 rsync it somewhere else (as long as rtorrent doesn't keep writting into
 it). This doesn't happen using ext4 on the same system.

 No btrfs errors, or any other errors, show up in any log. Scrubbing or
 balancing don't turn up any issues. I've tried using a subvolume mounted
 with nodatacow and/or flushoncommit, which didn't help. I'm not using
 quotas and at some point had a single snapshot that I deleted. The
 filesystem was originally created recently (on a 3.16.4+ kernel).

 Here's what the array looks like:

 Label: 'data'  uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811
 Total devices 4 FS bytes used 3.14TiB
 devid4 size 2.73TiB used 2.36TiB path /dev/sdd1
 devid5 size 1.82TiB used 1.45TiB path /dev/sdc1
 devid6 size 1.82TiB used 1.45TiB path /dev/sdb1
 devid7 size 1.82TiB used 1.45TiB path /dev/sda1

 Btrfs v3.17

 Data, RAID1: total=3.34TiB, used=3.13TiB
 System, RAID1: total=32.00MiB, used=512.00KiB
 Metadata, RAID1: total=10.00GiB, used=7.31GiB
 GlobalReserve, single: total=512.00MiB, used=0.00B


 On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28
 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3
 AuthenticAMD GNU/Linux

 I'm utterly puzzled and clueless at how to dig into this issue.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID1 fails to recover chunk tree

2014-10-29 Thread Robert White

On 10/29/2014 03:26 PM, Robert White wrote:

On 10/28/2014 01:32 PM, Zack Coffey wrote:

Made a RAID1 with another drive of just the metadata. Was in
that state for less than 12 hours-ish, removed the second drive and
now cannot get to any data on the original drive. Data remained single
while only metadata was RAID1.


I don't know all the details but I would _never_ suspect the action you
described to _not_ hose up the file system.
You need to put the second drive back in and then coerce all the data
back to the first drive. btrfs device delete is what you want. You
_may_ need to switch the metadata back to single before the delete.

--Rob.



P.S. I am/was assuming you said removed the second drive in the normal 
sense of disconnecting and removing, as opposed to the semantic action 
of deleting the device element.


If you did do the btrfs delete, you might have needed to do a btrfs 
filesystem sync to make sure that all the transactions involved in the 
delete were finished and flushed to disk.


Either way, physically reattaching the second drive is your first 
step; presuming again that you haven't destroyed the partition or 
re-used the drive etc. If the partition will mount once the second drive 
is in place, do the delete operation (if you didn't) and then the sync 
(to make sure that everything has finished migrating etc). Then you 
should be able to re-remove the physical drive.


If you already did the delete and sync as part of what you meant by 
remove then sorry for the interruption of your misery. 8-)


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation.

2014-10-29 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to 
reduce ENOSPC caused by unbalanced data/metadata allocation.

From: Liu Bo bo.li@oracle.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2014年10月29日 22:29

On Mon, Oct 27, 2014 at 04:36:26PM +0800, Qu Wenruo wrote:

 Original Message 
Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm
to reduce ENOSPC caused by unbalanced data/metadata allocation.
From: Liu Bo bo.li@oracle.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2014年10月27日 16:14

On Mon, Oct 27, 2014 at 08:18:12AM +0800, Qu Wenruo wrote:

 Original Message 
Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm
to reduce ENOSPC caused by unbalanced data/metadata allocation.
From: Liu Bo bo.li@oracle.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2014年10月24日 19:06

On Thu, Oct 23, 2014 at 10:37:51AM +0800, Qu Wenruo wrote:

When btrfs allocate a chunk, it will try to alloc up to 1G for data and
256M for metadata, or 10% of all the writeable space if there is enough

10G for data,
 if (type  BTRFS_BLOCK_GROUP_DATA) {
 max_stripe_size = 1024 * 1024 * 1024;
 max_chunk_size = 10 * max_stripe_size;

Oh, sorry, 10G is right.

Any other comments?

Thanks,
Qu



...

thanks,
-liubo


space for the stripe on device.

However, when we run out of space, this allocation may cause unbalanced
chunk allocation.
For example, there are only 1G unallocated space, and request for
allocate DATA chunk is sent, and all the space will be allocated as data
chunk, making later metadata chunk alloc request unable to handle, which
will cause ENOSPC.
This is the one of the common complains from end users about why ENOSPC
happens but there is still available space.

Okay, I don't think this is the common case, AFAIK, the most ENOSPC is caused
by our runtime worst case metadata reservation problem.

btrfs has been inclined to create a fairly large metadata chunk (1G) in its
initial mkfs stage and 256M metadata chunk is also a very large one.

As of your below example, yes, we don't have space for metadata
allocation, but do we really need to allocate a new one?

Or am I missing something?

thanks,
-liubo

Yes that's true this is not the common cause, but at least this
patch may make the percentage
of 'df' command reach as close to 100% as possible before hitting
ENOSPC under normal operations.
(If not using balance)

And some case like the following mail may be improved by the patch:
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36097.html

I understand that most of the cases that a lot of free data space
and no metadata space is caused by
create and then delete large files, but if the last giga bytes can
be allocated more carefully,
at least the available bytes of 'df'  command should be reduced
before hit ENOSPC.

How do you think about it?

Sorry for the late reply.

I just notice that a recent commit has fixed this problem.

commit 47ab2a6c689913db23ccae38349714edf8365e0a
Author: Josef Bacik jba...@fb.com
Date:   Thu Sep 18 11:20:02 2014 -0400

 Btrfs: remove empty block groups automatically
 
thanks,

-liubo

Oh, that's much better than my patch.

So please ignore my patch.

Thanks,
Qu



Thanks,
Qu

This patch will try not to alloc chunk which is more than half of the
unallocated space, making the last space more balanced at a small cost
of more fragmented chunk at the last 1G.

Some easy example:
Preallocate 17.5G on a 20G empty btrfs fs:
[Before]
  # btrfs fi show /mnt/test
Label: none  uuid: da8741b1-5d47-4245-9e94-bfccea34e91e
Total devices 1 FS bytes used 17.50GiB
devid1 size 20.00GiB used 20.00GiB path /dev/sdb
All space is allocated. No space later metadata space.

[After]
  # btrfs fi show /mnt/test
Label: none  uuid: e6935aeb-a232-4140-84f9-80aab1f23d56
Total devices 1 FS bytes used 17.50GiB
devid1 size 20.00GiB used 19.77GiB path /dev/sdb
About 230M is still available for later metadata allocation.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
  fs/btrfs/volumes.c | 18 ++
  1 file changed, 18 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d47289c..fa8de79 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4240,6 +4240,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
int ret;
u64 max_stripe_size;
u64 max_chunk_size;
+   u64 total_avail_space = 0;
u64 stripe_size;
u64 num_bytes;
u64 raid_stripe_len = BTRFS_STRIPE_LEN;
@@ -4352,10 +4353,27 @@ static int __btrfs_alloc_chunk(struct 
btrfs_trans_handle *trans,
devices_info[ndevs].max_avail = max_avail;
devices_info[ndevs].total_avail = total_avail;
devices_info[ndevs].dev = device;
+   total_avail_space += total_avail;
 

[PATCH 1/2] btrfs-progs: make the search target device routine more clear for fi show

2014-10-29 Thread Gui Hecheng
Extract the procedure of searching for a target device for fi show
from the @map_seed_devices() function to make it more clear.

Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com
---
 cmds-filesystem.c | 37 -
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index bb5881e..6437e57 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -742,14 +742,10 @@ static int find_and_copy_seed(struct btrfs_fs_devices 
*seed,
return 1;
 }
 
-static int map_seed_devices(struct list_head *all_uuids,
-   char *search, int *found)
+static int search_umounted_fs_uuids(struct list_head *all_uuids,
+   char *search)
 {
-   struct btrfs_fs_devices *cur_fs, *cur_seed;
-   struct btrfs_fs_devices *fs_copy, *seed_copy;
-   struct btrfs_fs_devices *opened_fs;
-   struct btrfs_device *device;
-   struct btrfs_fs_info *fs_info;
+   struct btrfs_fs_devices *cur_fs, *fs_copy;
struct list_head *fs_uuids;
int ret = 0;
 
@@ -764,7 +760,7 @@ static int map_seed_devices(struct list_head *all_uuids,
if (search) {
if (uuid_search(cur_fs, search) == 0)
continue;
-   *found = 1;
+   ret = 1;
}
 
fs_copy = malloc(sizeof(*fs_copy));
@@ -782,6 +778,22 @@ static int map_seed_devices(struct list_head *all_uuids,
list_add(fs_copy-list, all_uuids);
}
 
+out:
+   return ret;
+}
+
+static int map_seed_devices(struct list_head *all_uuids)
+{
+   struct btrfs_fs_devices *cur_fs, *cur_seed;
+   struct btrfs_fs_devices *seed_copy;
+   struct btrfs_fs_devices *opened_fs;
+   struct btrfs_device *device;
+   struct btrfs_fs_info *fs_info;
+   struct list_head *fs_uuids;
+   int ret = 0;
+
+   fs_uuids = btrfs_scanned_uuids();
+
list_for_each_entry(cur_fs, all_uuids, list) {
device = list_first_entry(cur_fs-devices,
struct btrfs_device, dev_list);
@@ -943,11 +955,18 @@ devs_only:
return 1;
}
 
+   found = search_umounted_fs_uuids(all_uuids, search);
+   if (found  0) {
+   fprintf(stderr,
+   ERROR: %d while searching target device\n, ret);
+   return 1;
+   }
+
/*
 * scan_for_btrfs() don't build seed/sprout mapping,
 * do mapping build for each scanned fs here
 */
-   ret = map_seed_devices(all_uuids, search, found);
+   ret = map_seed_devices(all_uuids);
if (ret) {
fprintf(stderr,
ERROR: %d while mapping seed devices\n, ret);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs-progs: skip mounted fs when deal with umounted ones for fi show

2014-10-29 Thread Gui Hecheng
Stalling problems may happen when exec balance  fi show cmds concurrently.

With the following commit:
commit 915902c500
btrfs-progs: fix device missing of btrfs fi show with seed devices

The fi show cmd will bother the mounted fs when only umounted fs should
be handled after @btrfs_can_kernel() has finished showing all mounted ones.

We could skip the mounted fs after @btrfs_can_kernel() is done, then tasks
keeps going on mounted fs while fi show continues on umounted ones separately.

Reported-by: Petr Janecek jane...@ucw.cz
Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com
---
 cmds-filesystem.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 6437e57..67fe52b 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -53,6 +53,15 @@ struct seen_fsid {
 
 static struct seen_fsid *seen_fsid_hash[SEEN_FSID_HASH_SIZE] = {NULL,};
 
+static int is_seen_fsid(u8 *fsid)
+{
+   u8 hash = fsid[0];
+   int slot = hash % SEEN_FSID_HASH_SIZE;
+   struct seen_fsid *seen = seen_fsid_hash[slot];
+
+   return seen ? 1 : 0;
+}
+
 static int add_seen_fsid(u8 *fsid)
 {
u8 hash = fsid[0];
@@ -763,6 +772,10 @@ static int search_umounted_fs_uuids(struct list_head 
*all_uuids,
ret = 1;
}
 
+   /* skip all fs already shown as mounted fs */
+   if (is_seen_fsid(cur_fs-fsid))
+   continue;
+
fs_copy = malloc(sizeof(*fs_copy));
if (!fs_copy) {
ret = -ENOMEM;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: rebuild missing block group during chunk recovery if possible

2014-10-29 Thread Qu Wenruo
Before the patch, chunk will be considered bad if the corresponding
block group is missing, even the only uncertain data is the 'used'
member of the block group.

This patch will try to recalculate the 'used' value of the block group
and rebuild it.
So even only chunk item and dev extent item is found, the chunk can be
recovered.
Although if extent tree is damanged and needed extent item can't be
read, the block group's 'used' value will be the block group length, to
prevent any later write/block reserve damaging the block group.
In that case, we will prompt user and recommend them to use
'--init-extent-tree' to rebuild extent tree if possible.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 btrfsck.h   |   3 +-
 chunk-recover.c | 242 +---
 cmds-check.c|  29 ---
 3 files changed, 234 insertions(+), 40 deletions(-)

diff --git a/btrfsck.h b/btrfsck.h
index 356c767..7a50648 100644
--- a/btrfsck.h
+++ b/btrfsck.h
@@ -179,5 +179,6 @@ btrfs_new_device_extent_record(struct extent_buffer *leaf,
 int check_chunks(struct cache_tree *chunk_cache,
 struct block_group_tree *block_group_cache,
 struct device_extent_tree *dev_extent_cache,
-struct list_head *good, struct list_head *bad, int silent);
+struct list_head *good, struct list_head *bad,
+struct list_head *rebuild, int silent);
 #endif
diff --git a/chunk-recover.c b/chunk-recover.c
index 6f43066..dbf98b5 100644
--- a/chunk-recover.c
+++ b/chunk-recover.c
@@ -61,6 +61,7 @@ struct recover_control {
 
struct list_head good_chunks;
struct list_head bad_chunks;
+   struct list_head rebuild_chunks;
struct list_head unrepaired_chunks;
pthread_mutex_t rc_lock;
 };
@@ -203,6 +204,7 @@ static void init_recover_control(struct recover_control 
*rc, int verbose,
 
INIT_LIST_HEAD(rc-good_chunks);
INIT_LIST_HEAD(rc-bad_chunks);
+   INIT_LIST_HEAD(rc-rebuild_chunks);
INIT_LIST_HEAD(rc-unrepaired_chunks);
 
rc-verbose = verbose;
@@ -529,22 +531,32 @@ static void print_check_result(struct recover_control *rc)
return;
 
printf(CHECK RESULT:\n);
-   printf(Healthy Chunks:\n);
+   printf(Recoverable Chunks:\n);
list_for_each_entry(chunk, rc-good_chunks, list) {
print_chunk_info(chunk,   );
good++;
total++;
}
-   printf(Bad Chunks:\n);
+   list_for_each_entry(chunk, rc-rebuild_chunks, list) {
+   print_chunk_info(chunk,   );
+   good++;
+   total++;
+   }
+   list_for_each_entry(chunk, rc-unrepaired_chunks, list) {
+   print_chunk_info(chunk,   );
+   good++;
+   total++;
+   }
+   printf(Unrecoverable Chunks:\n);
list_for_each_entry(chunk, rc-bad_chunks, list) {
print_chunk_info(chunk,   );
bad++;
total++;
}
printf(\n);
-   printf(Total Chunks:\t%d\n, total);
-   printf(  Heathy:\t%d\n, good);
-   printf(  Bad:\t%d\n, bad);
+   printf(Total Chunks:\t\t%d\n, total);
+   printf(  Recoverable:\t\t%d\n, good);
+   printf(  Unrecoverable:\t%d\n, bad);
 
printf(\n);
printf(Orphan Block Groups:\n);
@@ -555,6 +567,7 @@ static void print_check_result(struct recover_control *rc)
printf(Orphan Device Extents:\n);
list_for_each_entry(devext, rc-devext.no_chunk_orphans, chunk_list)
print_device_extent_info(devext,   );
+   printf(\n);
 }
 
 static int check_chunk_by_metadata(struct recover_control *rc,
@@ -938,6 +951,11 @@ static int build_device_maps_by_chunk_records(struct 
recover_control *rc,
if (ret)
return ret;
}
+   list_for_each_entry(chunk, rc-rebuild_chunks, list) {
+   ret = build_device_map_by_chunk_record(root, chunk);
+   if (ret)
+   return ret;
+   }
return ret;
 }
 
@@ -1168,12 +1186,31 @@ static int __rebuild_device_items(struct 
btrfs_trans_handle *trans,
return ret;
 }
 
+static int __insert_chunk_item(struct btrfs_trans_handle *trans,
+   struct chunk_record *chunk_rec,
+   struct btrfs_root *chunk_root)
+{
+   struct btrfs_key key;
+   struct btrfs_chunk *chunk = NULL;
+   int ret = 0;
+
+   chunk = create_chunk_item(chunk_rec);
+   if (!chunk)
+   return -ENOMEM;
+   key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
+   key.type = BTRFS_CHUNK_ITEM_KEY;
+   key.offset = chunk_rec-offset;
+
+   ret = btrfs_insert_item(trans, chunk_root, key, chunk,
+   btrfs_chunk_item_size(chunk-num_stripes));
+   free(chunk);
+   return ret;
+}
+
 static int __rebuild_chunk_items(struct 

Re: Btrfs raid1 array has issues with rtorrent usage pattern.

2014-10-29 Thread Dan Merillat
It's specifically BTRFS related, I was able to reproduce it on a bare
drive (no lvm, no md, no bcache).  It's not bad RAM, I was able to
reproduce it on multiple machines running either 3.17 or late RCs.

I've tested 3.18-rc2 for about 2 hours now, can't get any failures, so
that's good.  If anyone else can reproduce this it'll probably need to
be sent to 3.17-stable.

On Wed, Oct 29, 2014 at 7:24 PM, Alec Blayne a...@tevsa.net wrote:
 Really nice to know it's already getting handled :)

 I'm already downgrading to 3.16.6 now that I know I won't have that
 issue. I was already planning to because of the read-only snapshots issue.

 Thank you and good luck debugging!

 On 29-10-2014 21:50, Dan Merillat wrote:
 I'm in the middle of debugging the exact same thing.  3.17.0 -
 rtorrent dies with SIGBUS.

 I've done some debugging, the sequence is something like this:
 open a new file
 fallocate() to the final size
 mmap() all (or a portion) of the file
 write to the region
 run SHA1 on that mmap'd region to validate the chink
 crash, eventually.  Generally not at the same point.

 Reading that file (cat  /dev/null) returns -EIO.

 Looking up the process maps, the SIGBUS appears to be happening in the
 middle of a mapped region of a pre-allocated file - I.E. it shouldn't
 be.  I'm not completely ruling out a rtorrent bug but it appears sane
 to me.

 Weirder: old files, that have been around a while, work just fine for 
 seeding.
 I've re-hashed my entire collection without an error.

 Seeing this on both inherit-COW and no-inherit-COW files, and the
 filesystem is not using compression.

 The interesting part is going back and attempting to read the files
 later they sometimes don't throw an IO error.

 Absolutely nothing in dmesg.

 Working on a testcase that triggers it reliably but no luck so far.  I
 thought I had bad RAM but two people upgrading to 3.17 and seeing the
 same bug at around the same time can't be a coincidence.  I rebooted
 to 3.17 on the 25th, the first new download was on the 28th and that
 failed.

 Working on a testcase for it that's more reproducable than go grab
 torrent files with rtorrent.

 On Tue, Oct 28, 2014 at 12:49 PM, Alec Blayne a...@tevsa.net wrote:
 Hi, it seems that when using rtorrent to download into a btrfs system,
 it leads to the creation of files that fail to read properly.
 For instance, I get rtorrent to crash, but if I try to rsync the file he
 was writting into someplace else, rsync also fails with the message
 can't map file $file: Input/Output error (5).
 If I give it time, eventually the file gets into a good state and I can
 rsync it somewhere else (as long as rtorrent doesn't keep writting into
 it). This doesn't happen using ext4 on the same system.

 No btrfs errors, or any other errors, show up in any log. Scrubbing or
 balancing don't turn up any issues. I've tried using a subvolume mounted
 with nodatacow and/or flushoncommit, which didn't help. I'm not using
 quotas and at some point had a single snapshot that I deleted. The
 filesystem was originally created recently (on a 3.16.4+ kernel).

 Here's what the array looks like:

 Label: 'data'  uuid: ffe83a3d-f4ba-46b7-8424-4ec3380cb811
 Total devices 4 FS bytes used 3.14TiB
 devid4 size 2.73TiB used 2.36TiB path /dev/sdd1
 devid5 size 1.82TiB used 1.45TiB path /dev/sdc1
 devid6 size 1.82TiB used 1.45TiB path /dev/sdb1
 devid7 size 1.82TiB used 1.45TiB path /dev/sda1

 Btrfs v3.17

 Data, RAID1: total=3.34TiB, used=3.13TiB
 System, RAID1: total=32.00MiB, used=512.00KiB
 Metadata, RAID1: total=10.00GiB, used=7.31GiB
 GlobalReserve, single: total=512.00MiB, used=0.00B


 On linux 3.17.1: Linux 3.17.1-gentoo-r1 #3 SMP PREEMPT Tue Oct 28
 02:43:11 WET 2014 x86_64 AMD Athlon(tm) 5350 APU with Radeon(tm) R3
 AuthenticAMD GNU/Linux

 I'm utterly puzzled and clueless at how to dig into this issue.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID1 fails to recover chunk tree

2014-10-29 Thread Anand Jain



 just notice your case is different from others seen/working on.
 in your the layout has issue. its not about the raid. sorry.

 try: mount -o recovery,ro



On 10/30/2014 03:32 AM, Zack Coffey wrote:


$ sudo mount -o degraded,ro /dev/sdd1 /asdf
mount: wrong fs type, bad option, bad superblock on /dev/sdd1,
missing codepage or helper program, or other error

In some cases useful info is found in syslog - try
dmesg | tail or so.
$ dmesg | tail
[524718.760792] BTRFS info (device sdd1): allowing degraded mounts
[524718.760800] BTRFS info (device sdd1): disk space caching is enabled
[524718.762087] BTRFS: failed to read chunk root on sdd1
[524718.776524] BTRFS: open_ctree failed

$ uname -a
Linux mach 3.17.1-52.g5c4d099-desktop #1 SMP PREEMPT Sat Oct 18 23:36:23
UTC 2014 (5c4d099) x86_64 x86_64 x86_64 GNU/Linux
$ btrfs --version
Btrfs v3.16.2+20141003


On 10/28/2014 11:55 PM, Anand Jain wrote:



 'mount degraded,ro'
  see if there is any non-zero non-raid1 group profile.



On 10/29/14 04:32, Zack Coffey wrote:

Revisit of a previous issue. Setup a single 640GB drive with BTRFS and
compression. This was not a system drive, just a place to put random
junk.

Made a RAID1 with another drive of just the metadata. Was in
that state for less than 12 hours-ish, removed the second drive and
now cannot get to any data on the original drive. Data remained single
while only metadata was RAID1.

Single drive btrfs was made on Ubuntu with kernel 3.13.0 and tools
3.12.

$ sudo mount -o degraded /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so

$ dmesg | tail
[45353.869448] KBD BUG in
../../../../../../../../
drivers/2d/lnx/fgl/drm/kernel/
gal.c at line:
304!
[45353.901511] KBD BUG in
../../../../../../../../
drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45353.901666] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148488] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148573] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[46241.155350] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67
devid 1 transid 60944 /dev/sdc1
[46241.155923] btrfs: allowing degraded mounts
[46241.155927] btrfs: disk space caching is enabled
[46241.159436] btrfs: failed to read chunk root on sdc1
[46241.177815] btrfs: open_ctree failed

$ btrfs-show-super /dev/sdc1
superblock: bytenr=65536, device=/dev/sdc1
--
---
csum0x93bcb1b5 [match]
bytenr  65536
flags   0x1
magic   _BHRfS_M [match]
fsidbd78815a-802b-43e2-8387-fc6ab4237d67
label
generation  60944
root909586694144
sys_array_size  97
chunk_root_generation   60938
root_level  1
chunk_root  911673917440
chunk_root_level1
log_root0
log_root_transid0
log_root_level  0
total_bytes 1115871535104
bytes_used  321833435136
sectorsize  4096
nodesize4096
leafsize4096
stripesize  4096
root_dir6
num_devices 2
compat_flags0x0
compat_ro_flags 0x0
incompat_flags  0x9
csum_type   0
csum_size   4
cache_generation60944
uuid_tree_generation60944
dev_item.uuid   d82b2027-17b6-4513-a86d-9227a42d7ed1
dev_item.fsid   bd78815a-802b-43e2-8387-fc6ab4237d67 [match]
dev_item.type   0
dev_item.total_bytes615763673088
dev_item.bytes_used 324270030848
dev_item.io_align   4096
dev_item.io_width   4096
dev_item.sector_size4096
dev_item.devid  1
dev_item.dev_group  0
dev_item.seek_speed 0
dev_item.bandwidth  0
dev_item.generation 0


$ sudo btrfs device add -f /dev/sdh1 /dev/sdc1
ERROR: error adding the device '/dev/sdh1' - Inappropriate ioctl for
device

$ sudo btrfs device delete missing /dev/sdc1
ERROR: error removing the device 'missing' - Inappropriate ioctl for
device

$ sudo mount -o degraded,defaults,compress=lzo /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail  or so

$ dmesg | tail
[106991.655384] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[106991.665066] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.954397] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.962009] btrfs: device fsid

Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation.

2014-10-29 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to 
reduce ENOSPC caused by unbalanced data/metadata allocation.

From: Qu Wenruo quwen...@cn.fujitsu.com
To: bo.li@oracle.com
Date: 2014年10月30日 08:58


 Original Message 
Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm 
to reduce ENOSPC caused by unbalanced data/metadata allocation.

From: Liu Bo bo.li@oracle.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2014年10月29日 22:29

On Mon, Oct 27, 2014 at 04:36:26PM +0800, Qu Wenruo wrote:

 Original Message 
Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm
to reduce ENOSPC caused by unbalanced data/metadata allocation.
From: Liu Bo bo.li@oracle.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2014年10月27日 16:14

On Mon, Oct 27, 2014 at 08:18:12AM +0800, Qu Wenruo wrote:

 Original Message 
Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm
to reduce ENOSPC caused by unbalanced data/metadata allocation.
From: Liu Bo bo.li@oracle.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2014年10月24日 19:06

On Thu, Oct 23, 2014 at 10:37:51AM +0800, Qu Wenruo wrote:
When btrfs allocate a chunk, it will try to alloc up to 1G for 
data and
256M for metadata, or 10% of all the writeable space if there is 
enough

10G for data,
 if (type  BTRFS_BLOCK_GROUP_DATA) {
 max_stripe_size = 1024 * 1024 * 1024;
 max_chunk_size = 10 * max_stripe_size;

Oh, sorry, 10G is right.

Any other comments?

Thanks,
Qu



...

thanks,
-liubo


space for the stripe on device.

However, when we run out of space, this allocation may cause 
unbalanced

chunk allocation.
For example, there are only 1G unallocated space, and request for
allocate DATA chunk is sent, and all the space will be allocated 
as data
chunk, making later metadata chunk alloc request unable to 
handle, which

will cause ENOSPC.
This is the one of the common complains from end users about why 
ENOSPC

happens but there is still available space.
Okay, I don't think this is the common case, AFAIK, the most ENOSPC 
is caused

by our runtime worst case metadata reservation problem.

btrfs has been inclined to create a fairly large metadata chunk 
(1G) in its

initial mkfs stage and 256M metadata chunk is also a very large one.

As of your below example, yes, we don't have space for metadata
allocation, but do we really need to allocate a new one?

Or am I missing something?

thanks,
-liubo

Yes that's true this is not the common cause, but at least this
patch may make the percentage
of 'df' command reach as close to 100% as possible before hitting
ENOSPC under normal operations.
(If not using balance)

And some case like the following mail may be improved by the patch:
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36097.html

I understand that most of the cases that a lot of free data space
and no metadata space is caused by
create and then delete large files, but if the last giga bytes can
be allocated more carefully,
at least the available bytes of 'df'  command should be reduced
before hit ENOSPC.

How do you think about it?

Sorry for the late reply.

I just notice that a recent commit has fixed this problem.

commit 47ab2a6c689913db23ccae38349714edf8365e0a
Author: Josef Bacik jba...@fb.com
Date:   Thu Sep 18 11:20:02 2014 -0400

 Btrfs: remove empty block groups automatically
 thanks,
-liubo

Oh, that's much better than my patch.

So please ignore my patch.

Thanks,
Qu

Wait a second,
that's true block group auto-reclaim can deal with some cases,
but it will not improve the vanilla 'df' used percentage before hit ENOSPC.

The old 10%/10G will still hit the ENOSPC below 90% used space if using 
100G disk.

This patch should improve it to above 95% or even above 99%.

The old behavior may leave a bad image on normal users that btrfs can't 
use space effectively.


So I still consider the patch has positive effect on btrfs.

Thanks,
Qu



Thanks,
Qu
This patch will try not to alloc chunk which is more than half 
of the
unallocated space, making the last space more balanced at a 
small cost

of more fragmented chunk at the last 1G.

Some easy example:
Preallocate 17.5G on a 20G empty btrfs fs:
[Before]
  # btrfs fi show /mnt/test
Label: none  uuid: da8741b1-5d47-4245-9e94-bfccea34e91e
Total devices 1 FS bytes used 17.50GiB
devid1 size 20.00GiB used 20.00GiB path /dev/sdb
All space is allocated. No space later metadata space.

[After]
  # btrfs fi show /mnt/test
Label: none  uuid: e6935aeb-a232-4140-84f9-80aab1f23d56
Total devices 1 FS bytes used 17.50GiB
devid1 size 20.00GiB used 19.77GiB path /dev/sdb
About 230M is still available for later metadata allocation.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
  fs/btrfs/volumes.c | 18 ++
  1 file changed, 18 insertions(+)

diff --git 

Re: read block failed check_tree_block / Couldn't read chunk tree

2014-10-29 Thread Anand Jain




 yes that's the one.

 btrfs-progs: fix device missing of btrfs fi show with seed devices

Thanks


On 10/29/2014 08:15 PM, Rene Thomas wrote:

Can't find commit in official repos

Get fatal: bad object 915902c5002485fb13d27c4b699a73fb66cc0f09 from git show

Found

commit 2513077f2f830b4bc83d528bfb6979eb461918bd

btrfs-progs: fix device missing of btrfs fi show with seed devices


Thanks
René

2014-10-29 4:45 GMT+01:00 Anand Jain anand.j...@oracle.com:


this is (most likely) due to patch below,

commit 915902c5002485fb13d27c4b699a73fb66cc0f09

 btrfs-progs: fix device missing of btrfs fi show with seed devices


  Could you try to back out the patch from progs and give it a shot ?
  and pls report what you see. Thanks.





On 10/25/14 00:43, Rene Thomas wrote:


   # btrfs --version
Btrfs v3.17

   # btrfs fi show
Label: 'mythstorage'  uuid: 9b454272-6800-4b3c-b196-9e180407a6cb
  Total devices 1 FS bytes used 2.36MiB
  devid1 size 931.51GiB used 10.04GiB path /dev/sdd1

   Check tree block failed, want=5845480062976, have=0
Check tree block failed, want=5845480062976, have=0
Check tree block failed, want=5845480062976, have=65536
Check tree block failed, want=5845480062976, have=0
Check tree block failed, want=5845480062976, have=0
read block failed check_tree_block
Couldn't read chunk tree

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] revert btrfs-progs: do a separate probe for _transient_ replacing device

2014-10-29 Thread Anand Jain
There is a compatibility issue with older kernel with the progs commit id as 
below.

d0588bfa479409b2a0f6243f894338a01a56221a
btrfs-progs: do a separate probe for _transient_ replacing device

So as of now writing to revert the above commit id.
The brewing sysfs interface would help to fix the impending issue, which is
seed device would fail show in 'btrfs fi show' output of a sprout device.

Signed-off-by: Anand Jain anand.j...@oracle.com
---
 v2: update commit with correct commit which this patch will revert

 utils.c | 19 +--
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/utils.c b/utils.c
index a8691fe..1d1cc77 100644
--- a/utils.c
+++ b/utils.c
@@ -1869,29 +1869,12 @@ int get_fs_info(char *path, struct 
btrfs_ioctl_fs_info_args *fi_args,
if (!fi_args-num_devices)
goto out;
 
-   /*
-* with kernel patch
-* btrfs: ioctl BTRFS_IOC_FS_INFO and BTRFS_IOC_DEV_INFO miss-matched 
with slots
-* the kernel now returns total_devices which does not include
-* replacing device if running.
-* As we need to get dev info of the replace device if it is running,
-* so just add one to fi_args-num_devices.
-*/
-
-   di_args = *di_ret = malloc((fi_args-num_devices + 1) * 
sizeof(*di_args));
+   di_args = *di_ret = malloc((fi_args-num_devices) * sizeof(*di_args));
if (!di_args) {
ret = -errno;
goto out;
}
 
-   /* get the replace target device if it is there */
-   ret = get_device_info(fd, i, di_args[ndevs]);
-   if (!ret) {
-   ndevs++;
-   fi_args-num_devices++;
-   }
-   i++;
-
for (; i = fi_args-max_id; ++i) {
BUG_ON(ndevs = fi_args-num_devices);
ret = get_device_info(fd, i, di_args[ndevs]);
-- 
2.0.0.153.g79d

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] revert btrfs-progs: do a separate probe for _transient_ replacing device

2014-10-29 Thread Anand Jain




 my ws commit ids have changed may be when I was to trying nail down an
 issue some time back. Thanks. V2. is out.


On 10/29/2014 07:41 PM, Gui Hecheng wrote:

On Wed, 2014-10-29 at 18:51 +0800, Anand Jain wrote:

There is a compatibility issue with older kernel with the progs commit id as 
below.

05cd2907557ba627cfb86e60b214ea6228613a84


Which tree does this commit id belongs to?
I can't find it anywhere?


So as of now writing to revert the above commit id.
The brewing sysfs interface would help to fix the impending issue, which is
seed device would fail show in 'btrfs fi show' output of a sprout device.

Signed-off-by: Anand Jain anand.j...@oracle.com
---
  utils.c | 19 +--
  1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/utils.c b/utils.c
index a8691fe..1d1cc77 100644
--- a/utils.c
+++ b/utils.c
@@ -1869,29 +1869,12 @@ int get_fs_info(char *path, struct 
btrfs_ioctl_fs_info_args *fi_args,
if (!fi_args-num_devices)
goto out;

-   /*
-* with kernel patch
-* btrfs: ioctl BTRFS_IOC_FS_INFO and BTRFS_IOC_DEV_INFO miss-matched 
with slots
-* the kernel now returns total_devices which does not include
-* replacing device if running.
-* As we need to get dev info of the replace device if it is running,
-* so just add one to fi_args-num_devices.
-*/
-
-   di_args = *di_ret = malloc((fi_args-num_devices + 1) * 
sizeof(*di_args));
+   di_args = *di_ret = malloc((fi_args-num_devices) * sizeof(*di_args));
if (!di_args) {
ret = -errno;
goto out;
}

-   /* get the replace target device if it is there */
-   ret = get_device_info(fd, i, di_args[ndevs]);
-   if (!ret) {
-   ndevs++;
-   fi_args-num_devices++;
-   }
-   i++;
-
for (; i = fi_args-max_id; ++i) {
BUG_ON(ndevs = fi_args-num_devices);
ret = get_device_info(fd, i, di_args[ndevs]);




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html