Re: [PATCH 2/2] fstests: btrfs/006: Fix false alert due to output change

2016-12-29 Thread Eryu Guan
On Thu, Dec 29, 2016 at 11:14:10PM -0500, Su Yue wrote:
> Btrfs-progs v4.9 changed "device status" output by adding one more
> space, which differs from golden output.
> 
> Fix it by introducing new filter to convert multi space into one.
> 
> Signed-off-by: Su Yue 
> ---
>  common/filter   |  6 ++
>  tests/btrfs/006 | 16 
>  tests/btrfs/006.out | 24 
>  3 files changed, 30 insertions(+), 16 deletions(-)
> 
> diff --git a/common/filter b/common/filter
> index 397b456..4d5e4d0 100644
> --- a/common/filter
> +++ b/common/filter
> @@ -401,5 +401,11 @@ _filter_mknod()
>   sed -e "s/mknod: [\`']\(.*\)': File exists/mknod: \1: File exists/"
>  }
>  
> +# Filter spaces into one
> +_filter_spaces()
> +{
> + sed -e "s/\s\+/ /g"
> +}
> +

There's already one such filter with the same name in common/filter,
does this existing helper work for you?

>  # make sure this script returns success
>  /bin/true
> diff --git a/tests/btrfs/006 b/tests/btrfs/006
> index 0863394..dc9bfb9 100755
> --- a/tests/btrfs/006
> +++ b/tests/btrfs/006
> @@ -82,13 +82,21 @@ echo "== Sync filesystem"
>  $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT > /dev/null
>  
>  echo "== Show device stats by mountpoint"
> -$BTRFS_UTIL_PROG device stats $SCRATCH_MNT | _filter_btrfs_device_stats 
> $TOTAL_DEVS
> +$BTRFS_UTIL_PROG device stats $SCRATCH_MNT | \
> + _filter_btrfs_device_stats $TOTAL_DEVS | \

Spaces before tab in above line.

Thanks,
Eryu

> + _filter_spaces
>  echo "== Show device stats by first/scratch dev"
> -$BTRFS_UTIL_PROG device stats $SCRATCH_DEV | _filter_btrfs_device_stats
> +$BTRFS_UTIL_PROG device stats $SCRATCH_DEV | \
> + _filter_btrfs_device_stats | \
> + _filter_spaces
>  echo "== Show device stats by second dev"
> -$BTRFS_UTIL_PROG device stats $FIRST_POOL_DEV | sed -e 
> "s,$FIRST_POOL_DEV,FIRST_POOL_DEV,g"
> +$BTRFS_UTIL_PROG device stats $FIRST_POOL_DEV | \
> + sed -e "s,$FIRST_POOL_DEV,FIRST_POOL_DEV,g" | \
> + _filter_spaces
>  echo "== Show device stats by last dev"
> -$BTRFS_UTIL_PROG device stats $LAST_POOL_DEV | sed -e 
> "s,$LAST_POOL_DEV,LAST_POOL_DEV,g"
> +$BTRFS_UTIL_PROG device stats $LAST_POOL_DEV | \
> + sed -e "s,$LAST_POOL_DEV,LAST_POOL_DEV,g" | \
> + _filter_spaces
>  
>  # success, all done
>  status=0
> diff --git a/tests/btrfs/006.out b/tests/btrfs/006.out
> index 05b9ac0..a976972 100644
> --- a/tests/btrfs/006.out
> +++ b/tests/btrfs/006.out
> @@ -16,25 +16,25 @@ Label: 'TestLabel.006'  uuid: 
>  == Sync filesystem
>  == Show device stats by mountpoint
>   [SCRATCH_DEV].corruption_errs 
> - [SCRATCH_DEV].flush_io_errs   
> + [SCRATCH_DEV].flush_io_errs 
>   [SCRATCH_DEV].generation_errs 
> - [SCRATCH_DEV].read_io_errs
> - [SCRATCH_DEV].write_io_errs   
> + [SCRATCH_DEV].read_io_errs 
> + [SCRATCH_DEV].write_io_errs 
>  == Show device stats by first/scratch dev
>  [SCRATCH_DEV].corruption_errs 
> -[SCRATCH_DEV].flush_io_errs   
> +[SCRATCH_DEV].flush_io_errs 
>  [SCRATCH_DEV].generation_errs 
> -[SCRATCH_DEV].read_io_errs
> -[SCRATCH_DEV].write_io_errs   
> +[SCRATCH_DEV].read_io_errs 
> +[SCRATCH_DEV].write_io_errs 
>  == Show device stats by second dev
> -[FIRST_POOL_DEV].write_io_errs   0
> -[FIRST_POOL_DEV].read_io_errs0
> -[FIRST_POOL_DEV].flush_io_errs   0
> +[FIRST_POOL_DEV].write_io_errs 0
> +[FIRST_POOL_DEV].read_io_errs 0
> +[FIRST_POOL_DEV].flush_io_errs 0
>  [FIRST_POOL_DEV].corruption_errs 0
>  [FIRST_POOL_DEV].generation_errs 0
>  == Show device stats by last dev
> -[LAST_POOL_DEV].write_io_errs   0
> -[LAST_POOL_DEV].read_io_errs0
> -[LAST_POOL_DEV].flush_io_errs   0
> +[LAST_POOL_DEV].write_io_errs 0
> +[LAST_POOL_DEV].read_io_errs 0
> +[LAST_POOL_DEV].flush_io_errs 0
>  [LAST_POOL_DEV].corruption_errs 0
>  [LAST_POOL_DEV].generation_errs 0
> -- 
> 2.9.3
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] fstests: btrfs/104: Redirect mkfs output to avoid false alert

2016-12-29 Thread Su Yue
btrfs/104 doesn't redirect mkfs output correctly, which leads to false
alert.

Fix it.

Signed-off-by: Su Yue 
---
 tests/btrfs/104 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/btrfs/104 b/tests/btrfs/104
index e6a6d3b..c8be4dd 100755
--- a/tests/btrfs/104
+++ b/tests/btrfs/104
@@ -107,7 +107,7 @@ _explode_fs_tree () {
 
 # Force the default leaf size as the calculations for making our btree
 # heights are based on that.
-_scratch_mkfs "--nodesize 16384"
+_scratch_mkfs "--nodesize 16384" >> $seqres.full 2>&1
 _scratch_mount
 
 # populate the default subvolume and create a snapshot ('snap1')
-- 
2.9.3



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] fstests: btrfs/006: Fix false alert due to output change

2016-12-29 Thread Su Yue
Btrfs-progs v4.9 changed "device status" output by adding one more
space, which differs from golden output.

Fix it by introducing new filter to convert multi space into one.

Signed-off-by: Su Yue 
---
 common/filter   |  6 ++
 tests/btrfs/006 | 16 
 tests/btrfs/006.out | 24 
 3 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/common/filter b/common/filter
index 397b456..4d5e4d0 100644
--- a/common/filter
+++ b/common/filter
@@ -401,5 +401,11 @@ _filter_mknod()
sed -e "s/mknod: [\`']\(.*\)': File exists/mknod: \1: File exists/"
 }
 
+# Filter spaces into one
+_filter_spaces()
+{
+   sed -e "s/\s\+/ /g"
+}
+
 # make sure this script returns success
 /bin/true
diff --git a/tests/btrfs/006 b/tests/btrfs/006
index 0863394..dc9bfb9 100755
--- a/tests/btrfs/006
+++ b/tests/btrfs/006
@@ -82,13 +82,21 @@ echo "== Sync filesystem"
 $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT > /dev/null
 
 echo "== Show device stats by mountpoint"
-$BTRFS_UTIL_PROG device stats $SCRATCH_MNT | _filter_btrfs_device_stats 
$TOTAL_DEVS
+$BTRFS_UTIL_PROG device stats $SCRATCH_MNT | \
+   _filter_btrfs_device_stats $TOTAL_DEVS | \
+   _filter_spaces
 echo "== Show device stats by first/scratch dev"
-$BTRFS_UTIL_PROG device stats $SCRATCH_DEV | _filter_btrfs_device_stats
+$BTRFS_UTIL_PROG device stats $SCRATCH_DEV | \
+   _filter_btrfs_device_stats | \
+   _filter_spaces
 echo "== Show device stats by second dev"
-$BTRFS_UTIL_PROG device stats $FIRST_POOL_DEV | sed -e 
"s,$FIRST_POOL_DEV,FIRST_POOL_DEV,g"
+$BTRFS_UTIL_PROG device stats $FIRST_POOL_DEV | \
+   sed -e "s,$FIRST_POOL_DEV,FIRST_POOL_DEV,g" | \
+   _filter_spaces
 echo "== Show device stats by last dev"
-$BTRFS_UTIL_PROG device stats $LAST_POOL_DEV | sed -e 
"s,$LAST_POOL_DEV,LAST_POOL_DEV,g"
+$BTRFS_UTIL_PROG device stats $LAST_POOL_DEV | \
+   sed -e "s,$LAST_POOL_DEV,LAST_POOL_DEV,g" | \
+   _filter_spaces
 
 # success, all done
 status=0
diff --git a/tests/btrfs/006.out b/tests/btrfs/006.out
index 05b9ac0..a976972 100644
--- a/tests/btrfs/006.out
+++ b/tests/btrfs/006.out
@@ -16,25 +16,25 @@ Label: 'TestLabel.006'  uuid: 
 == Sync filesystem
 == Show device stats by mountpoint
  [SCRATCH_DEV].corruption_errs 
- [SCRATCH_DEV].flush_io_errs   
+ [SCRATCH_DEV].flush_io_errs 
  [SCRATCH_DEV].generation_errs 
- [SCRATCH_DEV].read_io_errs
- [SCRATCH_DEV].write_io_errs   
+ [SCRATCH_DEV].read_io_errs 
+ [SCRATCH_DEV].write_io_errs 
 == Show device stats by first/scratch dev
 [SCRATCH_DEV].corruption_errs 
-[SCRATCH_DEV].flush_io_errs   
+[SCRATCH_DEV].flush_io_errs 
 [SCRATCH_DEV].generation_errs 
-[SCRATCH_DEV].read_io_errs
-[SCRATCH_DEV].write_io_errs   
+[SCRATCH_DEV].read_io_errs 
+[SCRATCH_DEV].write_io_errs 
 == Show device stats by second dev
-[FIRST_POOL_DEV].write_io_errs   0
-[FIRST_POOL_DEV].read_io_errs0
-[FIRST_POOL_DEV].flush_io_errs   0
+[FIRST_POOL_DEV].write_io_errs 0
+[FIRST_POOL_DEV].read_io_errs 0
+[FIRST_POOL_DEV].flush_io_errs 0
 [FIRST_POOL_DEV].corruption_errs 0
 [FIRST_POOL_DEV].generation_errs 0
 == Show device stats by last dev
-[LAST_POOL_DEV].write_io_errs   0
-[LAST_POOL_DEV].read_io_errs0
-[LAST_POOL_DEV].flush_io_errs   0
+[LAST_POOL_DEV].write_io_errs 0
+[LAST_POOL_DEV].read_io_errs 0
+[LAST_POOL_DEV].flush_io_errs 0
 [LAST_POOL_DEV].corruption_errs 0
 [LAST_POOL_DEV].generation_errs 0
-- 
2.9.3



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)

2016-12-29 Thread Minchan Kim
On Thu, Dec 29, 2016 at 10:04:32AM +0100, Michal Hocko wrote:
> On Thu 29-12-16 10:20:26, Minchan Kim wrote:
> > On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote:
> > > Hi,
> > > could you try to run with the following patch on top of the previous
> > > one? I do not think it will make a large change in your workload but
> > > I think we need something like that so some testing under which is known
> > > to make a high lowmem pressure would be really appreciated. If you have
> > > more time to play with it then running with and without the patch with
> > > mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us
> > > whether it make any difference at all.
> > > 
> > > I would also appreciate if Mel and Johannes had a look at it. I am not
> > > yet sure whether we need the same thing for anon/file balancing in
> > > get_scan_count. I suspect we need but need to think more about that.
> > > 
> > > Thanks a lot again!
> > > ---
> > > From b51f50340fe9e40b68be198b012f8ab9869c1850 Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko 
> > > Date: Tue, 27 Dec 2016 16:28:44 +0100
> > > Subject: [PATCH] mm, vmscan: consider eligible zones in get_scan_count
> > > 
> > > get_scan_count considers the whole node LRU size when
> > > - doing SCAN_FILE due to many page cache inactive pages
> > > - calculating the number of pages to scan
> > > 
> > > in both cases this might lead to unexpected behavior especially on 32b
> > > systems where we can expect lowmem memory pressure very often.
> > > 
> > > A large highmem zone can easily distort SCAN_FILE heuristic because
> > > there might be only few file pages from the eligible zones on the node
> > > lru and we would still enforce file lru scanning which can lead to
> > > trashing while we could still scan anonymous pages.
> > 
> > Nit:
> > It doesn't make thrashing because isolate_lru_pages filter out them
> > but I agree it makes pointless CPU burning to find eligible pages.
> 
> This is not about isolate_lru_pages. The trashing could happen if we had
> lowmem pagecache user which would constantly reclaim recently faulted
> in pages while there is anonymous memory in the lowmem which could be
> reclaimed instead.
>  
> [...]
> > >  /*
> > > + * Return the number of pages on the given lru which are eligibne for the
> > eligible
> 
> fixed
> 
> > > + * given zone_idx
> > > + */
> > > +static unsigned long lruvec_lru_size_zone_idx(struct lruvec *lruvec,
> > > + enum lru_list lru, int zone_idx)
> > 
> > Nit:
> > 
> > Although there is a comment, function name is rather confusing when I 
> > compared
> > it with lruvec_zone_lru_size.
> 
> I am all for a better name.
> 
> > lruvec_eligible_zones_lru_size is better?
> 
> this would be too easy to confuse with lruvec_eligible_zone_lru_size.
> What about lruvec_lru_size_eligible_zones?

Don't mind.

>  
> > Nit:
> > 
> > With this patch, inactive_list_is_low can use lruvec_lru_size_zone_idx 
> > rather than
> > own custom calculation to filter out non-eligible pages. 
> 
> Yes, that would be possible and I was considering that. But then I found
> useful to see total and reduced numbers in the tracepoint
> http://lkml.kernel.org/r/20161228153032.10821-8-mho...@kernel.org
> and didn't want to call lruvec_lru_size 2 times. But if you insist then
> I can just do that.

I don't mind either but I think we need to describe the reason if you want to
go with your open-coded version. Otherwise, someone will try to fix it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can't add/replace a device on degraded filesystem

2016-12-29 Thread Rich Gannon
Well I certainly got myself into a pickle.  Been a Btrfs user since 2008 
and this is the first time I've had a serious problemand I got two 
on the same day (I'm separating them in a different emails).


I had 4x 4TB harddrives in a d=single m=raid1 array for about a year now 
containing many media files I really want to save.  Yesterday I removed 
them from my desktop, installed them into a "new-to-me" Supermicro 2U 
server and even swapped over my HighPoint MegaRAID 2720 SAS HBA (yes, 
it's acting as a direct pass-thruHBA only).  With the added space, I 
also installed an additional 4TB drive to the filesystem and was 
performing a rebalance with filters:


btrfs balance start -dconvert=raid10 -mconvert=raid10 /mnt/bpool-btrfs

I found that the new drive dropped off-line during the rebalance.  I 
swapped the drive into a different bay to see if it was backplane,cord, 
or drive related.  Upon remount, the same drive dropped offline.  I had 
another new 4TB drive and swapped it in for the dead drive.


I can mount my filesystem with -o degraded, but I can not do btrfs 
replace or btrfs device add as the filesystem is in read-only mode, and 
I can not mount read-write.


From my understanding, my data should all be safe as during the 
balance, no single-copy files should have made it onto the new drive 
(that subsequently failed).  Is this a correct assumption?


Here is some btrfs data:
proton bpool-btrfs # btrfs fi df /mnt/bpool-btrfs/
Data, RAID10: total=2.17TiB, used=1.04TiB
Data, single: total=7.79TiB, used=7.59TiB
System, RAID1: total=32.00MiB, used=1.08MiB
Metadata, RAID10: total=1.00GiB, used=1023.88MiB
Metadata, RAID1: total=10.00GiB, used=8.24GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
proton bpool-btrfs # btrfs fi sh /mnt/bpool-btrfs/
Label: 'bigpool'  uuid: 85e8b0dd-fbbd-48a2-abc4-ccaefa5e8d18
Total devices 5 FS bytes used 8.64TiB
devid5 size 3.64TiB used 2.77TiB path /dev/mapper/bpool-3
devid6 size 3.64TiB used 2.77TiB path /dev/mapper/bpool-4
devid7 size 3.64TiB used 2.77TiB path /dev/mapper/bpool-1
devid8 size 3.64TiB used 2.77TiB path /dev/mapper/bpool-2
*** Some devices missing


NOTE: The drives are all fully encrypted with LUKS/dm_crypt.

Please help me save the data :)

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL REQUEST FOR NEXT PATCH 00/26] Patches from Fujitsu for next version

2016-12-29 Thread Qu Wenruo
Hi, please fetch the following branch for next branch:
https://github.com/adam900710/linux.git fujitsu_for_next

This branch contains most of Fujitsu unmerged patches for for-next branch.

Latest David's for-next-20161219 branch can cause kernel panic so I don't
use it as base, but v4.10-rc1.

I ran the full xfstests test cases, with both default mount options and
compress=lzo options.
Fixes are working and no new regression.

So I thing it should be good enough to act as a base for for-next branch.

This big patch set contains the following fixes:

1) Wang's fixes for compression ENOSPC
2) Qgroup reserved space fix and WARN_ON
3) Inband dedupe work

I think inband dedupe is stable enough for for-next branch, so I included them
at the tail of the patchset.

RAID56 patches will follow soon, as I am still digging the RAID56 scrub/replace
race, and I hope to submit them as a RAID56 patchset.

Qu Wenruo (14):
  btrfs: Add WARN_ON for qgroup reserved underflow
  btrfs: qgroup: Add trace point for qgroup reserved space
  btrfs: qgroup: Re-arrange tracepoint timing to co-operate with
reserved space tracepoint
  btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount
option
  btrfs: qgroup: Add quick exit for non-fs extents
  btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function
  btrfs: qgroup: Return actually freed bytes for qgroup release or free
data
  btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered
write and quota enable
  btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  btrfs: qgroup: Fix qgroup reserved space underflow by only freeing
reserved ranges
  btrfs: delayed-ref: Add support for increasing data ref under spinlock
  btrfs: dedupe: Inband in-memory only de-duplication implement
  btrfs: relocation: Enhance error handling to avoid BUG_ON
  btrfs: dedupe: Introduce new reconfigure ioctl

Wang Xiaoguang (12):
  btrfs: improve inode's outstanding_extents computation
  btrfs: introduce type based delalloc metadata reserve
  btrfs: Introduce COMPRESS reserve type to fix false enospc for
compression
  btrfs: dedupe: Introduce dedupe framework and its header
  btrfs: dedupe: Introduce function to initialize dedupe info
  btrfs: dedupe: Introduce function to add hash into in-memory tree
  btrfs: dedupe: Introduce function to remove hash from in-memory tree
  btrfs: dedupe: Introduce function to search for an existing hash
  btrfs: dedupe: Implement btrfs_dedupe_calc_hash interface
  btrfs: ordered-extent: Add support for dedupe
  btrfs: Introduce DEDUPE reserve type to fix false enospc for in-band
dedupe
  btrfs: dedupe: Add ioctl for inband dedupelication

 fs/btrfs/Makefile|   2 +-
 fs/btrfs/ctree.h |  53 ++-
 fs/btrfs/dedupe.c| 820 +++
 fs/btrfs/dedupe.h| 184 +-
 fs/btrfs/delayed-ref.c   |  30 +-
 fs/btrfs/delayed-ref.h   |   8 +
 fs/btrfs/disk-io.c   |   4 +
 fs/btrfs/extent-tree.c   |  99 --
 fs/btrfs/extent_io.c |  62 +++-
 fs/btrfs/extent_io.h |  30 +-
 fs/btrfs/file.c  |  70 ++--
 fs/btrfs/free-space-cache.c  |   6 +-
 fs/btrfs/inode-map.c |   8 +-
 fs/btrfs/inode.c | 568 +-
 fs/btrfs/ioctl.c | 106 +-
 fs/btrfs/ordered-data.c  |  46 ++-
 fs/btrfs/ordered-data.h  |  13 +
 fs/btrfs/qgroup.c| 223 +---
 fs/btrfs/qgroup.h|  14 +-
 fs/btrfs/relocation.c|  65 +++-
 fs/btrfs/sysfs.c |   2 +
 fs/btrfs/tests/inode-tests.c |  15 +-
 fs/btrfs/transaction.c   |  20 +-
 include/trace/events/btrfs.h |  43 +++
 include/uapi/linux/btrfs.h   |  55 +++
 25 files changed, 2270 insertions(+), 276 deletions(-)
 create mode 100644 fs/btrfs/dedupe.c

-- 
2.11.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs send does not preserve reflinked files within subvolumes.

2016-12-29 Thread Qu Wenruo

Hi Glenn,

At 12/29/2016 06:20 PM, Glenn Washburn wrote:

Hi Qu,

Thanks for your response.  I've attached the send file as you have
requested. The "-2" one is created from the same script modified to
create a snapshot of C called C.snap instead of setting C to readonly,
and sending C.snap instead of C.  The second version is to mirror what
you seemed to do in your reply.  Still no reflinking.  I've included
the original script and log again, which was modified to show more
verbose logging on the receive.


Confirmed it's kernel bug.
Send stream doesn't contain the clone operation.

So it's kernel send to blame.



As I was using btrgs-progs v4.4, I've compiled v4.9 to get the nice
--dump option for receive (which you seem to have added, kudos!).  The
output of the dump option is in the *.recv-dump.log files.  However, I
see no clone command in the output.

Keep in mind that I'm running Ubuntu kernel 4.4.0-53-generic, which I
believe to be mostly v4.4.30 with a few patches.  Any ideas when send
started using "clone" in the kernel?


Not pretty sure about from which exact kernel version send can detect 
reflink.


AFAIC I sent an RFC patch to disable reflink detection in send in July 
2016, and it just got rejected.


So I assume at least v4.6 kernel works.

Thanks,
Qu



Thanks,
Glenn

On Thu, 29 Dec 2016 13:55:46 +0800
Qu Wenruo  wrote:


Hi,

I tried just what you did, and use "btrfs receive --dump" to exam the
send stream.

And things works quite well:

$ sudo mount /dev/sda6  /mnt/btrfs/
$ sudo btrfs subvolume create /mnt/btrfs/subv1
$ sudo xfs_io -f -c "pwrite 0 2M" /mnt/btrfs/subv1/file1
$ sudo xfs_io -f -c "reflink /mnt/btrfs/subv1/file1 0 0 2M"
/mnt/btrfs/subv1/file1.ref
$ sudo btrfs subv snap -r /mnt/btrfs/subv1/ /mnt/btrfs/ro_snap
$ sudo btrfs send /mnt/btrfs/ro_snap/ > /tmp/output
$ sudo btrfs receive --dump < /tmp/output

And the output shows like this:
subvol  ./ro_snap
uuid=e788bb6e-8ec6-dd47-a452-e26196d22699 transid=9
chown   ./ro_snap/  gid=0 uid=0
chmod   ./ro_snap/  mode=755
utimes  ./ro_snap/
atime=2016-12-29T13:46:00+0800 mtime=2016-12-29T13:46:50+0800
ctime=2016-12-29T13:46:50+0800
mkfile  ./ro_snap/o257-8-0rename  ./ro_snap/o257-8-0
  dest=./ro_snap/file1
utimes  ./ro_snap/
atime=2016-12-29T13:46:00+0800 mtime=2016-12-29T13:46:50+0800
ctime=2016-12-29T13:46:50+0800
write   ./ro_snap/file1 offset=0 len=49152
write   ./ro_snap/file1 offset=49152 len=49152

write   ./ro_snap/file1 offset=2064384
len=32768 truncate./ro_snap/file1 size=2097152
chown   ./ro_snap/file1 gid=0 uid=0
chmod   ./ro_snap/file1 mode=600
utimes  ./ro_snap/file1
atime=2016-12-29T13:46:24+0800 mtime=2016-12-29T13:46:24+0800
ctime=2016-12-29T13:46:24+0800
mkfile  ./ro_snap/o258-8-0rename  ./ro_snap/o258-8-0
  dest=./ro_snap/file1.ref
utimes  ./ro_snap/
atime=2016-12-29T13:46:00+0800 mtime=2016-12-29T13:46:50+0800
ctime=2016-12-29T13:46:50+0800
clone   ./ro_snap/file1.ref offset=0 len=2097152
from=./ro_snap/file1 clone_offset=0
^^^ Here is the clone operation

truncate./ro_snap/file1.ref size=2097152
chown   ./ro_snap/file1.ref gid=0 uid=0
chmod   ./ro_snap/file1.ref mode=600
utimes  ./ro_snap/file1.ref
atime=2016-12-29T13:46:50+0800 mtime=2016-12-29T13:47:07+0800
ctime=2016-12-29T13:47:07+0800

And in fact, btrfs send can even handle reflink to parent subvolume.
(Although this behavior can be deadly for heavily reflinked files)


So, would you please upload the send stream for us to check?

Thanks,
Qu

At 12/29/2016 10:44 AM, Glenn Washburn wrote:

I'm having a hard time getting btrfs receive to create reflinked
files and have a trivial example that I believe *should* work but
doesn't. I've attached a script that I used to perform this test,
so others can try to reproduce.  The text file is the output of the
shell script except the last command, which is a tool I wrote to
print the extent info from FIEMAP.  "btrfs fi du" would work just
as well, but I'm on Ubuntu 16.04, whose btrfs progs doesn't have
that command yet.  I've also tested on Ubuntu 16.10 with similar
results, except that "btrfs fi du" is on that version and confirms
what my tool displays.

So, can send not do what I'm trying to get it to do? If it can now,
when did that feature get introduced (must have been after kernel
4.8)?  I'm very surprised that this feature wouldn't have already
been done and if not that no one seems to be complaining about it.
I've done a decent amount of searching on this and have come up with
nothing.  Any help would be greatly appreciated.

Thanks,
Glenn











--
To unsubscribe from this list: send the line "unsubscribe 

error (device dm-4) in __btrfs_free_extent:6958: errno=-5 IO failure

2016-12-29 Thread Rich Gannon
As if I wasn't scared enough that my data is in trouble on my larger, 
primary Btrfs filesystem, my backup filesystem, also hit some trouble 
today, likely due to a kernel panic freezing the system and a subsequent 
forced reset during a device remove.


I was running `btrfs dev remove /dev/mapper/backup-1 /mnt/backup-btrfs` 
on my "new-to-me" SuperMicro 2U server.  It has both my primary and 
backup filesystems on it (until the second 2U unit arrives next week).


As mentioned above, I had to forcefully reset the server due to a kernel 
panic for an unknown reason (perhaps due to a failed drive which was 
later identified on the primary filesystem).  The filesystem was still 
in the process of removing this device.


I am able to mount this filesystem with no errors but only some 
seemingly-minor things in dmesg until I try to write anything to the 
only subvolume (@backup) at which point I get a non-freezing kernel 
panic and the filesystem becomes read-only.


I received the following during initial mount:
[   72.197216] BTRFS info (device dm-0): disk space caching is enabled
[   72.474072] BTRFS info (device dm-4): use lzo compression
[   72.474075] BTRFS info (device dm-4): disk space caching is enabled
[   72.474076] BTRFS info (device dm-4): has skinny extents
[   72.600816] BTRFS error (device dm-4): parent transid verify failed 
on 15889484660736 wanted 118090 found 118086
[   72.613872] BTRFS info (device dm-4): read error corrected: ino 1 off 
15889484660736 (dev /dev/mapper/backup-1 sector 12694368)
[   72.614079] BTRFS info (device dm-4): read error corrected: ino 1 off 
15889484664832 (dev /dev/mapper/backup-1 sector 12694376)
[   72.614266] BTRFS info (device dm-4): read error corrected: ino 1 off 
15889484668928 (dev /dev/mapper/backup-1 sector 12694384)
[   72.614458] BTRFS info (device dm-4): read error corrected: ino 1 off 
15889484673024 (dev /dev/mapper/backup-1 sector 12694392)
[   72.638285] BTRFS info (device dm-4): bdev /dev/mapper/backup-1 errs: 
wr 0, rd 4, flush 0, corrupt 0, gen 0
[   72.638640] BTRFS error (device dm-4): parent transid verify failed 
on 15889484808192 wanted 118090 found 118086
[   72.639272] BTRFS info (device dm-4): read error corrected: ino 1 off 
15889484808192 (dev /dev/mapper/backup-1 sector 12694656)
[   72.639466] BTRFS info (device dm-4): read error corrected: ino 1 off 
15889484812288 (dev /dev/mapper/backup-1 sector 12694664)
[   72.639651] BTRFS info (device dm-4): read error corrected: ino 1 off 
15889484816384 (dev /dev/mapper/backup-1 sector 12694672)
[   72.639842] BTRFS info (device dm-4): read error corrected: ino 1 off 
15889484820480 (dev /dev/mapper/backup-1 sector 12694680)
[   75.116151] BTRFS error (device dm-4): parent transid verify failed 
on 15889485250560 wanted 118090 found 118086
[   75.116694] BTRFS info (device dm-4): read error corrected: ino 1 off 
15889485250560 (dev /dev/mapper/backup-1 sector 12695520)
[   75.116880] BTRFS info (device dm-4): read error corrected: ino 1 off 
15889485254656 (dev /dev/mapper/backup-1 sector 12695528)

[   78.198601] usb 6-2: USB disconnect, device number 8
[   80.674748] usb 6-2: new low-speed USB device number 9 using uhci_hcd
[   81.040858] usb 6-2: New USB device found, idVendor=0764, idProduct=0601
[   81.040860] usb 6-2: New USB device strings: Mfr=3, Product=1, 
SerialNumber=0

[   81.040861] usb 6-2: Product: OR1500LCDRM1U
[   81.040862] usb 6-2: Manufacturer: CPS
[   81.263990] hid-generic 0003:0764:0601.000A: hidraw0: USB HID v1.10 
Device [CPS OR1500LCDRM1U] on usb-:00:1d.0-2/input0

[   88.447482] usb 6-2: USB disconnect, device number 9
[   90.631604] usb 6-2: new low-speed USB device number 10 using uhci_hcd
[   91.209719] usb 6-2: New USB device found, idVendor=0764, idProduct=0601
[   91.209722] usb 6-2: New USB device strings: Mfr=3, Product=1, 
SerialNumber=0

[   91.209724] usb 6-2: Product: OR1500LCDRM1U
[   91.209725] usb 6-2: Manufacturer: CPS
[   91.433941] hid-generic 0003:0764:0601.000B: hidraw0: USB HID v1.10 
Device [CPS OR1500LCDRM1U] on usb-:00:1d.0-2/input0
[   92.117006] BTRFS error (device dm-4): parent transid verify failed 
on 15889484365824 wanted 118090 found 118086

[   92.161961] repair_io_failure: 2 callbacks suppressed
[   92.161964] BTRFS info (device dm-4): read error corrected: ino 1 off 
15889484365824 (dev /dev/mapper/backup-1 sector 12693792)
[   92.162218] BTRFS info (device dm-4): read error corrected: ino 1 off 
15889484369920 (dev /dev/mapper/backup-1 sector 12693800)
[   92.162477] BTRFS info (device dm-4): read error corrected: ino 1 off 
15889484374016 (dev /dev/mapper/backup-1 sector 12693808)
[   92.162726] BTRFS info (device dm-4): read error corrected: ino 1 off 
15889484378112 (dev /dev/mapper/backup-1 sector 12693816)



At this point, the filesystem mounts rw and I can write data to the root 
subvolume, but upon writing to the @backup subvolume) I get the following:
[  738.583219] BTRFS warning (device dm-4): block group 

Re: [PATCH v2 00/19]

2016-12-29 Thread Qu Wenruo

Hi Goffredo,

At 12/30/2016 02:15 AM, Goffredo Baroncelli wrote:

Hi Qu,

I tried your patch, because I had an hardware failure and I needed to check the 
data integrity.


I'm glad the function helps.


I didn't find any problem however I was not able to understand what "btrfs check 
--scrub" was doing because the program didn't give any output (there is no progress 
bar).


Right, I should add a progress bar to it.
Maybe in next version along with repair function.

The good thing is, no output means good, just like normal fsck.


So I tried to strace it to check if the program was working properly. The 
strace output showed me that the program ran correctly.
However form the strace I noticed that the program read several time the same 
page (size 16k).
I think that this is due to the  walking of the btree. However this could be a 
possible optimization: cache the last read(s).


That doesn't mean it's scrubbing the same leaf, but just normal tree search.

The leaf would be extent root or nodes near extent root.
The offline scrub heavily rely on extent tree to determine if there is 
any extent need to be scrubbed.


Further more, the idea to cache extent tree is not really that easy, 
according to what we have learned from btrfsck.

(Cache may go out of control to explode your RAM).


But your idea to cache still makes sense, for block-device, cache would 
always be good.
(For normal file, kernel provides cache so we don't need to implement by 
ourself)
Although that may need to be implemented in the ctree operation code 
instead of the offline scrub.


BTW, just for reference, what's your device size and how much time it 
takes to do the offline scrub?


Thanks,
Qu



Only my 2¢

BR
G.Baroncelli



On 2016-12-26 07:29, Qu Wenruo wrote:

For any one who wants to try it, it can be get from my repo:
https://github.com/adam900710/btrfs-progs/tree/offline_scrub

Currently, I only tested it on SINGLE/DUP/RAID1/RAID5 filesystems, with
mirror or parity or data corrupted.
The tool are all able to detect them and give recoverbility report.

Several reports on kernel scrub screwing up good data stripes are in ML
for sometime.

And since kernel scrub won't account P/Q corruption, it makes us quite
to detect error like kernel screwing up P/Q when scrubbing.

To get a comparable tool for kernel scrub, we need a user-space tool to
act as benchmark to compare their different behaviors.

So here is the patchset for user-space scrub.

Which can do:

1) All mirror/backup check for non-parity based stripe
   Which means for RAID1/DUP/RAID10, we can really check all mirrors
   other than the 1st good mirror.

   Current "--check-data-csum" option will be finally replace by scrub.
   As it doesn't really check all mirrors, if it hits a good copy, then
   resting copies will just be ignored.

2) Comprehensive RAID5/6 full stripe check
   It will take full use of btrfs csum(both tree and data).
   It will only recover the full stripe if all recovered data matches
   with its csum.

In fact, it can already expose several new btrfs kernel bug.
As it's the main tool I'm using when developing the kernel fixes.

For example, after screwing up a data stripe, kernel did repairs using
parity, but recovered full stripe has wrong parity.
Need to scrub again to fix it.

And this patchset also introduced new map_block() function, which is
more flex than current btrfs_map_block(), and has a unified interface
for all profiles, not just an array of physical addresses.

Check the 6th and 7th patch for details.

They are already used in RAID5/6 scrub, but can also be used for other
profiles too.

The to-do list has been shortened, since RAID6 and new check logical is
introduced.
1) Repair support
   In fact, current tool can already report recoverability, repair is
   not hard to implement.

2) Test cases
   Need to make the infrastructure able to handle multi-device first.

3) Make btrfsck able to handle RAID5 with missing device
   Now it doesn't even open RAID5 btrfs with missing device, even though
   scrub should be able to handle it.

Changelog:
V0.8 RFC:
   Initial RFC patchset

v1:
   First formal patchset.
   RAID6 recovery support added, mainly copied from kernel radi6 lib.
   Cleaner recovery logical.

v2:
   More comments in both code and commit message, suggested by David.
   File re-arrangement, no check/ dir, raid56.ch moved to kernel-lib,
   Suggested by David

Qu Wenruo (19):
  btrfs-progs: raid56: Introduce raid56 header for later recovery usage
  btrfs-progs: raid56: Introduce tables for RAID6 recovery
  btrfs-progs: raid56: Allow raid6 to recover 2 data stripes
  btrfs-progs: raid56: Allow raid6 to recover data and p
  btrfs-progs: Introduce wrapper to recover raid56 data
  btrfs-progs: Introduce new btrfs_map_block function which returns more
unified result.
  btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes
  btrfs-progs: csum: Introduce function to read out one data csum
  btrfs-progs: scrub: 

Re: mounting failed any file on my filesystem

2016-12-29 Thread Duncan
Jan Koester posted on Thu, 29 Dec 2016 20:05:35 +0100 as excerpted:

> Hi,
> 
> i have problem with filesystem if my system crashed i have made been
> hard reset of the system after my Filesystem was crashed. I have already
> tried to repair without success you can see it on log file. It's seem
> one corrupted block brings complete filesystem to crashing.
> 
> Have anybody idea what happened with my filesystem ?
> 
> dmesg if open file:
> [29450.404327] WARNING: CPU: 5 PID: 16161 at
> /build/linux-lIgGMF/linux-4.8.11/ fs/btrfs/extent-tree.c:6945
> __btrfs_free_extent.isra.71+0x8e2/0xd60 [btrfs]

First a disclaimer.  I'm a btrfs user and list regular, not a dev.  As 
such I don't really read call traces much beyond checking the kernel 
version, and don't do code.  It's likely that you will get a more 
authoritative reply from someone who does, and it should take precedence, 
but in the mean time, I can try to deal with the preliminaries.

Kernel 4.8.11, good.  But you run btrfs check below, and we don't have 
the version of your btrfs-progs userspace.  Please report that too.

> btrfs output:
> root@dibsi:/home/jan# btrfs check /dev/disk/by-uuid/
> 73d4dc77-6ff3-412f-9b0a-0d11458faf32

Note that btrfs check is read-only by default.  It will report what it 
thinks are errors, but won't attempt to fix them unless you add various 
options (such as --repair) to tell it to do so.  This is by design and is 
very important, as attempting to repair problems that it doesn't properly 
understand could make the problems worse instead of better.  So even tho 
the above command will only report what it sees as problems, not attempt 
to fix them, you did the right thing by running check without --repair 
first, and posting the results here for an expert to look at and tell you 
whether to try --repair, or what else to try instead.

> Checking filesystem on
> /dev/disk/by-uuid/73d4dc77-6ff3-412f-9b0a-0d11458faf32
> UUID: 73d4dc77-6ff3-412f-9b0a-0d11458faf32
> checking extents
> parent transid verify failed on 2280458502144 wanted 861168
> found 860380
> parent transid verify failed on 2280458502144 wanted 861168
> found 860380
> checksum verify failed on 2280458502144 found FC3DF84D
> wanted 2164EB93
> checksum verify failed on 2280458502144 found FC3DF84D
> wanted 2164EB93
> bytenr mismatch, want=2280458502144, have=15938383240448
[...]

Some other information that we normally ask for includes the output from 
a few other btrfs commands.

It's unclear from your report if the filesystem will mount at all.  The 
subject says mount failed, but then it mentions any file on the 
filesystem, which seems to imply that you could mount, but that any file 
you attempted to actually access after mounting crashes the system with 
the trace you posted, so I'm not sure if you can actually mount the 
filesystem at all.

If you can't mount the filesystem, at least try to post the output from...

btrfs filesystem show

If you can mount the filesystem, then the much more detailed...

btrfs filesystem usage

... if your btrfs-progs is new enough, or...

btrfs filesystem df

... if btrfs-progs is too old to have the usage command.

Also, if it's not clear from the output of the commands above (usage by 
itself, or show plus df, should answer most of the below, but show alone 
only provides some of the information), tell us a bit more about the 
filesystem in question:

Single device (like traditional filesystems) or multiple device?  If 
multiple device, what raid levels if you know them, or did you just go 
with the defaults.  If single device, again, defaults, or did you specify 
single or dup, particularly for metadata.

Also, how big was the filesystem and how close to full?  And was it on 
ssd, spinning rust, or on top of something virtual (like a VM image 
existing as a file on the host, or lvm, or mdraid, etc)?


Meanwhile, if you can mount, the first thing I'd try is btrfs scrub 
(unless you were running btrfs raid56 mode, which makes things far more 
complex as it's not stable yet and isn't recommended except for testing 
with data you can afford to lose).  Often, a scrub can fix much of the 
damage of a crash if you were running raid1 mode (multi-device metadata 
default), raid10, or dup (single device metadata default, except on ssd), 
as those have a second checksummed copy that will often be correct that 
scrub can use to fix the bad copy, but it will detect but be unable to 
fix damage in single mode (default for data) or raid0 mode, as those 
don't have a second copy available to fix the first.

Because the default for single device btrfs is dup metadata, single data, 
in that case the scrub should fix most or all of metadata, allowing you 
to access small file (roughly anything under a couple KiB) and larger 
files that weren't themselves damaged, but you may still have damage in 
some files of any significant size.

But scrub can only run if you can mount the filesystem.  If you can't, 
then you have to try other things in 

Re: Incremental send receive of snapshot fails

2016-12-29 Thread Rene Wolf

Hi


As the fs in question is my root, I tried the following using a live usb 
stick of a xubuntu 16.10:



Checking filesystem on /dev/sdb1
UUID: 122ecca7-9804-4c8a-b4ed-42fd6c6bbe7a
checking extents [o]
checking free space cache [.]
checking fs roots [o]
found 40577679360 bytes used err is 0
total csum bytes: 39027548
total tree bytes: 571277312
total fs tree bytes: 453001216
total extent tree bytes: 71745536
btree space waste bytes: 116244847
file data blocks allocated: 46952968192
 referenced 44081487872


"err is 0" ... so I guess that means everything is fine?

Out of curiosity I retried the new_snap+send+receive on that same fs 
using the live cd: same results (ERROR unlink ...)
Though I noticed that the exact file in question (reported by ERROR) is 
somewhat random ...
For this test with the live usb, I mounted the root volume directly 
instead of subvolumes via fstab. So that doesn't seam to have been the 
problem either.



I did some further meditating on what happens here. From what I read and 
understand of send/receive, the stream produced by send is about 
replaying the fs events. If I give send a parent, it will just replay 
the difference between the two snapshots and only produce a stream that 
contains the changes needed to "transform" one (parent) snap into the 
other (on the receiving end). Now I'm not sure how the receiving end 
figures out what the parent is, and whether it has it, but I guess 
that's where all those UUIDs come into play.


There are three UUIDs, if I compare them on sending ("lab") and 
receiving ("server") side, I see:


## sender
# btrfs subv show /.back/last_snap_by_script

/.back/last_snap_by_script
UUID:   b4634a8b-b74b-154a-9f17-1115f6d07524
Parent UUID:b5f9a301-69f7-0646-8cf1-ba29e0c24fac
Received UUID:  196a0866-cd05-d24e-bac6-84e8e5eb037a


## receiver
# btrfs subv show /media/bak/lab/root/last_snap_by_script

UUID:   89321ec1-2de6-0a4c-8f9f-cdd30fa3a7af
Parent UUID:-
Received UUID:  196a0866-cd05-d24e-bac6-84e8e5eb037a


So that does make sense to me, as neither "Parent UUID" nor "UUID" is 
what would fit our needs (both are kind of local to one system). Instead 
the "Received UUID" seams to be the link identifying snaps on both ends 
to be "equal". But then why do both snaps on the sending side have the 
same "Received UUID" for me:


## from my original post, on sender side, this is the "new" delta snapshot
# btrfs subv show /.back/new_snap

/.back/new_snap
Name:   new_snap
UUID:   fca51929-8101-db45-8df6-f25935c04f98
Parent UUID:b5f9a301-69f7-0646-8cf1-ba29e0c24fac
Received UUID:  196a0866-cd05-d24e-bac6-84e8e5eb037a



It would be great if some one could clear this up .. could this point to 
the reason on why the "replay" stream is produced on a wrong basis?


Another thing I tried is the "--max-error 0" option of receive. That 
lets it continue after error, but that produced an endless slur of more 
of the same errors. Is that another indicator that the parent on the 
sending or receiving side is identified wrongly or not at all?


In any case, thanks for the tip Giuseppe :-)


Regards
Rene

On 29.12.2016 16:31, Giuseppe Della Bianca wrote:

Hi.

In such cases, I have run btrfs check (not repair mode !!!) in every
file system/partition that is involved in creating, sending and
receiving snapshots.


Regards.

Gdb

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mounting failed any file on my filesystem

2016-12-29 Thread Jan Koester
Hi,

i have problem with filesystem if my system crashed i have made been hard reset 
of the system after my Filesystem was crashed. I have already tried to repair 
without success you can see it on log file. It's seem one corrupted block 
brings complete filesystem to crashing.

Have anybody idea what happened with my filesystem ?

dmesg if open file:
[29450.404327] WARNING: CPU: 5 PID: 16161 at /build/linux-lIgGMF/linux-4.8.11/
fs/btrfs/extent-tree.c:6945 __btrfs_free_extent.isra.71+0x8e2/0xd60 [btrfs]
[29450.404331] Modules linked in: snd_usb_audio snd_usbmidi_lib snd_rawmidi 
snd_seq_device nfnetlink_queue nfnetlink_log nfnetlink cfg80211 bnep 
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter pci_stub 
vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) bluetooth rfkill 
binfmt_misc ext4 crc16 jbd2 fscrypto ecb mbcache btrfs xor raid6_pq kvm_amd 
kvm amdkfd irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel radeon 
snd_hda_codec_realtek snd_hda_codec_generic sr_mod cdrom pcspkr serio_raw 
snd_hda_codec_hdmi fam15h_power k10temp evdev snd_hda_intel ttm snd_hda_codec 
drm_kms_helper snd_hda_core snd_hwdep snd_pcm drm snd_timer snd soundcore 
i2c_algo_bit sg sp5100_tco nuvoton_cir shpchp rc_core acpi_cpufreq tpm_tis 
tpm_tis_core tpm button cuse fuse parport_pc ppdev lp parport ip_tables
[29450.404512]  x_tables autofs4 xfs libcrc32c crc32c_generic ata_generic 
hid_generic usbhid hid sd_mod uas usb_storage ohci_pci crc32c_intel 
aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse 
e1000e xhci_pci r8169 xhci_hcd ptp ahci mii pps_core pata_atiixp libahci 
ohci_hcd libata ehci_pci ehci_hcd usbcore scsi_mod i2c_piix4 usb_common fjes
[29450.404543] CPU: 5 PID: 16161 Comm: kworker/u12:3 Tainted: GW  OE   
4.8.0-2-amd64 #1 Debian 4.8.11-1
[29450.404544] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./
970 Extreme3, BIOS P1.80 07/31/2013
[29450.404579] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[29450.404581]  0286 0bc03c37 93d269f5 

[29450.404586]   93a7c16e fffe 
00dd8f8a8000
[29450.404590]   91c65d625000 91c4eac5c000 
91c4e7c09690
[29450.404594] Call Trace:
[29450.404599]  [] ? dump_stack+0x5c/0x77
[29450.404603]  [] ? __warn+0xbe/0xe0
[29450.404630]  [] ? __btrfs_free_extent.isra.71+0x8e2/0xd60 
[btrfs]
[29450.404660]  [] ? block_group_cache_tree_search+0x21/0xd0 
[btrfs]
[29450.404690]  [] ? update_block_group.isra.70+0x133/0x420 
[btrfs]
[29450.404699]  [] ? __set_page_dirty_nobuffers+0xef/0x140
[29450.404736]  [] ? btrfs_merge_delayed_refs+0x69/0x580 
[btrfs]
[29450.404767]  [] ? __btrfs_run_delayed_refs+0xadc/0x1240 
[btrfs]
[29450.404801]  [] ? btrfs_run_delayed_refs+0x8e/0x2a0 
[btrfs]
[29450.404834]  [] ? delayed_ref_async_start+0x89/0xa0 
[btrfs]
[29450.404871]  [] ? btrfs_scrubparity_helper+0xd1/0x2d0 
[btrfs]
[29450.404879]  [] ? process_one_work+0x160/0x410
[29450.404886]  [] ? worker_thread+0x4d/0x480
[29450.404892]  [] ? process_one_work+0x410/0x410
[29450.404899]  [] ? kthread+0xcd/0xf0
[29450.404906]  [] ? ret_from_fork+0x1f/0x40
[29450.404913]  [] ? kthread_create_on_node+0x190/0x190
[29450.404919] ---[ end trace 9627fcfceb44da0b ]---
[29450.404926] BTRFS info (device sdd): leaf 950359678976 total ptrs 67 free 
space 442
[29450.404934]  item 0 key (503649468416 192 1107296256) itemoff 3971 itemsize 
24
[29450.404940]  block group used 36864
[29450.404947]  item 1 key (503649538048 169 0) itemoff 3938 itemsize 33
[29450.404952]  extent refs 1 gen 861177 flags 2
[29450.404959]  tree block backref root 2
[29450.404965]  item 2 key (503649550336 169 1) itemoff 3905 itemsize 33
[29450.404971]  extent refs 1 gen 861177 flags 2
[29450.404977]  tree block backref root 2
[29450.404984]  item 3 key (503649566720 169 0) itemoff 3872 itemsize 33
[29450.404991]  extent refs 1 gen 861177 flags 2
[29450.404996]  tree block backref root 2
[29450.405001]  item 4 key (503649570816 169 0) itemoff 3839 itemsize 33
[29450.405007]  extent refs 1 gen 861177 flags 2
[29450.405012]  tree block backref root 2
[29450.405019]  item 5 key (503649607680 169 0) itemoff 3806 itemsize 33
[29450.405025]  extent refs 1 gen 861177 flags 2
[29450.405028]  tree block backref root 2
[29450.405030]  item 6 key (503649628160 169 0) itemoff 3773 itemsize 33
[29450.405031]  extent refs 1 gen 861177 flags 2
[29450.405032]  tree block backref root 2
[29450.405034]  item 7 key (503649636352 169 0) itemoff 3740 itemsize 33
[29450.405036]  extent refs 1 gen 861177 flags 2
[29450.405037]  tree block backref root 2
[29450.405039]  item 8 key (503649669120 169 1) itemoff 3707 itemsize 33
[29450.405040]  extent refs 1 gen 861177 flags 2
[29450.405041]  tree block backref root 1
[29450.405043]  item 9 key 

Re: [PATCH v2 00/19]

2016-12-29 Thread Goffredo Baroncelli
Hi Qu,

I tried your patch, because I had an hardware failure and I needed to check the 
data integrity. I didn't find any problem however I was not able to understand 
what "btrfs check --scrub" was doing because the program didn't give any output 
(there is no progress bar). So I tried to strace it to check if the program was 
working properly. The strace output showed me that the program ran correctly.
However form the strace I noticed that the program read several time the same 
page (size 16k). I think that this is due to the  walking of the btree. However 
this could be a possible optimization: cache the last read(s).

Only my 2¢

BR
G.Baroncelli



On 2016-12-26 07:29, Qu Wenruo wrote:
> For any one who wants to try it, it can be get from my repo:
> https://github.com/adam900710/btrfs-progs/tree/offline_scrub
> 
> Currently, I only tested it on SINGLE/DUP/RAID1/RAID5 filesystems, with
> mirror or parity or data corrupted.
> The tool are all able to detect them and give recoverbility report.
> 
> Several reports on kernel scrub screwing up good data stripes are in ML
> for sometime.
> 
> And since kernel scrub won't account P/Q corruption, it makes us quite
> to detect error like kernel screwing up P/Q when scrubbing.
> 
> To get a comparable tool for kernel scrub, we need a user-space tool to
> act as benchmark to compare their different behaviors.
> 
> So here is the patchset for user-space scrub.
> 
> Which can do:
> 
> 1) All mirror/backup check for non-parity based stripe
>Which means for RAID1/DUP/RAID10, we can really check all mirrors
>other than the 1st good mirror.
> 
>Current "--check-data-csum" option will be finally replace by scrub.
>As it doesn't really check all mirrors, if it hits a good copy, then
>resting copies will just be ignored.
> 
> 2) Comprehensive RAID5/6 full stripe check
>It will take full use of btrfs csum(both tree and data).
>It will only recover the full stripe if all recovered data matches
>with its csum.
> 
> In fact, it can already expose several new btrfs kernel bug.
> As it's the main tool I'm using when developing the kernel fixes.
> 
> For example, after screwing up a data stripe, kernel did repairs using
> parity, but recovered full stripe has wrong parity.
> Need to scrub again to fix it.
> 
> And this patchset also introduced new map_block() function, which is
> more flex than current btrfs_map_block(), and has a unified interface
> for all profiles, not just an array of physical addresses.
> 
> Check the 6th and 7th patch for details.
> 
> They are already used in RAID5/6 scrub, but can also be used for other
> profiles too.
> 
> The to-do list has been shortened, since RAID6 and new check logical is
> introduced.
> 1) Repair support
>In fact, current tool can already report recoverability, repair is
>not hard to implement.
> 
> 2) Test cases
>Need to make the infrastructure able to handle multi-device first.
> 
> 3) Make btrfsck able to handle RAID5 with missing device
>Now it doesn't even open RAID5 btrfs with missing device, even though
>scrub should be able to handle it.
> 
> Changelog:
> V0.8 RFC:
>Initial RFC patchset
> 
> v1:
>First formal patchset.
>RAID6 recovery support added, mainly copied from kernel radi6 lib.
>Cleaner recovery logical.
> 
> v2:
>More comments in both code and commit message, suggested by David.
>File re-arrangement, no check/ dir, raid56.ch moved to kernel-lib,
>Suggested by David
> 
> Qu Wenruo (19):
>   btrfs-progs: raid56: Introduce raid56 header for later recovery usage
>   btrfs-progs: raid56: Introduce tables for RAID6 recovery
>   btrfs-progs: raid56: Allow raid6 to recover 2 data stripes
>   btrfs-progs: raid56: Allow raid6 to recover data and p
>   btrfs-progs: Introduce wrapper to recover raid56 data
>   btrfs-progs: Introduce new btrfs_map_block function which returns more
> unified result.
>   btrfs-progs: Allow __btrfs_map_block_v2 to remove unrelated stripes
>   btrfs-progs: csum: Introduce function to read out one data csum
>   btrfs-progs: scrub: Introduce structures to support fsck scrub for
> RAID56
>   btrfs-progs: scrub: Introduce function to scrub mirror based tree
> block
>   btrfs-progs: scrub: Introduce function to scrub mirror based data
> blocks
>   btrfs-progs: scrub: Introduce function to scrub one extent
>   btrfs-progs: scrub: Introduce function to scrub one data stripe
>   btrfs-progs: scrub: Introduce function to verify parities
>   btrfs-progs: extent-tree: Introduce function to check if there is any
> extent in given range.
>   btrfs-progs: scrub: Introduce function to recover data parity
>   btrfs-progs: scrub: Introduce a function to scrub one full stripe
>   btrfs-progs: scrub: Introduce function to check a whole block group
>   btrfs-progs: fsck: Introduce offline scrub function
> 
>  .gitignore |2 +
>  Documentation/btrfs-check.asciidoc |7 +
>  Makefile.in   

Re: problems with btrfs filesystem loading

2016-12-29 Thread Michał Zegan
That seems to do the trick, thanks

W dniu 29.12.2016 o 17:53, Roman Mamedov pisze:
> On Thu, 29 Dec 2016 16:42:09 +0100
> Michał Zegan  wrote:
> 
>> I have odroid c2, processor architecture aarch64, linux kernel from
>> master as of today from http://github.com/torwalds/linux.git.
>> It seems that the btrfs module cannot be loaded. The only thing that
>> happens is that after modprobe i see:
>> modprobe: can't load module btrfs (kernel/fs/btrfs/btrfs.ko.gz): unknown
>> symbol in module, or unknown parameter
>> No errors in dmesg, like I have ignore_loglevel in kernel cmdline and no
>> logs in console appear except logs for loading dependencies like xor
>> module, but that is probably not important.
>> The kernel has been recompiled few minutes ago from scratch, the only
>> thing left was .config file. What is that? other modules load correctly
>> from what I can see.
> 
> In the past there's been some trouble with crc32 dependencies:
> https://www.spinics.net/lists/linux-btrfs/msg32104.html
> Not sure if that's relevant anymore, but in any case, check if you have
> crc32-related stuff either built-in or compiled as modules, if latter, try
> loading those before btrfs (/lib/modules/*/kernel/crypto/crc32*)
> 



signature.asc
Description: OpenPGP digital signature


Re: problems with btrfs filesystem loading

2016-12-29 Thread Roman Mamedov
On Thu, 29 Dec 2016 16:42:09 +0100
Michał Zegan  wrote:

> I have odroid c2, processor architecture aarch64, linux kernel from
> master as of today from http://github.com/torwalds/linux.git.
> It seems that the btrfs module cannot be loaded. The only thing that
> happens is that after modprobe i see:
> modprobe: can't load module btrfs (kernel/fs/btrfs/btrfs.ko.gz): unknown
> symbol in module, or unknown parameter
> No errors in dmesg, like I have ignore_loglevel in kernel cmdline and no
> logs in console appear except logs for loading dependencies like xor
> module, but that is probably not important.
> The kernel has been recompiled few minutes ago from scratch, the only
> thing left was .config file. What is that? other modules load correctly
> from what I can see.

In the past there's been some trouble with crc32 dependencies:
https://www.spinics.net/lists/linux-btrfs/msg32104.html
Not sure if that's relevant anymore, but in any case, check if you have
crc32-related stuff either built-in or compiled as modules, if latter, try
loading those before btrfs (/lib/modules/*/kernel/crypto/crc32*)

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: problems with btrfs filesystem loading

2016-12-29 Thread Michał Zegan
Resending to btrfs list



--- Treść przekazanej wiadomości ---
To: linux-fsde...@vger.kernel.org
From: Michał Zegan 
Subject: problems with btrfs filesystem loading
Message-ID: <05893a24-2bf7-d485-1f9c-b10650419...@poczta.onet.pl>
Date: Thu, 29 Dec 2016 16:24:47 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.5.1
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
protocol="application/pgp-signature";
boundary="mQjxlXhkhghUxUDee2dbaL9BkEvBHD9kN"
X-ONET_PL-MDA-Received: ppn20v.m5r2.onet 23374

Hello.

I have odroid c2, processor architecture aarch64, linux kernel from
master as of today from http://github.com/torwalds/linux.git.
It seems that the btrfs module cannot be loaded. The only thing that
happens is that after modprobe i see:
modprobe: can't load module btrfs (kernel/fs/btrfs/btrfs.ko.gz): unknown
symbol in module, or unknown parameter
No errors in dmesg, like I have ignore_loglevel in kernel cmdline and no
logs in console appear except logs for loading dependencies like xor
module, but that is probably not important.
The kernel has been recompiled few minutes ago from scratch, the only
thing left was .config file. What is that? other modules load correctly
from what I can see.





signature.asc
Description: OpenPGP digital signature


Incremental send receive of snapshot fails

2016-12-29 Thread Giuseppe Della Bianca
Hi.

In such cases, I have run btrfs check (not repair mode !!!) in every
file system/partition that is involved in creating, sending and
receiving snapshots.


Regards.

Gdb


>Rene Wolf Wed, 28 Dec 2016 03:51:07 -0800


>Hi all
>I have a problem with incremental snapshot send receive in btrfs. May
be my google-fu is weak, but I couldn't find any pointers, so here goes.

>A few words about my setup first:

>I have multiple clients that back up to a central server. All clients
(and the server) are running a (K)Ubuntu 16.10 64Bit on btrfs. Backing
up works with btrfs send / receive, either full or incremental,
depending on whats available on >the server side. All clients have the
usual (Ubuntu) btrfs layout: 2 subvolumes, one for / and one for /home;
explicit entries in fstab; root volume not mounted anywhere. For further
details see the P.s. at the end.
]zac[


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)

2016-12-29 Thread Michal Hocko
On Thu 29-12-16 10:20:26, Minchan Kim wrote:
> On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote:
> > Hi,
> > could you try to run with the following patch on top of the previous
> > one? I do not think it will make a large change in your workload but
> > I think we need something like that so some testing under which is known
> > to make a high lowmem pressure would be really appreciated. If you have
> > more time to play with it then running with and without the patch with
> > mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us
> > whether it make any difference at all.
> > 
> > I would also appreciate if Mel and Johannes had a look at it. I am not
> > yet sure whether we need the same thing for anon/file balancing in
> > get_scan_count. I suspect we need but need to think more about that.
> > 
> > Thanks a lot again!
> > ---
> > From b51f50340fe9e40b68be198b012f8ab9869c1850 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko 
> > Date: Tue, 27 Dec 2016 16:28:44 +0100
> > Subject: [PATCH] mm, vmscan: consider eligible zones in get_scan_count
> > 
> > get_scan_count considers the whole node LRU size when
> > - doing SCAN_FILE due to many page cache inactive pages
> > - calculating the number of pages to scan
> > 
> > in both cases this might lead to unexpected behavior especially on 32b
> > systems where we can expect lowmem memory pressure very often.
> > 
> > A large highmem zone can easily distort SCAN_FILE heuristic because
> > there might be only few file pages from the eligible zones on the node
> > lru and we would still enforce file lru scanning which can lead to
> > trashing while we could still scan anonymous pages.
> 
> Nit:
> It doesn't make thrashing because isolate_lru_pages filter out them
> but I agree it makes pointless CPU burning to find eligible pages.

This is not about isolate_lru_pages. The trashing could happen if we had
lowmem pagecache user which would constantly reclaim recently faulted
in pages while there is anonymous memory in the lowmem which could be
reclaimed instead.
 
[...]
> >  /*
> > + * Return the number of pages on the given lru which are eligibne for the
> eligible

fixed

> > + * given zone_idx
> > + */
> > +static unsigned long lruvec_lru_size_zone_idx(struct lruvec *lruvec,
> > +   enum lru_list lru, int zone_idx)
> 
> Nit:
> 
> Although there is a comment, function name is rather confusing when I compared
> it with lruvec_zone_lru_size.

I am all for a better name.

> lruvec_eligible_zones_lru_size is better?

this would be too easy to confuse with lruvec_eligible_zone_lru_size.
What about lruvec_lru_size_eligible_zones?
 
> Nit:
> 
> With this patch, inactive_list_is_low can use lruvec_lru_size_zone_idx rather 
> than
> own custom calculation to filter out non-eligible pages. 

Yes, that would be possible and I was considering that. But then I found
useful to see total and reduced numbers in the tracepoint
http://lkml.kernel.org/r/20161228153032.10821-8-mho...@kernel.org
and didn't want to call lruvec_lru_size 2 times. But if you insist then
I can just do that.

> Anyway, I think this patch does right things so I suppose this.
> 
> Acked-by: Minchan Kim 

Thanks for the review!

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)

2016-12-29 Thread Michal Hocko
On Thu 29-12-16 09:48:24, Minchan Kim wrote:
> On Thu, Dec 29, 2016 at 09:31:54AM +0900, Minchan Kim wrote:
[...]
> > Acked-by: Minchan Kim 

Thanks!
 
> Nit:
> 
> WARNING: line over 80 characters
> #53: FILE: include/linux/memcontrol.h:689:
> +unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec, enum 
> lru_list lru,
> 
> WARNING: line over 80 characters
> #147: FILE: mm/vmscan.c:248:
> +unsigned long lruvec_zone_lru_size(struct lruvec *lruvec, enum lru_list lru, 
> int zone_idx)
> 
> WARNING: line over 80 characters
> #177: FILE: mm/vmscan.c:1446:
> +   mem_cgroup_update_lru_size(lruvec, lru, zid, 
> -nr_zone_taken[zid]);

fixed

> WARNING: line over 80 characters
> #201: FILE: mm/vmscan.c:2099:
> +   inactive_zone = lruvec_zone_lru_size(lruvec, file * LRU_FILE, 
> zid);
> 
> WARNING: line over 80 characters
> #202: FILE: mm/vmscan.c:2100:
> +   active_zone = lruvec_zone_lru_size(lruvec, (file * LRU_FILE) 
> + LRU_ACTIVE, zid);

I would prefer to have those on the same line though. It will make them
easier to follow.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html