date:20160602

New convert introduced simpler chunk/extent allocation algorithm, at the
cost of complex backup superblock migration codes.

Use specially built ext2 images to test if btrfs-convert can convert and
rollback images without problem.

All these special ext2 image have blocks/holes across 2nd btrfs backup
superblock.

The naming of test image is like the following:
|<--superblock migration range->|
64M 64M + 64K
|-Data--|-Data--|/Hole//|-Data--|/Hole//|-Data--|--Data--| = drdhdhdrd

These test cases should check all typical layouts and make sure new
convert works.

Signed-off-by: Qu Wenruo 
---
v2:
  Split infrastructure with test case
---
 .../drdhdhdrd.e2image.raw.xz   | Bin 0 -> 84564 bytes
 .../drdhdhrh.e2image.raw.xz| Bin 0 -> 84568 bytes
 .../hrhdhdrd.e2image.raw.xz| Bin 0 -> 84556 bytes
 .../hrhdhrh.e2image.raw.xz | Bin 0 -> 84568 bytes
 .../001-ext2-backup-superblock-ranges/test.sh  |  37 +
 5 files changed, 37 insertions(+)
 create mode 100644 
tests/convert-tests/001-ext2-backup-superblock-ranges/drdhdhdrd.e2image.raw.xz
 create mode 100644 
tests/convert-tests/001-ext2-backup-superblock-ranges/drdhdhrh.e2image.raw.xz
 create mode 100644 
tests/convert-tests/001-ext2-backup-superblock-ranges/hrhdhdrd.e2image.raw.xz
 create mode 100644 
tests/convert-tests/001-ext2-backup-superblock-ranges/hrhdhrh.e2image.raw.xz
 create mode 100755 
tests/convert-tests/001-ext2-backup-superblock-ranges/test.sh

diff --git 
a/tests/convert-tests/001-ext2-backup-superblock-ranges/drdhdhdrd.e2image.raw.xz
 
b/tests/convert-tests/001-ext2-backup-superblock-ranges/drdhdhdrd.e2image.raw.xz
new file mode 100644
index 
..73e2309c4bd6991424962c3bc53ad74171b52268
GIT binary patch
literal 84564
zcmeI5Wl-Num!@%d2?Up*!QCB#Yj6$jZo%Dxy9Nsy2*EA5LvSa!y95mqc4~HZYj$R~
zYM!b0d1a>g`u{*x*ZK8*Uw5BA{lx91r2_#0vpZKO1qDG4`2qq0!j*Za@c9`NI!GS^
z!r|%pxv(*iYo}y{rZCtwY41pNlwS25G93dg72>DIxtVmL7Zb7nHa%GMLXU^_3Yht#jkpjt)BvXvgJxYSBMy^-6n{
zWPiwy!COQ!?x0)@9(uJiZJ@d6@2o|Jbv^nUkC=p%1S`a6H#RKvqG`8EGLF?5pzY!b
zS-l^Xuly-gcms#+l!_MLB4TV9tkp-~%mp*6UIU*xJG3N`CpdYc@lt7x>KbKiX#y?w
zuKNjjKaH;Ls%q3S=R;2DBCN#V3!CTsi@LgpoNj6{l9#^QWl9iv={)eqEBAZ|akTUs
zpHPGNXPoy8_jyIy@rW;@(5k9Q2H*KH47XISlQXJuy99L2Q*}eAltRI+k{h%pYo1
zlevLkkErD%ql{xOy>DW)9}D;j^UZJTms8knqPo#({wz#>
z>B)Z2g_}@kt=^j$xf&**nAhl%j9Dc)mYK2Wz
zj+L_oISB*Cs33d$D{gax`dRqBwyxecTEyLLy!G+>*i_#yso%6m#qGS;kayV3Q3GX>
z5eT?OnNqO}ACmVUA(*_dw_y^3Sheg*Tzj$gx+YD>BszY`#03tuD2gDan)LrDe(5z6
zKxEExFJ^wxOCh>rOr6S0ga#|U$@nEiuz}<32)hS?Er=UR%~VWFpypg|lJqSj5t{xG
z<`Pust1V+_jummy_m>`T>CNU(V~T@uR)|IBD>?dcvE`@N^jNL1=w|8`55q=FU
zRX#qbzIGiY(Ipf6wW20?M`w!2A2331yH?MC-rL4xSdK0vXD$|xjrNR@bm!!$eTO{y
zoI)#wUiRp-`jE-Oq|q4{dKnzpY1T`nc6Xh?#R^;W_0*xuQQ-~BWCCV+r4ezFjNTEMD*FC+U7;f(HWY1Ul}ylb$7{
zOEqDhnnMC>xKdP=x9jL-lH+?m@y%RNXs*v;jP(MfeZ+ZByY^b-n7u(-ywebia-1
zahXX5`tO>I+{#&2a%53b$l1LoZw$Kf!z)(45Yiz8($gtdo$AJH&|1Urjj}ZMQzx1?
z(O-V8K&
zp_m?21m%vM(80(kS#`uYoG6OwN^RO#OD|vC(vu^75#6#i98R6KTa)rQe{HZnHBj(E
z3H>+B2e>v?evbO0i=Ep>DViV_E-e}qN(^YrjzX6kLv6uK
zB|O~JvQ+LXnKy|u^&8Zluh#wMceviKV!^U2CvzEfM!2J%`nkTc48f*VHnuHYi2My}
zDb8{^WDzvf_o%kZ%0g-1Dl63FY6=+i*S~Y(Fy7MW?f1%leAN9pf_UUf)zQB}LO9ZX
z|7wMC$}XauP{nIxCA7Q+wvimmb;IK|eEOJ11$bLwB#=F309v?VRgMC
z;Q4Z2FmDfAr;2*iw#AT)luUe>QpbiCy5>1tlJ2W@9ULq0}L-*h?=fiwby|~^r
zVx2BRkyUEWc^cJnOt4GhNo;?gqEE8PxOdUkVY#A*mPwhhx34@Ll(6uq8s0k%!4
zjfKmgi2WI(R*c*u!E7=aD-HT5?r1yl6V9x`llwEtF?zf}?9?W|B!b=xjDaGWZw7Ca
z9D{V6V=roN@5FoY?g}SycLYP+wO}!qk!nymrEiDdd_
zFKrY8ET>3u0{vW{=++s-B04AO(<*eou4xK~b#yYuKNGb`);~Cg{$7CQ4iQ@$%#z-3
zjtbg%+GvfBv1lAFTgH`X4qjo4b~MUeHma0O%XKp+H{He8GcFDHM7^h0zKdbwM0OQO
z9*MVUpbXF!?3t1f;%(MfhOAFt#b7xzAZd61B7^Ej!QAl=3WGH;L|%=g}57#d#;#
zF0*M9bJu`2PU2I$o;gKEedDxey=V6ZlVc*3A
zsVwa4C|mL-@8#*Lgl|!Td`=xz=LXz3P1o~B{pn7O^zTYf8JT$78YlqmqSFpHIt#c3K5b@?N!Z_cAHV77R$lWSowBAa9nubu%0{C
zJ8uR01yhD0Y_FGMF@e3E|O$5}^zd;9^q360V=1wMYp3*QxLUKKBNf?L
z9IGXln)antICl;1*QDR|SlU?Kvzj0iUDM$l6(9Uza*D}A^(s`F>wxF8{WhO-HsN!`QClHr1jlS}C{Pq)DBsVYOqoY{)fDN<1pD!~9VfR%GcF3nC_(+2^Pj
zcmvI$mo^$Yxz1VmocEoI+nFc+Ow1zmosy;))1=VmMck}wuL(VraO#yQZH5E)+>yAf
zu_2$OygWia*o##hmlvgUNfJXzHPTcVx-|i$<%+f~F!vqoP)FH~r
z_>29sY?ol&7T@lmLmj!yN+x?mtb+XVo%qsTL-SyoOVsb;cZPv9?_R*mz+`$uQM#rl7m{R1S9c}x
z)$GgFzp
zICR!(``G*u^K*RK@(75-`1_O97d78>T%GhuF^_(=lxA-37|v6M4VshRbkbWRods?x

[PATCH v2 1/2] btrfs-progs: convert-tests: Add support for custom test scripts

Add support for custom convert test scripts, just like fsck tests.

Instead of generic convert tests, we need more specifically created images
for new convert tests.

This patch provide the needed infrastructure for later convert test
cases.

Signed-off-by: Qu Wenruo 
---
v2:
  Split test case with infrastructure
---
 tests/common   |  8 
 tests/convert-tests.sh | 31 +++
 2 files changed, 39 insertions(+)

diff --git a/tests/common b/tests/common
index 91682ef..fed9ede 100644
--- a/tests/common
+++ b/tests/common
@@ -94,6 +94,14 @@ check_prereq()
fi
 }
 
+check_global_prereq()
+{
+   which $1 &> /dev/null
+   if [ $? -ne 0 ]; then
+   _fail "Failed system wide prerequisities: $1";
+   fi
+}
+
 check_image()
 {
local image
diff --git a/tests/convert-tests.sh b/tests/convert-tests.sh
index 06d8419..a02311d 100755
--- a/tests/convert-tests.sh
+++ b/tests/convert-tests.sh
@@ -15,6 +15,11 @@ DATASET_SIZE=50
 
 source $TOP/tests/common
 
+# Allow child test to use $TOP and $RESULTS
+export TOP
+export RESULTS
+export LANG
+
 rm -f $RESULTS
 
 setup_root_helper
@@ -22,6 +27,25 @@ prepare_test_dev 512M
 
 CHECKSUMTMP=$(mktemp --tmpdir btrfs-progs-convert.XX)
 
+run_one_test() {
+   local testname
+
+   testname="$1"
+   echo "[TEST/conv]   $testname"
+   cd $testname
+   echo "=== Entering $testname" >> $RESULTS
+   if [ -x test.sh ]; then
+   # Difference convert test case needs different tools to restore
+   # and check image, so only support custom test scripts
+   ./test.sh
+   if [ $? -ne 0 ]; then
+   _fail "test failed for case $(basename $testname)"
+   fi
+   else
+   _fail "custom test script not found"
+   fi
+}
+
 generate_dataset() {
 
dataset_type="$1"
@@ -163,4 +187,11 @@ for feature in '' 'extref' 'skinny-metadata' 'no-holes'; do
convert_test "$feature" "ext4 64k nodesize" 65536 mke2fs -t ext4 -b 4096
 done
 
+# Test special images
+for i in $(find $TOP/tests/convert-tests -maxdepth 1 -mindepth 1 -type d \
+  ${TEST:+-name "$TEST"} | sort)
+do
+   run_one_test "$i"
+done
+
 rm $CHECKSUMTMP
-- 
2.8.3



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: chunk_width_limit mount option

2016-06-02 Thread Andrew Armenia

This patch adds mount option 'chunk_width_limit=X', which when set forces
the chunk allocator to use only up to X devices when allocating a chunk.
This may help reduce the seek penalties seen in filesystems with large
numbers of devices.

Signed-off-by: Andrew Armenia 
---
 fs/btrfs/ctree.h   |  3 +++
 fs/btrfs/super.c   | 22 +-
 fs/btrfs/volumes.c | 26 ++
 3 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 101c3cf..27b6f8f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -754,6 +754,9 @@ struct btrfs_fs_info {
unsigned long pending_changes;
unsigned long compress_type:4;
int commit_interval;
+
+   int chunk_width_limit;
+
/*
 * It is a suggestive number, the read side is safe even it gets a
 * wrong number because we will write out the data into a regular
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 4e59a91..3da5220 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -300,7 +300,7 @@ enum {
Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard,
Opt_noenospc_debug, Opt_noflushoncommit, Opt_acl, Opt_datacow,
Opt_datasum, Opt_treelog, Opt_noinode_cache, Opt_usebackuproot,
-   Opt_nologreplay, Opt_norecovery,
+   Opt_nologreplay, Opt_norecovery, Opt_width_limit,
 #ifdef CONFIG_BTRFS_DEBUG
Opt_fragment_data, Opt_fragment_metadata, Opt_fragment_all,
 #endif
@@ -360,6 +360,7 @@ static const match_table_t tokens = {
{Opt_rescan_uuid_tree, "rescan_uuid_tree"},
{Opt_fatal_errors, "fatal_errors=%s"},
{Opt_commit_interval, "commit=%d"},
+   {Opt_width_limit, "chunk_width_limit=%d"},
 #ifdef CONFIG_BTRFS_DEBUG
{Opt_fragment_data, "fragment=data"},
{Opt_fragment_metadata, "fragment=metadata"},
@@ -782,6 +783,22 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options,
info->commit_interval = 
BTRFS_DEFAULT_COMMIT_INTERVAL;
}
break;
+   case Opt_width_limit:
+   intarg = 0;
+   ret = match_int([0], );
+   if (ret < 0) {
+   btrfs_err(root->fs_info, "invalid chunk width 
limit");
+   ret = -EINVAL;
+   goto out;
+   }
+
+   if (intarg > 0) {
+   info->chunk_width_limit = intarg;
+   } else {
+   btrfs_info(root->fs_info, "chunk width is 
unlimited");
+   info->chunk_width_limit = 0;
+   }
+   break;
 #ifdef CONFIG_BTRFS_DEBUG
case Opt_fragment_all:
btrfs_info(root->fs_info, "fragmenting all space");
@@ -1207,6 +1224,9 @@ static int btrfs_show_options(struct seq_file *seq, 
struct dentry *dentry)
if (info->thread_pool_size !=  min_t(unsigned long,
 num_online_cpus() + 2, 8))
seq_printf(seq, ",thread_pool=%d", info->thread_pool_size);
+   if (info->chunk_width_limit != 0)
+   seq_printf(seq, ",chunk_width_limit=%d",
+   info->chunk_width_limit);
if (btrfs_test_opt(root, COMPRESS)) {
if (info->compress_type == BTRFS_COMPRESS_ZLIB)
compress_type = "zlib";
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bdc6256..6d0d35d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4558,6 +4558,32 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
devs_increment = btrfs_raid_array[index].devs_increment;
ncopies = btrfs_raid_array[index].ncopies;
 
+   /*
+* if we have a statically-configured chunk width, and the type doesn't
+* specify one, go ahead and use the statically-configured max instead.
+*
+* If the static value is greater than the BTRFS_MAX_DEVS for the
+* chunk tree, we ignore it.
+*
+* Also, we ignore the static value for system chunks.
+*/
+   if (
+   devs_max == 0 && info->chunk_width_limit != 0
+   && !(type & BTRFS_BLOCK_GROUP_SYSTEM)
+   && info->chunk_width_limit <= BTRFS_MAX_DEVS(info->chunk_root)
+   ) {
+   if (info->chunk_width_limit >= devs_min) {
+   devs_max = info->chunk_width_limit;
+   } else {
+   /* warn that the static devs_max is unusable */
+   btrfs_warn(info,
+   "can't satisfy max chunk width of %d; "
+   "minimum %d devices needed",
+   info->chunk_width_limit, devs_max
+

Re: "No space left on device" and balance doesn't work

2016-06-02 Thread Henk Slager

On Thu, Jun 2, 2016 at 3:55 PM, MegaBrutal  wrote:
> 2016-06-02 0:22 GMT+02:00 Henk Slager :
>> What is the kernel version used?
>> Is the fs on a mechanical disk or SSD?
>> What are the mount options?
>> How old is the fs?
>
> Linux 4.4.0-22-generic (Ubuntu 16.04).
> Mechanical disks in LVM.
> Mount: /dev/mapper/centrevg-rootlv on / type btrfs
> (rw,relatime,space_cache,subvolid=257,subvol=/@)
> I don't know how to retrieve the exact FS age, but it was created in
> 2014 August.
>
> Snapshots (their names encode their creation dates):
>
> ID 908 gen 487349 top level 5 path @-snapshot-2016050301
...
> ID 937 gen 521829 top level 5 path @-snapshot-2016060201
>
> Removing old snapshots is the most feasible solution, but I can also
> increase the FS size. It's easy since it's in LVM, and there is plenty
> of space in the volume group.
>
> Probably I should rewrite my alert script to check btrfs fi show
> instead of plain df.

Yes I think that makes sense, to decide on chunk-level. You can see
how big the chunks are with the linked show_usage.py program, most of
33 should be 1GiB as already very well explained by Austin.

The setup looks all pretty normal and btrfs should be able to handle
it, but unfortunately your fs is a typical example that one currently
needs to monitor/tune a btrfs fs for its 'health' in order to keep it
running longterm. You might want to change mount option relatime to
noatime, so that you have less writes to metadata chunks. It should
lower the scattering inside the metadata chunks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] receive not seeing file that exists

2016-06-02 Thread Henk Slager

On Thu, Jun 2, 2016 at 9:26 AM, Benedikt Morbach
 wrote:
> Hi all,
>
> I've encountered a bug in btrfs-receive. When receiving a certain
> incremental send, it will error with:
>
> ERROR: cannot open
> backup/detritus/root/root.20160524T1800/var/log/journal/9cbb44cf160f4c1089f77e32ed376a0b/user-1000.journal:
> No such file or directory
>
> even though that path exists and the parent subvolume is identical on
> both ends (I checked manually).
>
> I've noticed this happen before on the same directory (and google
> confirms it has also happened to others) and /var/log/journal/ and its
> children are the only directories with 'chattr +C' on this system, so
> it might be related to that?

Now that I see this report, I realize that I also hit this issue. I
was compiling a kernel with 'make -j4 all'. Under some circumstances,
this leads to 'package temp too high' and throtlling speed of CPU;
with -j1 or -j2 I haven't seen it (root cause is the power supply I
think).

Anyhow, while this compile was running, my nightly snapshotting and
incremental send|receive was started. I saw a mce HW error in kernel
log also at that point, so I did restart. Also the inc send had failed
so I thought, it was due to mce issue. But also with no mce-HW issues
logged, tools-4.5.3 + kernel-4.5.4 and also tools-4.5.3 + kernel-4.6.0
had the same issue.

I run send and receive on same PC in this case, but splitting the
stream to a file in addition. The file was already corrupt (too short)
I noticed, so I concluded the issue was in send. I set up a hourly
extra backup crontask for this problem subvol and it failed almost
every hour. For another subvolume on the new 3-day young fs, it was
not a problem. The fs is a few TB, has default mkfs settings +noholes.
Nodesize increased from 4k to 16k, that was a reason to re-create it.

For the problem subvol and also others that I do not inc backup, I set
the subvol to ro on old fs, send the stream-file to temp storage,
received it back on new fs and set it to rw and created initial backup
snapshot of it and send it over to backup fs. That all worked fine.
Several programs write and delete roughly 10 files/hour so not very
active part of the fs. It was quite random at which file the
incremental stream got corrupted.

My best guess was that the use of  btrfs property set  might be the
issue, so I rsynced the data in the subvol into a new subvol and did
initial backup snapshot transfer. This was with tools-4.5.3 +
kernel-4.5.4 and it runs now fine for 10 days.

I had limited time to research this issue for the subvol and also
cannot provide send-stream data for the subvol. But I have still a 12G
btrfs-stream of a .git kernelbuild tree that also got this btrfs
property set ro=true treatment. So I might try to reproduce the bug
with that one.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs: fix check_shared for fiemap ioctl

On Thu, Jun 02, 2016 at 02:17:40PM -0700, Mark Fasheh wrote:
> On Thu, Jun 02, 2016 at 04:56:06PM -0400, Jeff Mahoney wrote:
> > On 6/2/16 3:08 PM, Mark Fasheh wrote:
> > > On Thu, Jun 02, 2016 at 07:07:32PM +0200, David Sterba wrote:
> > >> On Wed, Jun 01, 2016 at 02:15:22PM -0700, Mark Fasheh wrote:
> >  +/* dynamically allocate and initialize a ref_root */
> >  +static struct ref_root *ref_root_alloc(void)
> >  +{
> >  +  struct ref_root *ref_tree;
> >  +
> >  +  ref_tree = kmalloc(sizeof(*ref_tree), GFP_KERNEL);
> > >>>
> > >>> I'm pretty sure we want GFP_NOFS here.
> > >>
> > >> Then please explain to me why/where the reasoning below is wrong:
> > > 
> > > The general reasoning of when to use GFP_NOFS below is fine, I don't
> > > disagree with that at all. If there is no way a recursion back into btrfs
> > > can't happen at that allocation site then we can use GFP_KERNEL.
> > > 
> > > That said, have you closely audited this path? Does the allocation happen
> > > completely outside any locks that might be shared with the writeout path?
> > > What happens if we have to do writeout of the inode being fiemapped in 
> > > order
> > > to allocate space? If the answer to all my questions is "there is no way
> > > this can deadlock" then by all means, we should use GFP_KERNEL. Otherwise
> > > GFP_NOFS is a sensible guard against possible future deadlocks.
> > 
> > This is exactly the situation we discussed at LSF/MM this year.  The MM
> > folks are pushing back because the fs folks tend to use GFP_NOFS as a
> > talisman.  The audit needs to happen, otherwise that last sentence is
> > another talisman.
> 
> There's nothing here I disagree with. I'm not seeing a strong technical
> justification, which is what I want (being called from an ioctl means
> nothing in this case).

A small amount of searching shows me that extent_fiemap() does
lock_extent_bits() and writepage_delalloc() also calls lock_extent_bits()
(via find_lock_delalloc_range()).

I'm no expert on the extent locking but that seems pretty deadlocky to me.
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs: fix check_shared for fiemap ioctl

On Thu, Jun 02, 2016 at 04:56:06PM -0400, Jeff Mahoney wrote:
> On 6/2/16 3:08 PM, Mark Fasheh wrote:
> > On Thu, Jun 02, 2016 at 07:07:32PM +0200, David Sterba wrote:
> >> On Wed, Jun 01, 2016 at 02:15:22PM -0700, Mark Fasheh wrote:
>  +/* dynamically allocate and initialize a ref_root */
>  +static struct ref_root *ref_root_alloc(void)
>  +{
>  +struct ref_root *ref_tree;
>  +
>  +ref_tree = kmalloc(sizeof(*ref_tree), GFP_KERNEL);
> >>>
> >>> I'm pretty sure we want GFP_NOFS here.
> >>
> >> Then please explain to me why/where the reasoning below is wrong:
> > 
> > The general reasoning of when to use GFP_NOFS below is fine, I don't
> > disagree with that at all. If there is no way a recursion back into btrfs
> > can't happen at that allocation site then we can use GFP_KERNEL.
> > 
> > That said, have you closely audited this path? Does the allocation happen
> > completely outside any locks that might be shared with the writeout path?
> > What happens if we have to do writeout of the inode being fiemapped in order
> > to allocate space? If the answer to all my questions is "there is no way
> > this can deadlock" then by all means, we should use GFP_KERNEL. Otherwise
> > GFP_NOFS is a sensible guard against possible future deadlocks.
> 
> This is exactly the situation we discussed at LSF/MM this year.  The MM
> folks are pushing back because the fs folks tend to use GFP_NOFS as a
> talisman.  The audit needs to happen, otherwise that last sentence is
> another talisman.

There's nothing here I disagree with. I'm not seeing a strong technical
justification, which is what I want (being called from an ioctl means
nothing in this case).
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs ENOSPC "not the usual problem"

2016-06-02 Thread Chris Murphy

On Thu, Jun 2, 2016 at 1:45 PM, Omari Stephens  wrote:
> [Note: not on list; please reply-all]
>
> I've read everything I can find about running out of space on btrfs, and it
> hasn't helped.  I'm currently dead in the water.
>
> Everything I do seems to make the problem monotonically worse — I tried
> adding a loopback device to the fs, and now I can't remove it.  Then I tried
> adding a real device (mSATA) to the fs and now I still can't remove the
> loopback device (which is making everything super slow), and I also can't
> remove the mSATA.  I've removed about 100GB from the filesystem and that
> hasn't done anything either.
>
> Is there anything I can to do even figure out how bad things are, what I
> need to do to make any kind of forward progress?  This is a laptop, so I
> don't want to add an external drive only to find out that I can't remove it
> without corrupting my filesystem.
>
> ### FILESYSTEM STATE
> 19:23:14> [root{slobol}@/home/xsdg]
> #btrfs fi show /home
> Label: none  uuid: 4776be5b-5058-4248-a1b7-7c213757dfbd
> Total devices 3 FS bytes used 221.02GiB
> devid1 size 418.72GiB used 413.72GiB path /dev/sda3
> devid2 size 10.00GiB used 5.00GiB path /dev/loop0
> devid3 size 14.91GiB used 3.00GiB path /dev/sdb1
>
>
> 19:23:33> [root{slobol}@/home/xsdg]
> #btrfs fi usage /home
> Overall:
> Device size: 443.63GiB
> Device allocated: 421.72GiB
> Device unallocated:  21.91GiB
> Device missing: 0.00B
> Used: 221.68GiB
> Free (estimated): 219.24GiB(min: 208.29GiB)
> Data ratio:  1.00
> Metadata ratio:  2.00
> Global reserve: 228.00MiB(used: 36.00KiB)
>
> Data,single: Size:417.69GiB, Used:220.36GiB
>/dev/loop0   5.00GiB
>/dev/sda3 409.69GiB
>/dev/sdb1   3.00GiB
>
> Metadata,single: Size:8.00MiB, Used:0.00B
>/dev/sda3   8.00MiB
>
> Metadata,DUP: Size:2.00GiB, Used:674.45MiB
>/dev/sda3   4.00GiB
>
> System,single: Size:4.00MiB, Used:0.00B
>/dev/sda3   4.00MiB
>
> System,DUP: Size:8.00MiB, Used:56.00KiB
>/dev/sda3  16.00MiB
>
> Unallocated:
>/dev/loop0   5.00GiB
>/dev/sda3   5.00GiB
>/dev/sdb1  11.91GiB
>
>
> ### BALANCE FAILS, EVEN WITH -dusage=0
> 19:23:02> [root{slobol}@/home/xsdg]
> #btrfs balance start -v -dusage=0 .
> Dumping filters: flags 0x1, state 0x0, force is off
>   DATA (flags 0x2): balancing, usage=0
> ERROR: error during balancing '.': No space left on device
> There may be more info in syslog - try dmesg | tail
>
>
> ### CAN'T REMOVE DEVICES -> ENOSPC
> #btrfs device remove /dev/loop0 /home
> ERROR: error removing device '/dev/loop0': No space left on device


Well the big problem here is that it's a loop device so even if it
were a known/fixed bug you're stuck being unable to boot; well, except
you could add a big enough device, convert to raid1, and reboot with
rootflags=degraded.

I'd use the external to make a backup, and start planning to make a
new fs, at least until someone else with a better idea arrives on
scene.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs: fix check_shared for fiemap ioctl

2016-06-02 Thread Jeff Mahoney

On 6/2/16 3:08 PM, Mark Fasheh wrote:
> On Thu, Jun 02, 2016 at 07:07:32PM +0200, David Sterba wrote:
>> On Wed, Jun 01, 2016 at 02:15:22PM -0700, Mark Fasheh wrote:
 +/* dynamically allocate and initialize a ref_root */
 +static struct ref_root *ref_root_alloc(void)
 +{
 +  struct ref_root *ref_tree;
 +
 +  ref_tree = kmalloc(sizeof(*ref_tree), GFP_KERNEL);
>>>
>>> I'm pretty sure we want GFP_NOFS here.
>>
>> Then please explain to me why/where the reasoning below is wrong:
> 
> The general reasoning of when to use GFP_NOFS below is fine, I don't
> disagree with that at all. If there is no way a recursion back into btrfs
> can't happen at that allocation site then we can use GFP_KERNEL.
> 
> That said, have you closely audited this path? Does the allocation happen
> completely outside any locks that might be shared with the writeout path?
> What happens if we have to do writeout of the inode being fiemapped in order
> to allocate space? If the answer to all my questions is "there is no way
> this can deadlock" then by all means, we should use GFP_KERNEL. Otherwise
> GFP_NOFS is a sensible guard against possible future deadlocks.

This is exactly the situation we discussed at LSF/MM this year.  The MM
folks are pushing back because the fs folks tend to use GFP_NOFS as a
talisman.  The audit needs to happen, otherwise that last sentence is
another talisman.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2] btrfs: fix check_shared for fiemap ioctl

On Thu, Jun 02, 2016 at 01:46:27PM +0800, Lu Fengqi wrote:
> 
> At 06/02/2016 05:15 AM, Mark Fasheh wrote:
> >Thanks for trying to fix this problem, comments below.
> >
> >On Wed, Jun 01, 2016 at 01:48:05PM +0800, Lu Fengqi wrote:
> >>Only in the case of different root_id or different object_id, check_shared
> >>identified extent as the shared. However, If a extent was referred by
> >>different offset of same file, it should also be identified as shared.
> >>In addition, check_shared's loop scale is at least  n^3, so if a extent
> >>has too many references,  even causes soft hang up.
> >>
> >>First, add all delayed_ref to the ref_tree and calculate the unqiue_refs,
> >>if the unique_refs is greater than one, return BACKREF_FOUND_SHARED.
> >>Then individually add the  on-disk reference(inline/keyed) to the ref_tree
> >>and calculate the unique_refs of the ref_tree to check if the unique_refs
> >>is greater than one.Because once there are two references to return
> >>SHARED, so the time complexity is close to the constant.
> >Constant time in the best case, but still n^3 in the worst case right? I'm
> >not complaining btw, I just want to be sure we're not over promising  :)
> Only in case of a large number of delayed_ref, the worst case time
> complexity will be n^2*logn. Otherwise, it will be constant even if
> there are many on-disk references.

Ahh ok so it's driven more by the # of delayed refs. That makes sense,
thanks.


> >>@@ -34,6 +35,253 @@ struct extent_inode_elem {
> >>struct extent_inode_elem *next;
> >>  };
> >>
> >>+/*
> >>+ * ref_root is used as the root of the ref tree that hold a collection
> >>+ * of unique references.
> >>+ */
> >>+struct ref_root {
> >>+   struct rb_root rb_root;
> >>+
> >>+   /*
> >>+* the unique_refs represents the number of ref_nodes with a positive
> >>+* count stored in the tree. Even if a ref_node(the count is greater
> >>+* than one) is added, the unique_refs will only increase one.
> >>+*/
> >>+   unsigned int unique_refs;
> >>+};
> >>+
> >>+/* ref_node is used to store a unique reference to the ref tree. */
> >>+struct ref_node {
> >>+   struct rb_node rb_node;
> >>+
> >>+   /* for NORMAL_REF, otherwise all these fields should be set to 0 */
> >>+   u64 root_id;
> >>+   u64 object_id;
> >>+   u64 offset;
> >>+
> >>+   /* for SHARED_REF, otherwise parent field should be set to 0 */
> >>+   u64 parent;
> >>+
> >>+   /* ref to the ref_mod of btrfs_delayed_ref_node(delayed-ref.h) */
> >>+   int ref_mod;
> >>+};
> >Why are we mirroring so much of the backref structures here? It seems like
> >we're just throwing layers on top of layers. Can't we modify the backref
> >structures and code to handle whatever small amount of unique accounting you
> >must do?
> The original structure(struct __prelim_ref) store reference in list,
> and I have to perform many search operations that not suitable for
> list. However, if I modify the original structure, it would require
> a lot of rework. So I just want to fix fiemap with this patch. If
> necessary, we can use this structure to replace the original
> structure later.

Well there's room for an rb_node on that structure so we can solve the 'it
only uses a list' problem trivially. I definitely understand your reluctance
to modify the backref code, but to me that just sounds like we need someone
who is familiar with that code to review your work and provide advice when
needed.

Otherwise, I believe my point holds. If there's some technical reason why
this is a bad idea, that's a different story. So far though this just seems
like a situation where we need some extra review from the primary
developers. I cc'd Josef in the hopes he could shed some light for us.


> >>+/* dynamically allocate and initialize a ref_root */
> >>+static struct ref_root *ref_root_alloc(void)
> >>+{
> >>+   struct ref_root *ref_tree;
> >>+
> >>+   ref_tree = kmalloc(sizeof(*ref_tree), GFP_KERNEL);
> >I'm pretty sure we want GFP_NOFS here.
> Yes, perhaps you're right.
> >Because there's no need to narrow the allocation constraints. GFP_NOFS
> >is necessary when the caller is on a critical path that must not recurse
> >back to the filesystem through the allocation (ie. if the allocator
> >decides to free some memory and tries tro write dirty data). FIEMAP is
> >called from an ioctl.
> But David seems to have a different point of view with you, so I
> would like to ask for his advice again.

Sounds good, hopefully David and I can figure it out  :)

Thanks again Lu,
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs ENOSPC "not the usual problem"

2016-06-02 Thread Omari Stephens


[Note: not on list; please reply-all]

I've read everything I can find about running out of space on btrfs, and 
it hasn't helped.  I'm currently dead in the water.


Everything I do seems to make the problem monotonically worse — I tried 
adding a loopback device to the fs, and now I can't remove it.  Then I 
tried adding a real device (mSATA) to the fs and now I still can't 
remove the loopback device (which is making everything super slow), and 
I also can't remove the mSATA.  I've removed about 100GB from the 
filesystem and that hasn't done anything either.


Is there anything I can to do even figure out how bad things are, what I 
need to do to make any kind of forward progress?  This is a laptop, so I 
don't want to add an external drive only to find out that I can't remove 
it without corrupting my filesystem.


### FILESYSTEM STATE
19:23:14> [root{slobol}@/home/xsdg]
#btrfs fi show /home
Label: none  uuid: 4776be5b-5058-4248-a1b7-7c213757dfbd
Total devices 3 FS bytes used 221.02GiB
devid1 size 418.72GiB used 413.72GiB path /dev/sda3
devid2 size 10.00GiB used 5.00GiB path /dev/loop0
devid3 size 14.91GiB used 3.00GiB path /dev/sdb1


19:23:33> [root{slobol}@/home/xsdg]
#btrfs fi usage /home
Overall:
Device size: 443.63GiB
Device allocated: 421.72GiB
Device unallocated:  21.91GiB
Device missing: 0.00B
Used: 221.68GiB
Free (estimated): 219.24GiB(min: 208.29GiB)
Data ratio:  1.00
Metadata ratio:  2.00
Global reserve: 228.00MiB(used: 36.00KiB)

Data,single: Size:417.69GiB, Used:220.36GiB
   /dev/loop0   5.00GiB
   /dev/sda3 409.69GiB
   /dev/sdb1   3.00GiB

Metadata,single: Size:8.00MiB, Used:0.00B
   /dev/sda3   8.00MiB

Metadata,DUP: Size:2.00GiB, Used:674.45MiB
   /dev/sda3   4.00GiB

System,single: Size:4.00MiB, Used:0.00B
   /dev/sda3   4.00MiB

System,DUP: Size:8.00MiB, Used:56.00KiB
   /dev/sda3  16.00MiB

Unallocated:
   /dev/loop0   5.00GiB
   /dev/sda3   5.00GiB
   /dev/sdb1  11.91GiB


### BALANCE FAILS, EVEN WITH -dusage=0
19:23:02> [root{slobol}@/home/xsdg]
#btrfs balance start -v -dusage=0 .
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=0
ERROR: error during balancing '.': No space left on device
There may be more info in syslog - try dmesg | tail


### CAN'T REMOVE DEVICES -> ENOSPC
#btrfs device remove /dev/loop0 /home
ERROR: error removing device '/dev/loop0': No space left on device

--xsdg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs: fix check_shared for fiemap ioctl

On Thu, Jun 02, 2016 at 07:07:32PM +0200, David Sterba wrote:
> On Wed, Jun 01, 2016 at 02:15:22PM -0700, Mark Fasheh wrote:
> > > +/* dynamically allocate and initialize a ref_root */
> > > +static struct ref_root *ref_root_alloc(void)
> > > +{
> > > + struct ref_root *ref_tree;
> > > +
> > > + ref_tree = kmalloc(sizeof(*ref_tree), GFP_KERNEL);
> > 
> > I'm pretty sure we want GFP_NOFS here.
> 
> Then please explain to me why/where the reasoning below is wrong:

The general reasoning of when to use GFP_NOFS below is fine, I don't
disagree with that at all. If there is no way a recursion back into btrfs
can't happen at that allocation site then we can use GFP_KERNEL.

That said, have you closely audited this path? Does the allocation happen
completely outside any locks that might be shared with the writeout path?
What happens if we have to do writeout of the inode being fiemapped in order
to allocate space? If the answer to all my questions is "there is no way
this can deadlock" then by all means, we should use GFP_KERNEL. Otherwise
GFP_NOFS is a sensible guard against possible future deadlocks.
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] btrfs-progs: convert: Fix a bug which fails to insert hole file extent

On Thu, Jun 02, 2016 at 03:22:49PM +0800, Qu Wenruo wrote:
> When copying inode, if there is a file referring part of a hole range,
> convert will fail.
> 
> The problem is, when calculating real extent bytenr, it doesn't check if
> the original extent is a hole.
> 
> In case the orinal extent is a hole, we still calculate bytenr using
> file_pos - found_extent_file_pos, causing non-zero value, and later
> btrfs_record_file_extent() detects that we are pointing to non-exist
> extent and aborts convert.
> 
> Fix it by checking the disk_bytenr before calculating real disk bytenr.
> 
> Signed-off-by: Qu Wenruo 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] btrfs-progs: convert-test: Add specially built test cases for new convert

On Thu, Jun 02, 2016 at 03:22:50PM +0800, Qu Wenruo wrote:
> New convert introduced simpler chunk/extent allocation algorithm, at the
> cost of complex backup superblock migration codes.
> 
> Use specially built ext2 images to test if btrfs-convert can convert and
> rollback images without problem.
> 
> All these special ext2 image have blocks across 2nd btrfs backup super
> block.
> 
> The naming is like the following:
>   |<--superblock migration range->|
>   64M 64M + 64K
> |-Data--|-Data--|/Hole//|-Data--|/Hole//|-Data--|--Data--| = drdhdhdrd
> 
> These test cases should check all typical layouts and make sure new
> convert works.
> 
> Signed-off-by: Qu Wenruo 
> ---
>  tests/common   |   8 
>  tests/convert-tests.sh |  30 ++

Please move the testing framework changes to another patch. This is
otherwise a good change, as we'll collect images for convert the same
way we do for checker.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs: fix check_shared for fiemap ioctl

On Wed, Jun 01, 2016 at 02:15:22PM -0700, Mark Fasheh wrote:
> > +/* dynamically allocate and initialize a ref_root */
> > +static struct ref_root *ref_root_alloc(void)
> > +{
> > +   struct ref_root *ref_tree;
> > +
> > +   ref_tree = kmalloc(sizeof(*ref_tree), GFP_KERNEL);
> 
> I'm pretty sure we want GFP_NOFS here.

Then please explain to me why/where the reasoning below is wrong:

> > Because there's no need to narrow the allocation constraints. GFP_NOFS
> > is necessary when the caller is on a critical path that must not recurse
> > back to the filesystem through the allocation (ie. if the allocator
> > decides to free some memory and tries tro write dirty data). FIEMAP is
> > called from an ioctl.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] receive not seeing file that exists

2016-06-02 Thread Benedikt Morbach

On Thu, Jun 2, 2016 at 6:35 PM, Chris Murphy  wrote:
> kernel and btrfs-profs versions?

kernel:
send:4.5.5
receive: 4.5.4
btrfs-progs (both): 4.5.3

--
Benedikt
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 05/22] btrfs-progs: Introduce function to setup temporary superblock

On Sun, May 29, 2016 at 06:52:41PM +0800, Qu Wenruo wrote:
> >> +  btrfs_set_super_leafsize(super, cfg->nodesize);
> >> +  btrfs_set_super_nodesize(super, cfg->nodesize);
> >> +  btrfs_set_super_stripesize(super, cfg->stripesize);
> >> +  btrfs_set_super_csum_type(super, BTRFS_CSUM_TYPE_CRC32);
> >> +  btrfs_set_super_chunk_root(super, chunk_bytenr);
> >> +  btrfs_set_super_cache_generation(super, -1);
> >> +  btrfs_set_super_incompat_flags(super, cfg->features);
> >> +  if (cfg->label)
> >> +  strncpy(super->label, cfg->label, BTRFS_LABEL_SIZE - 1);
> >
> > Why not use __strncpy_null?
> 
> Good idea, I'll add new patch to use it.

I've updated it in the patch, there was one more that used strncpy and
coverity reported it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] receive not seeing file that exists

2016-06-02 Thread Chris Murphy

On Thu, Jun 2, 2016 at 1:26 AM, Benedikt Morbach
 wrote:
>
> Let me know if you need anything else or if I misunderstood the tree
> thing. (I _think_ I can also provide the with-data send, but I'd like
> to take a look at that first ;) )

kernel and btrfs-profs versions?

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to map extents to files

2016-06-02 Thread Nikolaus Rath

On Jun 02 2016, Qu Wenruo  wrote:
> At 06/02/2016 11:06 AM, Nikolaus Rath wrote:
>> Hello,
>>
>> For one of my btrfs volumes, btrfsck reports a lot of the following
>> warnings:
>>
>> [...]
>> checking extents
>> bad extent [138477568, 138510336), type mismatch with chunk
>> bad extent [140091392, 140148736), type mismatch with chunk
>> bad extent [140148736, 140201984), type mismatch with chunk
>> bad extent [140836864, 140865536), type mismatch with chunk
>> [...]
>>
>> Is there a way to discover which files are affected by this (in
>> particular so that I can take a look at them before and after a btrfsck
>> --repair)?
>
> Which version is the progs? If the fs is not converted from ext2/3/4,
> it may be a false alert.

Version is 4.4.1. The fs may very well have been converted from ext4,
but I can't tell for sure.


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: "No space left on device" and balance doesn't work

2016-06-02 Thread MegaBrutal

2016-06-02 0:22 GMT+02:00 Henk Slager :
> What is the kernel version used?
> Is the fs on a mechanical disk or SSD?
> What are the mount options?
> How old is the fs?

Linux 4.4.0-22-generic (Ubuntu 16.04).
Mechanical disks in LVM.
Mount: /dev/mapper/centrevg-rootlv on / type btrfs
(rw,relatime,space_cache,subvolid=257,subvol=/@)
I don't know how to retrieve the exact FS age, but it was created in
2014 August.

Snapshots (their names encode their creation dates):

ID 908 gen 487349 top level 5 path @-snapshot-2016050301
ID 909 gen 488849 top level 5 path @-snapshot-2016050401
ID 910 gen 490313 top level 5 path @-snapshot-2016050501
ID 911 gen 491763 top level 5 path @-snapshot-2016050601
ID 912 gen 493399 top level 5 path @-snapshot-2016050702
ID 913 gen 494996 top level 5 path @-snapshot-2016050802
ID 914 gen 496495 top level 5 path @-snapshot-2016050902
ID 915 gen 498094 top level 5 path @-snapshot-2016051005
ID 916 gen 499688 top level 5 path @-snapshot-2016051102
ID 917 gen 501308 top level 5 path @-snapshot-2016051201
ID 918 gen 503375 top level 5 path @-snapshot-2016051402
ID 919 gen 504356 top level 5 path @-snapshot-2016051501
ID 920 gen 505890 top level 5 path @-snapshot-2016051601
ID 921 gen 506901 top level 5 path @-snapshot-2016051701
ID 922 gen 507313 top level 5 path @-snapshot-2016051802
ID 923 gen 507712 top level 5 path @-snapshot-2016051901
ID 924 gen 508057 top level 5 path @-snapshot-2016052001
ID 925 gen 508882 top level 5 path @-snapshot-2016052101
ID 926 gen 509241 top level 5 path @-snapshot-2016052201
ID 927 gen 509618 top level 5 path @-snapshot-2016052301
ID 928 gen 510277 top level 5 path @-snapshot-2016052402
ID 929 gen 511357 top level 5 path @-snapshot-2016052502
ID 930 gen 512125 top level 5 path @-snapshot-2016052602
ID 931 gen 513292 top level 5 path @-snapshot-2016052701
ID 932 gen 515766 top level 5 path @-snapshot-2016052802
ID 933 gen 517349 top level 5 path @-snapshot-2016052904
ID 934 gen 519004 top level 5 path @-snapshot-2016053002
ID 935 gen 519500 top level 5 path @-snapshot-2016053102
ID 936 gen 519847 top level 5 path @-snapshot-2016060101
ID 937 gen 521829 top level 5 path @-snapshot-2016060201

Removing old snapshots is the most feasible solution, but I can also
increase the FS size. It's easy since it's in LVM, and there is plenty
of space in the volume group.

Probably I should rewrite my alert script to check btrfs fi show
instead of plain df.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: "No space left on device" and balance doesn't work

2016-06-02 Thread Austin S. Hemmelgarn


On 2016-06-01 14:30, MegaBrutal wrote:

Hi all,

I have a 20 GB file system and df says I have about 2,6 GB free space,
yet I can't do anything on the file system because I get "No space
left on device" errors. I read that balance may help to remedy the
situation, but it actually doesn't.


Some data about the FS:


root@ReThinkCentre:~# df -h /
FájlrendszerMéret Fogl. Szab. Fo.% Csatol. pont
/dev/mapper/centrevg-rootlv   20G   18G  2,6G  88% /

root@ReThinkCentre:~# btrfs fi show /
Label: 'RootFS'  uuid: 3f002b8d-8a1f-41df-ad05-e3c91d7603fb
Total devices 1 FS bytes used 15.42GiB
devid1 size 20.00GiB used 20.00GiB path /dev/mapper/centrevg-rootlv

root@ReThinkCentre:~# btrfs fi df /
Data, single: total=16.69GiB, used=14.14GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=1.62GiB, used=1.28GiB
GlobalReserve, single: total=352.00MiB, used=0.00B

root@ReThinkCentre:~# btrfs version
btrfs-progs v4.4


This happens when I try to balance:

root@ReThinkCentre:~# btrfs fi balance start -dusage=66 /
Done, had to relocate 0 out of 33 chunks
root@ReThinkCentre:~# btrfs fi balance start -dusage=67 /
ERROR: error during balancing '/': No space left on device
There may be more info in syslog - try dmesg | tail


"dmesg | tail" does not show anything related to this.

It is important to note that the file system currently has 32
snapshots of / at the moment, and snapshots taking up all the free
space is a plausible explanation. Maybe deleting some of the oldest
snapshots or just increasing the file system would help the situation.
However, I'm still interested, if the file system is full, why does df
show there is free space, and how could I show the situation without
having the mentioned options? I actually have an alert set up which
triggers when the FS usage reaches 90%, so then I know I have to
delete some old snapshots. It worked so far, I cleaned the snapshots
at 90%, FS usage fell back, everyone was happy. But now the alert
didn't even trigger because the FS is at 88% usage, so it shouldn't be
full yet.
The first thing that needs to be understood is that df has been pretty 
much unchanged since it was introduced in the 70's (IIRC, it was in at 
least SVR4, possibly earlier UNIX versions too).  Back then, it was 
pretty easy to say what percentage of space was used and how much is 
left.  Back then, a filesystem only allocated one set of blocks for a 
file, and it didn't need extra space for updates, and the file took up 
exactly as much space as it's size on disk (usually, it can get kind of 
complicated based on a number of factors).  In addition, traditional UFS 
had a fixed size metadata area for the inodes, which simplified 
computations even more.


In BTRFS though, almost all of these assumptions which the original 
interface made aren't guaranteed.


Now, the biggest difference though is in how BTRFS allocates space. 
BTRFS uses a two tier allocation system.  First, you have high-level 
allocations of what are usually referred to as chunks, and then it 
allocates blocks within those chunks.  The balance operation operates at 
the chunk level, whereas things like defragmentation operate at the 
block level.  For performance reasons, BTRFS usually has separate chunks 
for metadata and data.  Data chunks are usually 1GB, and metadata chunks 
are usually 256MB, although both can vary in size based on the size of 
the filesystem.  Figuring out the exact size gets tricky on a live 
filesystem, but if your filesystem is between 16G and 64G, you're pretty 
much guaranteed to have chunks which are the default size.


Now, because of the segregation of data and metadata, and how chunk 
allocation works, it's possible to end up in a situation where you 
technically have free space, but you can't actually do anything with it. 
 This is because most file operations on BTRFS require at least a few 
blocks of metadata space so that the COW updates can happen.  You 
luckily don't appear to be quite to that point.


For compatibility reasons, we have to report _something_ through df.  We 
can't however report many of the situational things about the state of 
the FS itself (for example, if you have all the possible chunks 
allocated, no space in data chunks, but free space in metadata chunks, 
it's possible to create a lot of very small files, but creating a big 
one will fail).  As a result of this, what we report through df is 
technically absolutely correct (in your case, you _do_ technically have 
2.6G of free space), but is also absolutely useless for any kind of 
management decision.


In your particular situation, what's happened is that you have all the 
space allocated to chunks, but have free space within those chunks. 
Balance never puts data in existing chunks, and you can't allocate any 
new chunks, so you can't run a balance.  However, because of that free 
space in the chunks, you can still use the filesystem itself for 
'regular' filesystem operations.

Re: [PATCH] btrfs: don't force mounts to wait for cleaner_kthread to delete one or more subvolumes

2016-06-02 Thread Filipe Manana

On Wed, Jun 1, 2016 at 5:39 AM, Zygo Blaxell
 wrote:
> On Thu, May 05, 2016 at 12:23:49AM -0400, Zygo Blaxell wrote:
>> During a mount, we start the cleaner kthread first because the transaction
>> kthread wants to wake up the cleaner kthread.  We start the transaction
>> kthread next because everything in btrfs wants transactions.  We do reloc
>> recovery in the thread that was doing the original mount call once the
>> transaction kthread is running.  This means that the cleaner kthread
>> could already be running when reloc recovery happens (e.g. if a snapshot
>> delete was started before a crash).
>>
>> Relocation does not play well with the cleaner kthread, so a mutex was
>> added in commit 5f3164813b90f7dbcb5c3ab9006906222ce471b7 "Btrfs: fix
>> race between balance recovery and root deletion" to prevent both from
>> being active at the same time.
>>
>> If the cleaner kthread is already holding the mutex by the time we get
>> to btrfs_recover_relocation, the mount will be blocked until at least
>> one deleted subvolume is cleaned (possibly more if the mount process
>> doesn't get the lock right away).  During this time (which could be an
>> arbitrarily long time on a large/slow filesystem), the mount process is
>> stuck and the filesystem is unnecessarily inaccessible.
>>
>> Fix this by locking cleaner_mutex before we start cleaner_kthread, and
>> unlocking the mutex after mount no longer requires it.  This ensures
>> that the mounting process will not be blocked by the cleaner kthread.
>> The cleaner kthread is already prepared for mutex contention and will
>> just go to sleep until the mutex is available.
>> ---
>>  fs/btrfs/disk-io.c | 18 +++---
>>  1 file changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index d8d68af..7c8f435 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -2509,6 +2509,7 @@ int open_ctree(struct super_block *sb,
>>   int num_backups_tried = 0;
>>   int backup_index = 0;
>>   int max_active;
>> + bool cleaner_mutex_locked = false;
>>
>>   tree_root = fs_info->tree_root = btrfs_alloc_root(fs_info);
>>   chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info);
>> @@ -2988,6 +2989,13 @@ retry_root_backup:
>>   goto fail_sysfs;
>>   }
>>
>> + /*
>> +  * Hold the cleaner_mutex thread here so that we don't block
>> +  * for a long time on btrfs_recover_relocation.  cleaner_kthread
>> +  * will wait for us to finish mounting the filesystem.
>> +  */
>> + mutex_lock(_info->cleaner_mutex);
>> + cleaner_mutex_locked = true;
>>   fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
>>  "btrfs-cleaner");
>>   if (IS_ERR(fs_info->cleaner_kthread))
>
> Unfortunately, if we have a log to replay, we get to code like this
> in open_ctree:
>
> /* do not make disk changes in broken FS */
> if (btrfs_super_log_root(disk_super) != 0) {
> ret = btrfs_replay_log(fs_info, fs_devices);
> if (ret) {
> err = ret;
> goto fail_qgroup;
> }
> }
>
> and:
>
> static int btrfs_replay_log(struct btrfs_fs_info *fs_info,
> struct btrfs_fs_devices *fs_devices)
> {
> [...]
> if (fs_info->sb->s_flags & MS_RDONLY) {
> ret = btrfs_commit_super(tree_root);
> if (ret)
> return ret;
> }
>
> and finally:
>
> int btrfs_commit_super(struct btrfs_root *root)
> {
> struct btrfs_trans_handle *trans;
>
> mutex_lock(>fs_info->cleaner_mutex);
> btrfs_run_delayed_iputs(root);
> mutex_unlock(>fs_info->cleaner_mutex);
> wake_up_process(root->fs_info->cleaner_kthread);
>
> Well, dammit.  Since we have already locked cleaner_mutex, it promptly
> recursive-deadlocks on itself--but only if the filesystem was not cleanly
> umounted, and the problem disappears if you reboot and try to mount again
> because there won't be a log to replay the second time.
>
> Could we just add a bool to fs_info that says to cleaner_kthread "don't
> do anything yet, we're not finished mounting"?  That way it doesn't break
> if some new place to lock cleaner_mutex pops up (they do seem to move
> around from one kernel version to the next).
>
> I think we can do btrfs_run_delayed_iputs and just skip the
> wake_up_process call here?  Or neuter it by having cleaner_kthread do
> nothing while we are still somewhere in the middle of open_ctree.

You can try something as simple as (untested):

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6628fca..a96a71a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3827,9

Re: Device replace issues and disabling it until they are solved

2016-06-02 Thread Filipe Manana

On Thu, Jun 2, 2016 at 11:03 AM, Yauhen Kharuzhy
 wrote:
> On Fri, May 27, 2016 at 10:43:47AM +0100, Filipe Manana wrote:
>> > Hi Filipe,
>>
>> Hi Scott,
>>
>> >
>> > Does your recent patch set (from May 20) address all of these issues?
>>
>> Yes.
>
> Tested, RAID5/6 still produces a plenty of 'failed to rebuild valid
> logical NN" messages after two consecutive device replaces. So,
> replace is still not usable for RAID5/6. And it is very slow in
> comparison with 'device add && balance device remove missing' sequence
> (4x slower).

Right. There's missing code for raid5/6 I believe. I didn't care about
that, nor will in the near future at least.
The set of problems I tried to solve were generic and unrelated to any
specific raid mode.

>
> --
> Yauhen Kharuzhy



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: raid5/6 production use status?

2016-06-02 Thread Gerald Hopf

Hey.

I've lost a bit track recently and the wiki changelog doesn't seem to
contain much about how things went on at the RAID5/6 front... so how're
things going?

Is it already more or less "productively" usable? What's still missing?

Well, you still can't even check for free space.

You can, but not with that tool.

https://btrfs.wiki.kernel.org/index.php/FAQ#Understanding_free_space.2C_using_the_original_tools

Hugo.
That tool however according to the wiki is the "new tool" which you are
supposed to use! The other options are not that good...

btrfs fi usage
==> 3x WARNING: RAID56 detected, not implemented
btrfs fi df
==> only shows what part of "allocated" space is in use, not useful
information if you want to know if you have free space

btrfs fi show
==> does not show total free space. I guess you can use the information
in btrfs fi show and then subtract used from total and then multiply
that? But by what? n disks? Or by n-1 disks because of parity?
==> multiplying it by all disks (including parity) seems to arrive at a
similar free space as df -h shows me. But is it correct? Or should it be
4/5 of this because I have one parity disk and 4 data disks?

I do however stand corrected: You actually can (barely) check for free
space. And you can get a number that might or might not be the free space.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Device replace issues and disabling it until they are solved

2016-06-02 Thread Yauhen Kharuzhy

On Fri, May 27, 2016 at 10:43:47AM +0100, Filipe Manana wrote:
> > Hi Filipe,
> 
> Hi Scott,
> 
> >
> > Does your recent patch set (from May 20) address all of these issues?
> 
> Yes.

Tested, RAID5/6 still produces a plenty of 'failed to rebuild valid
logical NN" messages after two consecutive device replaces. So,
replace is still not usable for RAID5/6. And it is very slow in
comparison with 'device add && balance device remove missing' sequence
(4x slower).

-- 
Yauhen Kharuzhy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: raid5/6 production use status?

2016-06-02 Thread Hugo Mills

On Thu, Jun 02, 2016 at 11:24:45AM +0200, Gerald Hopf wrote:
> 
> >Hey.
> >
> >I've lost a bit track recently and the wiki changelog doesn't seem to
> >contain much about how things went on at the RAID5/6 front... so how're
> >things going?
> >
> >Is it already more or less "productively" usable? What's still missing?
> Well, you still can't even check for free space.

   You can, but not with that tool.

https://btrfs.wiki.kernel.org/index.php/FAQ#Understanding_free_space.2C_using_the_original_tools

   Hugo.

> ~ # btrfs fi usage /mnt/data-raid
> WARNING: RAID56 detected, not implemented
> WARNING: RAID56 detected, not implemented
> WARNING: RAID56 detected, not implemented
> Overall:
> Device size:  18.19TiB
> Device allocated:0.00B
> Device unallocated:   18.19TiB
> Device missing:  0.00B
> Used:0.00B
> Free (estimated):0.00B  (min: 8.00EiB)
> 
> btrfs --version ==> btrfs-progs v4.5.3-70-gc1c27b9
> kernel ==> 4.6.0
> 
> 

-- 
Hugo Mills | UNIX: Spanish manufacturer of fire extinguishers
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature

Re: raid5/6 production use status?

2016-06-02 Thread Gerald Hopf




Hey.

I've lost a bit track recently and the wiki changelog doesn't seem to
contain much about how things went on at the RAID5/6 front... so how're
things going?

Is it already more or less "productively" usable? What's still missing?

Well, you still can't even check for free space.

~ # btrfs fi usage /mnt/data-raid
WARNING: RAID56 detected, not implemented
WARNING: RAID56 detected, not implemented
WARNING: RAID56 detected, not implemented
Overall:
Device size:  18.19TiB
Device allocated:0.00B
Device unallocated:   18.19TiB
Device missing:  0.00B
Used:0.00B
Free (estimated):0.00B  (min: 8.00EiB)

btrfs --version ==> btrfs-progs v4.5.3-70-gc1c27b9
kernel ==> 4.6.0


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/5] btrfs-progs: btrfs-crc: make argc check more strict


Signed-off-by: Satoru Takeuchi 
---
 btrfs-crc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/btrfs-crc.c b/btrfs-crc.c
index c2b5f00..d433ff3 100644
--- a/btrfs-crc.c
+++ b/btrfs-crc.c
@@ -69,12 +69,14 @@ int main(int argc, char **argv)
str = argv[optind];

if (!loop) {
-   if (check_argc_min(argc - optind, 1))
+   if (check_argc_exact(argc - optind, 1))
print_usage(255);

printf("%12u - %s\n", crc32c(~1, str, strlen(str)), str);
return 0;
}
+   if (check_argc_exact(argc - optind, 0))
+   print_usage(255);

buf = malloc(length);
if (!buf)
--
2.5.5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/5] btrfs-progs: btrfs-crc: improve usage message


- If -c is set, filename argument is ignored.
- Describe about -h option

Signed-off-by: Satoru Takeuchi 
---
 btrfs-crc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/btrfs-crc.c b/btrfs-crc.c
index 55a4c61..c2b5f00 100644
--- a/btrfs-crc.c
+++ b/btrfs-crc.c
@@ -26,10 +26,12 @@ void print_usage(int status)
 {
printf("usage: btrfs-crc filename\n");
printf("print out the btrfs crc for \"filename\"\n");
-   printf("usage: btrfs-crc filename -c crc [-s seed] [-l length]\n");
+   printf("usage: btrfs-crc -c crc [-s seed] [-l length]\n");
printf("brute force search for file names with the given crc\n");
printf("  -s seedthe random seed (default: random)\n");
printf("  -l length  the length of the file names (default: 10)\n");
+   printf("usage: btrfs-crc -h\n");
+   printf("print this message\n");
exit(status);
 }

--
2.5.5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] btrfs-progs: btrfs-crc: print usage on receiving invalid arguments


Usage is only printed if -h option is set. However it's nice to
do it when wrong option is set or the number of argument is wrong.

Signed-off-by: Satoru Takeuchi 
---
 btrfs-crc.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/btrfs-crc.c b/btrfs-crc.c
index 86dfe05..55a4c61 100644
--- a/btrfs-crc.c
+++ b/btrfs-crc.c
@@ -22,7 +22,7 @@
 #include "crc32c.h"
 #include "utils.h"

-void print_usage(void)
+void print_usage(int status)
 {
printf("usage: btrfs-crc filename\n");
printf("print out the btrfs crc for \"filename\"\n");
@@ -30,7 +30,7 @@ void print_usage(void)
printf("brute force search for file names with the given crc\n");
printf("  -s seedthe random seed (default: random)\n");
printf("  -l length  the length of the file names (default: 10)\n");
-   exit(1);
+   exit(status);
 }

 int main(int argc, char **argv)
@@ -57,9 +57,9 @@ int main(int argc, char **argv)
seed = atol(optarg);
break;
case 'h':
-   print_usage();
+   print_usage(1);
case '?':
-   return 255;
+   print_usage(255);
}
}

@@ -68,7 +68,7 @@ int main(int argc, char **argv)

if (!loop) {
if (check_argc_min(argc - optind, 1))
-   return 255;
+   print_usage(255);

printf("%12u - %s\n", crc32c(~1, str, strlen(str)), str);
return 0;
--
2.5.5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/5] btrfs-progs: btrfs-crc should be ignored by git


It's a binary built from btrfs-crc.c

Signed-off-by: Satoru Takeuchi 
---
 .gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.gitignore b/.gitignore
index a27cb0d..aaf9702 100644
--- a/.gitignore
+++ b/.gitignore
@@ -33,6 +33,7 @@ btrfs-zero-log
 btrfs-corrupt-block
 btrfs-select-super
 btrfs-calc-size
+btrfs-crc
 btrfstune
 libbtrfs.a
 libbtrfs.so
--
2.5.5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5] btrfs-progs: btrfs-crc: fix build error


Remove the following build error.

  
  $ make btrfs-crc
  [CC] btrfs-crc.o
  [LD] btrfs-crc
  btrfs-crc.o: In function `usage':
  /home/sat/src/btrfs-progs/btrfs-crc.c:26: multiple definition of `usage'
  help.o:/home/sat/src/btrfs-progs/help.c:125: first defined here
  collect2: error: ld returned 1 exit status
  Makefile:294: recipe for target 'btrfs-crc' failed
  make: *** [btrfs-crc] Error 1
  =

Signed-off-by: Satoru Takeuchi 
---
 btrfs-crc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/btrfs-crc.c b/btrfs-crc.c
index 723e0b7..86dfe05 100644
--- a/btrfs-crc.c
+++ b/btrfs-crc.c
@@ -22,7 +22,7 @@
 #include "crc32c.h"
 #include "utils.h"

-void usage(void)
+void print_usage(void)
 {
printf("usage: btrfs-crc filename\n");
printf("print out the btrfs crc for \"filename\"\n");
@@ -57,7 +57,7 @@ int main(int argc, char **argv)
seed = atol(optarg);
break;
case 'h':
-   usage();
+   print_usage();
case '?':
return 255;
}
--
2.5.5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[BUG] receive not seeing file that exists

2016-06-02 Thread Benedikt Morbach

Hi all,

I've encountered a bug in btrfs-receive. When receiving a certain
incremental send, it will error with:

ERROR: cannot open
backup/detritus/root/root.20160524T1800/var/log/journal/9cbb44cf160f4c1089f77e32ed376a0b/user-1000.journal:
No such file or directory

even though that path exists and the parent subvolume is identical on
both ends (I checked manually).

I've noticed this happen before on the same directory (and google
confirms it has also happened to others) and /var/log/journal/ and its
children are the only directories with 'chattr +C' on this system, so
it might be related to that?

This was reported on IRC a week or so ago and Josef requested a tree
--inode of the file/the dirs leading to it and the incremental send,
so here you go:


send side:
/mnt
[256]  btrfs_pool_ssd

/mnt/btrfs_pool_ssd
[256]  backup

/mnt/btrfs_pool_ssd/backup
[256]  root

/mnt/btrfs_pool_ssd/backup/root
[256]  root.20160524T1800
[256]  root.20160524T1900

/mnt/btrfs_pool_ssd/backup/root/root.20160524T1800
[268]  var

/mnt/btrfs_pool_ssd/backup/root/root.20160524T1800/var
[   9035]  log

/mnt/btrfs_pool_ssd/backup/root/root.20160524T1800/var/log
[35122105]  journal

/mnt/btrfs_pool_ssd/backup/root/root.20160524T1800/var/log/journal
[35122136]  9cbb44cf160f4c1089f77e32ed376a0b


/mnt/btrfs_pool_ssd/backup/root/root.20160524T1800/var/log/journal/9cbb44cf160f4c1089f77e32ed376a0b
[53198460]  user-1000.journal


receive side:
/backup
[256]  detritus

/backup/detritus
[256]  root

/backup/detritus/root
[256]  root.20160524T1800

/backup/detritus/root/root.20160524T1800
[267]  var

/backup/detritus/root/root.20160524T1800/var
[856]  log

/backup/detritus/root/root.20160524T1800/var/log
[ 316157]  journal

/backup/detritus/root/root.20160524T1800/var/log/journal
[ 316158]  9cbb44cf160f4c1089f77e32ed376a0b


/backup/detritus/root/root.20160524T1800/var/log/journal/9cbb44cf160f4c1089f77e32ed376a0b
[ 738979]  user-1000.journal

both trimmed down to only the relevant path.

I don't know how the ML handles attachments, so incremental send
stream (with --no-data) is here:
http://dev.exherbo.org/~moben/send-receive_incremental.stream

Let me know if you need anything else or if I misunderstood the tree
thing. (I _think_ I can also provide the with-data send, but I'd like
to take a look at that first ;) )


Cheers
Benedikt
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] btrfs-progs: convert: Fix a bug which fails to insert hole file extent

When copying inode, if there is a file referring part of a hole range,
convert will fail.

The problem is, when calculating real extent bytenr, it doesn't check if
the original extent is a hole.

In case the orinal extent is a hole, we still calculate bytenr using
file_pos - found_extent_file_pos, causing non-zero value, and later
btrfs_record_file_extent() detects that we are pointing to non-exist
extent and aborts convert.

Fix it by checking the disk_bytenr before calculating real disk bytenr.

Signed-off-by: Qu Wenruo 
---
These commits (especially for next testcase commits, which includes 4
80+K e2image raw dumps) can also be fetched from my github:
https://github.com/adam900710/btrfs-progs.git convert_fixes
---
 btrfs-convert.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/btrfs-convert.c b/btrfs-convert.c
index 5f6b44c..43b8b08 100644
--- a/btrfs-convert.c
+++ b/btrfs-convert.c
@@ -572,7 +572,11 @@ static int record_file_blocks(struct blk_iterate_data 
*data,
BUG_ON(cur_off - key.offset >= extent_num_bytes);
btrfs_release_path(path);
 
-   real_disk_bytenr = cur_off - key.offset + extent_disk_bytenr;
+   if (extent_disk_bytenr)
+   real_disk_bytenr = cur_off - key.offset +
+  extent_disk_bytenr;
+   else
+   real_disk_bytenr = 0;
cur_len = min(key.offset + extent_num_bytes,
  old_disk_bytenr + num_bytes) - cur_off;
ret = btrfs_record_file_extent(data->trans, data->root,
-- 
2.8.3



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] btrfs-progs: convert-test: Add specially built test cases for new convert