Re: Btrfs broken in massive transfar

2018-03-12 Thread Qu Wenruo


On 2018年03月13日 12:19, MASAKI haruka wrote:
> *Now* I tried with Linux 4.14 and 4.15.
> I experienced same probrem and reported in 2014 with Linux 3.9 and 3.10. 
> (Perchance, actually the kernel was newer than 3.10, anyway I experienced 
> same probrem with old 3.x kernel.)

Then kernel message please.

Especially for the readonly case.

And "btrfs check" output please.

Thanks,
Qu


> 
>>
>>
>> On 2018年03月13日 05:57, MASAKI haruka wrote:
>>> I'm trying to clone 18TiB data between btrfs,
>>> but it will crash anyway.
>>>
>>> This probrem is occured even how to clone (btrfs send/receive, rsync or cp.)
>>> I experienced same probrem in Linux 3.9 and Linux 3.10.
>>
>> Did you really mean *3*.9 and *3*.10?
>>
>> That's too old for btrfs usage IIRC.
>>
>> It would be *4*.9 or *4*.10 for a relative new kernel for btrfs.
>>
>> Would you please try some latest mainline kernel again?
>>
>>>
>>> What happen:
>>>
>>> 1. Failed to write because I/O error (read only filesystem)
>>> 2. writing to the btrfs succeeds and fails randomly.
>>> 3. The btrfs unable to unmount (resource is busy.) Unable to umount even 
>>> forcely, so cannot halt.
>>>
>>> Example:
>>> ---
>>> mkfile o7784-11-0
>>> rename o7784-11-0 -> 
>>> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
>>> utimes 
>>> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo
>>> truncate 
>>> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
>>>  size=1073698824
>>> chown 
>>> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
>>>  - uid=1000, gid=1000
>>> chmod 
>>> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
>>>  - mode=0600
>>> utimes 
>>> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
>>> mkfile o7785-12-0
>>> rename o7785-12-0 -> 
>>> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/lSABmfoArm9pAtade-gHmS6X
>>> utimes 
>>> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo
>>> truncate 
>>> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/lSABmfoArm9pAtade-gHmS6X
>>>  size=864067592
>>> ERROR: truncate 
>>> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/lSABmfoArm9pAtade-gHmS6X
>>>  failed: Input/output error
>>> btrfs send 180310235348  0.09s user 11.98s system 16% cpu 1:14.42 total
>>> ---
>>
>> In that case, we need kernel message to investigate.
>> (And of course, please use at least 4.x kernel)
>>
>> Thanks,
>> Qu
>>
>>>
>>> Tries:
>>> 1.
>>> Connect between host A (btrfs, 4disks) and B with socat (TCP).
>>> Host B write to iSCSI disk (btrfs, single).
>>> clone with btrfs send/receive. Linux 4.15.
>>> -> Crashed at transfarred 1.78TB
>>>
>>> 2.
>>> Delete snapshot and retry.
>>> Connect between host A and B with SSH and socat (UNIX).
>>> Host B write to iSCSI disk (btrfs, single).
>>> clone with btrfs send/receive. Linux 4.15.
>>> -> Crashed at transfarred 90GB
>>>
>>> 3.
>>> Recreate btrfs.
>>> Host A write to iSCSI disk.
>>> clone with btrfs send/receive. Linux 4.15.
>>> -> Crashed at transfarred 260GB
>>>
>>> 4.
>>> Recreate btrfs.
>>> Original disk attach to other computer (having more resource.)
>>> clone with btrfs send/receive. Linux 4.15.
>>> -> Crashed at transfarred 120GB
>>>
>>> 5.
>>> Recreate btrfs.
>>> Clone with rsync. Linux 4.15.
>>> -> Crashed at transfarred 100GB
>>>
>>> 6.
>>> Recreate btrfs.
>>> Try with Linux 4.14, btrfs send/receive.
>>> -> Crashed at transfarred 3.98TB
>>>
>>> 7.
>>> Recreate btrfs.
>>> Connect between host and NAS (iSCSI) with GbE cable directly.
>>> Mounted with options relatime, spase_cache, compress=lzo.
>>> clone with btrfs send/receive. Linux 4.14.
>>> -> Crashed at transfarred 2.13TB
>>>
>>
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-12 Thread Kees Cook
On Mon, Mar 12, 2018 at 4:57 PM, Linus Torvalds
 wrote:
> On Mon, Mar 12, 2018 at 3:55 PM, Andrew Morton
>  wrote:
>>
>> Replacing the __builtin_choose_expr() with ?: works of course.
>
> Hmm. That sounds like the right thing to do. We were so myopically
> staring at the __builtin_choose_expr() problem that we overlooked the
> obvious solution.
>
> Using __builtin_constant_p() together with a ?: is in fact our common
> pattern, so that should be fine. The only real reason to use
> __builtin_choose_expr() is if you want to get the *type* to vary
> depending on which side you choose, but that's not an issue for
> min/max.

This doesn't solve it for -Wvla, unfortunately. That was the point of
Josh's original suggestion of __builtin_choose_expr().

Try building with KCFLAGS=-Wval and checking net/ipv6/proc.c:

net/ipv6/proc.c: In function ‘snmp6_seq_show_item’:
net/ipv6/proc.c:198:2: warning: ISO C90 forbids array ‘buff’ whose
size can’t be evaluated [-Wvla]
  unsigned long buff[SNMP_MIB_MAX];
  ^~~~


-Kees

-- 
Kees Cook
Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs broken in massive transfar

2018-03-12 Thread MASAKI haruka
*Now* I tried with Linux 4.14 and 4.15.
I experienced same probrem and reported in 2014 with Linux 3.9 and 3.10. 
(Perchance, actually the kernel was newer than 3.10, anyway I experienced same 
probrem with old 3.x kernel.)

> 
> 
> On 2018年03月13日 05:57, MASAKI haruka wrote:
> > I'm trying to clone 18TiB data between btrfs,
> > but it will crash anyway.
> > 
> > This probrem is occured even how to clone (btrfs send/receive, rsync or cp.)
> > I experienced same probrem in Linux 3.9 and Linux 3.10.
> 
> Did you really mean *3*.9 and *3*.10?
> 
> That's too old for btrfs usage IIRC.
> 
> It would be *4*.9 or *4*.10 for a relative new kernel for btrfs.
> 
> Would you please try some latest mainline kernel again?
> 
> > 
> > What happen:
> > 
> > 1. Failed to write because I/O error (read only filesystem)
> > 2. writing to the btrfs succeeds and fails randomly.
> > 3. The btrfs unable to unmount (resource is busy.) Unable to umount even 
> > forcely, so cannot halt.
> > 
> > Example:
> > ---
> > mkfile o7784-11-0
> > rename o7784-11-0 -> 
> > .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
> > utimes 
> > .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo
> > truncate 
> > .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
> >  size=1073698824
> > chown 
> > .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
> >  - uid=1000, gid=1000
> > chmod 
> > .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
> >  - mode=0600
> > utimes 
> > .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
> > mkfile o7785-12-0
> > rename o7785-12-0 -> 
> > .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/lSABmfoArm9pAtade-gHmS6X
> > utimes 
> > .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo
> > truncate 
> > .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/lSABmfoArm9pAtade-gHmS6X
> >  size=864067592
> > ERROR: truncate 
> > .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/lSABmfoArm9pAtade-gHmS6X
> >  failed: Input/output error
> > btrfs send 180310235348  0.09s user 11.98s system 16% cpu 1:14.42 total
> > ---
> 
> In that case, we need kernel message to investigate.
> (And of course, please use at least 4.x kernel)
> 
> Thanks,
> Qu
> 
> > 
> > Tries:
> > 1.
> > Connect between host A (btrfs, 4disks) and B with socat (TCP).
> > Host B write to iSCSI disk (btrfs, single).
> > clone with btrfs send/receive. Linux 4.15.
> > -> Crashed at transfarred 1.78TB
> > 
> > 2.
> > Delete snapshot and retry.
> > Connect between host A and B with SSH and socat (UNIX).
> > Host B write to iSCSI disk (btrfs, single).
> > clone with btrfs send/receive. Linux 4.15.
> > -> Crashed at transfarred 90GB
> > 
> > 3.
> > Recreate btrfs.
> > Host A write to iSCSI disk.
> > clone with btrfs send/receive. Linux 4.15.
> > -> Crashed at transfarred 260GB
> > 
> > 4.
> > Recreate btrfs.
> > Original disk attach to other computer (having more resource.)
> > clone with btrfs send/receive. Linux 4.15.
> > -> Crashed at transfarred 120GB
> > 
> > 5.
> > Recreate btrfs.
> > Clone with rsync. Linux 4.15.
> > -> Crashed at transfarred 100GB
> > 
> > 6.
> > Recreate btrfs.
> > Try with Linux 4.14, btrfs send/receive.
> > -> Crashed at transfarred 3.98TB
> > 
> > 7.
> > Recreate btrfs.
> > Connect between host and NAS (iSCSI) with GbE cable directly.
> > Mounted with options relatime, spase_cache, compress=lzo.
> > clone with btrfs send/receive. Linux 4.14.
> > -> Crashed at transfarred 2.13TB
> > 
> 


-- 
MASAKI haruka 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs broken in massive transfar

2018-03-12 Thread Qu Wenruo


On 2018年03月13日 05:57, MASAKI haruka wrote:
> I'm trying to clone 18TiB data between btrfs,
> but it will crash anyway.
> 
> This probrem is occured even how to clone (btrfs send/receive, rsync or cp.)
> I experienced same probrem in Linux 3.9 and Linux 3.10.

Did you really mean *3*.9 and *3*.10?

That's too old for btrfs usage IIRC.

It would be *4*.9 or *4*.10 for a relative new kernel for btrfs.

Would you please try some latest mainline kernel again?

> 
> What happen:
> 
> 1. Failed to write because I/O error (read only filesystem)
> 2. writing to the btrfs succeeds and fails randomly.
> 3. The btrfs unable to unmount (resource is busy.) Unable to umount even 
> forcely, so cannot halt.
> 
> Example:
> ---
> mkfile o7784-11-0
> rename o7784-11-0 -> 
> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
> utimes 
> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo
> truncate 
> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
>  size=1073698824
> chown 
> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
>  - uid=1000, gid=1000
> chmod 
> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
>  - mode=0600
> utimes 
> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
> mkfile o7785-12-0
> rename o7785-12-0 -> 
> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/lSABmfoArm9pAtade-gHmS6X
> utimes 
> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo
> truncate 
> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/lSABmfoArm9pAtade-gHmS6X
>  size=864067592
> ERROR: truncate 
> .filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/lSABmfoArm9pAtade-gHmS6X
>  failed: Input/output error
> btrfs send 180310235348  0.09s user 11.98s system 16% cpu 1:14.42 total
> ---

In that case, we need kernel message to investigate.
(And of course, please use at least 4.x kernel)

Thanks,
Qu

> 
> Tries:
> 1.
> Connect between host A (btrfs, 4disks) and B with socat (TCP).
> Host B write to iSCSI disk (btrfs, single).
> clone with btrfs send/receive. Linux 4.15.
> -> Crashed at transfarred 1.78TB
> 
> 2.
> Delete snapshot and retry.
> Connect between host A and B with SSH and socat (UNIX).
> Host B write to iSCSI disk (btrfs, single).
> clone with btrfs send/receive. Linux 4.15.
> -> Crashed at transfarred 90GB
> 
> 3.
> Recreate btrfs.
> Host A write to iSCSI disk.
> clone with btrfs send/receive. Linux 4.15.
> -> Crashed at transfarred 260GB
> 
> 4.
> Recreate btrfs.
> Original disk attach to other computer (having more resource.)
> clone with btrfs send/receive. Linux 4.15.
> -> Crashed at transfarred 120GB
> 
> 5.
> Recreate btrfs.
> Clone with rsync. Linux 4.15.
> -> Crashed at transfarred 100GB
> 
> 6.
> Recreate btrfs.
> Try with Linux 4.14, btrfs send/receive.
> -> Crashed at transfarred 3.98TB
> 
> 7.
> Recreate btrfs.
> Connect between host and NAS (iSCSI) with GbE cable directly.
> Mounted with options relatime, spase_cache, compress=lzo.
> clone with btrfs send/receive. Linux 4.14.
> -> Crashed at transfarred 2.13TB
> 



signature.asc
Description: OpenPGP digital signature


[PATCH v2 1/7] btrfs-progs: convert/ext2: Fix inline extent creation check

2018-03-12 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
---
 convert/source-ext2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/convert/source-ext2.c b/convert/source-ext2.c
index b1492c78693d..e13fbe00415a 100644
--- a/convert/source-ext2.c
+++ b/convert/source-ext2.c
@@ -310,7 +310,7 @@ static int ext2_create_file_extents(struct 
btrfs_trans_handle *trans,
if (ret)
goto fail;
if ((convert_flags & CONVERT_FLAG_INLINE_DATA) && data.first_block == 0
-   && data.num_blocks > 0
+   && data.num_blocks > 0 && inode_size < sectorsize
&& inode_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)) {
u64 num_bytes = data.num_blocks * sectorsize;
u64 disk_bytenr = data.disk_block * sectorsize;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/7] btrfs-progs: convert/reiserfs: Fix inline file extent creation check

2018-03-12 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
---
 convert/source-reiserfs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/convert/source-reiserfs.c b/convert/source-reiserfs.c
index 39d6f0728bd3..eeb68d962c5d 100644
--- a/convert/source-reiserfs.c
+++ b/convert/source-reiserfs.c
@@ -376,7 +376,8 @@ static int reiserfs_convert_tail(struct btrfs_trans_handle 
*trans,
u64 isize;
int ret;
 
-   if (length >= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info))
+   if (length >= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info) ||
+   length >= root->fs_info->sectorsize)
return convert_direct(trans, root, objectid, inode, body,
  length, offset, convert_flags);
 
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 7/7] btrfs-progs: test/mkfs: Add test case for rootdir inline extent size

2018-03-12 Thread Qu Wenruo
Add a test case for mkfs --rootdir, using files with different file
sizes to check if invalid large inline extent could exist.

Signed-off-by: Qu Wenruo 
---
 tests/mkfs-tests/014-rootdir-inline-extent/test.sh | 81 ++
 1 file changed, 81 insertions(+)
 create mode 100755 tests/mkfs-tests/014-rootdir-inline-extent/test.sh

diff --git a/tests/mkfs-tests/014-rootdir-inline-extent/test.sh 
b/tests/mkfs-tests/014-rootdir-inline-extent/test.sh
new file mode 100755
index ..e0765b0bc4e2
--- /dev/null
+++ b/tests/mkfs-tests/014-rootdir-inline-extent/test.sh
@@ -0,0 +1,81 @@
+#!/bin/bash
+# Regression test for mkfs.btrfs --rootdir with inline file extents
+# For any large inline file extent, btrfs check could already report it
+
+source "$TOP/tests/common"
+
+check_global_prereq fallocate
+check_prereq mkfs.btrfs
+
+prepare_test_dev
+
+tmp=$(mktemp -d --tmpdir btrfs-progs-mkfs.rootdirXXX)
+
+pagesize=$(getconf PAGESIZE)
+create_file()
+{
+   local size=$1
+   # Reuse size and filename
+   run_check fallocate -l $size "$tmp/$size"
+}
+
+test_mkfs_rootdir()
+{
+   nodesize=$1
+   run_check "$TOP/mkfs.btrfs" --nodesize $nodesize -f --rootdir "$tmp" \
+   "$TEST_DEV"
+   run_check $SUDO_HELPER "$TOP/btrfs" check "$TEST_DEV"
+}
+
+# File sizes is designed to cross differnet node size, so even
+# the sectorsize is not 4K, we can still test it well.
+create_file 512
+create_file 1024
+create_file 2048
+
+create_file 3994
+create_file 3995   # For 4K node size, max inline would be 4k - 101
+create_file 3996
+
+create_file 4095
+create_file 4096
+create_file 4097
+
+create_file 8090
+create_file 8091
+create_file 8092
+
+create_file 8191
+create_file 8192
+create_file 8193
+
+create_file 16282
+create_file 16283
+create_file 16284
+
+create_file 16383
+create_file 16384
+create_file 16385
+
+create_file 32666
+create_file 32667
+create_file 32668
+
+create_file 32767
+create_file 32768
+create_file 32769
+
+create_file 65434
+create_file 65435
+create_file 65436
+
+create_file 65535
+create_file 65536
+create_file 65537
+
+for nodesize in 4096 8192 16384 32768 65536; do
+   if [ $nodesize -ge $pagesize ]; then
+   test_mkfs_rootdir $nodesize
+   fi
+done
+rm -rf -- "$tmp"
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/7] btrfs-progs: check/lowmem mode: Check inline extent size

2018-03-12 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
---
 check/mode-lowmem.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 62bcf3d2e126..0d19b373c7af 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -1417,6 +1417,8 @@ static int check_file_extent(struct btrfs_root *root, 
struct btrfs_key *fkey,
u64 csum_found; /* In byte size, sectorsize aligned */
u64 search_start;   /* Logical range start we search for csum */
u64 search_len; /* Logical range len we search for csum */
+   u32 max_inline_extent_size = min_t(u32, root->fs_info->sectorsize - 1,
+   BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info));
unsigned int extent_type;
unsigned int is_hole;
int compressed = 0;
@@ -1440,6 +1442,32 @@ static int check_file_extent(struct btrfs_root *root, 
struct btrfs_key *fkey,
root->objectid, fkey->objectid, fkey->offset);
err |= FILE_EXTENT_ERROR;
}
+   if (compressed) {
+   if (extent_num_bytes > root->fs_info->sectorsize) {
+   error(
+"root %llu EXTENT_DATA[%llu %llu] too large inline extent ram size, have %llu, 
max: %u",
+   root->objectid, fkey->objectid,
+   fkey->offset, extent_num_bytes,
+   root->fs_info->sectorsize);
+   err |= FILE_EXTENT_ERROR;
+   }
+   if (item_inline_len > max_inline_extent_size) {
+   error(
+"root %llu EXTENT_DATA[%llu %llu] too large inline extent on-disk size, have 
%u, max: %u",
+   root->objectid, fkey->objectid,
+   fkey->offset, item_inline_len,
+   max_inline_extent_size);
+   err |= FILE_EXTENT_ERROR;
+   }
+   } else {
+   if (extent_num_bytes > max_inline_extent_size) {
+   error(
+ "root %llu EXTENT_DATA[%llu %llu] too large inline extent size, have %llu, 
max: %u",
+   root->objectid, fkey->objectid, fkey->offset,
+   extent_num_bytes, max_inline_extent_size);
+   err |= FILE_EXTENT_ERROR;
+   }
+   }
if (!compressed && extent_num_bytes != item_inline_len) {
error(
"root %llu EXTENT_DATA[%llu %llu] wrong inline size, have: 
%llu, expected: %u",
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 6/7] btrfs-progs: test/convert: Add test case for invalid large inline data extent

2018-03-12 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
---
 .../016-invalid-large-inline-extent/test.sh| 22 ++
 1 file changed, 22 insertions(+)
 create mode 100755 tests/convert-tests/016-invalid-large-inline-extent/test.sh

diff --git a/tests/convert-tests/016-invalid-large-inline-extent/test.sh 
b/tests/convert-tests/016-invalid-large-inline-extent/test.sh
new file mode 100755
index ..f37c7c09d2e7
--- /dev/null
+++ b/tests/convert-tests/016-invalid-large-inline-extent/test.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+# Check if btrfs-convert refuses to rollback the filesystem, and leave the fs
+# and the convert image untouched
+
+source "$TEST_TOP/common"
+source "$TEST_TOP/common.convert"
+
+setup_root_helper
+prepare_test_dev
+check_prereq btrfs-convert
+check_global_prereq mke2fs
+
+convert_test_prep_fs ext4 mke2fs -t ext4 -b 4096
+
+# Create a 6K file, which should not be inlined
+run_check $SUDO_HELPER dd if=/dev/zero bs=2k count=3 of="$TEST_MNT/file1"
+
+run_check_umount_test_dev
+
+# convert_test_do_convert() will call btrfs check, which should expose any
+# invalid inline extent with too large size
+convert_test_do_convert
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/7] btrfs-progs: mkfs/rootdir: Fix inline extent creation check

2018-03-12 Thread Qu Wenruo
Just like convert, we need extra check against sector size for creating
inline extent.

Signed-off-by: Qu Wenruo 
---
 mkfs/rootdir.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mkfs/rootdir.c b/mkfs/rootdir.c
index e06b65ac13e4..a1d223a2408a 100644
--- a/mkfs/rootdir.c
+++ b/mkfs/rootdir.c
@@ -139,7 +139,8 @@ static int fill_inode_item(struct btrfs_trans_handle *trans,
}
if (S_ISREG(src->st_mode)) {
btrfs_set_stack_inode_size(dst, (u64)src->st_size);
-   if (src->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info))
+   if (src->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info) &&
+   src->st_size < sectorsize)
btrfs_set_stack_inode_nbytes(dst, src->st_size);
else {
blocks = src->st_size / sectorsize;
@@ -327,7 +328,8 @@ static int add_file_items(struct btrfs_trans_handle *trans,
if (st->st_size % sectorsize)
blocks += 1;
 
-   if (st->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)) {
+   if (st->st_size <= BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info) &&
+   st->st_size < sectorsize) {
char *buffer = malloc(st->st_size);
 
if (!buffer) {
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/7] btrfs-progs: check/original mode: Check inline extent size

2018-03-12 Thread Qu Wenruo
For inline compressed file extent, kernel doesn't allow inline extent
ram size larger than sector size and on-disk inline extent size should
not exceed BTRFS_MAX_INLINE_DATA_SIZE().

For inline uncompressed file extent, kernel doesn't allow inline extent
ram and on-disk size larger than either BTRFS_MAX_INLINE_DATA_SIZE() or
sector size.

Check it in original mode.

Signed-off-by: Qu Wenruo 
---
 check/main.c  | 16 
 check/mode-original.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/check/main.c b/check/main.c
index 97baae583f04..7821faa929a3 100644
--- a/check/main.c
+++ b/check/main.c
@@ -560,6 +560,8 @@ static void print_inode_error(struct btrfs_root *root, 
struct inode_record *rec)
fprintf(stderr, ", bad file extent");
if (errors & I_ERR_FILE_EXTENT_OVERLAP)
fprintf(stderr, ", file extent overlap");
+   if (errors & I_ERR_FILE_EXTENT_TOO_LARGE)
+   fprintf(stderr, ", inline file extent too large");
if (errors & I_ERR_FILE_EXTENT_DISCOUNT)
fprintf(stderr, ", file extent discount");
if (errors & I_ERR_DIR_ISIZE_WRONG)
@@ -1433,6 +1435,8 @@ static int process_file_extent(struct btrfs_root *root,
u64 disk_bytenr = 0;
u64 extent_offset = 0;
u64 mask = root->fs_info->sectorsize - 1;
+   u32 max_inline_size = min_t(u32, mask,
+   BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info));
int extent_type;
int ret;
 
@@ -1458,9 +1462,21 @@ static int process_file_extent(struct btrfs_root *root,
extent_type = btrfs_file_extent_type(eb, fi);
 
if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
+   u8 compression = btrfs_file_extent_compression(eb, fi);
+   struct btrfs_item *item = btrfs_item_nr(slot);
+
num_bytes = btrfs_file_extent_inline_len(eb, slot, fi);
if (num_bytes == 0)
rec->errors |= I_ERR_BAD_FILE_EXTENT;
+   if (compression) {
+   if (btrfs_file_extent_inline_item_len(eb, item) >
+   max_inline_size ||
+   num_bytes > root->fs_info->sectorsize)
+   rec->errors |= I_ERR_FILE_EXTENT_TOO_LARGE;
+   } else {
+   if (num_bytes > max_inline_size)
+   rec->errors |= I_ERR_FILE_EXTENT_TOO_LARGE;
+   }
rec->found_size += num_bytes;
num_bytes = (num_bytes + mask) & ~mask;
} else if (extent_type == BTRFS_FILE_EXTENT_REG ||
diff --git a/check/mode-original.h b/check/mode-original.h
index f859af478f0f..368de692fdd1 100644
--- a/check/mode-original.h
+++ b/check/mode-original.h
@@ -185,6 +185,7 @@ struct file_extent_hole {
 #define I_ERR_SOME_CSUM_MISSING(1 << 12)
 #define I_ERR_LINK_COUNT_WRONG (1 << 13)
 #define I_ERR_FILE_EXTENT_ORPHAN   (1 << 14)
+#define I_ERR_FILE_EXTENT_TOO_LARGE(1 << 15)
 
 struct inode_record {
struct list_head backrefs;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/7] Fix long standing -EOPNOTSUPP problem caused by large inline extent

2018-03-12 Thread Qu Wenruo
The patch is based on v4.15.1, and is designed to replace the old patch
in devel branch.

Kernel doesn't support dropping range inside inline extent, and prevents
such thing happening by limiting max inline extent size to
min(max_inline, sectorsize - 1) in cow_file_range_inline().

However btrfs-progs only inherit the BTRFS_MAX_INLINE_DATA_SIZE() macro,
which doesn't have sectorsize check.
And since btrfs-progs defaults to 16K nodesize, above macro allows large
inline extent over 15K size.

This leads to unexpected kernel behavior.

The bug exists in several parts of btrfs-progs, any tool which creates
file extent is involved, including:
1) btrfs-convert
2) mkfs --rootdir

This patchset fixes the problems in convert (both ext2 and reiserfs),
mkfs --rootdir, then add check support for both original and lowmem
mode, and finally adds 2 test cases, one for mkfs and one for convert.

For mkfs test case, it can already be exposed by misc/002, but a
pin-point test case will be much better.

Tested with test-convert, test-fsck, test-misc and test-mkfs.

Qu Wenruo (7):
  btrfs-progs: convert/ext2: Fix inline extent creation check
  btrfs-progs: convert/reiserfs: Fix inline file extent creation check
  btrfs-progs: mkfs/rootdir: Fix inline extent creation check
  btrfs-progs: check/original mode: Check inline extent size
  btrfs-progs: check/lowmem mode: Check inline extent size
  btrfs-progs: test/convert: Add test case for invalid large inline data
extent
  btrfs-progs: test/mkfs: Add test case for rootdir inline extent size

 check/main.c   | 16 +
 check/mode-lowmem.c| 28 
 check/mode-original.h  |  1 +
 convert/source-ext2.c  |  2 +-
 convert/source-reiserfs.c  |  3 +-
 mkfs/rootdir.c |  6 +-
 .../016-invalid-large-inline-extent/test.sh| 22 ++
 tests/mkfs-tests/014-rootdir-inline-extent/test.sh | 81 ++
 8 files changed, 155 insertions(+), 4 deletions(-)
 create mode 100755 tests/convert-tests/016-invalid-large-inline-extent/test.sh
 create mode 100755 tests/mkfs-tests/014-rootdir-inline-extent/test.sh

-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-12 Thread Linus Torvalds
On Mon, Mar 12, 2018 at 3:55 PM, Andrew Morton
 wrote:
>
> Replacing the __builtin_choose_expr() with ?: works of course.

Hmm. That sounds like the right thing to do. We were so myopically
staring at the __builtin_choose_expr() problem that we overlooked the
obvious solution.

Using __builtin_constant_p() together with a ?: is in fact our common
pattern, so that should be fine. The only real reason to use
__builtin_choose_expr() is if you want to get the *type* to vary
depending on which side you choose, but that's not an issue for
min/max.

> What will be the runtime effects?

There should be none. Gcc will turn the conditional for the ?: into a
constant, and DTRT.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()

2018-03-12 Thread Andrew Morton
On Fri, 9 Mar 2018 17:30:15 -0800 Kees Cook  wrote:

> > It's one reason why I wondered if simplifying the expression to have
> > just that single __builtin_constant_p() might not end up working..
> 
> Yeah, it seems like it doesn't bail out as "false" for complex
> expressions given to __builtin_constant_p().
> 
> If no magic solution, then which of these?
> 
> - go back to my original "multi-eval max only for constants" macro (meh)
> - add gcc version checks around this and similarly for -Wvla in the future 
> (eww)
> - raise gcc version (yikes)

Replacing the __builtin_choose_expr() with ?: works of course.  What
will be the runtime effects?

I tried replacing

__builtin_choose_expr(__builtin_constant_p(x) &&
  __builtin_constant_p(y),

with

__builtin_choose_expr(__builtin_constant_p(x + y),

but no success.

I'm not sure what else to try to trick gcc into working.

--- 
a/include/linux/kernel.h~kernelh-skip-single-eval-logic-on-literals-in-min-max-v3-fix
+++ a/include/linux/kernel.h
@@ -804,13 +804,10 @@ static inline void ftrace_dump(enum ftra
  * accidental VLA.
  */
 #define __min(t1, t2, x, y)\
-   __builtin_choose_expr(__builtin_constant_p(x) &&\
- __builtin_constant_p(y),  \
- (t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y),\
- __single_eval_min(t1, t2, \
-   __UNIQUE_ID(min1_), \
-   __UNIQUE_ID(min2_), \
-   x, y))
+   ((__builtin_constant_p(x) && __builtin_constant_p(y)) ? \
+   ((t1)(x) < (t2)(y) ? (t1)(x) : (t2)(y)) :   \
+   (__single_eval_min(t1, t2, __UNIQUE_ID(min1_),  \
+ __UNIQUE_ID(min2_), x, y)))
 
 /**
  * min - return minimum of two values of the same or compatible types
@@ -826,13 +823,11 @@ static inline void ftrace_dump(enum ftra
max1 > max2 ? max1 : max2; })
 
 #define __max(t1, t2, x, y)\
-   __builtin_choose_expr(__builtin_constant_p(x) &&\
- __builtin_constant_p(y),  \
- (t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y),\
- __single_eval_max(t1, t2, \
-   __UNIQUE_ID(max1_), \
-   __UNIQUE_ID(max2_), \
-   x, y))
+   ((__builtin_constant_p(x) && __builtin_constant_p(y)) ? \
+   ((t1)(x) > (t2)(y) ? (t1)(x) : (t2)(y)) :   \
+   (__single_eval_max(t1, t2, __UNIQUE_ID(max1_),  \
+ __UNIQUE_ID(max2_), x, y)))
+
 /**
  * max - return maximum of two values of the same or compatible types
  * @x: first value
_

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs broken in massive transfar

2018-03-12 Thread MASAKI haruka
I'm trying to clone 18TiB data between btrfs,
but it will crash anyway.

This probrem is occured even how to clone (btrfs send/receive, rsync or cp.)
I experienced same probrem in Linux 3.9 and Linux 3.10.

What happen:

1. Failed to write because I/O error (read only filesystem)
2. writing to the btrfs succeeds and fails randomly.
3. The btrfs unable to unmount (resource is busy.) Unable to umount even 
forcely, so cannot halt.

Example:
---
mkfile o7784-11-0
rename o7784-11-0 -> 
.filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
utimes 
.filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo
truncate 
.filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
 size=1073698824
chown 
.filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
 - uid=1000, gid=1000
chmod 
.filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
 - mode=0600
utimes 
.filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/kVxX8RdGhryQiEMOm4II2qMw
mkfile o7785-12-0
rename o7785-12-0 -> 
.filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/lSABmfoArm9pAtade-gHmS6X
utimes 
.filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo
truncate 
.filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/lSABmfoArm9pAtade-gHmS6X
 size=864067592
ERROR: truncate 
.filesystem/HDD/.XFV_pp/,fQO40jotqhUZ0/5JSSubx1Ph5xYNOcXhIAoIK3/XDGOWpbx,5zYWEi0L5LHdWBo/lSABmfoArm9pAtade-gHmS6X
 failed: Input/output error
btrfs send 180310235348  0.09s user 11.98s system 16% cpu 1:14.42 total
---

Tries:
1.
Connect between host A (btrfs, 4disks) and B with socat (TCP).
Host B write to iSCSI disk (btrfs, single).
clone with btrfs send/receive. Linux 4.15.
-> Crashed at transfarred 1.78TB

2.
Delete snapshot and retry.
Connect between host A and B with SSH and socat (UNIX).
Host B write to iSCSI disk (btrfs, single).
clone with btrfs send/receive. Linux 4.15.
-> Crashed at transfarred 90GB

3.
Recreate btrfs.
Host A write to iSCSI disk.
clone with btrfs send/receive. Linux 4.15.
-> Crashed at transfarred 260GB

4.
Recreate btrfs.
Original disk attach to other computer (having more resource.)
clone with btrfs send/receive. Linux 4.15.
-> Crashed at transfarred 120GB

5.
Recreate btrfs.
Clone with rsync. Linux 4.15.
-> Crashed at transfarred 100GB

6.
Recreate btrfs.
Try with Linux 4.14, btrfs send/receive.
-> Crashed at transfarred 3.98TB

7.
Recreate btrfs.
Connect between host and NAS (iSCSI) with GbE cable directly.
Mounted with options relatime, spase_cache, compress=lzo.
clone with btrfs send/receive. Linux 4.14.
-> Crashed at transfarred 2.13TB
-- 
MASAKI haruka 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ongoing Btrfs stability issues

2018-03-12 Thread Christoph Anton Mitterer
On Mon, 2018-03-12 at 22:22 +0100, Goffredo Baroncelli wrote:
> Unfortunately no, the likelihood might be 100%: there are some
> patterns which trigger this problem quite easily. See The link which
> I posted in my previous email. There was a program which creates a
> bad checksum (in COW+DATASUM mode), and the file became unreadable.
But that rather seems like a plain bug?!

No reason that would conceptually make checksumming+notdatacow
impossible.

AFAIU, the conceptual thin would be about:
- data is written in nodatacow
  => thus a checksum must be written as well, so write it
- what can then of course happen is
  - both csum and data are written => fine
  - csum is written but data not and then some crash => csum will show
that => fine
  - data is written but csum not and then some crash => csum will give
false positive

Still better few false positives, as many unnoticed data corruptions
and no true raid repair.


> If you cannot know if a checksum is bad or the data is bad, the
> checksum is not useful at all!
Why not? It's anyway only uncertain in the case of crash,... and it at
least tells you that something is fishy.
A program which cares about its data will ensure its own journaling
means and can simply recover by this... or users could then just roll
in a backup.
Or one could provide some API/userland tool to recompute the csums of
the affected file (and possibly live with bad data).


> If I read correctly what you wrote, it seems that you consider a
> "minor issue" the fact that the checksum is not correct. If you
> accept the possibility that a checksum might be wrong, you wont trust
> anymore the checksum; so the checksum became not useful.
There's simply no disadvantage to not having checksumming at all in the
nodatacow case.
Cause then you never have an idea whether your data is correct or
not... the case with checksumming + datacow, which can give a false
positive on a crash when data was written correctly, but not the
checksum, covers at least the other cases of data corruption (silent
data corruption, csum written, but data not or only partially in case
of a crash).


> Again, you are assuming that the likelihood of having a bad checksum
> is low. Unfortunately this is not true. There are pattern which
> exploits this bug with a likelihood=100%.

Okay I don't understand why this would be so and wouldn't assume that
the IO pattern can affect it heavily... but I'm not really btrfs
expert.

My blind assumption would have been that writing an extent of data
takes much longer to complete than writing the corresponding checksum.

Even if not... I should be only a problem in case of a crash during
that,.. and than I'd still prefer to get the false positive than bad
data.


Anyway... it's not going to happen so the discussion is pointless.
I think people can probably use dm-integrity (which btw: does no CoW
either (IIRC) and still can provide integrity... ;-) ) to see whether
their data is valid.
No nice but since it won't change on btrfs, a possible alternative.


Cheers,
Chris.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ongoing Btrfs stability issues

2018-03-12 Thread Goffredo Baroncelli
On 03/11/2018 11:37 PM, Christoph Anton Mitterer wrote:
> On Sun, 2018-03-11 at 18:51 +0100, Goffredo Baroncelli wrote:
>>
>> COW is needed to properly checksum the data. Otherwise is not
>> possible to ensure the coherency between data and checksum (however I
>> have to point out that BTRFS fails even in this case [*]).
>> We could rearrange this sentence, saying that: if you want checksum,
>> you need COW...
> 
> No,... not really... the meta-data is anyway always CoWed... so if you
> do checksum *and* notdatacow,..., the only thing that could possibly
> happen (in the worst case) is, that data that actually made it
> correctly to the disk is falsely determined bad, as the metadata (i.e.
> the checksums) weren't upgraded correctly.
> 
> That however is probably much less likely than the other way round,..
> i.e. bad data went to disk and would be detected with checksuming.

Unfortunately no, the likelihood might be 100%: there are some patterns which 
trigger this problem quite easily. See The link which I posted in my previous 
email. There was a program which creates a bad checksum (in COW+DATASUM mode), 
and the file became unreadable.

> 
> 
> I had lots of discussions about this here on the list, and no one ever
> brought up a real argument against it... I also had an off-list
> discussion with Chris Mason who IIRC confirmed that it would actually
> work as I imagine it... with the only two problems:
> - good data possibly be marked bad because of bad checksums
> - reads giving back EIO where people would rather prefer bad data

If you cannot know if a checksum is bad or the data is bad, the checksum is not 
useful at all!

If I read correctly what you wrote, it seems that you consider a "minor issue" 
the fact that the checksum is not correct. If you accept the possibility that a 
checksum might be wrong, you wont trust anymore the checksum; so the checksum 
became not useful.
 

> (not really sure if this were really his two arguments,... I'd have to
> look it up, so don't nail me down).
> 
> 
> Long story short:
> 
> In any case, I think giving back bad data without EIO is unacceptable.
> If someone really doesn't care (e.g. because he has higher level
> checksumming and possibly even repair) he could still manually disable
> checksumming.
> 
> The little chance of having a false positive weights IMO far less that
> have very large amounts of data (DBs, VM images are our typical cases)
> completely unprotected.

Again, you are assuming that the likelihood of having a bad checksum is low. 
Unfortunately this is not true. There are pattern which exploits this bug with 
a likelihood=100%.

> 
> And not having checksumming with notdatacow breaks any safe raid repair
> (so in that case "repair" may even overwrite good data),... which is
> IMO also unacceptable.
> And the typical use cases for nodatacow (VMs, DBs) are in turn not so
> uncommon to want RAID.
> 
> 
> I really like btrfs,... and it's not that other fs (which typically
> have no checksumming at all) would perform better here... but not
> having it for these major use case is a big disappointment for me.
> 
> 
> Cheers,
> Chris.
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs-progs] coreutils-like -i parameter, splitting permissions for various tasks

2018-03-12 Thread David Sterba
On Sun, Feb 11, 2018 at 08:04:39PM +0100, Axel Burri wrote:
> 
> 
> On 2018-02-10 12:24, Tomasz Pala wrote:
> > There is a serious flaw in btrfs subcommands handling. Since all of them
> > are handled by single 'btrfs' binary, there is no way to create any
> > protection against accidental data loss for (the only one I've found,
> > but still DANGEROUS) 'btrfs subvolume delete'.
> > 
> > There are several protections that are being used for various commands.
> > For example, with zsh having hist_ignore_space enabled I got:
> > 
> > alias kill=' kill'
> > alias halt=' halt'
> > alias init=' init'
> > alias poweroff=' poweroff'
> > alias reboot=' reboot'
> > alias shutdown=' shutdown'
> > alias telinit=' telinit'
> > 
> > so that these command are never saved into my shell history.
> > 
> > Other system-wide protection enabled by default might be coreutils.sh
> > creating aliases:
> > 
> > alias cp=' cp --interactive --archive --backup=numbered --reflink=auto'
> > alias mv=' mv --interactive --backup=numbered'
> > alias rm=' rm --interactive --one-file-system --interactive=once'
> > 
> > All such countermeasures reduce the probability of fatal mistakes.
> > 
> > 
> > There is no 'prompt before doing ANYTHING irreversible' option for btrfs,
> > so everyone needs to take special care typing commands. Since snapshotting
> > and managing subvolumes is some daily-routine, not anything special
> > (like creating storage pools or managing devices), this should be more
> > forgiving for any user errors. Since there is no other (obvious)
> > solution, I propose makeing "subvolume delete" ask for confirmation by
> > default, unless used with newly introduced option, like -y(--yes).
> > 
> > 
> > Moreover, since there might be different admin roles on the system, the
> > btrfs-progs should be splitted into separate tools, so one could have
> > quota-admin without permissions for managing devices, backup-admin
> > with access to all the subvolumes or maintenance-admin that could issue
> > scrub or rebalance volumes. For backward compatibility, these tools
> > could be issued by 'btrfs' wrapper binary.
> 
> FWIW, I maintain a little patchset on btrfs-progs which separates
> specific btrfs command groups and thus can be used to set/restrict
> privileged access:
> 
> https://github.com/digint/btrfs-progs-btrbk
> 
> It's far from being complete, merely an ugly hack. I use it to constrain
> btrfs actions (issued by btrbk) on remote machines within cron jobs.

I think this is an interesting approach, the specific binaries with
capabilities avoid setting up sudo. The sources can also be generated
from some template, but what you have now is fine. I'm a bit concerned
about the security implications as the tools would need to be limited to
trusted users and up to the admin to deploy them properly, but otherwise
I see the usecase and would not mind merging it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs-progs] coreutils-like -i parameter, splitting permissions for various tasks

2018-03-12 Thread David Sterba
On Sat, Feb 10, 2018 at 12:24:41PM +0100, Tomasz Pala wrote:
> There is a serious flaw in btrfs subcommands handling. Since all of them
> are handled by single 'btrfs' binary, there is no way to create any
> protection against accidental data loss for (the only one I've found,
> but still DANGEROUS) 'btrfs subvolume delete'.

So, what about this:

we can add a global option to add the requested interactivity and
commands that are considred dangerous will require the confimation.

alias btrfs='btrfs -i'

This would expand to 'btrfs -i subvolume delete /subvol' etc. For any
command that is not dangerous, the option -i will be accepted but
nothing would happen.

[...]
> (like creating storage pools or managing devices), this should be more
> forgiving for any user errors. Since there is no other (obvious)
> solution, I propose makeing "subvolume delete" ask for confirmation by
> default, unless used with newly introduced option, like -y(--yes).

Does it make sense to add an option to override the -i? This would
cancel the -i that would be hidden in the alias. This can be generally
done by quoting the command, but the option to override -i might fit
somebodys usability habits.

Either option to 'say yes to all queries' or a simple negation of -i,
like -I.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix NULL pointer exception in find_bio_stripe()

2018-03-12 Thread David Sterba
On Sat, Feb 24, 2018 at 12:09:32AM +0100, David Sterba wrote:
> On Fri, Feb 16, 2018 at 07:51:38PM +, Dmitriy Gorokh wrote:
> > On detaching of a disk which is a part of a RAID6 filesystem, the following 
> > kernel OOPS may happen:
> > 
> > [63122.680461] BTRFS error (device sdo): bdev /dev/sdo errs: wr 0, rd 0, 
> > flush 1, corrupt 0, gen 0 
> > [63122.719584] BTRFS warning (device sdo): lost page write due to IO error 
> > on /dev/sdo 
> > [63122.719587] BTRFS error (device sdo): bdev /dev/sdo errs: wr 1, rd 0, 
> > flush 1, corrupt 0, gen 0 
> > [63122.803516] BTRFS warning (device sdo): lost page write due to IO error 
> > on /dev/sdo 
> > [63122.803519] BTRFS error (device sdo): bdev /dev/sdo errs: wr 2, rd 0, 
> > flush 1, corrupt 0, gen 0 
> > [63122.863902] BTRFS critical (device sdo): fatal error on device /dev/sdo 
> > [63122.935338] BUG: unable to handle kernel NULL pointer dereference at 
> > 0080 
> > [63122.946554] IP: fail_bio_stripe+0x58/0xa0 [btrfs] 
> > [63122.958185] PGD 9ecda067 P4D 9ecda067 PUD b2b37067 PMD 0 
> > [63122.971202] Oops:  [#1] SMP 
> > [63122.990786] Modules linked in: libcrc32c dlm configfs cpufreq_userspace 
> > cpufreq_powersave cpufreq_conservative softdog nfsd auth_rpcgss nfs_acl nfs 
> > lockd grace fscache sunrpc bonding ipmi_devintf ipmi_msghandler joydev 
> > snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd psmouse evdev parport_pc 
> > soundcore serio_raw battery pcspkr video ac97_bus ac parport ohci_pci 
> > ohci_hcd i2c_piix4 button crc32c_generic crc32c_intel btrfs xor 
> > zstd_decompress zstd_compress xxhash raid6_pq dm_mod dax raid1 md_mod 
> > hid_generic usbhid hid xhci_pci xhci_hcd ehci_pci ehci_hcd usbcore sg 
> > sd_mod sr_mod cdrom ata_generic ahci libahci ata_piix libata e1000 scsi_mod 
> > [last unloaded: scst] 
> > [63123.006760] CPU: 0 PID: 3979 Comm: kworker/u8:9 Tainted: G W 
> > 4.14.2-16-scst34x+ #8 
> > [63123.007091] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS 
> > VirtualBox 12/01/2006 
> > [63123.007402] Workqueue: btrfs-worker btrfs_worker_helper [btrfs] 
> > [63123.007595] task: 880036ea4040 task.stack: c90006384000 
> > [63123.007796] RIP: 0010:fail_bio_stripe+0x58/0xa0 [btrfs] 
> > [63123.007968] RSP: 0018:c90006387ad8 EFLAGS: 00010287 
> > [63123.008140] RAX: 0002 RBX: 88004beaa0b8 RCX: 
> > 8800b2bd5690 
> > [63123.008359] RDX:  RSI: 88007bb43500 RDI: 
> > 88004beaa000 
> > [63123.008621] RBP: c90006387ae8 R08: 9910 R09: 
> > 8800b2bd5600 
> > [63123.008840] R10: 0004 R11: 0001 R12: 
> > 88007bb43500 
> > [63123.009059] R13: fffb R14: 880036fc5180 R15: 
> > 0004 
> > [63123.009278] FS: () GS:8800b700() 
> > knlGS: 
> > [63123.009564] CS: 0010 DS:  ES:  CR0: 80050033 
> > [63123.009748] CR2: 0080 CR3: b0866000 CR4: 
> > 000406f0 
> > [63123.009969] Call Trace: 
> > [63123.010085] raid_write_end_io+0x7e/0x80 [btrfs] 
> > [63123.010251] bio_endio+0xa1/0x120 
> > [63123.010378] generic_make_request+0x218/0x270 
> > [63123.010921] submit_bio+0x66/0x130 
> > [63123.011073] finish_rmw+0x3fc/0x5b0 [btrfs] 
> > [63123.011245] full_stripe_write+0x96/0xc0 [btrfs] 
> > [63123.011428] raid56_parity_write+0x117/0x170 [btrfs] 
> > [63123.011604] btrfs_map_bio+0x2ec/0x320 [btrfs] 
> > [63123.011759] ? ___cache_free+0x1c5/0x300 
> > [63123.011909] __btrfs_submit_bio_done+0x26/0x50 [btrfs] 
> > [63123.012087] run_one_async_done+0x9c/0xc0 [btrfs] 
> > [63123.012257] normal_work_helper+0x19e/0x300 [btrfs] 
> > [63123.012429] btrfs_worker_helper+0x12/0x20 [btrfs] 
> > [63123.012656] process_one_work+0x14d/0x350 
> > [63123.012888] worker_thread+0x4d/0x3a0 
> > [63123.013026] ? _raw_spin_unlock_irqrestore+0x15/0x20 
> > [63123.013192] kthread+0x109/0x140 
> > [63123.013315] ? process_scheduled_works+0x40/0x40 
> > [63123.013472] ? kthread_stop+0x110/0x110 
> > [63123.013610] ret_from_fork+0x25/0x30 
> > [63123.013741] Code: 7e 43 31 c0 48 63 d0 48 8d 14 52 49 8d 4c d1 60 48 8b 
> > 51 08 49 39 d0 72 1f 4c 63 1b 4c 01 da 49 39 d0 73 14 48 8b 11 48 8b 52 68 
> > <48> 8b 8a 80 00 00 00 48 39 4e 08 74 14 83 c0 01 44 39 d0 75 c4 
> > [63123.014469] RIP: fail_bio_stripe+0x58/0xa0 [btrfs] RSP: c90006387ad8 
> > [63123.014678] CR2: 0080 
> > [63123.016590] ---[ end trace a295ea7259c17880 ]— 
> > 
> > This is reproducible in a cycle, where a series of writes is followed by 
> > SCSI device delete command. The test may take up to few minutes.
> > 
> > Fixes: commit 74d46992e0d9dee7f1f376de0d56d31614c8a17a ("block: replace 
> > bi_bdev with a gendisk pointer and partitions index")
> 
> Right, the commit introduced dereference of stripe->dev->bdev so we have
> to check it first.
> 
> Please update the Fixes: tag, the word 'commit' should not be there and
> the sha1 length can be shortened to 12 

Re: [PATCH 1/4] btrfs-progs: mode-lowmem: Fix false alert about orphan inode

2018-03-12 Thread David Sterba
On Mon, Feb 05, 2018 at 02:47:11PM +0800, Qu Wenruo wrote:
> Btrfs can delay inode deletion and in that case btrfs will unlink the
> victim inode from its parent dir, and insert a marker to info btrfs to
> delete it later.
> 
> In that case, such victim inode will have nlinks == 0, but is still
> completely valid.
> Original mode won't report such problem, but lowmem mode doesn't check
> the ORPHAN_ITEM key for such inode, and can report false alert like:
> --
> ERROR: root 257 INODE[28891726] is orphan item
> --
> 
> Fix such false alert by checking orphan item for inode whose nlink is 0.
> 
> Signed-off-by: Qu Wenruo 

1-4 applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] btrfs-progs: mode-lowmem: Fix false alert about orphan inode

2018-03-12 Thread David Sterba
On Mon, Feb 05, 2018 at 03:00:39PM +0800, Su Yue wrote:
> 
> 
> On 02/05/2018 02:47 PM, Qu Wenruo wrote:
> > Btrfs can delay inode deletion and in that case btrfs will unlink the
> > victim inode from its parent dir, and insert a marker to info btrfs to
> > delete it later.
> > 
> > In that case, such victim inode will have nlinks == 0, but is still
> > completely valid.
> > Original mode won't report such problem, but lowmem mode doesn't check
> > the ORPHAN_ITEM key for such inode, and can report false alert like:
> > --
> > ERROR: root 257 INODE[28891726] is orphan item
> > --
> > 
> > Fix such false alert by checking orphan item for inode whose nlink is 0.
> > 
> > Signed-off-by: Qu Wenruo 
> > ---
> >   check/mode-lowmem.c | 24 ++--
> >   1 file changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
> > index 62bcf3d2e126..b11a6d77d102 100644
> > --- a/check/mode-lowmem.c
> > +++ b/check/mode-lowmem.c
> > @@ -1861,6 +1861,24 @@ out:
> > return ret;
> >   }
> >   
> > +static bool has_orphan_item(struct btrfs_root *root, u64 ino)
> > +{
> > +   struct btrfs_path path;
> > +   struct btrfs_key key;
> > +   int ret;
> > +
> > +   btrfs_init_path();
> > +   key.objectid = BTRFS_ORPHAN_OBJECTID;
> > +   key.type = BTRFS_ORPHAN_ITEM_KEY;
> > +   key.offset = ino;
> > +
> > +   ret = btrfs_search_slot(NULL, root, , , 0, 0);
> > +   btrfs_release_path();
> > +   if (ret == 0)
> > +   return true;
> > +   return false;
> > +}
> > +
> 
> Suggestion:
> Maybe "is_orphan_item" is a better name?

The inode cannot be an orphan item itself, the orphan status is denoted
by the orphan item so the inode does have the item attached. So
has_orphan_item seems appropriate.

> Reviewed-by: Su Yue 

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Use sizeof directly instead of a constant variable

2018-03-12 Thread David Sterba
On Mon, Mar 12, 2018 at 06:52:28PM +0200, Nikolay Borisov wrote:
> The kernel would like to have all stack VLA usage removed[1].
> Unfortunately using an integer constant variable as the size of an
> array is still considered a VLA. Instead let's use directly sizeof(var)
> which removes the VLA usage. Use the occasion to remove csum_size
> altogether and use sizeof() also for the size passed to memcmp

The point of csum_size here is to define once, use everywhere, so we
don't have to opencode sizeof. To avoid the 'not really a VLA', we can
rewrite that.

> 
> [1]: https://lkml.org/lkml/2018/3/7/621
> 
> Signed-off-by: Nikolay Borisov 
> ---
>  fs/btrfs/disk-io.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 3809d6d66f6a..6247162e334a 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -403,8 +403,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
> *fs_info,
>  
>   if (csum_type == BTRFS_CSUM_TYPE_CRC32) {
>   u32 crc = ~(u32)0;
> - const int csum_size = sizeof(crc);
> - char result[csum_size];
> + char result[sizeof(crc)];
>  
>   /*
>* The super_block structure does not span the whole
> @@ -415,7 +414,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
> *fs_info,
>   crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
>   btrfs_csum_final(crc, result);
>  
> - if (memcmp(raw_disk_sb, result, csum_size))
> + if (memcmp(raw_disk_sb, result, sizeof(crc)))

sizeof(result) would be better here as it's clear what we're going to
memcpy, while 'crc' is define somewhere above.

>   ret = 1;
>   }
>  
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Use sizeof directly instead of a constant variable

2018-03-12 Thread Nikolay Borisov
The kernel would like to have all stack VLA usage removed[1].
Unfortunately using an integer constant variable as the size of an
array is still considered a VLA. Instead let's use directly sizeof(var)
which removes the VLA usage. Use the occasion to remove csum_size
altogether and use sizeof() also for the size passed to memcmp

[1]: https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/disk-io.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3809d6d66f6a..6247162e334a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -403,8 +403,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
 
if (csum_type == BTRFS_CSUM_TYPE_CRC32) {
u32 crc = ~(u32)0;
-   const int csum_size = sizeof(crc);
-   char result[csum_size];
+   char result[sizeof(crc)];
 
/*
 * The super_block structure does not span the whole
@@ -415,7 +414,7 @@ static int btrfs_check_super_csum(struct btrfs_fs_info 
*fs_info,
crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
btrfs_csum_final(crc, result);
 
-   if (memcmp(raw_disk_sb, result, csum_size))
+   if (memcmp(raw_disk_sb, result, sizeof(crc)))
ret = 1;
}
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstests: test regression of -EEXIST on creating new file after log replay

2018-03-12 Thread Liu Bo
On Mon, Mar 12, 2018 at 07:28:26PM +0800, Eryu Guan wrote:
> On Sat, Mar 10, 2018 at 04:56:04PM -0700, Liu Bo wrote:
> > The regression is introduced to btrfs in linux v4.4 and it refuses to create
> > new files after log replay by returning -EEXIST.
> > 
> > Although the problem is on btrfs only, there is no btrfs stuff in terms of
> > test, so this makes it generic.
> > 
> > The kernel fix is
> >   Btrfs: fix unexpected -EEXIST when creating new inode
> > 
> > Reviewed-by: Filipe Manana 
> > Signed-off-by: Liu Bo 
...
> > +seq=`basename $0`
> > +seqres=$RESULT_DIR/$seq
> > +echo "QA output created by $seq"
> > +tmp=/tmp/$$
> > +status=1   # failure is the default!
> > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > +
> > +_cleanup()
> > +{
> > +   _cleanup_flakey
> > +   cd /
> > +   rm -f $tmp.*
> > +}
> > +
> > +# get standard environment, filters and checks
> > +. ./common/rc
> > +. ./common/filter
> > +. ./common/dmflakey
> > +
> > +# real QA test starts here
> > +_supported_fs generic
> > +_supported_os Linux
> > +_require_scratch
> > +_require_dm_target flakey
> > +
> > +rm -f $seqres.full
> > +
> > +_scratch_mkfs >>$seqres.full 2>&1
> 
> I added a "_require_metadata_journaling $SCRATCH_DEV" here, otherwise
> test fails with ext2, which doesn't have journal.

Ah OK, that makes sense.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUGFIX PATCH bpf-next] error-injection: Fix to prohibit jump optimization

2018-03-12 Thread Daniel Borkmann
On 03/12/2018 03:06 PM, Masami Hiramatsu wrote:
> On Mon, 12 Mar 2018 11:44:21 +0100
> Daniel Borkmann  wrote:
>> On 03/12/2018 11:27 AM, Masami Hiramatsu wrote:
>>> On Mon, 12 Mar 2018 19:00:49 +0900
>>> Masami Hiramatsu  wrote:
>>>
 Since the kprobe which was optimized by jump can not change
 the execution path, the kprobe for error-injection must not
 be optimized. To prohibit it, set a dummy post-handler as
 officially stated in Documentation/kprobes.txt.
>>>
>>> Note that trace-probe based BPF is not affected, because it
>>> ensures the trace-probe is based on ftrace, which is not
>>> jump optimized.
>>
>> Thanks for the fix! I presume this should go via bpf instead of bpf-next
>> tree since 4b1a29a7f542 ("error-injection: Support fault injection 
>> framework")
>> is in Linus' tree as well. Unless there are objection I would rather route
>> it that way so it would be for 4.16.
> 
> Ah, right! It should go into 4.16. It should be applicable cleanly either tree
> since there is only the above commit on kernel/fail_function.c :)

Applied to bpf tree, thanks Masami!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUGFIX PATCH bpf-next] error-injection: Fix to prohibit jump optimization

2018-03-12 Thread Masami Hiramatsu
On Mon, 12 Mar 2018 11:44:21 +0100
Daniel Borkmann  wrote:

> Hi Masami,
> 
> On 03/12/2018 11:27 AM, Masami Hiramatsu wrote:
> > On Mon, 12 Mar 2018 19:00:49 +0900
> > Masami Hiramatsu  wrote:
> > 
> >> Since the kprobe which was optimized by jump can not change
> >> the execution path, the kprobe for error-injection must not
> >> be optimized. To prohibit it, set a dummy post-handler as
> >> officially stated in Documentation/kprobes.txt.
> > 
> > Note that trace-probe based BPF is not affected, because it
> > ensures the trace-probe is based on ftrace, which is not
> > jump optimized.
> 
> Thanks for the fix! I presume this should go via bpf instead of bpf-next
> tree since 4b1a29a7f542 ("error-injection: Support fault injection framework")
> is in Linus' tree as well. Unless there are objection I would rather route
> it that way so it would be for 4.16.

Ah, right! It should go into 4.16. It should be applicable cleanly either tree
since there is only the above commit on kernel/fail_function.c :)

Thanks,

> 
> Thanks,
> Daniel
> 
> > Thanks,
> > 
> >>
> >> Fixes: 4b1a29a7f542 ("error-injection: Support fault injection framework")
> >> Signed-off-by: Masami Hiramatsu 
> >> ---
> >>  kernel/fail_function.c |   10 ++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/kernel/fail_function.c b/kernel/fail_function.c
> >> index 21b0122cb39c..1d5632d8bbcc 100644
> >> --- a/kernel/fail_function.c
> >> +++ b/kernel/fail_function.c
> >> @@ -14,6 +14,15 @@
> >>  
> >>  static int fei_kprobe_handler(struct kprobe *kp, struct pt_regs *regs);
> >>  
> >> +static void fei_post_handler(struct kprobe *kp, struct pt_regs *regs,
> >> +   unsigned long flags)
> >> +{
> >> +  /*
> >> +   * A dummy post handler is required to prohibit optimizing, because
> >> +   * jump optimization does not support execution path overriding.
> >> +   */
> >> +}
> >> +
> >>  struct fei_attr {
> >>struct list_head list;
> >>struct kprobe kp;
> >> @@ -56,6 +65,7 @@ static struct fei_attr *fei_attr_new(const char *sym, 
> >> unsigned long addr)
> >>return NULL;
> >>}
> >>attr->kp.pre_handler = fei_kprobe_handler;
> >> +  attr->kp.post_handler = fei_post_handler;
> >>attr->retval = adjust_error_retval(addr, 0);
> >>INIT_LIST_HEAD(>list);
> >>}
> >>
> > 
> > 
> 


-- 
Masami Hiramatsu 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Handle error from btrfs_uuid_tree_rem call in _btrfs_ioctl_set_received_subvol

2018-03-12 Thread Nikolay Borisov
As with every function which deals with modifying the btree
btrfs_uuid_tree_rem can fail for any number of reasons (ie. EIO/ENOMEM).
Handle return error value from this function gracefully by aborting the
transaction.

Fixes: dd5f9615fc5c ("Btrfs: maintain subvolume items in the UUID tree")
Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ioctl.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 987b4843272f..0abe9ca83a9e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -5075,10 +5075,17 @@ static long _btrfs_ioctl_set_received_subvol(struct 
file *file,
received_uuid_changed = memcmp(root_item->received_uuid, sa->uuid,
   BTRFS_UUID_SIZE);
if (received_uuid_changed &&
-   !btrfs_is_empty_uuid(root_item->received_uuid))
-   btrfs_uuid_tree_rem(trans, fs_info, root_item->received_uuid,
-   BTRFS_UUID_KEY_RECEIVED_SUBVOL,
-   root->root_key.objectid);
+   !btrfs_is_empty_uuid(root_item->received_uuid)) {
+   ret = btrfs_uuid_tree_rem(trans, fs_info,
+ root_item->received_uuid,
+ BTRFS_UUID_KEY_RECEIVED_SUBVOL,
+ root->root_key.objectid);
+   if (ret && ret != -ENOENT) {
+   btrfs_abort_transaction(trans, ret);
+   btrfs_end_transaction(trans);
+   goto out;
+   }
+   }
memcpy(root_item->received_uuid, sa->uuid, BTRFS_UUID_SIZE);
btrfs_set_root_stransid(root_item, sa->stransid);
btrfs_set_root_rtransid(root_item, sa->rtransid);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstests: test regression of -EEXIST on creating new file after log replay

2018-03-12 Thread Eryu Guan
On Sat, Mar 10, 2018 at 04:56:04PM -0700, Liu Bo wrote:
> The regression is introduced to btrfs in linux v4.4 and it refuses to create
> new files after log replay by returning -EEXIST.
> 
> Although the problem is on btrfs only, there is no btrfs stuff in terms of
> test, so this makes it generic.
> 
> The kernel fix is
>   Btrfs: fix unexpected -EEXIST when creating new inode
> 
> Reviewed-by: Filipe Manana 
> Signed-off-by: Liu Bo 
> ---
> v2: - Remove failed message from 481.out
> - Drop the unnecessary write in creating a file
> 
>  tests/generic/481 | 75 
> +++
>  tests/generic/481.out |  2 ++
>  tests/generic/group   |  1 +
>  3 files changed, 78 insertions(+)
>  create mode 100755 tests/generic/481
>  create mode 100644 tests/generic/481.out
> 
> diff --git a/tests/generic/481 b/tests/generic/481
> new file mode 100755
> index 000..6a7e9dd
> --- /dev/null
> +++ b/tests/generic/481
> @@ -0,0 +1,75 @@
> +#! /bin/bash
> +# FSQA Test No. 481
> +#
> +# Reproduce a regression of btrfs that leads to -EEXIST on creating new files
> +# after log replay.
> +#
> +# The kernel fix is
> +#   Btrfs: fix unexpected -EEXIST when creating new inode
> +#
> +#---
> +#
> +# Copyright (C) 2018 Oracle. All Rights Reserved.
> +# Author: Bo Liu 
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + _cleanup_flakey
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/dmflakey
> +
> +# real QA test starts here
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch
> +_require_dm_target flakey
> +
> +rm -f $seqres.full
> +
> +_scratch_mkfs >>$seqres.full 2>&1

I added a "_require_metadata_journaling $SCRATCH_DEV" here, otherwise
test fails with ext2, which doesn't have journal.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUGFIX PATCH bpf-next] error-injection: Fix to prohibit jump optimization

2018-03-12 Thread Daniel Borkmann
Hi Masami,

On 03/12/2018 11:27 AM, Masami Hiramatsu wrote:
> On Mon, 12 Mar 2018 19:00:49 +0900
> Masami Hiramatsu  wrote:
> 
>> Since the kprobe which was optimized by jump can not change
>> the execution path, the kprobe for error-injection must not
>> be optimized. To prohibit it, set a dummy post-handler as
>> officially stated in Documentation/kprobes.txt.
> 
> Note that trace-probe based BPF is not affected, because it
> ensures the trace-probe is based on ftrace, which is not
> jump optimized.

Thanks for the fix! I presume this should go via bpf instead of bpf-next
tree since 4b1a29a7f542 ("error-injection: Support fault injection framework")
is in Linus' tree as well. Unless there are objection I would rather route
it that way so it would be for 4.16.

Thanks,
Daniel

> Thanks,
> 
>>
>> Fixes: 4b1a29a7f542 ("error-injection: Support fault injection framework")
>> Signed-off-by: Masami Hiramatsu 
>> ---
>>  kernel/fail_function.c |   10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/kernel/fail_function.c b/kernel/fail_function.c
>> index 21b0122cb39c..1d5632d8bbcc 100644
>> --- a/kernel/fail_function.c
>> +++ b/kernel/fail_function.c
>> @@ -14,6 +14,15 @@
>>  
>>  static int fei_kprobe_handler(struct kprobe *kp, struct pt_regs *regs);
>>  
>> +static void fei_post_handler(struct kprobe *kp, struct pt_regs *regs,
>> + unsigned long flags)
>> +{
>> +/*
>> + * A dummy post handler is required to prohibit optimizing, because
>> + * jump optimization does not support execution path overriding.
>> + */
>> +}
>> +
>>  struct fei_attr {
>>  struct list_head list;
>>  struct kprobe kp;
>> @@ -56,6 +65,7 @@ static struct fei_attr *fei_attr_new(const char *sym, 
>> unsigned long addr)
>>  return NULL;
>>  }
>>  attr->kp.pre_handler = fei_kprobe_handler;
>> +attr->kp.post_handler = fei_post_handler;
>>  attr->retval = adjust_error_retval(addr, 0);
>>  INIT_LIST_HEAD(>list);
>>  }
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUGFIX PATCH bpf-next] error-injection: Fix to prohibit jump optimization

2018-03-12 Thread Masami Hiramatsu
On Mon, 12 Mar 2018 19:00:49 +0900
Masami Hiramatsu  wrote:

> Since the kprobe which was optimized by jump can not change
> the execution path, the kprobe for error-injection must not
> be optimized. To prohibit it, set a dummy post-handler as
> officially stated in Documentation/kprobes.txt.

Note that trace-probe based BPF is not affected, because it
ensures the trace-probe is based on ftrace, which is not
jump optimized.

Thanks,

> 
> Fixes: 4b1a29a7f542 ("error-injection: Support fault injection framework")
> Signed-off-by: Masami Hiramatsu 
> ---
>  kernel/fail_function.c |   10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/kernel/fail_function.c b/kernel/fail_function.c
> index 21b0122cb39c..1d5632d8bbcc 100644
> --- a/kernel/fail_function.c
> +++ b/kernel/fail_function.c
> @@ -14,6 +14,15 @@
>  
>  static int fei_kprobe_handler(struct kprobe *kp, struct pt_regs *regs);
>  
> +static void fei_post_handler(struct kprobe *kp, struct pt_regs *regs,
> +  unsigned long flags)
> +{
> + /*
> +  * A dummy post handler is required to prohibit optimizing, because
> +  * jump optimization does not support execution path overriding.
> +  */
> +}
> +
>  struct fei_attr {
>   struct list_head list;
>   struct kprobe kp;
> @@ -56,6 +65,7 @@ static struct fei_attr *fei_attr_new(const char *sym, 
> unsigned long addr)
>   return NULL;
>   }
>   attr->kp.pre_handler = fei_kprobe_handler;
> + attr->kp.post_handler = fei_post_handler;
>   attr->retval = adjust_error_retval(addr, 0);
>   INIT_LIST_HEAD(>list);
>   }
> 


-- 
Masami Hiramatsu 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUGFIX PATCH bpf-next] error-injection: Fix to prohibit jump optimization

2018-03-12 Thread Masami Hiramatsu
Since the kprobe which was optimized by jump can not change
the execution path, the kprobe for error-injection must not
be optimized. To prohibit it, set a dummy post-handler as
officially stated in Documentation/kprobes.txt.

Fixes: 4b1a29a7f542 ("error-injection: Support fault injection framework")
Signed-off-by: Masami Hiramatsu 
---
 kernel/fail_function.c |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/kernel/fail_function.c b/kernel/fail_function.c
index 21b0122cb39c..1d5632d8bbcc 100644
--- a/kernel/fail_function.c
+++ b/kernel/fail_function.c
@@ -14,6 +14,15 @@
 
 static int fei_kprobe_handler(struct kprobe *kp, struct pt_regs *regs);
 
+static void fei_post_handler(struct kprobe *kp, struct pt_regs *regs,
+unsigned long flags)
+{
+   /*
+* A dummy post handler is required to prohibit optimizing, because
+* jump optimization does not support execution path overriding.
+*/
+}
+
 struct fei_attr {
struct list_head list;
struct kprobe kp;
@@ -56,6 +65,7 @@ static struct fei_attr *fei_attr_new(const char *sym, 
unsigned long addr)
return NULL;
}
attr->kp.pre_handler = fei_kprobe_handler;
+   attr->kp.post_handler = fei_post_handler;
attr->retval = adjust_error_retval(addr, 0);
INIT_LIST_HEAD(>list);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Remove received information from snapshot on ro->rw switch

2018-03-12 Thread Nikolay Borisov
Currently when a read-only snapshot is received and subsequently its
ro property is set to false i.e. switched to rw-mode the
received_uuid/stime/rtime/stransid/rtransid of that subvol remains
intact. However, once the received volume is switched to RW mode we
cannot guaranteee that it contains the same data, so it makes sense
to remove those fields which indicate this volume was ever
send/received. Additionally, sending such volume can cause conflicts
due to the presence of received_uuid.

Signed-off-by: Nikolay Borisov 
Suggested-by: David Sterba 
---
Changes: 
 * changed the title to make it more generic. Old was: "btrfs: Remove
 received_uuid during received snapshot ro->rw switch"

 * Clear additional fields that pertain to subvol receive operation. 

 fs/btrfs/ioctl.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5011b6272cfa..987b4843272f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1758,6 +1758,7 @@ static noinline int btrfs_ioctl_subvol_setflags(struct 
file *file,
struct btrfs_trans_handle *trans;
u64 root_flags;
u64 flags;
+   bool clear_received_state = false;
int ret = 0;
 
if (!inode_owner_or_capable(inode))
@@ -1807,6 +1808,7 @@ static noinline int btrfs_ioctl_subvol_setflags(struct 
file *file,
btrfs_set_root_flags(>root_item,
 root_flags & ~BTRFS_ROOT_SUBVOL_RDONLY);
spin_unlock(>root_item_lock);
+   clear_received_state = true;
} else {
spin_unlock(>root_item_lock);
btrfs_warn(fs_info,
@@ -1823,6 +1825,31 @@ static noinline int btrfs_ioctl_subvol_setflags(struct 
file *file,
goto out_reset;
}
 
+   if (clear_received_state) {
+   if (!btrfs_is_empty_uuid(root->root_item.received_uuid)) {
+   struct btrfs_root_item *root_item = >root_item;
+
+   ret = btrfs_uuid_tree_rem(trans, fs_info,
+   root->root_item.received_uuid,
+   BTRFS_UUID_KEY_RECEIVED_SUBVOL,
+   root->root_key.objectid);
+
+   if (ret && ret != -ENOENT) {
+   btrfs_abort_transaction(trans, ret);
+   btrfs_end_transaction(trans);
+   goto out_reset;
+   }
+
+   memset(root_item->received_uuid, 0, BTRFS_UUID_SIZE);
+   btrfs_set_root_stransid(root_item, 0);
+   btrfs_set_root_rtransid(root_item, 0);
+   btrfs_set_stack_timespec_sec(_item->stime, 0);
+   btrfs_set_stack_timespec_nsec(_item->stime, 0);
+   btrfs_set_stack_timespec_sec(_item->rtime, 0);
+   btrfs_set_stack_timespec_nsec(_item->rtime, 0);
+   }
+   }
+
ret = btrfs_update_root(trans, fs_info->tree_root,
>root_key, >root_item);
if (ret < 0) {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: Move check/main.c to cmds-check.c to maintain the subcommand hierarchy

2018-03-12 Thread Qu Wenruo
cmds-check.c is back now, with include files cleaned up.

Signed-off-by: Qu Wenruo 
---
 Makefile |  8 
 check/main.c => cmds-check.c | 17 -
 2 files changed, 4 insertions(+), 21 deletions(-)
 rename check/main.c => cmds-check.c (97%)

diff --git a/Makefile b/Makefile
index 27f5c9d97f9a..5f6d5b351700 100644
--- a/Makefile
+++ b/Makefile
@@ -110,6 +110,7 @@ CHECKER_FLAGS := -include $(check_defs) -D__CHECKER__ \
-D__CHECK_ENDIAN__ -Wbitwise -Wuninitialized -Wshadow -Wundef \
-U_FORTIFY_SOURCE -Wdeclaration-after-statement -Wdefault-bitfield-sign
 
+check_objects = check/mode-lowmem.o check/mode-original.o check/mode-common.o
 objects = ctree.o disk-io.o kernel-lib/radix-tree.o extent-tree.o print-tree.o 
\
  root-tree.o dir-item.o file-item.o inode-item.o inode-map.o \
  extent-cache.o extent_io.o volumes.o utils.o repair.o \
@@ -119,12 +120,11 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o 
extent-tree.o print-tree.o \
  fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o
 cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \
   cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
-  cmds-quota.o cmds-qgroup.o cmds-replace.o check/main.o \
+  cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-check.o \
   cmds-restore.o cmds-rescue.o chunk-recover.o super-recover.o \
   cmds-property.o cmds-fi-usage.o cmds-inspect-dump-tree.o \
   cmds-inspect-dump-super.o cmds-inspect-tree-stats.o cmds-fi-du.o 
\
-  mkfs/common.o check/mode-common.o check/mode-lowmem.o \
-  check/mode-original.o
+  mkfs/common.o
 libbtrfs_objects = send-stream.o send-utils.o kernel-lib/rbtree.o btrfs-list.o 
\
   kernel-lib/crc32c.o messages.o \
   uuid-tree.o utils-lib.o rbtree-utils.o
@@ -450,7 +450,7 @@ btrfs-%: btrfs-%.o $(objects) $(standalone_deps) 
$(libs_static)
$(libs_static) \
$(LDFLAGS) $(LIBS) $($(subst -,_,$@-libs))
 
-btrfs: btrfs.o $(objects) $(cmds_objects) $(libs_static)
+btrfs: btrfs.o $(objects) $(cmds_objects) $(libs_static) $(check_objects)
@echo "[LD] $@"
$(Q)$(CC) -o $@ $^ $(LDFLAGS) $(LIBS) $(LIBS_COMP)
 
diff --git a/check/main.c b/cmds-check.c
similarity index 97%
rename from check/main.c
rename to cmds-check.c
index 6944de9e4635..decbbe662336 100644
--- a/check/main.c
+++ b/cmds-check.c
@@ -16,32 +16,15 @@
  * Boston, MA 021110-1307, USA.
  */
 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
 #include 
 #include 
-#include "ctree.h"
 #include "volumes.h"
 #include "repair.h"
 #include "disk-io.h"
-#include "print-tree.h"
 #include "task-utils.h"
 #include "transaction.h"
 #include "utils.h"
-#include "commands.h"
-#include "free-space-cache.h"
-#include "free-space-tree.h"
-#include "btrfsck.h"
 #include "qgroup-verify.h"
-#include "rbtree-utils.h"
-#include "backref.h"
-#include "kernel-shared/ulist.h"
-#include "hash.h"
 #include "help.h"
 #include "check/mode-common.h"
 #include "check/mode-original.h"
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] btrfs-progs: Split mode-original code

2018-03-12 Thread Qu Wenruo
This patchset can be fetched from github:
https://github.com/adam900710/btrfs-progs/tree/split_check_for_david

This is the remaining part of the check code split.
After the code split, the old cmds-check.c will return to make every
subcommand have its own cmds-*.c.
And there will be check/mode-original.[ch] and check/mode-lowmem.[ch] to
indicates the supported check mode.

As usual, some patch would be quite big so it won't reach mail list.
Please check github for full change.

Qu Wenruo (3):
  btrfs-progs: check: Move more shared codes to mode-common.c
  btrfs-progs: Move the remaining original mode code to mode-original.c
  btrfs-progs: Move check/main.c to cmds-check.c to maintain the
subcommand hierarchy

 Makefile  |7 +-
 btrfsck.h |1 -
 check/mode-common.c   | 1724 +
 check/mode-common.h   |   38 +
 check/{main.c => mode-original.c} | 6906 -
 check/mode-original.h |   32 +-
 cmds-check.c  |  661 
 7 files changed, 4724 insertions(+), 4645 deletions(-)
 rename check/{main.c => mode-original.c} (76%)
 create mode 100644 cmds-check.c

-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid1 volume stuck as read-only: How to dump, recreate and restore its content?

2018-03-12 Thread Duncan
Adam Borowski posted on Sun, 11 Mar 2018 18:47:13 +0100 as excerpted:

> On Sun, Mar 11, 2018 at 11:28:08PM +0700, Andreas Hild wrote:
>> Following a physical disk failure of a RAID1 array, I tried to mount
>> the remaining volume of a root partition with "-o degraded". For some
>> reason it ended up as read-only as described here:
>> https://btrfs.wiki.kernel.org/index.php/
Gotchas#raid1_volumes_only_mountable_once_RW_if_degraded
>> 
>> 
>> How to precisely do this: dump, recreate and restore its contents?
>> Could someone please provided more details how to recover this volume
>> safely?
> 
>> Linux debian 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3 (2017-12-03) x86_64
>> GNU/Linux
> 
>> [ 1313.279140] BTRFS warning (device sdb2): missing devices (1)
>> exceeds the limit (0), writeable mount is not allowed
> 
> I'd recommend instead going with kernel 4.14 or newer (available in
> stretch-backports), which handles this case well without the need to
> restore.  If there's no actual data loss (there shouldn't be, it's RAID1
> with only a single device missing), you can mount degraded normally,
> then balance the data onto the new disk.
> 
> Recovery with 4.9 is unpleasant.

Second the 4.14+ kernel with the per-chunk-check (vs. older per-device) 
patches recommendation, both in general and if the priority is getting 
the existing filesystem back into normal writable condition.

And since we're talking about upgrades, I'll mention that 4.7 btrfs-tools 
version as well.  For normal operations, mounting, writing, balance, 
scrub, etc, it's the kernel version that does the work (the userspace 
scrub and balance commands just call the appropriate kernel 
functionality), so it's the kernel version that's important to keep 
current as having the latest bugfixes.  However, when things go wrong and 
you're running commands such as btrfs check, restore, or rescue, as well 
as when you create a new btrfs with mkfs.btrfs, it's the userspace code 
that does the work and thus the userspace code you want to have the 
latest bugfixes in ordered to have the greatest chance at a fix.

And with userspace versions synced with kernelspace and coming out 
shortly after the kernel release of the same major.minor, five such 
kernel releases a year, and 4.15 being the current kernel release, 4.7 
userspace is 8 kernel series releases and over a year and a half 
outdated.  Put differently, 4.7 is missing a year and a half worth of 
bugfixes that you won't have when you run it to try to check or recover 
that btrfs that won't mount!  Do you *really* want to risk your data on 
bugs that were after all discovered and fixed over a year ago?


Meanwhile, if the priority is simply ensuring access to the data, and 
you'd prefer to stick with the 4.9-LTS series kernel if possible, then 
the existing read-only filesystem should let you do just that, reliably 
read the files in ordered to copy them elsewhere, say to a new btrfs, tho 
it can just as easily be non-btrfs if desired, since you're just copying 
the files as normal.  It's a read-only filesystem, but that shouldn't 
prevent you copying the files off, just prevent writing.

No need to worry about btrfs restore at all, which after all is designed 
for disaster recovery and thus (among other things) doesn't verify 
checksums on the data it recovers, in ordered to give the best chance at 
recovery when things are already badly wrong, so just copying the data 
off the read-only filesystem is a better option when it's available, as 
it is here.


Alternatively, since the value of data is defined not by empty claims but 
by the backups you consider it worth having of that data (and raid is not 
a backup), you either have a backup to copy from if necessary, or by lack 
of said backup, you defined the value of the data as too trivial to be 
worth bothering with a backup, in which case it's trivial enough it may 
not be worth bothering copying it off, either -- just blow it away with a 
fresh mkfs and start over.

And should you be reconsidering your backup-action (or lack thereof) 
definition of the value of the data... consider yourself luck fate didn't 
take you up on that definition... this time... and take the opportunity 
presented to make that backup while you have the chance, because fate may 
actually take you up on that value definition, next time. =:^)

(FWIW, about a year ago I upgraded even my backups to ssd, because I 
wasn't comfortable with amount of unprotected data I had in the delta 
between current/working and last backup, because backups were enough of a 
hassle I kept putting them off.  By buying ssds, I deliberately chose to 
spend money to bring down the hassle cost of the backups, and thus my 
"trivial value" threshold definition, and backups are fast enough now 
that as I predicted, I make them far more often.  So I'm walking my own 
talk, and am able to sleep much more comfortably now as I'm not worrying 
about that backup I put off and the chance fate might take me up on 

[PATCH] btrfs-progs: check: Fix false alerts on sector sized compressed inline file extent

2018-03-12 Thread Qu Wenruo
Commit 9d015c984006 ("btrfs-progs: Limit inline extent below page size")
tries to fix the convert problem which we could create large inline file
extent but kernel can't hanle.

This leads the false alert about fsck test case 020.

And after discussion about the problem in the mail list, we still
need to support compressed inline file extent whose ram_bytes can be
sector size.

So instead of embedding the sector size check into
BTRFS_MAX_INLINE_DATA_SIZE(), we still need to manually check ram_bytes
and inline_item_len separately.

This patch reverts the fix in 9d015c984006 ("btrfs-progs: Limit inline
extent below page size") and do it manually in check and convert, so we
won't need to maintain 2 different BTRFS_MAX_INLINE_DATA_SIZE().

Fixes: 9d015c984006 ("btrfs-progs: Limit inline extent below page size")
Signed-off-by: Qu Wenruo 
---
 check/main.c  | 25 +++--
 check/mode-lowmem.c   | 30 +++---
 convert/source-ext2.c |  4 ++--
 ctree.h   | 10 ++
 4 files changed, 54 insertions(+), 15 deletions(-)

diff --git a/check/main.c b/check/main.c
index 392195ca324e..c887b139ccdb 100644
--- a/check/main.c
+++ b/check/main.c
@@ -1435,6 +1435,8 @@ static int process_file_extent(struct btrfs_root *root,
u64 disk_bytenr = 0;
u64 extent_offset = 0;
u64 mask = root->fs_info->sectorsize - 1;
+   u32 max_inline_size = min_t(u32, mask,
+   BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info));
int extent_type;
int ret;
 
@@ -1460,11 +1462,30 @@ static int process_file_extent(struct btrfs_root *root,
extent_type = btrfs_file_extent_type(eb, fi);
 
if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
+   u8 compression = btrfs_file_extent_compression(eb, fi);
+   struct btrfs_item *item = btrfs_item_nr(slot);
+
num_bytes = btrfs_file_extent_inline_len(eb, slot, fi);
if (num_bytes == 0)
rec->errors |= I_ERR_BAD_FILE_EXTENT;
-   if (num_bytes > BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info))
-   rec->errors |= I_ERR_FILE_EXTENT_TOO_LARGE;
+   /*
+* Here is the hassle.
+* For compressed inline file extent, we allow ram_bytes to
+* be as large as sectorsize, and only limit its on-disk
+* data size below sectorsize.
+* But for uncompressed inline file extent, we limit
+* the ram_bytes to below sectorsize.
+*/
+   if (compression) {
+   if (btrfs_file_extent_inline_item_len(eb, item) >
+   max_inline_size ||
+   num_bytes > root->fs_info->sectorsize)
+   rec->errors |= I_ERR_FILE_EXTENT_TOO_LARGE;
+   } else {
+   if (num_bytes > max_inline_size)
+   rec->errors |= I_ERR_FILE_EXTENT_TOO_LARGE;
+   }
+
rec->found_size += num_bytes;
num_bytes = (num_bytes + mask) & ~mask;
} else if (extent_type == BTRFS_FILE_EXTENT_REG ||
diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 0c932b343db2..ae81dfb60d7e 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -1417,7 +1417,8 @@ static int check_file_extent(struct btrfs_root *root, 
struct btrfs_key *fkey,
u64 csum_found; /* In byte size, sectorsize aligned */
u64 search_start;   /* Logical range start we search for csum */
u64 search_len; /* Logical range len we search for csum */
-   u32 max_inline_extent_size = BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info);
+   u32 max_inline_extent_size = min_t(u32, root->fs_info->sectorsize - 1,
+   BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info));
unsigned int extent_type;
unsigned int is_hole;
int compressed = 0;
@@ -1441,12 +1442,35 @@ static int check_file_extent(struct btrfs_root *root, 
struct btrfs_key *fkey,
root->objectid, fkey->objectid, fkey->offset);
err |= FILE_EXTENT_ERROR;
}
-   if (extent_num_bytes > max_inline_extent_size) {
+   /*
+* Compressed inline and uncompressed inline has different limit
+* on ram_bytes and item size.
+*/
+   if (compressed) {
+   if (extent_num_bytes > root->fs_info->sectorsize) {
+   error(
+"root %llu EXTENT_DATA[%llu %llu] too large inline extent ram size, have %llu, 
max: %u",
+   root->objectid, fkey->objectid,
+   fkey->offset, extent_num_bytes,
+   root->fs_info->sectorsize);
+