Re: Regression with crc32c selection?

2018-07-23 Thread Holger Hoffstätte

On 07/23/18 18:50, David Sterba wrote:

On Mon, Jul 23, 2018 at 04:13:26PM +0200, Holger Hoffstätte wrote:

While backporting a bunch of fixes to my own 4.16.x tree
(4.17 had a few too many bugs for my taste) I also ended up merging:


Curious, bugs in btrfs or the whole 4.17 kernel? And if bugs, real
breakage or backported fixes?


Overall. I don't remember specifics but skimming lkml at the time
didn't inspire a lot of confidence, and since I already had a large
number of hand-picked & backported patches from 4.17/4.18/4.19 :) for
btrfs, xfs, net, blk-mq & drivers - just the stuff I care about - I skipped
it instead of upgrading & rebasing everything. Might well be that the latest
4.17-stable works reliably, but 4.18 is already around the corner, so..
no really good reason. :)

cheers
Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with crc32c selection?

2018-07-23 Thread David Sterba
On Mon, Jul 23, 2018 at 04:13:26PM +0200, Holger Hoffstätte wrote:
> While backporting a bunch of fixes to my own 4.16.x tree
> (4.17 had a few too many bugs for my taste) I also ended up merging:

Curious, bugs in btrfs or the whole 4.17 kernel? And if bugs, real
breakage or backported fixes?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: python-btrfs & btrfs-heatmap ebuilds now available for Gentoo

2018-07-23 Thread Hans van Kranenburg
On 07/23/2018 04:25 PM, Holger Hoffstätte wrote:
> 
> I wanted to migrate my collection of questionable shell scripts for btrfs
> maintenance/inspection to a more stable foundation and therefore created
> Gentoo ebuilds for Hans van Kranenburg's excellent python-btrfs [1] and
> btrfs-heatmap [2] packages.
> 
> They can be found in my overlay at:
> 
> https://github.com/hhoffstaette/portage/tree/master/sys-fs/python-btrfs
> https://github.com/hhoffstaette/portage/tree/master/sys-fs/btrfs-heatmap

Thanks for doing this!

And now you're making us curious about those
not-so-questionable-any-more scripts. :D

> Both are curently only available for python-3.6 since a change in 3.7
> broke the existing python API [3]. As soon as that is fixed I'll add 3.7
> to the PYTHON_COMPAT entries.

Yes, thanks for testing, I'll have a look at it and will probably also
ask you to help testing the fix.

> btrfs-heatmap properly uses the python-exec wrapper and therefore works
> regardless of currently selected default python version.  :)
> 
> I hope this is useful to someone.

I bet it will. ;-]

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH resend 1/2] btrfs: allow defrag on a file opened ro that has rw permissions

2018-07-23 Thread David Sterba
On Wed, Jul 18, 2018 at 12:08:59AM +0200, Adam Borowski wrote:
> Requiring a rw descriptor conflicts both ways with exec, returning ETXTBSY
> whenever you try to defrag a program that's currently being run, or
> causing intermittent exec failures on a live system being defragged.
> 
> As defrag doesn't change the file's contents in any way, there's no reason
> to consider it a rw operation.  Thus, let's check only whether the file
> could have been opened rw.  Such access control is still needed as
> currently defrag can use extra disk space, and might trigger bugs.
> 
> Signed-off-by: Adam Borowski 
> ---
>  fs/btrfs/ioctl.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 43ecbe620dea..01c150b6ab62 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2941,7 +2941,8 @@ static int btrfs_ioctl_defrag(struct file *file, void 
> __user *argp)
>   ret = btrfs_defrag_root(root);
>   break;
>   case S_IFREG:
> - if (!(file->f_mode & FMODE_WRITE)) {
> + if (!capable(CAP_SYS_ADMIN) &&
> + inode_permission(inode, MAY_WRITE)) {

The dedupe ioctl does the admin check or the FMODE_WRITE, which means
admin can dedupe anything but user has to have the file write open.
Doing the same for defrag should be ok for same reasons it is for
dedupe.

I'm not sure about using plain inode_permissions here, though it looks
correct and I'm not able to see inode vs file descriptor issues that
could cause trouble here. There are uid/gid, rws, acl, immutable,
capabilities, namespace aware checks, security hooks.

So, I'll add the patch to 4.19 queue. It's small and isolated change so
a revert would be easy in case we find something bad. The 2nd patch
should be IMHO part of this change as it's logical to return the error
code in the patch that modifies the user visible behaviour.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with crc32c selection? (solved - pilot error)

2018-07-23 Thread Holger Hoffstätte

On 07/23/18 16:39, Patrik Lundquist wrote:

$ uname -a
Linux nas 4.17.0-1-amd64 #1 SMP Debian 4.17.8-1 (2018-07-20) x86_64 GNU/Linux

$ dmesg | grep Btrfs
[8.168408] Btrfs loaded, crc32c=crc32c-intel

$ lsmod | grep crc32
crc32_pclmul   16384  0
libcrc32c  16384  1 btrfs
crc32c_generic 16384  0
crc32c_intel   24576  2


Ooohh..thanks for that. I wouldn't be surprised it's because my libcrc
is built-in (probably because of built-in xfs) and at initialization time
doesn't see any modules, so it always selects the generic impl.
Since btrfs is only ever loaded last, the previous btrfs init code could
properly detect/load/use the crc32c-intel module.

I switched the crc32c impls to built-in and what do you know:

Jul 23 17:04:26 ragnarok kernel: Btrfs loaded, crc32c=crc32c-intel

\o/

Thanks Patrick!

Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression with crc32c selection?

2018-07-23 Thread Patrik Lundquist
$ uname -a
Linux nas 4.17.0-1-amd64 #1 SMP Debian 4.17.8-1 (2018-07-20) x86_64 GNU/Linux

$ dmesg | grep Btrfs
[8.168408] Btrfs loaded, crc32c=crc32c-intel

$ lsmod | grep crc32
crc32_pclmul   16384  0
libcrc32c  16384  1 btrfs
crc32c_generic 16384  0
crc32c_intel   24576  2

$ grep CRC /boot/config-4.17.0-1-amd64
# CONFIG_PCIE_ECRC is not set
# CONFIG_W1_SLAVE_DS2433_CRC is not set
CONFIG_CRYPTO_CRC32C=m
CONFIG_CRYPTO_CRC32C_INTEL=m
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_CRC32_PCLMUL=m
CONFIG_CRYPTO_CRCT10DIF=y
CONFIG_CRYPTO_CRCT10DIF_PCLMUL=m
CONFIG_CRC_CCITT=m
CONFIG_CRC16=m
CONFIG_CRC_T10DIF=y
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
# CONFIG_CRC4 is not set
CONFIG_CRC7=m
CONFIG_LIBCRC32C=m
CONFIG_CRC8=m


On Mon, 23 Jul 2018 at 16:14, Holger Hoffstätte
 wrote:
>
> Hi,
>
> While backporting a bunch of fixes to my own 4.16.x tree
> (4.17 had a few too many bugs for my taste) I also ended up merging:
>
> df91f56adce1f: libcrc32c: Add crc32c_impl function
> 9678c54388b6a: btrfs: Remove custom crc32c init code
>
> ..which AFAIK went into 4.17 and seemed harmless enough; after fixing up
> a trivial context conflict it builds, runs, all good..except that btrfs
> (apprently?) no longer uses the preferred crc32c-intel module, but the
> crc32c-generic one instead.
>
> In order to rule out any mistakes on my part I built 4.18.0-rc6 and it
> seems to have the same problem:
>
> Jul 23 15:55:09 ragnarok kernel: raid6: sse2x1   gen() 11267 MB/s
> Jul 23 15:55:09 ragnarok kernel: raid6: sse2x1   xor()  8110 MB/s
> Jul 23 15:55:09 ragnarok kernel: raid6: sse2x2   gen() 13409 MB/s
> Jul 23 15:55:09 ragnarok kernel: raid6: sse2x2   xor()  9137 MB/s
> Jul 23 15:55:09 ragnarok kernel: raid6: sse2x4   gen() 15884 MB/s
> Jul 23 15:55:09 ragnarok kernel: raid6: sse2x4   xor() 10579 MB/s
> Jul 23 15:55:09 ragnarok kernel: raid6: using algorithm sse2x4 gen() 15884 
> MB/s
> Jul 23 15:55:09 ragnarok kernel: raid6:  xor() 10579 MB/s, rmw enabled
> Jul 23 15:55:09 ragnarok kernel: raid6: using ssse3x2 recovery algorithm
> Jul 23 15:55:09 ragnarok kernel: xor: automatically using best checksumming 
> function   avx
> Jul 23 15:55:09 ragnarok kernel: Btrfs loaded, crc32c=crc32c-generic
>
> I understand that the new crc32c_impl() function changed from
> crypto_tfm_alg_driver_name() to crypto_shash_driver_name() - could this
> be the reason? The module is loaded just fine, but apprently not used:
>
> $lsmod | grep crc32
> crc32_pclmul   16384  0
> crc32c_intel   24576  0
>
> In other words, is this supposed to happen or is my kernel config somehow
> no longer right? It worked before and doesn't look too wrong:
>
> $grep CRC /etc/kernels/kernel-config-x86_64-4.18.0-rc6
> # CONFIG_PCIE_ECRC is not set
> CONFIG_CRYPTO_CRC32C=y
> CONFIG_CRYPTO_CRC32C_INTEL=m
> CONFIG_CRYPTO_CRC32=m
> CONFIG_CRYPTO_CRC32_PCLMUL=m
> # CONFIG_CRYPTO_CRCT10DIF is not set
> CONFIG_CRC_CCITT=m
> CONFIG_CRC16=y
> # CONFIG_CRC_T10DIF is not set
> CONFIG_CRC_ITU_T=y
> CONFIG_CRC32=y
> # CONFIG_CRC32_SELFTEST is not set
> CONFIG_CRC32_SLICEBY8=y
> # CONFIG_CRC32_SLICEBY4 is not set
> # CONFIG_CRC32_SARWATE is not set
> # CONFIG_CRC32_BIT is not set
> # CONFIG_CRC4 is not set
> # CONFIG_CRC7 is not set
> CONFIG_LIBCRC32C=y
> # CONFIG_CRC8 is not set
>
> Ultimately btrfs (and everything else) works, but the process of how
> the kernel selects a crc32c implementation seems rather mysterious to me. :/
>
> Any insights welcome. If it's a regression I can gladly test fixes.
>
> cheers
> Holger
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2018-07-23 Thread Stefan Etzkorn
 subscribe linux-btrfs


python-btrfs & btrfs-heatmap ebuilds now available for Gentoo

2018-07-23 Thread Holger Hoffstätte



I wanted to migrate my collection of questionable shell scripts for btrfs
maintenance/inspection to a more stable foundation and therefore created
Gentoo ebuilds for Hans van Kranenburg's excellent python-btrfs [1] and
btrfs-heatmap [2] packages.

They can be found in my overlay at:

https://github.com/hhoffstaette/portage/tree/master/sys-fs/python-btrfs
https://github.com/hhoffstaette/portage/tree/master/sys-fs/btrfs-heatmap

Both are curently only available for python-3.6 since a change in 3.7
broke the existing python API [3]. As soon as that is fixed I'll add 3.7
to the PYTHON_COMPAT entries.

btrfs-heatmap properly uses the python-exec wrapper and therefore works
regardless of currently selected default python version.  :)

I hope this is useful to someone.

cheers,
Holger

[1] https://github.com/knorrie/python-btrfs
[2] https://github.com/knorrie/btrfs-heatmap
[3] https://github.com/knorrie/python-btrfs/issues/13
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] btrfs: add helper btrfs_num_devices() to deduce num_devices

2018-07-23 Thread Anand Jain




On 07/23/2018 09:57 PM, David Sterba wrote:

On Fri, Jul 20, 2018 at 07:18:54PM +0800, Anand Jain wrote:



On 07/19/2018 07:53 PM, David Sterba wrote:

On Mon, Jul 16, 2018 at 10:58:11PM +0800, Anand Jain wrote:

When the replace is running the fs_devices::num_devices also includes
the replace device, however in some operations like device delete and
balance it needs the actual num_devices without the repalce devices, so
now the function btrfs_num_devices() just provides that.


We can't run any two from device delete, device replace or balance at
the same time.



Signed-off-by: Anand Jain 
---
v2: add comments. Thanks Nikolay.

   fs/btrfs/volumes.c | 32 ++--
   1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0f4c512aa6b4..1c0b56374992 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1844,6 +1844,21 @@ void btrfs_assign_next_active_device(struct 
btrfs_fs_info *fs_info,
fs_info->fs_devices->latest_bdev = next_device->bdev;
   }
   
+/* Returns btrfs_fs_devices::num_devices excluding replace device if any */

+static inline u64 btrfs_num_devices(struct btrfs_fs_info *fs_info)
+{
+   u64 num_devices = fs_info->fs_devices->num_devices;
+
+   btrfs_dev_replace_read_lock(_info->dev_replace);
+   if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
+   WARN_ON(num_devices < 1);
+   num_devices--;
+   }
+   btrfs_dev_replace_read_unlock(_info->dev_replace);





This does not make sense, besides that > btrfs_dev_replace_is_ongoing is
always going to be false here,


   No. There is a way how balance and replace could co-exists.
   (theoretically, I didn't experiment it yet)
   . Start balance and pause it
   . Now start the replace
   . power-fail
   . The open_ctree() first starts the balance so it must check
   for the replace device otherwise our num_devices calculation will
   be wrong. IMO its not a good idea to remove the replace check here.


I see, the paused states can lead to balance that sees device replace
ongoing as true. This would be good to add to the function comment as
it's not quite obvious why the helper is needed.


   For now a consolidation as in this patch is better.


Yeah, for this context it would be good.  The function name could be
more descriptive what devices it actually counts.


Regarding function names its tough to convince in a short form.
As of now I have the following choices, if there is anything better
I don't mind using it though.
  btrfs_num_devices_minus_replace()
  btrfs_get_num_devices_raw()
  btrfs_num_devices_raw()

Thanks, Anand


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Regression with crc32c selection?

2018-07-23 Thread Holger Hoffstätte

Hi,

While backporting a bunch of fixes to my own 4.16.x tree
(4.17 had a few too many bugs for my taste) I also ended up merging:

df91f56adce1f: libcrc32c: Add crc32c_impl function
9678c54388b6a: btrfs: Remove custom crc32c init code

..which AFAIK went into 4.17 and seemed harmless enough; after fixing up
a trivial context conflict it builds, runs, all good..except that btrfs
(apprently?) no longer uses the preferred crc32c-intel module, but the
crc32c-generic one instead.

In order to rule out any mistakes on my part I built 4.18.0-rc6 and it
seems to have the same problem:

Jul 23 15:55:09 ragnarok kernel: raid6: sse2x1   gen() 11267 MB/s
Jul 23 15:55:09 ragnarok kernel: raid6: sse2x1   xor()  8110 MB/s
Jul 23 15:55:09 ragnarok kernel: raid6: sse2x2   gen() 13409 MB/s
Jul 23 15:55:09 ragnarok kernel: raid6: sse2x2   xor()  9137 MB/s
Jul 23 15:55:09 ragnarok kernel: raid6: sse2x4   gen() 15884 MB/s
Jul 23 15:55:09 ragnarok kernel: raid6: sse2x4   xor() 10579 MB/s
Jul 23 15:55:09 ragnarok kernel: raid6: using algorithm sse2x4 gen() 15884 MB/s
Jul 23 15:55:09 ragnarok kernel: raid6:  xor() 10579 MB/s, rmw enabled
Jul 23 15:55:09 ragnarok kernel: raid6: using ssse3x2 recovery algorithm
Jul 23 15:55:09 ragnarok kernel: xor: automatically using best checksumming 
function   avx
Jul 23 15:55:09 ragnarok kernel: Btrfs loaded, crc32c=crc32c-generic

I understand that the new crc32c_impl() function changed from
crypto_tfm_alg_driver_name() to crypto_shash_driver_name() - could this
be the reason? The module is loaded just fine, but apprently not used:

$lsmod | grep crc32
crc32_pclmul   16384  0
crc32c_intel   24576  0

In other words, is this supposed to happen or is my kernel config somehow
no longer right? It worked before and doesn't look too wrong:

$grep CRC /etc/kernels/kernel-config-x86_64-4.18.0-rc6
# CONFIG_PCIE_ECRC is not set
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32C_INTEL=m
CONFIG_CRYPTO_CRC32=m
CONFIG_CRYPTO_CRC32_PCLMUL=m
# CONFIG_CRYPTO_CRCT10DIF is not set
CONFIG_CRC_CCITT=m
CONFIG_CRC16=y
# CONFIG_CRC_T10DIF is not set
CONFIG_CRC_ITU_T=y
CONFIG_CRC32=y
# CONFIG_CRC32_SELFTEST is not set
CONFIG_CRC32_SLICEBY8=y
# CONFIG_CRC32_SLICEBY4 is not set
# CONFIG_CRC32_SARWATE is not set
# CONFIG_CRC32_BIT is not set
# CONFIG_CRC4 is not set
# CONFIG_CRC7 is not set
CONFIG_LIBCRC32C=y
# CONFIG_CRC8 is not set

Ultimately btrfs (and everything else) works, but the process of how
the kernel selects a crc32c implementation seems rather mysterious to me. :/

Any insights welcome. If it's a regression I can gladly test fixes.

cheers
Holger

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] btrfs: warn for num_devices below 0

2018-07-23 Thread Anand Jain




On 07/23/2018 10:01 PM, David Sterba wrote:

On Mon, Jul 16, 2018 at 10:58:10PM +0800, Anand Jain wrote:

In preparation to de-duplicate a section of code where we deduce the
num_devices, use warn instead of bug.

Signed-off-by: Anand Jain 
---
  fs/btrfs/volumes.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7f4973fc2b52..0f4c512aa6b4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3726,7 +3726,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
num_devices = fs_info->fs_devices->num_devices;
btrfs_dev_replace_read_lock(_info->dev_replace);
if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
-   BUG_ON(num_devices < 1);
+   WARN_ON(num_devices < 1);


I wonder if there any valid cases when there are 0 devices when balance
is started, ie. before num_devices gets decremented.


 num_devices counts the in-memory devices of a fsid.
 On a mounted FS num_devices > 0 always.


The WARN_ON is either redundant or should be turned to a proper sanity
check.


  Yes is redundant. I suggest to delete it.

Thanks, Anand


num_devices--;

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] btrfs: warn for num_devices below 0

2018-07-23 Thread David Sterba
On Mon, Jul 16, 2018 at 10:58:10PM +0800, Anand Jain wrote:
> In preparation to de-duplicate a section of code where we deduce the
> num_devices, use warn instead of bug.
> 
> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/volumes.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 7f4973fc2b52..0f4c512aa6b4 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -3726,7 +3726,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
>   num_devices = fs_info->fs_devices->num_devices;
>   btrfs_dev_replace_read_lock(_info->dev_replace);
>   if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
> - BUG_ON(num_devices < 1);
> + WARN_ON(num_devices < 1);

I wonder if there any valid cases when there are 0 devices when balance
is started, ie. before num_devices gets decremented.

The WARN_ON is either redundant or should be turned to a proper sanity
check.

>   num_devices--;
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] btrfs: add helper btrfs_num_devices() to deduce num_devices

2018-07-23 Thread David Sterba
On Fri, Jul 20, 2018 at 07:18:54PM +0800, Anand Jain wrote:
> 
> 
> On 07/19/2018 07:53 PM, David Sterba wrote:
> > On Mon, Jul 16, 2018 at 10:58:11PM +0800, Anand Jain wrote:
> >> When the replace is running the fs_devices::num_devices also includes
> >> the replace device, however in some operations like device delete and
> >> balance it needs the actual num_devices without the repalce devices, so
> >> now the function btrfs_num_devices() just provides that.
> > 
> > We can't run any two from device delete, device replace or balance at
> > the same time.
> > 
> >>
> >> Signed-off-by: Anand Jain 
> >> ---
> >> v2: add comments. Thanks Nikolay.
> >>
> >>   fs/btrfs/volumes.c | 32 ++--
> >>   1 file changed, 18 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index 0f4c512aa6b4..1c0b56374992 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -1844,6 +1844,21 @@ void btrfs_assign_next_active_device(struct 
> >> btrfs_fs_info *fs_info,
> >>fs_info->fs_devices->latest_bdev = next_device->bdev;
> >>   }
> >>   
> >> +/* Returns btrfs_fs_devices::num_devices excluding replace device if any 
> >> */
> >> +static inline u64 btrfs_num_devices(struct btrfs_fs_info *fs_info)
> >> +{
> >> +  u64 num_devices = fs_info->fs_devices->num_devices;
> >> +
> >> +  btrfs_dev_replace_read_lock(_info->dev_replace);
> >> +  if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
> >> +  WARN_ON(num_devices < 1);
> >> +  num_devices--;
> >> +  }
> >> +  btrfs_dev_replace_read_unlock(_info->dev_replace);
> 
> 
> 
> > This does not make sense, besides that > btrfs_dev_replace_is_ongoing is
> > always going to be false here,
> 
>   No. There is a way how balance and replace could co-exists.
>   (theoretically, I didn't experiment it yet)
>   . Start balance and pause it
>   . Now start the replace
>   . power-fail
>   . The open_ctree() first starts the balance so it must check
>   for the replace device otherwise our num_devices calculation will
>   be wrong. IMO its not a good idea to remove the replace check here.

I see, the paused states can lead to balance that sees device replace
ongoing as true. This would be good to add to the function comment as
it's not quite obvious why the helper is needed.

>   For now a consolidation as in this patch is better.

Yeah, for this context it would be good.  The function name could be
more descriptive what devices it actually counts.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] fs_info cleanups for volume.c

2018-07-23 Thread David Sterba
On Fri, Jul 20, 2018 at 07:37:46PM +0300, Nikolay Borisov wrote:
> Here are a bunch of patches which cleanup extraneous fs_info parameters to 
> function which already take a structure that holds a reference to the 
> fs_info. 
> 
> Except for patches 4 and 5, everything else is correct - due to those 
> functions
> always taking a transaction. 4 and 5 in turn reference the fs_info from 
> struct btrfs_device. Inspecting the callers I managed to convince myself that 
> those function are always called with well-formed btrfs_device i.e one which 
> has its fs_info member initialised. Reviewers might want to pay extra 
> attention to that but otherwise they are trivial. 

4 and 5 look good to me, a device without a valid fs_info has a short
timespan and should not appear anywhere besides the helpers that set up
fs_devices etc. Series added to misc-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix btrfs_write_inode() vs delayed iput deadlock

2018-07-23 Thread David Sterba
On Fri, Jul 20, 2018 at 11:46:10AM -0700, Omar Sandoval wrote:
> From: Josef Bacik 
> 
> We recently ran into the following deadlock involving
> btrfs_write_inode():
> 
> [  +0.005066]  __schedule+0x38e/0x8c0
> [  +0.007144]  schedule+0x36/0x80
> [  +0.006447]  bit_wait+0x11/0x60
> [  +0.006446]  __wait_on_bit+0xbe/0x110
> [  +0.007487]  ? bit_wait_io+0x60/0x60
> [  +0.007319]  __inode_wait_for_writeback+0x96/0xc0
> [  +0.009568]  ? autoremove_wake_function+0x40/0x40
> [  +0.009565]  inode_wait_for_writeback+0x21/0x30
> [  +0.009224]  evict+0xb0/0x190
> [  +0.006099]  iput+0x1a8/0x210
> [  +0.006103]  btrfs_run_delayed_iputs+0x73/0xc0
> [  +0.009047]  btrfs_commit_transaction+0x799/0x8c0
> [  +0.009567]  btrfs_write_inode+0x81/0xb0
> [  +0.008008]  __writeback_single_inode+0x267/0x320
> [  +0.009569]  writeback_sb_inodes+0x25b/0x4e0
> [  +0.008702]  wb_writeback+0x102/0x2d0
> [  +0.007487]  wb_workfn+0xa4/0x310
> [  +0.006794]  ? wb_workfn+0xa4/0x310
> [  +0.007143]  process_one_work+0x150/0x410
> [  +0.008179]  worker_thread+0x6d/0x520
> [  +0.007490]  kthread+0x12c/0x160
> [  +0.006620]  ? put_pwq_unlocked+0x80/0x80
> [  +0.008185]  ? kthread_park+0xa0/0xa0
> [  +0.007484]  ? do_syscall_64+0x53/0x150
> [  +0.007837]  ret_from_fork+0x29/0x40
> 
> Writeback calls btrfs_write_inode(), which calls
> btrfs_commit_transaction(), which calls btrfs_run_delayed_iputs(). If
> iput() is called on that same inode, evict() will wait for writeback
> forever.
> 
> btrfs_write_inode() was originally added way back in 4730a4bc5bf3
> ("btrfs_dirty_inode") to support O_SYNC writes. However, ->write_inode()
> hasn't been used for O_SYNC since 148f948ba877 ("vfs: Introduce new
> helpers for syncing after writing to O_SYNC file or IS_SYNC inode"), so
> btrfs_write_inode() is actually unnecessary (and leads to a bunch of
> unnecessary commits). Get rid of it, which also gets rid of the
> deadlock.
> 
> Signed-off-by: Josef Bacik 
> [Omar: new commit message]
> Signed-off-by: Omar Sandoval 

Thank, added to current 4.19 queue and I'll add tags for stable.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] v3: btrfs-progs: Introduce 'btrfs inspect-internal dump-csum' option

2018-07-23 Thread Lakshmipathi.G
Print csum for a given file on stdout.

Sample usage:
btrfs inspect-internal dump-csum /btrfs/50gbfile /dev/sda4
Signed-off-by: Lakshmipathi.G 
---
 Makefile |   2 +-
 cmds-inspect-dump-csum.c | 239 +++
 cmds-inspect-dump-csum.h |  22 +
 cmds-inspect.c   |   3 +
 4 files changed, 265 insertions(+), 1 deletion(-)
 create mode 100644 cmds-inspect-dump-csum.c
 create mode 100644 cmds-inspect-dump-csum.h

diff --git a/Makefile b/Makefile
index 544410e6..8ad28012 100644
--- a/Makefile
+++ b/Makefile
@@ -122,7 +122,7 @@ cmds_objects = cmds-subvolume.o cmds-filesystem.o 
cmds-device.o cmds-scrub.o \
   cmds-quota.o cmds-qgroup.o cmds-replace.o check/main.o \
   cmds-restore.o cmds-rescue.o chunk-recover.o super-recover.o \
   cmds-property.o cmds-fi-usage.o cmds-inspect-dump-tree.o \
-  cmds-inspect-dump-super.o cmds-inspect-tree-stats.o cmds-fi-du.o 
\
+  cmds-inspect-dump-super.o cmds-inspect-tree-stats.o cmds-fi-du.o 
cmds-inspect-dump-csum.o \
   mkfs/common.o check/mode-common.o check/mode-lowmem.o
 libbtrfs_objects = send-stream.o send-utils.o kernel-lib/rbtree.o btrfs-list.o 
\
   kernel-lib/crc32c.o messages.o \
diff --git a/cmds-inspect-dump-csum.c b/cmds-inspect-dump-csum.c
new file mode 100644
index ..06dba368
--- /dev/null
+++ b/cmds-inspect-dump-csum.c
@@ -0,0 +1,239 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include "kerncompat.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ctree.h"
+#include "disk-io.h"
+#include "print-tree.h"
+#include "transaction.h"
+#include "list.h"
+#include "utils.h"
+#include "commands.h"
+#include "crc32c.h"
+#include "cmds-inspect-dump-csum.h"
+#include "help.h"
+#include "volumes.h"
+
+
+const char * const cmd_inspect_dump_csum_usage[] = {
+   "btrfs inspect-internal dump-csum  ",
+   "Get csums for the given file.",
+   NULL
+};
+
+int btrfs_lookup_csums(struct btrfs_trans_handle *trans, struct btrfs_root 
*root,
+   struct btrfs_path *path, u64 bytenr, int cow, int total_csums)
+{
+   int ret;
+   int i;
+   int start_pos = 0;
+   struct btrfs_key file_key;
+   struct btrfs_key found_key;
+   struct btrfs_csum_item *item;
+   struct extent_buffer *leaf;
+   u64 csum_offset = 0;
+   u16 csum_size =
+   btrfs_super_csum_size(root->fs_info->super_copy);
+   int csums_in_item = 0;
+   unsigned int tree_csum = 0;
+   int pending_csums = total_csums;
+   static int cnt=1;
+
+   file_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
+   file_key.offset = bytenr;
+   file_key.type = BTRFS_EXTENT_CSUM_KEY;
+   ret = btrfs_search_slot(trans, root, _key, path, 0, cow);
+   if (ret < 0)
+   goto fail;
+   while(1){
+   leaf = path->nodes[0];
+   if (ret > 0) {
+   ret = 1;
+   if (path->slots[0] == 0)
+   goto fail;
+   path->slots[0]--;
+   btrfs_item_key_to_cpu(leaf, _key, path->slots[0]);
+   if (found_key.type != BTRFS_EXTENT_CSUM_KEY){
+   fprintf(stderr, "\nInvalid key found.");
+   goto fail;
+   }
+
+   csum_offset = ((bytenr - found_key.offset) / 
root->fs_info->sectorsize) * csum_size;
+   csums_in_item = btrfs_item_size_nr(leaf, 
path->slots[0]);
+   csums_in_item /= csum_size;
+   csums_in_item -= ( bytenr - found_key.offset ) / 
root->fs_info->sectorsize;
+   start_pos=csum_offset;
+   }
+   if (path->slots[0] >= btrfs_header_nritems(leaf)) {
+   if (pending_csums > 0){
+   ret = btrfs_next_leaf(root, path);
+   if (ret == 0)
+ continue;
+   }
+   }
+   item = btrfs_item_ptr(leaf, path->slots[0], struct 
btrfs_csum_item);
+   btrfs_item_key_to_cpu(leaf, _key, path->slots[0]);
+   if 

Re: [PATCH v2] btrfs: extent-tree: Remove dead alignment check

2018-07-23 Thread David Sterba
On Mon, Jul 23, 2018 at 02:47:29PM +0800, Qu Wenruo wrote:
> In find_free_extent() under checks: tag, we have the following code
> 
> --
>   search_start = ALIGN(offset, fs_info->stripesize);
>   /* move on to the next group */
>   if (search_start + num_bytes >
>   block_group->key.objectid + block_group->key.offset) {
>   btrfs_add_free_space(block_group, offset, num_bytes);
>   goto loop;
>   }
>   if (offset < search_start)
>   btrfs_add_free_space(block_group, offset,
>search_start - offset);
>   BUG_ON(offset > search_start);
> --
> 
> However ALIGN() is rounding up, thus @search_start >= @offset and that
> BUG_ON() will never be triggered.
> 
> Signed-off-by: Qu Wenruo 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: remove unused key assignment when doing a full send

2018-07-23 Thread David Sterba
On Mon, Jul 23, 2018 at 09:10:09AM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> At send.c:full_send_tree() we were setting the 'key' variable in the loop
> while never using it later. We were also using two btrfs_key variables
> to store the initial key for search and the key found in every iteration
> of the loop. So remove this useless key assignment and use the same
> btrfs_key variable to store the initial search key and the key found in
> each iteration. This was introduced in the initial send commit but was
> never used (commit 31db9f7c23fb ("Btrfs: introduce BTRFS_IOC_SEND for
> btrfs send/receive").
> 
> Signed-off-by: Filipe Manana 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: test send with snapshots that have files deleted while open

2018-07-23 Thread fdmanana
From: Filipe Manana 

Test that we are able to do send operations when one of the source
snapshots (or subvolume) has a file that is deleted while there is still
a open file descriptor for that file.

This test is motivated by a bug found in btrfs which is fixed by a patch
for the linux kernel titled:

  "Btrfs: fix send failure when root has deleted files still open"

Signed-off-by: Filipe Manana 
---
 tests/btrfs/168 | 136 
 tests/btrfs/168.out |  13 +
 tests/btrfs/group   |   1 +
 3 files changed, 150 insertions(+)
 create mode 100755 tests/btrfs/168
 create mode 100644 tests/btrfs/168.out

diff --git a/tests/btrfs/168 b/tests/btrfs/168
new file mode 100755
index ..9a159d61
--- /dev/null
+++ b/tests/btrfs/168
@@ -0,0 +1,136 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test No. 168
+#
+# Test that we are able to do send operations when one of the source snapshots
+# (or subvolume) has a file that is deleted while there is still a open file
+# descriptor for that file.
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+   rm -fr $send_files_dir
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_test
+_require_scratch
+_require_btrfs_command "property"
+_require_fssum
+
+send_files_dir=$TEST_DIR/btrfs-test-$seq
+
+rm -f $seqres.full
+rm -fr $send_files_dir
+mkdir $send_files_dir
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+
+# Create a subvolume used for first full send test and used to create two
+# snapshots for the incremental send test.
+$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/sv1 | _filter_scratch
+
+# Create some test files.
+$XFS_IO_PROG -f -c "pwrite -S 0xf1 0 64K" $SCRATCH_MNT/sv1/foo >>$seqres.full
+$XFS_IO_PROG -f -c "pwrite -S 0x7b 0 90K" $SCRATCH_MNT/sv1/bar >>$seqres.full
+$XFS_IO_PROG -f -c "pwrite -S 0xea 0 256K" $SCRATCH_MNT/sv1/baz >>$seqres.full
+
+# Flush the previous buffered writes, since setting a subvolume to RO mode
+# does not do it and we want to check if the data is correctly transmitted by
+# the send operations.
+sync
+
+# Keep an open file descriptor on file bar.
+exec 73<$SCRATCH_MNT/sv1/bar
+
+# While the file descriptor is open, delete the file, set the subvolume to
+# read-only mode and see if a full send operation succeeds.
+unlink $SCRATCH_MNT/sv1/bar
+$BTRFS_UTIL_PROG property set $SCRATCH_MNT/sv1 ro true
+$FSSUM_PROG -A -f -w $send_files_dir/sv1.fssum $SCRATCH_MNT/sv1
+$BTRFS_UTIL_PROG send -f $send_files_dir/sv1.send $SCRATCH_MNT/sv1 2>&1 \
+   | _filter_scratch
+
+# Now that the we did the full send, close the file descriptor and set the
+# subvolume back to read-write mode.
+exec 73>&-
+$BTRFS_UTIL_PROG property set $SCRATCH_MNT/sv1 ro false
+
+# Now try an incremental send operation while there's an open file descriptor
+# for a file that was deleted from the send snapshot (while it was in 
read-write
+# mode).
+
+# Create a snapshot of the subvolume, to be used later as the parent snapshot
+# for an incremental send operation.
+$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/sv1 $SCRATCH_MNT/snap1 \
+   | _filter_scratch
+
+# First do a full send of this snapshot.
+$FSSUM_PROG -A -f -w $send_files_dir/snap1.fssum $SCRATCH_MNT/snap1
+$BTRFS_UTIL_PROG send -f $send_files_dir/snap1.send $SCRATCH_MNT/snap1 2>&1 \
+   | _filter_scratch
+
+# Modify file baz, to check that the incremental send operation does not miss
+# that this file has changed.
+$XFS_IO_PROG -c "pwrite -S 0x19 4K 8K" $SCRATCH_MNT/sv1/baz >>$seqres.full
+
+# Create a second snapshot of the subvolume, to be used later as the send
+# snapshot of an incremental send operation.
+$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT/sv1 $SCRATCH_MNT/snap2 \
+   | _filter_scratch
+
+# Temporarily turn the second snapshot to read-write mode and then open a file
+# descriptor on its foo file.
+$BTRFS_UTIL_PROG property set $SCRATCH_MNT/snap2 ro false
+exec 73<$SCRATCH_MNT/snap2/foo
+
+# Delete the foo file from the second snapshot while holding the file 
descriptor
+# open.
+unlink $SCRATCH_MNT/snap2/foo
+
+# Set the second snapshot back to RO mode, so that we can use it for the
+# incremental send operation.
+$BTRFS_UTIL_PROG property set $SCRATCH_MNT/snap2 ro true
+
+# Do the incremental send while there's an open file descriptor on file foo 
from
+# the second snapshot.
+$FSSUM_PROG -A -f -w $send_files_dir/snap2.fssum $SCRATCH_MNT/snap2
+$BTRFS_UTIL_PROG send -f $send_files_dir/snap2.send -p $SCRATCH_MNT/snap1 \
+   $SCRATCH_MNT/snap2 2>&1 | _filter_scratch
+
+# Now that the incremental send is done close the file 

[PATCH] Btrfs: remove unused key assignment when doing a full send

2018-07-23 Thread fdmanana
From: Filipe Manana 

At send.c:full_send_tree() we were setting the 'key' variable in the loop
while never using it later. We were also using two btrfs_key variables
to store the initial key for search and the key found in every iteration
of the loop. So remove this useless key assignment and use the same
btrfs_key variable to store the initial search key and the key found in
each iteration. This was introduced in the initial send commit but was
never used (commit 31db9f7c23fb ("Btrfs: introduce BTRFS_IOC_SEND for
btrfs send/receive").

Signed-off-by: Filipe Manana 
---
 fs/btrfs/send.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 6ffe1c983b76..8acc0e712cfa 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6451,7 +6451,6 @@ static int full_send_tree(struct send_ctx *sctx)
int ret;
struct btrfs_root *send_root = sctx->send_root;
struct btrfs_key key;
-   struct btrfs_key found_key;
struct btrfs_path *path;
struct extent_buffer *eb;
int slot;
@@ -6473,17 +6472,13 @@ static int full_send_tree(struct send_ctx *sctx)
while (1) {
eb = path->nodes[0];
slot = path->slots[0];
-   btrfs_item_key_to_cpu(eb, _key, slot);
+   btrfs_item_key_to_cpu(eb, , slot);
 
-   ret = changed_cb(path, NULL, _key,
+   ret = changed_cb(path, NULL, ,
 BTRFS_COMPARE_TREE_NEW, sctx);
if (ret < 0)
goto out;
 
-   key.objectid = found_key.objectid;
-   key.type = found_key.type;
-   key.offset = found_key.offset + 1;
-
ret = btrfs_next_item(send_root, path);
if (ret < 0)
goto out;
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix send failure when root has deleted files still open

2018-07-23 Thread fdmanana
From: Filipe Manana 

The more common use case of send involves creating a RO snapshot and then
use it for a send operation. In this case it's not possible to have inodes
in the snapshot that have a link count of zero (inode with an orphan item)
since during snapshot creation we do the orphan cleanup. However, other
less common use cases for send can end up seeing inodes with a link count
of zero and in this case the send operation fails with a ENOENT error
because any attempt to generate a path for the inode, with the purpose
of creating it or updating it at the receiver, fails since there are no
inode reference items. One use case it to use a regular subvolume for
a send operation after turning it to RO mode or turning a RW snapshot
into RO mode and then using it for a send operation. In both cases, if a
file gets all its hard links deleted while there is an open file
descriptor before turning the subvolume/snapshot into RO mode, the send
operation will encounter an inode with a link count of zero and then
fail with errno ENOENT.

Example using a full send with a subvolume:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ btrfs subvolume create /mnt/sv1
  $ touch /mnt/sv1/foo
  $ touch /mnt/sv1/bar

  # keep an open file descriptor on file bar
  $ exec 73> /mnt/sv1/bar

  $ btrfs subvolume snapshot -r /mnt/sv1 /mnt/snap2

  # Turn the second snapshot to RW mode and delete file foo while
  # holding an open file descriptor on it.
  $ btrfs property set /mnt/snap2 ro false
  $ exec 73
Signed-off-by: Filipe Manana 
---
 fs/btrfs/send.c | 143 
 1 file changed, 133 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index c47f62b19226..6ffe1c983b76 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -100,6 +100,7 @@ struct send_ctx {
u64 cur_inode_rdev;
u64 cur_inode_last_extent;
u64 cur_inode_next_write_offset;
+   bool ignore_cur_inode;
 
u64 send_progress;
 
@@ -5799,6 +5800,9 @@ static int finish_inode_if_needed(struct send_ctx *sctx, 
int at_end)
int pending_move = 0;
int refs_processed = 0;
 
+   if (sctx->ignore_cur_inode)
+   return 0;
+
ret = process_recorded_refs_if_needed(sctx, at_end, _move,
  _processed);
if (ret < 0)
@@ -5917,6 +5921,94 @@ static int finish_inode_if_needed(struct send_ctx *sctx, 
int at_end)
return ret;
 }
 
+struct parent_paths_ctx {
+   struct list_head *refs;
+   struct send_ctx *sctx;
+};
+
+static int record_parent_ref(int num, u64 dir, int index, struct fs_path *name,
+void *ctx)
+{
+   struct parent_paths_ctx *ppctx = ctx;
+
+   return record_ref(ppctx->sctx->parent_root, dir, name, ppctx->sctx,
+ ppctx->refs);
+}
+
+/*
+ * Issue unlink operations for all paths of the current inode found in the
+ * parent snapshot.
+ */
+static int btrfs_unlink_all_paths(struct send_ctx *sctx)
+{
+   LIST_HEAD(deleted_refs);
+   struct btrfs_path *path;
+   struct btrfs_key key;
+   struct parent_paths_ctx ctx;
+   int ret;
+
+   path = alloc_path_for_send();
+   if (!path)
+   return -ENOMEM;
+
+   key.objectid = sctx->cur_ino;
+   key.type = BTRFS_INODE_REF_KEY;
+   key.offset = 0;
+   ret = btrfs_search_slot(NULL, sctx->parent_root, , path, 0, 0);
+   if (ret < 0)
+   goto out;
+
+   ctx.refs = _refs;
+   ctx.sctx = sctx;
+
+   while (true) {
+   struct extent_buffer *eb = path->nodes[0];
+   int slot = path->slots[0];
+
+   if (slot >= btrfs_header_nritems(eb)) {
+   ret = btrfs_next_leaf(sctx->parent_root, path);
+   if (ret < 0)
+   goto out;
+   else if (ret > 0)
+   break;
+   continue;
+   }
+
+   btrfs_item_key_to_cpu(eb, , slot);
+   if (key.objectid != sctx->cur_ino)
+   break;
+   if (key.type != BTRFS_INODE_REF_KEY &&
+   key.type != BTRFS_INODE_EXTREF_KEY)
+   break;
+
+   ret = iterate_inode_ref(sctx->parent_root, path, , 1,
+   record_parent_ref, );
+   if (ret < 0)
+   goto out;
+
+   path->slots[0]++;
+   }
+
+   while (!list_empty(_refs)) {
+   struct recorded_ref *ref;
+
+   ref = list_first_entry(_refs, struct recorded_ref,
+  list);
+   ret = send_unlink(sctx, ref->full_path);
+   if (ret < 0)
+   goto out;
+   fs_path_free(ref->full_path);
+   list_del(>list);
+   kfree(ref);
+   

Re: [PATCH v2 2/2] btrfs: fix missing superblock update in the device delete commit transaction

2018-07-23 Thread Anand Jain




On 07/21/2018 02:01 PM, Lu Fengqi wrote:

On Tue, Jul 03, 2018 at 05:07:24PM +0800, Anand Jain wrote:

When a device is deleted, the btrfs_super_block::number_devices is
reduced by 1, but we do that after the commit transaction, so this
change did not made it to the disk and waits for the next commit
transaction whenever it happens.

This can be easily demonstrated using the following test case where I
use the btrfs device ready cli to read the disk and report.

  mkfs.btrfs -fq -dsingle -msingle $dev1 $dev2
  mount $dev1 /btrfs
  btrfs dev del $dev2 /btrfs
  btrfs dev ready $dev1; echo RESULT=$? <-- 1
   Without this patch RESULT returns 1, indicating not ready!

  Testing with a seed device:

  mkfs.btrfs -fq $dev1
  btrfstune -S1 $dev1
  mount $dev1 /btrfs
  btrfs dev add -f $dev2 /btrfs
  umount /btrfs
  mount $dev2 /btrfs
  btrfs dev del $dev1 /btrfs
  btrfs dev ready $dev2; echo RESULT=$? <-- 1

  Fix this by bringing in the num_devices update with in the
  current commit transaction.
  Also align the local variable declarations in the function
  btrfs_rm_dev_item()
  Delete a todo comment about transient inconsistent state


Hi, Anand

The btrfs/164 failed when I run xfstests with kdave/misc-next.


 And the

result of git bisect shows this patch may be the cause. Please see the
following log and dmesg.

xfstests log:
# sudo ./check btrfs/164
FSTYP -- btrfs
PLATFORM  -- Linux/x86_64 larch 4.18.0-rc5+
MKFS_OPTIONS  -- /dev/vdb2
MOUNT_OPTIONS -- /dev/vdb2 /mnt/scratch

btrfs/164 14s ... [failed, exit status 1]

dmesg:
[  212.009967] general protection fault:  [#1] SMP PTI
[  212.015834] CPU: 1 PID: 5665 Comm: btrfs Tainted: G   O  
4.18.0-rc5+ #2
[  212.021985] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 
02/06/2015
[  212.031137] RIP: 0010:btrfs_update_commit_device_size+0x74/0xd0 [btrfs]



Thanks for the report.


--
void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info)
{
struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
struct btrfs_device *curr, *next;

if (list_empty(_devices->resized_devices))
return;

mutex_lock(_devices->device_list_mutex);
mutex_lock(_info->chunk_mutex);
list_for_each_entry_safe(curr, next, _devices->resized_devices,
 resized_list) {
list_del_init(>resized_list); <  GPF
curr->commit_total_bytes = curr->disk_total_bytes;
}
mutex_unlock(_info->chunk_mutex);
mutex_unlock(_devices->device_list_mutex);
}
--


I can't reproduce the issue. Do you reproduce consistently? and I
wonder if your workspace contains the latest changes from
kdave/misc-next? Because last weeks kdave/misc had some issues.
Can you please give a try?

Also I found another issue - if the device being deleted is a seed
device as in btrfs/164, we don't have to call
btrfs_update_commit_device_size() as because we don't write its super.
But this isn't related to this commit nor the GPF that you saw.

Thanks, Anand




[  212.036659] Code: ef e8 60 cc 72 e1 49 8b 96 08 01 00 00 48 8b 0a 48 8d 82 d8 fe 
ff ff 48 8d b1 d8 fe ff ff 49 39 d4 74 47 48 8b b8 30 01 00 00 <48> 89 79 08 48 
89 0f 48 89 90 28 01 00 00 48 89 90 30 01 00 00 48
[  212.052862] RSP: 0018:c9c3bbc0 EFLAGS: 00010202
[  212.057537] RAX: 88004d45ea88 RBX: 88005f246820 RCX: 6b6b6b6b6b6b6b6b
[  212.066491] RDX: 88004d45ebb0 RSI: 6b6b6b6b6b6b6a43 RDI: 6b6b6b6b6b6b6b6b
[  212.072735] RBP: c9c3bbe0 R08:  R09: 0001
[  212.078961] R10: c9c3bbb0 R11: 82ea2800 R12: 88005f2468d0
[  212.084967] R13: 880066e4c910 R14: 88005f2467c8 R15: 880066e4d138
[  212.093457] FS:  7fca3f6e08c0() GS:88007f80() 
knlGS:
[  212.099305] CS:  0010 DS:  ES:  CR0: 80050033
[  212.103643] CR2: 7fdd21385f88 CR3: 75c4e000 CR4: 06e0
[  212.108514] Call Trace:
[  212.110829]  btrfs_commit_transaction+0x525/0x980 [btrfs]
[  212.114745]  btrfs_rm_device+0x527/0x560 [btrfs]
[  212.118082]  btrfs_ioctl+0x2e99/0x31e0 [btrfs]
[  212.123417]  ? __lock_acquire+0x396/0x1910
[  212.126360]  ? __might_fault+0x3e/0x90
[  212.128924]  ? __might_fault+0x85/0x90
[  212.131398]  ? _copy_to_user+0x65/0x80
[  212.133870]  ? cp_new_stat+0x120/0x150
[  212.136354]  ? native_sched_clock+0x3e/0xa0
[  212.138825]  do_vfs_ioctl+0xa9/0x6c0
[  212.141175]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[  212.144476]  ? do_vfs_ioctl+0xa9/0x6c0
[  212.146889]  ? trace_hardirqs_on+0xd/0x10
[  212.149210]  ksys_ioctl+0x67/0x90
[  212.151499]  __x64_sys_ioctl+0x1a/0x20
[  212.154064]  do_syscall_64+0x6d/0x690
[  212.156146]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  212.158720]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  212.161363] RIP: 0033:0x7fca3e4c1667
[  212.163437] Code: 00 00 90 48 8b 05 e9 67 

[PATCH v2] btrfs: extent-tree: Remove dead alignment check

2018-07-23 Thread Qu Wenruo
In find_free_extent() under checks: tag, we have the following code

--
search_start = ALIGN(offset, fs_info->stripesize);
/* move on to the next group */
if (search_start + num_bytes >
block_group->key.objectid + block_group->key.offset) {
btrfs_add_free_space(block_group, offset, num_bytes);
goto loop;
}
if (offset < search_start)
btrfs_add_free_space(block_group, offset,
 search_start - offset);
BUG_ON(offset > search_start);
--

However ALIGN() is rounding up, thus @search_start >= @offset and that
BUG_ON() will never be triggered.

Signed-off-by: Qu Wenruo 
---
changelog:
v2:
  Fix my stupid error on judging whether ALIGN() is rounding up or down.
---
 fs/btrfs/extent-tree.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f18e86177067..de4a85dfb98e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7754,7 +7754,7 @@ static noinline int find_free_extent(struct btrfs_fs_info 
*fs_info,
goto loop;
}
 checks:
-   search_start = ALIGN(offset, fs_info->stripesize);
+   search_start = round_up(offset, fs_info->stripesize);
 
/* move on to the next group */
if (search_start + num_bytes >
@@ -7766,7 +7766,6 @@ static noinline int find_free_extent(struct btrfs_fs_info 
*fs_info,
if (offset < search_start)
btrfs_add_free_space(block_group, offset,
 search_start - offset);
-   BUG_ON(offset > search_start);
 
ret = btrfs_add_reserved_bytes(block_group, ram_bytes,
num_bytes, delalloc);
-- 
2.18.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: extent-tree: Remove dead alignment check

2018-07-23 Thread Qu Wenruo


On 2018年07月23日 14:22, Qu Wenruo wrote:
> In find_free_extent() under checks: tag, we have the following code
> 
> --
>   search_start = ALIGN(offset, fs_info->stripesize);
>   /* move on to the next group */
>   if (search_start + num_bytes >
>   block_group->key.objectid + block_group->key.offset) {
>   btrfs_add_free_space(block_group, offset, num_bytes);
>   goto loop;
>   }
>   if (offset < search_start)
>   btrfs_add_free_space(block_group, offset,
>search_start - offset);
>   BUG_ON(offset > search_start);
> --
> 
> However ALIGN() is rounding down, thus @search_start <= @offset.

Well, ALIGN() is rounding up, so this is completely wrong.

But still we have some conflicting BUG_ON() against ALIGN().

I'll update the patch and hopes no one found my stupid error.

Thanks,
Qu

> 
> That later (offset < search_start) check will never be true, but that
> BUG_ON(offset > search_start) can be triggered as long as our @offset is
> not aligned to @stripesize.
> 
> However we never has such report on this BUG_ON(), which means the
> search result @offset is always aligned to @fs_info->stripesize (which
> is also set to sector size at mkfs time), or it's handled by block group
> range check.
> 
> Anyway, remove such dead unalignment check.
> 
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/extent-tree.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index f18e86177067..48a2aec7dc59 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -7754,7 +7754,7 @@ static noinline int find_free_extent(struct 
> btrfs_fs_info *fs_info,
>   goto loop;
>   }
>  checks:
> - search_start = ALIGN(offset, fs_info->stripesize);
> + search_start = round_down(offset, fs_info->stripesize);
>  
>   /* move on to the next group */
>   if (search_start + num_bytes >
> @@ -7763,11 +7763,6 @@ static noinline int find_free_extent(struct 
> btrfs_fs_info *fs_info,
>   goto loop;
>   }
>  
> - if (offset < search_start)
> - btrfs_add_free_space(block_group, offset,
> -  search_start - offset);
> - BUG_ON(offset > search_start);
> -
>   ret = btrfs_add_reserved_bytes(block_group, ram_bytes,
>   num_bytes, delalloc);
>   if (ret == -EAGAIN) {
> 



signature.asc
Description: OpenPGP digital signature


[PATCH] btrfs: extent-tree: Remove dead alignment check

2018-07-23 Thread Qu Wenruo
In find_free_extent() under checks: tag, we have the following code

--
search_start = ALIGN(offset, fs_info->stripesize);
/* move on to the next group */
if (search_start + num_bytes >
block_group->key.objectid + block_group->key.offset) {
btrfs_add_free_space(block_group, offset, num_bytes);
goto loop;
}
if (offset < search_start)
btrfs_add_free_space(block_group, offset,
 search_start - offset);
BUG_ON(offset > search_start);
--

However ALIGN() is rounding down, thus @search_start <= @offset.

That later (offset < search_start) check will never be true, but that
BUG_ON(offset > search_start) can be triggered as long as our @offset is
not aligned to @stripesize.

However we never has such report on this BUG_ON(), which means the
search result @offset is always aligned to @fs_info->stripesize (which
is also set to sector size at mkfs time), or it's handled by block group
range check.

Anyway, remove such dead unalignment check.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/extent-tree.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f18e86177067..48a2aec7dc59 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7754,7 +7754,7 @@ static noinline int find_free_extent(struct btrfs_fs_info 
*fs_info,
goto loop;
}
 checks:
-   search_start = ALIGN(offset, fs_info->stripesize);
+   search_start = round_down(offset, fs_info->stripesize);
 
/* move on to the next group */
if (search_start + num_bytes >
@@ -7763,11 +7763,6 @@ static noinline int find_free_extent(struct 
btrfs_fs_info *fs_info,
goto loop;
}
 
-   if (offset < search_start)
-   btrfs_add_free_space(block_group, offset,
-search_start - offset);
-   BUG_ON(offset > search_start);
-
ret = btrfs_add_reserved_bytes(block_group, ram_bytes,
num_bytes, delalloc);
if (ret == -EAGAIN) {
-- 
2.18.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html