Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update

2017-02-02 Thread Qu Wenruo



At 02/03/2017 01:08 PM, Christoph Anton Mitterer wrote:

Hey Qu.

I'm sending this off-list for privacy reasons of the contained file-
names / hash sums.
It doesn't contained anything particularly secret, nevertheless, please
 try to avoid spreading it too far around and delete it once you don't
need it anymore :)

On Thu, 2017-02-02 at 10:12 +0800, Qu Wenruo wrote:

Would you please to check the latest lowmem_tests branch?
https://github.com/adam900710/btrfs-progs/tree/lowmem_tests


Invocation was as usual:
# btrfs check --mode=lowmem /dev/nbd0 ; echo $?

stdout/err output in the attachment


Great thanks for that!

Although I made a mistake in the debug patch, the output info is still 
good enough for me to catch the bug in inline extent check.


I also added missing error message output for other places I found, and 
updated the branch, the name remains as "lowmem_tests"


Please try it.

Thanks,
Qu





Sorry for the extra trouble.

No worries :)


Cheers,
Chris.




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: create helper for processing bits on contiguous pages

2017-02-02 Thread Liu Bo
This introduces a new helper which can be used to process pages bits.

Signed-off-by: Liu Bo 
---
 fs/btrfs/extent_io.c | 37 ++---
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d5f3edb..2ef2d05 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1726,28 +1726,22 @@ STATIC u64 find_lock_delalloc_range(struct inode *inode,
return found;
 }
 
-void extent_clear_unlock_delalloc(struct inode *inode, u64 start, u64 end,
-u64 delalloc_end, struct page *locked_page,
-unsigned clear_bits,
-unsigned long page_ops)
+static void __process_pages_contig(struct address_space *mapping,
+  struct page *locked_page,
+  pgoff_t start_index, pgoff_t end_index,
+  unsigned long page_ops)
 {
-   struct extent_io_tree *tree = _I(inode)->io_tree;
-   int ret;
+   unsigned long nr_pages = end_index - start_index + 1;
+   pgoff_t index = start_index;
struct page *pages[16];
-   unsigned long index = start >> PAGE_SHIFT;
-   unsigned long end_index = end >> PAGE_SHIFT;
-   unsigned long nr_pages = end_index - index + 1;
+   unsigned ret;
int i;
 
-   clear_extent_bit(tree, start, end, clear_bits, 1, 0, NULL, GFP_NOFS);
-   if (page_ops == 0)
-   return;
-
if ((page_ops & PAGE_SET_ERROR) && nr_pages > 0)
-   mapping_set_error(inode->i_mapping, -EIO);
+   mapping_set_error(mapping, -EIO);
 
while (nr_pages > 0) {
-   ret = find_get_pages_contig(inode->i_mapping, index,
+   ret = find_get_pages_contig(mapping, index,
 min_t(unsigned long,
 nr_pages, ARRAY_SIZE(pages)), pages);
for (i = 0; i < ret; i++) {
@@ -1777,6 +1771,19 @@ void extent_clear_unlock_delalloc(struct inode *inode, 
u64 start, u64 end,
}
 }
 
+void extent_clear_unlock_delalloc(struct inode *inode, u64 start, u64 end,
+u64 delalloc_end, struct page *locked_page,
+unsigned clear_bits,
+unsigned long page_ops)
+{
+   clear_extent_bit(_I(inode)->io_tree, start, end, clear_bits, 1, 0,
+NULL, GFP_NOFS);
+
+   __process_pages_contig(inode->i_mapping, locked_page,
+  start >> PAGE_SHIFT, end >> PAGE_SHIFT,
+  page_ops);
+}
+
 /*
  * count the number of bytes in the tree that have a given bit(s)
  * set.  This can be fairly slow, except for EXTENT_DIRTY which is
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: use helper to simplify lock/unlock pages

2017-02-02 Thread Liu Bo
Since we have a helper to set page bits, let lock_delalloc_pages and
__unlock_for_delalloc use it.

Signed-off-by: Liu Bo 
---
 fs/btrfs/extent_io.c | 122 ++-
 fs/btrfs/extent_io.h |   3 +-
 2 files changed, 54 insertions(+), 71 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2ef2d05..7e9639f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1549,33 +1549,24 @@ static noinline u64 find_delalloc_range(struct 
extent_io_tree *tree,
return found;
 }
 
+static int __process_pages_contig(struct address_space *mapping,
+ struct page *locked_page,
+ pgoff_t start_index, pgoff_t end_index,
+ unsigned long page_ops, pgoff_t *index_ret);
+
 static noinline void __unlock_for_delalloc(struct inode *inode,
   struct page *locked_page,
   u64 start, u64 end)
 {
-   int ret;
-   struct page *pages[16];
unsigned long index = start >> PAGE_SHIFT;
unsigned long end_index = end >> PAGE_SHIFT;
-   unsigned long nr_pages = end_index - index + 1;
-   int i;
 
+   ASSERT(locked_page);
if (index == locked_page->index && end_index == index)
return;
 
-   while (nr_pages > 0) {
-   ret = find_get_pages_contig(inode->i_mapping, index,
-min_t(unsigned long, nr_pages,
-ARRAY_SIZE(pages)), pages);
-   for (i = 0; i < ret; i++) {
-   if (pages[i] != locked_page)
-   unlock_page(pages[i]);
-   put_page(pages[i]);
-   }
-   nr_pages -= ret;
-   index += ret;
-   cond_resched();
-   }
+   __process_pages_contig(inode->i_mapping, locked_page, index, end_index,
+  PAGE_UNLOCK, NULL);
 }
 
 static noinline int lock_delalloc_pages(struct inode *inode,
@@ -1584,59 +1575,19 @@ static noinline int lock_delalloc_pages(struct inode 
*inode,
u64 delalloc_end)
 {
unsigned long index = delalloc_start >> PAGE_SHIFT;
-   unsigned long start_index = index;
+   unsigned long index_ret = index;
unsigned long end_index = delalloc_end >> PAGE_SHIFT;
-   unsigned long pages_locked = 0;
-   struct page *pages[16];
-   unsigned long nrpages;
int ret;
-   int i;
 
-   /* the caller is responsible for locking the start index */
+   ASSERT(locked_page);
if (index == locked_page->index && index == end_index)
return 0;
 
-   /* skip the page at the start index */
-   nrpages = end_index - index + 1;
-   while (nrpages > 0) {
-   ret = find_get_pages_contig(inode->i_mapping, index,
-min_t(unsigned long,
-nrpages, ARRAY_SIZE(pages)), pages);
-   if (ret == 0) {
-   ret = -EAGAIN;
-   goto done;
-   }
-   /* now we have an array of pages, lock them all */
-   for (i = 0; i < ret; i++) {
-   /*
-* the caller is taking responsibility for
-* locked_page
-*/
-   if (pages[i] != locked_page) {
-   lock_page(pages[i]);
-   if (!PageDirty(pages[i]) ||
-   pages[i]->mapping != inode->i_mapping) {
-   ret = -EAGAIN;
-   unlock_page(pages[i]);
-   put_page(pages[i]);
-   goto done;
-   }
-   }
-   put_page(pages[i]);
-   pages_locked++;
-   }
-   nrpages -= ret;
-   index += ret;
-   cond_resched();
-   }
-   ret = 0;
-done:
-   if (ret && pages_locked) {
-   __unlock_for_delalloc(inode, locked_page,
- delalloc_start,
- ((u64)(start_index + pages_locked - 1)) <<
- PAGE_SHIFT);
-   }
+   ret = __process_pages_contig(inode->i_mapping, locked_page, index,
+end_index, PAGE_LOCK, _ret);
+   if (ret == -EAGAIN)
+   __unlock_for_delalloc(inode, locked_page, delalloc_start,
+ (u64)index_ret << PAGE_SHIFT);
return ret;
 }
 
@@ -1726,17 +1677,24 @@ STATIC u64 find_lock_delalloc_range(struct inode *inode,

Re: [REVERT][v4.10][V2] btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl

2017-02-02 Thread Joseph Salisbury
On 02/02/2017 03:57 PM, Greg KH wrote:
> On Thu, Feb 02, 2017 at 02:58:16PM -0500, Joseph Salisbury wrote:
>> On 02/02/2017 01:23 PM, Greg KH wrote:
>>> On Thu, Feb 02, 2017 at 12:38:33PM -0500, Joseph Salisbury wrote:
 Hello,

 Please consider reverting commit
 4c63c2454eff996c5e27991221106eb511f7db38 in the next v4.x.y release.
>>> What release can I remove it from?
>>>
>>> It isn't in 4.4.y, and 4.9.y doesn't make much sense, unless it's
>>> reverted in Linus's tree already?
>>>
>>> totally confused.
>>>
>>> greg k-h
>> Sorry, commit 4c63c2454eff996c5e27991221106eb511f7db38 was introduced in
>> mainline in v4.7-rc1.  The request should be only for kernels newer than
>> 4.7-rc1, so 4.9 and 4.10.
>>
>> I assume it will come down to 4.9 once reverted in mainline. 
> Let me know when it is reverted, and what that revert commit is please.
>
> thanks,
>
> greg k-h

Will do.


Thanks again,


Joe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REVERT][v4.10][V2] btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl

2017-02-02 Thread Greg KH
On Thu, Feb 02, 2017 at 02:58:16PM -0500, Joseph Salisbury wrote:
> On 02/02/2017 01:23 PM, Greg KH wrote:
> > On Thu, Feb 02, 2017 at 12:38:33PM -0500, Joseph Salisbury wrote:
> >> Hello,
> >>
> >> Please consider reverting commit
> >> 4c63c2454eff996c5e27991221106eb511f7db38 in the next v4.x.y release.
> > What release can I remove it from?
> >
> > It isn't in 4.4.y, and 4.9.y doesn't make much sense, unless it's
> > reverted in Linus's tree already?
> >
> > totally confused.
> >
> > greg k-h
> 
> Sorry, commit 4c63c2454eff996c5e27991221106eb511f7db38 was introduced in
> mainline in v4.7-rc1.  The request should be only for kernels newer than
> 4.7-rc1, so 4.9 and 4.10.
> 
> I assume it will come down to 4.9 once reverted in mainline. 

Let me know when it is reverted, and what that revert commit is please.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[REVERT][v4.10][V2] btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl

2017-02-02 Thread Joseph Salisbury
On 02/02/2017 01:23 PM, Greg KH wrote:
> On Thu, Feb 02, 2017 at 12:38:33PM -0500, Joseph Salisbury wrote:
>> Hello,
>>
>> Please consider reverting commit
>> 4c63c2454eff996c5e27991221106eb511f7db38 in the next v4.x.y release.
> What release can I remove it from?
>
> It isn't in 4.4.y, and 4.9.y doesn't make much sense, unless it's
> reverted in Linus's tree already?
>
> totally confused.
>
> greg k-h

Sorry, commit 4c63c2454eff996c5e27991221106eb511f7db38 was introduced in
mainline in v4.7-rc1.  The request should be only for kernels newer than
4.7-rc1, so 4.9 and 4.10.

I assume it will come down to 4.9 once reverted in mainline. 

I updated the subject.

Thanks,

Joe




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: remove duplicated find_get_pages_contig

2017-02-02 Thread David Sterba
On Thu, Feb 02, 2017 at 11:11:15AM -0800, Liu Bo wrote:
> On Thu, Feb 02, 2017 at 07:42:43PM +0100, David Sterba wrote:
> > On Mon, Jan 30, 2017 at 12:26:30PM -0800, Liu Bo wrote:
> > > This creates a helper to manipulate page bits to avoid duplicate uses.
> > 
> > This seems too short for what the patch does. It adds a new page ops bit
> > that would deserve a separate patch. Please try to split it to smaller
> > parts.
> 
> Hmm..the newly added 'PAGE_LOCK' is only used in lock_delalloc_pages in
> order to make the helper lock pages.
> 
> I could firstly introduce the helper from extent_clear_unlock_delalloc()
> , then introduce the new 'PAGE_LOCK' and make lock_delalloc_pages and
> __unlock_for_delalloc to use the helper, does it sound better to review
> and bisect?

This sounds better, thanks. The reason is mostly from the review side.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/24] fs: Provide infrastructure for dynamic BDIs in filesystems

2017-02-02 Thread Liu Bo
Hi,

On Thu, Feb 02, 2017 at 06:34:02PM +0100, Jan Kara wrote:
> Provide helper functions for setting up dynamically allocated
> backing_dev_info structures for filesystems and cleaning them up on
> superblock destruction.

Just one concern, will this cause problems for multiple superblock cases
like nfs with nosharecache?

Thanks,

-liubo

> 
> CC: linux-...@lists.infradead.org
> CC: linux-...@vger.kernel.org
> CC: Petr Vandrovec 
> CC: linux-ni...@vger.kernel.org
> CC: cluster-de...@redhat.com
> CC: osd-...@open-osd.org
> CC: codal...@coda.cs.cmu.edu
> CC: linux-...@lists.infradead.org
> CC: ecryp...@vger.kernel.org
> CC: linux-c...@vger.kernel.org
> CC: ceph-de...@vger.kernel.org
> CC: linux-btrfs@vger.kernel.org
> CC: v9fs-develo...@lists.sourceforge.net
> CC: lustre-de...@lists.lustre.org
> Signed-off-by: Jan Kara 
> ---
>  fs/super.c   | 49 
> 
>  include/linux/backing-dev-defs.h |  2 +-
>  include/linux/fs.h   |  6 +
>  3 files changed, 56 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index ea662b0e5e78..31dc4c6450ef 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -446,6 +446,11 @@ void generic_shutdown_super(struct super_block *sb)
>   hlist_del_init(>s_instances);
>   spin_unlock(_lock);
>   up_write(>s_umount);
> + if (sb->s_iflags & SB_I_DYNBDI) {
> + bdi_put(sb->s_bdi);
> + sb->s_bdi = _backing_dev_info;
> + sb->s_iflags &= ~SB_I_DYNBDI;
> + }
>  }
>  
>  EXPORT_SYMBOL(generic_shutdown_super);
> @@ -1249,6 +1254,50 @@ mount_fs(struct file_system_type *type, int flags, 
> const char *name, void *data)
>  }
>  
>  /*
> + * Setup private BDI for given superblock. I gets automatically cleaned up
> + * in generic_shutdown_super().
> + */
> +int super_setup_bdi_name(struct super_block *sb, char *fmt, ...)
> +{
> + struct backing_dev_info *bdi;
> + int err;
> + va_list args;
> +
> + bdi = bdi_alloc(GFP_KERNEL);
> + if (!bdi)
> + return -ENOMEM;
> +
> + bdi->name = sb->s_type->name;
> +
> + va_start(args, fmt);
> + err = bdi_register_va(bdi, NULL, fmt, args);
> + va_end(args);
> + if (err) {
> + bdi_put(bdi);
> + return err;
> + }
> + WARN_ON(sb->s_bdi != _backing_dev_info);
> + sb->s_bdi = bdi;
> + sb->s_iflags |= SB_I_DYNBDI;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(super_setup_bdi_name);
> +
> +/*
> + * Setup private BDI for given superblock. I gets automatically cleaned up
> + * in generic_shutdown_super().
> + */
> +int super_setup_bdi(struct super_block *sb)
> +{
> + static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0);
> +
> + return super_setup_bdi_name(sb, "%.28s-%ld", sb->s_type->name,
> + atomic_long_inc_return(_seq));
> +}
> +EXPORT_SYMBOL(super_setup_bdi);
> +
> +/*
>   * This is an internal function, please use sb_end_{write,pagefault,intwrite}
>   * instead.
>   */
> diff --git a/include/linux/backing-dev-defs.h 
> b/include/linux/backing-dev-defs.h
> index 2ecafc8a2d06..70080b4217f4 100644
> --- a/include/linux/backing-dev-defs.h
> +++ b/include/linux/backing-dev-defs.h
> @@ -143,7 +143,7 @@ struct backing_dev_info {
>   congested_fn *congested_fn; /* Function pointer if device is md/dm */
>   void *congested_data;   /* Pointer to aux data for congested func */
>  
> - char *name;
> + const char *name;
>  
>   struct kref refcnt; /* Reference counter for the structure */
>   unsigned int registered:1;  /* Is bdi registered? */
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c930cbc19342..8ed8b6d1bc54 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1267,6 +1267,9 @@ struct mm_struct;
>  /* sb->s_iflags to limit user namespace mounts */
>  #define SB_I_USERNS_VISIBLE  0x0010 /* fstype already mounted */
>  
> +/* Temporary flag until all filesystems are converted to dynamic bdis */
> +#define SB_I_DYNBDI  0x0100
> +
>  /* Possible states of 'frozen' field */
>  enum {
>   SB_UNFROZEN = 0,/* FS is unfrozen */
> @@ -2103,6 +2106,9 @@ extern int vfs_ustat(dev_t, struct kstatfs *);
>  extern int freeze_super(struct super_block *super);
>  extern int thaw_super(struct super_block *super);
>  extern bool our_mnt(struct vfsmount *mnt);
> +extern __printf(2, 3)
> +int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
> +extern int super_setup_bdi(struct super_block *sb);
>  
>  extern int current_umask(void);
>  
> -- 
> 2.10.2
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: remove duplicated find_get_pages_contig

2017-02-02 Thread Liu Bo
On Thu, Feb 02, 2017 at 07:42:43PM +0100, David Sterba wrote:
> On Mon, Jan 30, 2017 at 12:26:30PM -0800, Liu Bo wrote:
> > This creates a helper to manipulate page bits to avoid duplicate uses.
> 
> This seems too short for what the patch does. It adds a new page ops bit
> that would deserve a separate patch. Please try to split it to smaller
> parts.

Hmm..the newly added 'PAGE_LOCK' is only used in lock_delalloc_pages in
order to make the helper lock pages.

I could firstly introduce the helper from extent_clear_unlock_delalloc()
, then introduce the new 'PAGE_LOCK' and make lock_delalloc_pages and
__unlock_for_delalloc to use the helper, does it sound better to review
and bisect?

> 
> > +static noinline void __unlock_for_delalloc(struct inode *inode,
> > +  struct page *locked_page,
> > +  u64 start, u64 end)
> > +{
> > +   unsigned long page_ops = PAGE_UNLOCK;
> 
> Used only once, not necessary IMO.

OK.

> 
> > +
> > +   ASSERT(locked_page);
> > +   __process_pages_contig(inode->i_mapping, locked_page,
> > +  start >> PAGE_SHIFT, end >> PAGE_SHIFT,
> > +  page_ops, NULL);
> > +}
> > +
> > +static noinline int lock_delalloc_pages(struct inode *inode,
> > +   struct page *locked_page,
> > +   u64 delalloc_start,
> > +   u64 delalloc_end)
> > +{
> > +   pgoff_t index = delalloc_start >> PAGE_SHIFT;
> > +   pgoff_t start_index = index;
> > +   pgoff_t end_index = delalloc_end >> PAGE_SHIFT;
> > +   unsigned long page_ops = PAGE_LOCK;
> 
> Same here.
> 

OK.

Thanks,

-liubo
> > +   int ret = 0;
> > +
> > +   ASSERT(locked_page);
> > +
> > +   ret = __process_pages_contig(inode->i_mapping, locked_page, start_index,
> > +  end_index, page_ops, );
> > +   if (ret == -EAGAIN) {
> > +   __unlock_for_delalloc(inode, locked_page, delalloc_start,
> > + ((u64)index) << PAGE_SHIFT);
> > }
> > +
> > return ret;
> >  }
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: fix assertion failure when freeing block groups at close_ctree()

2017-02-02 Thread Liu Bo
On Thu, Feb 02, 2017 at 05:56:33PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> At close_ctree() we free the block groups and then only after we wait for
> any running worker kthreads to finish and shutdown the workqueues. This
> behaviour is racy and it triggers an assertion failure when freeing block
> groups because while we are doing it we can have for example a block group
> caching kthread running, and in that case the block group's reference
> count can still be greater than 1 by the time we assert its reference count
> is 1, leading to an assertion failure:
> 
> [19041.198004] assertion failed: atomic_read(_group->count) == 1, file: 
> fs/btrfs/extent-tree.c, line: 9799
> [19041.200584] [ cut here ]
> [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418!
> [19041.202830] invalid opcode:  [#1] PREEMPT SMP
> [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod 
> crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev tpm_tis 
> parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor button loop 
> autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi 
> ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod floppy [last 
> unloaded: btrfs]
> [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted 
> 4.9.0-rc7-btrfs-next-36+ #1
> [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
> [19041.208082] task: 88015f028980 task.stack: c9000ad34000
> [19041.208082] RIP: 0010:[]  [] 
> assfail.constprop.41+0x1c/0x1e [btrfs]
> [19041.208082] RSP: 0018:c9000ad37d60  EFLAGS: 00010286
> [19041.208082] RAX: 0061 RBX: 88015ecb4000 RCX: 
> 0001
> [19041.208082] RDX: 88023f392fb8 RSI: 817ef7ba RDI: 
> 
> [19041.208082] RBP: c9000ad37d60 R08: 0001 R09: 
> 
> [19041.208082] R10: c9000ad37cb0 R11: 82f2b66d R12: 
> 88023431d170
> [19041.208082] R13: 88015ecb40c0 R14: 88023431d000 R15: 
> 88015ecb4100
> [19041.208082] FS:  7f44f3d42840() GS:88023f38() 
> knlGS:
> [19041.208082] CS:  0010 DS:  ES:  CR0: 80050033
> [19041.208082] CR2: 7f65d623b000 CR3: 0002166f2000 CR4: 
> 06e0
> [19041.208082] Stack:
> [19041.208082]  c9000ad37d98 a035989f 88015ecb4000 
> 88015ecb5630
> [19041.208082]  88014f6be000  7ffcf0ba6a10 
> c9000ad37df8
> [19041.208082]  a0368cd4 88014e9658e0 c9000ad37e08 
> 811a634d
> [19041.208082] Call Trace:
> [19041.208082]  [] btrfs_free_block_groups+0x17f/0x392 
> [btrfs]
> [19041.208082]  [] close_ctree+0x1c5/0x2e1 [btrfs]
> [19041.208082]  [] ? evict_inodes+0x132/0x141
> [19041.208082]  [] btrfs_put_super+0x15/0x17 [btrfs]
> [19041.208082]  [] generic_shutdown_super+0x6a/0xeb
> [19041.208082]  [] kill_anon_super+0x12/0x1c
> [19041.208082]  [] btrfs_kill_super+0x16/0x21 [btrfs]
> [19041.208082]  [] deactivate_locked_super+0x3b/0x68
> [19041.208082]  [] deactivate_super+0x36/0x39
> [19041.208082]  [] cleanup_mnt+0x58/0x76
> [19041.208082]  [] __cleanup_mnt+0x12/0x14
> [19041.208082]  [] task_work_run+0x6f/0x95
> [19041.208082]  [] prepare_exit_to_usermode+0xa3/0xc1
> [19041.208082]  [] syscall_return_slowpath+0x16e/0x1d2
> [19041.208082]  [] entry_SYSCALL_64_fastpath+0xab/0xad
> [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 f1 48 
> c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 d4 e0 <0f> 
> 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9
> [19041.208082] RIP  [] assfail.constprop.41+0x1c/0x1e 
> [btrfs]
> [19041.208082]  RSP 
> [19041.279264] ---[ end trace 23330586f16f064d ]---
> 
> This started happening as of kernel 4.8, since commit f3bca8028bd9
> ("Btrfs: add ASSERT for block group's memory leak") introduced these
> assertions.
> 
> So fix this by freeing the block groups only after waiting for all
> worker kthreads to complete and shutdown the workqueues.

Reviewed-by: Liu Bo 

Thanks,

-liubo
> 
> Signed-off-by: Filipe Manana 
> ---
> 
> v2: Make error path of open_ctree() also stop all workers before freeing
> the block groups. Assert that when freeing block groups, no block
> group can be in the caching started state.
> 
>  fs/btrfs/disk-io.c | 6 +++---
>  fs/btrfs/extent-tree.c | 9 ++---
>  2 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 066d9b9..bf54d7d 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3263,7 +3263,6 @@ int open_ctree(struct super_block *sb,
>  
>  fail_block_groups:
>   btrfs_put_block_group_cache(fs_info);
> - btrfs_free_block_groups(fs_info);
>  
>  fail_tree_roots:
>   

Re: [PATCH] Btrfs: fix assertion failure when freeing block groups at close_ctree()

2017-02-02 Thread Liu Bo
On Thu, Feb 02, 2017 at 06:58:03PM +, Filipe Manana wrote:
> On Thu, Feb 2, 2017 at 6:53 PM, Liu Bo  wrote:
> > On Thu, Feb 02, 2017 at 06:32:01PM +, Filipe Manana wrote:
> >> On Thu, Feb 2, 2017 at 6:19 PM, Liu Bo  wrote:
> >> > On Wed, Feb 01, 2017 at 11:01:28PM +, fdman...@kernel.org wrote:
> >> >> From: Filipe Manana 
> >> >>
> >> >> At close_ctree() we free the block groups and then only after we wait 
> >> >> for
> >> >> any running worker kthreads to finish and shutdown the workqueues. This
> >> >> behaviour is racy and it triggers an assertion failure when freeing 
> >> >> block
> >> >> groups because while we are doing it we can have for example a block 
> >> >> group
> >> >> caching kthread running, and in that case the block group's reference
> >> >> count is greater than 1, leading to an assertion failure:
> >> >>
> >> >> [19041.198004] assertion failed: atomic_read(_group->count) == 1, 
> >> >> file: fs/btrfs/extent-tree.c, line: 9799
> >> >> [19041.200584] [ cut here ]
> >> >> [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418!
> >> >> [19041.202830] invalid opcode:  [#1] PREEMPT SMP
> >> >> [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod 
> >> >> crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev 
> >> >> tpm_tis parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor 
> >> >> button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod 
> >> >> ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio 
> >> >> e1000 scsi_mod floppy [last unloaded: btrfs]
> >> >> [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted 
> >> >> 4.9.0-rc7-btrfs-next-36+ #1
> >> >> [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> >> >> BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
> >> >> [19041.208082] task: 88015f028980 task.stack: c9000ad34000
> >> >> [19041.208082] RIP: 0010:[]  [] 
> >> >> assfail.constprop.41+0x1c/0x1e [btrfs]
> >> >> [19041.208082] RSP: 0018:c9000ad37d60  EFLAGS: 00010286
> >> >> [19041.208082] RAX: 0061 RBX: 88015ecb4000 RCX: 
> >> >> 0001
> >> >> [19041.208082] RDX: 88023f392fb8 RSI: 817ef7ba RDI: 
> >> >> 
> >> >> [19041.208082] RBP: c9000ad37d60 R08: 0001 R09: 
> >> >> 
> >> >> [19041.208082] R10: c9000ad37cb0 R11: 82f2b66d R12: 
> >> >> 88023431d170
> >> >> [19041.208082] R13: 88015ecb40c0 R14: 88023431d000 R15: 
> >> >> 88015ecb4100
> >> >> [19041.208082] FS:  7f44f3d42840() GS:88023f38() 
> >> >> knlGS:
> >> >> [19041.208082] CS:  0010 DS:  ES:  CR0: 80050033
> >> >> [19041.208082] CR2: 7f65d623b000 CR3: 0002166f2000 CR4: 
> >> >> 06e0
> >> >> [19041.208082] Stack:
> >> >> [19041.208082]  c9000ad37d98 a035989f 88015ecb4000 
> >> >> 88015ecb5630
> >> >> [19041.208082]  88014f6be000  7ffcf0ba6a10 
> >> >> c9000ad37df8
> >> >> [19041.208082]  a0368cd4 88014e9658e0 c9000ad37e08 
> >> >> 811a634d
> >> >> [19041.208082] Call Trace:
> >> >> [19041.208082]  [] 
> >> >> btrfs_free_block_groups+0x17f/0x392 [btrfs]
> >> >> [19041.208082]  [] close_ctree+0x1c5/0x2e1 [btrfs]
> >> >> [19041.208082]  [] ? evict_inodes+0x132/0x141
> >> >> [19041.208082]  [] btrfs_put_super+0x15/0x17 [btrfs]
> >> >> [19041.208082]  [] generic_shutdown_super+0x6a/0xeb
> >> >> [19041.208082]  [] kill_anon_super+0x12/0x1c
> >> >> [19041.208082]  [] btrfs_kill_super+0x16/0x21 [btrfs]
> >> >> [19041.208082]  [] deactivate_locked_super+0x3b/0x68
> >> >> [19041.208082]  [] deactivate_super+0x36/0x39
> >> >> [19041.208082]  [] cleanup_mnt+0x58/0x76
> >> >> [19041.208082]  [] __cleanup_mnt+0x12/0x14
> >> >> [19041.208082]  [] task_work_run+0x6f/0x95
> >> >> [19041.208082]  [] prepare_exit_to_usermode+0xa3/0xc1
> >> >> [19041.208082]  [] syscall_return_slowpath+0x16e/0x1d2
> >> >> [19041.208082]  [] entry_SYSCALL_64_fastpath+0xab/0xad
> >> >> [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 
> >> >> f1 48 c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 
> >> >> d4 e0 <0f> 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9
> >> >> [19041.208082] RIP  [] assfail.constprop.41+0x1c/0x1e 
> >> >> [btrfs]
> >> >> [19041.208082]  RSP 
> >> >> [19041.279264] ---[ end trace 23330586f16f064d ]---
> >> >>
> >> >> This started happening as of kernel 4.8, since commit f3bca8028bd9
> >> >> ("Btrfs: add ASSERT for block group's memory leak") introduced these
> >> >> assertions.
> >> >>
> >> >> So fix this by freeing the block groups only after waiting for all
> >> >> worker kthreads to complete and shutdown the workqueues.
> >> >
> >> > This looks good to me, but I don't understand how that could happen, if
> >> > a 

Re: [PATCH] Btrfs: fix assertion failure when freeing block groups at close_ctree()

2017-02-02 Thread Filipe Manana
On Thu, Feb 2, 2017 at 6:53 PM, Liu Bo  wrote:
> On Thu, Feb 02, 2017 at 06:32:01PM +, Filipe Manana wrote:
>> On Thu, Feb 2, 2017 at 6:19 PM, Liu Bo  wrote:
>> > On Wed, Feb 01, 2017 at 11:01:28PM +, fdman...@kernel.org wrote:
>> >> From: Filipe Manana 
>> >>
>> >> At close_ctree() we free the block groups and then only after we wait for
>> >> any running worker kthreads to finish and shutdown the workqueues. This
>> >> behaviour is racy and it triggers an assertion failure when freeing block
>> >> groups because while we are doing it we can have for example a block group
>> >> caching kthread running, and in that case the block group's reference
>> >> count is greater than 1, leading to an assertion failure:
>> >>
>> >> [19041.198004] assertion failed: atomic_read(_group->count) == 1, 
>> >> file: fs/btrfs/extent-tree.c, line: 9799
>> >> [19041.200584] [ cut here ]
>> >> [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418!
>> >> [19041.202830] invalid opcode:  [#1] PREEMPT SMP
>> >> [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod 
>> >> crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev 
>> >> tpm_tis parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor 
>> >> button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod 
>> >> ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio 
>> >> e1000 scsi_mod floppy [last unloaded: btrfs]
>> >> [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted 
>> >> 4.9.0-rc7-btrfs-next-36+ #1
>> >> [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
>> >> BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
>> >> [19041.208082] task: 88015f028980 task.stack: c9000ad34000
>> >> [19041.208082] RIP: 0010:[]  [] 
>> >> assfail.constprop.41+0x1c/0x1e [btrfs]
>> >> [19041.208082] RSP: 0018:c9000ad37d60  EFLAGS: 00010286
>> >> [19041.208082] RAX: 0061 RBX: 88015ecb4000 RCX: 
>> >> 0001
>> >> [19041.208082] RDX: 88023f392fb8 RSI: 817ef7ba RDI: 
>> >> 
>> >> [19041.208082] RBP: c9000ad37d60 R08: 0001 R09: 
>> >> 
>> >> [19041.208082] R10: c9000ad37cb0 R11: 82f2b66d R12: 
>> >> 88023431d170
>> >> [19041.208082] R13: 88015ecb40c0 R14: 88023431d000 R15: 
>> >> 88015ecb4100
>> >> [19041.208082] FS:  7f44f3d42840() GS:88023f38() 
>> >> knlGS:
>> >> [19041.208082] CS:  0010 DS:  ES:  CR0: 80050033
>> >> [19041.208082] CR2: 7f65d623b000 CR3: 0002166f2000 CR4: 
>> >> 06e0
>> >> [19041.208082] Stack:
>> >> [19041.208082]  c9000ad37d98 a035989f 88015ecb4000 
>> >> 88015ecb5630
>> >> [19041.208082]  88014f6be000  7ffcf0ba6a10 
>> >> c9000ad37df8
>> >> [19041.208082]  a0368cd4 88014e9658e0 c9000ad37e08 
>> >> 811a634d
>> >> [19041.208082] Call Trace:
>> >> [19041.208082]  [] btrfs_free_block_groups+0x17f/0x392 
>> >> [btrfs]
>> >> [19041.208082]  [] close_ctree+0x1c5/0x2e1 [btrfs]
>> >> [19041.208082]  [] ? evict_inodes+0x132/0x141
>> >> [19041.208082]  [] btrfs_put_super+0x15/0x17 [btrfs]
>> >> [19041.208082]  [] generic_shutdown_super+0x6a/0xeb
>> >> [19041.208082]  [] kill_anon_super+0x12/0x1c
>> >> [19041.208082]  [] btrfs_kill_super+0x16/0x21 [btrfs]
>> >> [19041.208082]  [] deactivate_locked_super+0x3b/0x68
>> >> [19041.208082]  [] deactivate_super+0x36/0x39
>> >> [19041.208082]  [] cleanup_mnt+0x58/0x76
>> >> [19041.208082]  [] __cleanup_mnt+0x12/0x14
>> >> [19041.208082]  [] task_work_run+0x6f/0x95
>> >> [19041.208082]  [] prepare_exit_to_usermode+0xa3/0xc1
>> >> [19041.208082]  [] syscall_return_slowpath+0x16e/0x1d2
>> >> [19041.208082]  [] entry_SYSCALL_64_fastpath+0xab/0xad
>> >> [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 
>> >> f1 48 c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 
>> >> d4 e0 <0f> 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9
>> >> [19041.208082] RIP  [] assfail.constprop.41+0x1c/0x1e 
>> >> [btrfs]
>> >> [19041.208082]  RSP 
>> >> [19041.279264] ---[ end trace 23330586f16f064d ]---
>> >>
>> >> This started happening as of kernel 4.8, since commit f3bca8028bd9
>> >> ("Btrfs: add ASSERT for block group's memory leak") introduced these
>> >> assertions.
>> >>
>> >> So fix this by freeing the block groups only after waiting for all
>> >> worker kthreads to complete and shutdown the workqueues.
>> >
>> > This looks good to me, but I don't understand how that could happen, if
>> > a block group is being cached by the caching worker thread, the block
>> > group cache has been marked as BTRFS_CACHE_STARTED so we should wait on
>> > wait_block_group_cache_done() in btrfs_free_block_groups() before
>> > getting to the ASSERT.  Maybe something else 

Re: [PATCH] Btrfs: fix assertion failure when freeing block groups at close_ctree()

2017-02-02 Thread Liu Bo
On Thu, Feb 02, 2017 at 06:32:01PM +, Filipe Manana wrote:
> On Thu, Feb 2, 2017 at 6:19 PM, Liu Bo  wrote:
> > On Wed, Feb 01, 2017 at 11:01:28PM +, fdman...@kernel.org wrote:
> >> From: Filipe Manana 
> >>
> >> At close_ctree() we free the block groups and then only after we wait for
> >> any running worker kthreads to finish and shutdown the workqueues. This
> >> behaviour is racy and it triggers an assertion failure when freeing block
> >> groups because while we are doing it we can have for example a block group
> >> caching kthread running, and in that case the block group's reference
> >> count is greater than 1, leading to an assertion failure:
> >>
> >> [19041.198004] assertion failed: atomic_read(_group->count) == 1, 
> >> file: fs/btrfs/extent-tree.c, line: 9799
> >> [19041.200584] [ cut here ]
> >> [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418!
> >> [19041.202830] invalid opcode:  [#1] PREEMPT SMP
> >> [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod 
> >> crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev 
> >> tpm_tis parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor 
> >> button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod 
> >> ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio 
> >> e1000 scsi_mod floppy [last unloaded: btrfs]
> >> [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted 
> >> 4.9.0-rc7-btrfs-next-36+ #1
> >> [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> >> rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
> >> [19041.208082] task: 88015f028980 task.stack: c9000ad34000
> >> [19041.208082] RIP: 0010:[]  [] 
> >> assfail.constprop.41+0x1c/0x1e [btrfs]
> >> [19041.208082] RSP: 0018:c9000ad37d60  EFLAGS: 00010286
> >> [19041.208082] RAX: 0061 RBX: 88015ecb4000 RCX: 
> >> 0001
> >> [19041.208082] RDX: 88023f392fb8 RSI: 817ef7ba RDI: 
> >> 
> >> [19041.208082] RBP: c9000ad37d60 R08: 0001 R09: 
> >> 
> >> [19041.208082] R10: c9000ad37cb0 R11: 82f2b66d R12: 
> >> 88023431d170
> >> [19041.208082] R13: 88015ecb40c0 R14: 88023431d000 R15: 
> >> 88015ecb4100
> >> [19041.208082] FS:  7f44f3d42840() GS:88023f38() 
> >> knlGS:
> >> [19041.208082] CS:  0010 DS:  ES:  CR0: 80050033
> >> [19041.208082] CR2: 7f65d623b000 CR3: 0002166f2000 CR4: 
> >> 06e0
> >> [19041.208082] Stack:
> >> [19041.208082]  c9000ad37d98 a035989f 88015ecb4000 
> >> 88015ecb5630
> >> [19041.208082]  88014f6be000  7ffcf0ba6a10 
> >> c9000ad37df8
> >> [19041.208082]  a0368cd4 88014e9658e0 c9000ad37e08 
> >> 811a634d
> >> [19041.208082] Call Trace:
> >> [19041.208082]  [] btrfs_free_block_groups+0x17f/0x392 
> >> [btrfs]
> >> [19041.208082]  [] close_ctree+0x1c5/0x2e1 [btrfs]
> >> [19041.208082]  [] ? evict_inodes+0x132/0x141
> >> [19041.208082]  [] btrfs_put_super+0x15/0x17 [btrfs]
> >> [19041.208082]  [] generic_shutdown_super+0x6a/0xeb
> >> [19041.208082]  [] kill_anon_super+0x12/0x1c
> >> [19041.208082]  [] btrfs_kill_super+0x16/0x21 [btrfs]
> >> [19041.208082]  [] deactivate_locked_super+0x3b/0x68
> >> [19041.208082]  [] deactivate_super+0x36/0x39
> >> [19041.208082]  [] cleanup_mnt+0x58/0x76
> >> [19041.208082]  [] __cleanup_mnt+0x12/0x14
> >> [19041.208082]  [] task_work_run+0x6f/0x95
> >> [19041.208082]  [] prepare_exit_to_usermode+0xa3/0xc1
> >> [19041.208082]  [] syscall_return_slowpath+0x16e/0x1d2
> >> [19041.208082]  [] entry_SYSCALL_64_fastpath+0xab/0xad
> >> [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 f1 
> >> 48 c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 d4 e0 
> >> <0f> 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9
> >> [19041.208082] RIP  [] assfail.constprop.41+0x1c/0x1e 
> >> [btrfs]
> >> [19041.208082]  RSP 
> >> [19041.279264] ---[ end trace 23330586f16f064d ]---
> >>
> >> This started happening as of kernel 4.8, since commit f3bca8028bd9
> >> ("Btrfs: add ASSERT for block group's memory leak") introduced these
> >> assertions.
> >>
> >> So fix this by freeing the block groups only after waiting for all
> >> worker kthreads to complete and shutdown the workqueues.
> >
> > This looks good to me, but I don't understand how that could happen, if
> > a block group is being cached by the caching worker thread, the block
> > group cache has been marked as BTRFS_CACHE_STARTED so we should wait on
> > wait_block_group_cache_done() in btrfs_free_block_groups() before
> > getting to the ASSERT.  Maybe something else broke?
> 
> Simple. Look at extent-tree.c:caching_kthread() - the the caching

caching_thread() vs caching_kthread(), free space cache vs inode cache,
confusing 

Re: [PATCH] Btrfs: remove duplicated find_get_pages_contig

2017-02-02 Thread David Sterba
On Mon, Jan 30, 2017 at 12:26:30PM -0800, Liu Bo wrote:
> This creates a helper to manipulate page bits to avoid duplicate uses.

This seems too short for what the patch does. It adds a new page ops bit
that would deserve a separate patch. Please try to split it to smaller
parts.

> +static noinline void __unlock_for_delalloc(struct inode *inode,
> +struct page *locked_page,
> +u64 start, u64 end)
> +{
> + unsigned long page_ops = PAGE_UNLOCK;

Used only once, not necessary IMO.

> +
> + ASSERT(locked_page);
> + __process_pages_contig(inode->i_mapping, locked_page,
> +start >> PAGE_SHIFT, end >> PAGE_SHIFT,
> +page_ops, NULL);
> +}
> +
> +static noinline int lock_delalloc_pages(struct inode *inode,
> + struct page *locked_page,
> + u64 delalloc_start,
> + u64 delalloc_end)
> +{
> + pgoff_t index = delalloc_start >> PAGE_SHIFT;
> + pgoff_t start_index = index;
> + pgoff_t end_index = delalloc_end >> PAGE_SHIFT;
> + unsigned long page_ops = PAGE_LOCK;

Same here.

> + int ret = 0;
> +
> + ASSERT(locked_page);
> +
> + ret = __process_pages_contig(inode->i_mapping, locked_page, start_index,
> +end_index, page_ops, );
> + if (ret == -EAGAIN) {
> + __unlock_for_delalloc(inode, locked_page, delalloc_start,
> +   ((u64)index) << PAGE_SHIFT);
>   }
> +
>   return ret;
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix assertion failure when freeing block groups at close_ctree()

2017-02-02 Thread Filipe Manana
On Thu, Feb 2, 2017 at 6:19 PM, Liu Bo  wrote:
> On Wed, Feb 01, 2017 at 11:01:28PM +, fdman...@kernel.org wrote:
>> From: Filipe Manana 
>>
>> At close_ctree() we free the block groups and then only after we wait for
>> any running worker kthreads to finish and shutdown the workqueues. This
>> behaviour is racy and it triggers an assertion failure when freeing block
>> groups because while we are doing it we can have for example a block group
>> caching kthread running, and in that case the block group's reference
>> count is greater than 1, leading to an assertion failure:
>>
>> [19041.198004] assertion failed: atomic_read(_group->count) == 1, 
>> file: fs/btrfs/extent-tree.c, line: 9799
>> [19041.200584] [ cut here ]
>> [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418!
>> [19041.202830] invalid opcode:  [#1] PREEMPT SMP
>> [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod 
>> crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev tpm_tis 
>> parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor button loop 
>> autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi 
>> ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod floppy [last 
>> unloaded: btrfs]
>> [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted 
>> 4.9.0-rc7-btrfs-next-36+ #1
>> [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
>> [19041.208082] task: 88015f028980 task.stack: c9000ad34000
>> [19041.208082] RIP: 0010:[]  [] 
>> assfail.constprop.41+0x1c/0x1e [btrfs]
>> [19041.208082] RSP: 0018:c9000ad37d60  EFLAGS: 00010286
>> [19041.208082] RAX: 0061 RBX: 88015ecb4000 RCX: 
>> 0001
>> [19041.208082] RDX: 88023f392fb8 RSI: 817ef7ba RDI: 
>> 
>> [19041.208082] RBP: c9000ad37d60 R08: 0001 R09: 
>> 
>> [19041.208082] R10: c9000ad37cb0 R11: 82f2b66d R12: 
>> 88023431d170
>> [19041.208082] R13: 88015ecb40c0 R14: 88023431d000 R15: 
>> 88015ecb4100
>> [19041.208082] FS:  7f44f3d42840() GS:88023f38() 
>> knlGS:
>> [19041.208082] CS:  0010 DS:  ES:  CR0: 80050033
>> [19041.208082] CR2: 7f65d623b000 CR3: 0002166f2000 CR4: 
>> 06e0
>> [19041.208082] Stack:
>> [19041.208082]  c9000ad37d98 a035989f 88015ecb4000 
>> 88015ecb5630
>> [19041.208082]  88014f6be000  7ffcf0ba6a10 
>> c9000ad37df8
>> [19041.208082]  a0368cd4 88014e9658e0 c9000ad37e08 
>> 811a634d
>> [19041.208082] Call Trace:
>> [19041.208082]  [] btrfs_free_block_groups+0x17f/0x392 
>> [btrfs]
>> [19041.208082]  [] close_ctree+0x1c5/0x2e1 [btrfs]
>> [19041.208082]  [] ? evict_inodes+0x132/0x141
>> [19041.208082]  [] btrfs_put_super+0x15/0x17 [btrfs]
>> [19041.208082]  [] generic_shutdown_super+0x6a/0xeb
>> [19041.208082]  [] kill_anon_super+0x12/0x1c
>> [19041.208082]  [] btrfs_kill_super+0x16/0x21 [btrfs]
>> [19041.208082]  [] deactivate_locked_super+0x3b/0x68
>> [19041.208082]  [] deactivate_super+0x36/0x39
>> [19041.208082]  [] cleanup_mnt+0x58/0x76
>> [19041.208082]  [] __cleanup_mnt+0x12/0x14
>> [19041.208082]  [] task_work_run+0x6f/0x95
>> [19041.208082]  [] prepare_exit_to_usermode+0xa3/0xc1
>> [19041.208082]  [] syscall_return_slowpath+0x16e/0x1d2
>> [19041.208082]  [] entry_SYSCALL_64_fastpath+0xab/0xad
>> [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 f1 
>> 48 c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 d4 e0 
>> <0f> 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9
>> [19041.208082] RIP  [] assfail.constprop.41+0x1c/0x1e 
>> [btrfs]
>> [19041.208082]  RSP 
>> [19041.279264] ---[ end trace 23330586f16f064d ]---
>>
>> This started happening as of kernel 4.8, since commit f3bca8028bd9
>> ("Btrfs: add ASSERT for block group's memory leak") introduced these
>> assertions.
>>
>> So fix this by freeing the block groups only after waiting for all
>> worker kthreads to complete and shutdown the workqueues.
>
> This looks good to me, but I don't understand how that could happen, if
> a block group is being cached by the caching worker thread, the block
> group cache has been marked as BTRFS_CACHE_STARTED so we should wait on
> wait_block_group_cache_done() in btrfs_free_block_groups() before
> getting to the ASSERT.  Maybe something else broke?

Simple. Look at extent-tree.c:caching_kthread() - the the caching
state is updated (to error or finished), but only much later (at the
very end) the kthread drops its ref count on the block group. So the
assertion you added in commit f3bca8028bd9 fails because the block
group's ref count is 2 and not 1.
So nothing broke, just the assertion made an incorrect assumption. But
I think it's good 

Re: Subject: [REVERT][v4.x.y] btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl

2017-02-02 Thread Greg KH
On Thu, Feb 02, 2017 at 12:38:33PM -0500, Joseph Salisbury wrote:
> Hello,
> 
> Please consider reverting commit
> 4c63c2454eff996c5e27991221106eb511f7db38 in the next v4.x.y release.

What release can I remove it from?

It isn't in 4.4.y, and 4.9.y doesn't make much sense, unless it's
reverted in Linus's tree already?

totally confused.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix assertion failure when freeing block groups at close_ctree()

2017-02-02 Thread Liu Bo
On Wed, Feb 01, 2017 at 11:01:28PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> At close_ctree() we free the block groups and then only after we wait for
> any running worker kthreads to finish and shutdown the workqueues. This
> behaviour is racy and it triggers an assertion failure when freeing block
> groups because while we are doing it we can have for example a block group
> caching kthread running, and in that case the block group's reference
> count is greater than 1, leading to an assertion failure:
> 
> [19041.198004] assertion failed: atomic_read(_group->count) == 1, file: 
> fs/btrfs/extent-tree.c, line: 9799
> [19041.200584] [ cut here ]
> [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418!
> [19041.202830] invalid opcode:  [#1] PREEMPT SMP
> [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod 
> crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev tpm_tis 
> parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor button loop 
> autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi 
> ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod floppy [last 
> unloaded: btrfs]
> [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted 
> 4.9.0-rc7-btrfs-next-36+ #1
> [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
> [19041.208082] task: 88015f028980 task.stack: c9000ad34000
> [19041.208082] RIP: 0010:[]  [] 
> assfail.constprop.41+0x1c/0x1e [btrfs]
> [19041.208082] RSP: 0018:c9000ad37d60  EFLAGS: 00010286
> [19041.208082] RAX: 0061 RBX: 88015ecb4000 RCX: 
> 0001
> [19041.208082] RDX: 88023f392fb8 RSI: 817ef7ba RDI: 
> 
> [19041.208082] RBP: c9000ad37d60 R08: 0001 R09: 
> 
> [19041.208082] R10: c9000ad37cb0 R11: 82f2b66d R12: 
> 88023431d170
> [19041.208082] R13: 88015ecb40c0 R14: 88023431d000 R15: 
> 88015ecb4100
> [19041.208082] FS:  7f44f3d42840() GS:88023f38() 
> knlGS:
> [19041.208082] CS:  0010 DS:  ES:  CR0: 80050033
> [19041.208082] CR2: 7f65d623b000 CR3: 0002166f2000 CR4: 
> 06e0
> [19041.208082] Stack:
> [19041.208082]  c9000ad37d98 a035989f 88015ecb4000 
> 88015ecb5630
> [19041.208082]  88014f6be000  7ffcf0ba6a10 
> c9000ad37df8
> [19041.208082]  a0368cd4 88014e9658e0 c9000ad37e08 
> 811a634d
> [19041.208082] Call Trace:
> [19041.208082]  [] btrfs_free_block_groups+0x17f/0x392 
> [btrfs]
> [19041.208082]  [] close_ctree+0x1c5/0x2e1 [btrfs]
> [19041.208082]  [] ? evict_inodes+0x132/0x141
> [19041.208082]  [] btrfs_put_super+0x15/0x17 [btrfs]
> [19041.208082]  [] generic_shutdown_super+0x6a/0xeb
> [19041.208082]  [] kill_anon_super+0x12/0x1c
> [19041.208082]  [] btrfs_kill_super+0x16/0x21 [btrfs]
> [19041.208082]  [] deactivate_locked_super+0x3b/0x68
> [19041.208082]  [] deactivate_super+0x36/0x39
> [19041.208082]  [] cleanup_mnt+0x58/0x76
> [19041.208082]  [] __cleanup_mnt+0x12/0x14
> [19041.208082]  [] task_work_run+0x6f/0x95
> [19041.208082]  [] prepare_exit_to_usermode+0xa3/0xc1
> [19041.208082]  [] syscall_return_slowpath+0x16e/0x1d2
> [19041.208082]  [] entry_SYSCALL_64_fastpath+0xab/0xad
> [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 f1 48 
> c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 d4 e0 <0f> 
> 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9
> [19041.208082] RIP  [] assfail.constprop.41+0x1c/0x1e 
> [btrfs]
> [19041.208082]  RSP 
> [19041.279264] ---[ end trace 23330586f16f064d ]---
> 
> This started happening as of kernel 4.8, since commit f3bca8028bd9
> ("Btrfs: add ASSERT for block group's memory leak") introduced these
> assertions.
> 
> So fix this by freeing the block groups only after waiting for all
> worker kthreads to complete and shutdown the workqueues.

This looks good to me, but I don't understand how that could happen, if
a block group is being cached by the caching worker thread, the block
group cache has been marked as BTRFS_CACHE_STARTED so we should wait on
wait_block_group_cache_done() in btrfs_free_block_groups() before
getting to the ASSERT.  Maybe something else broke?

Thanks,

-liubo
> 
> Signed-off-by: Filipe Manana 
> ---
>  fs/btrfs/disk-io.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 066d9b9..a90e40e 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3985,8 +3985,6 @@ void close_ctree(struct btrfs_fs_info *fs_info)
>  
>   btrfs_put_block_group_cache(fs_info);
>  
> - btrfs_free_block_groups(fs_info);
> -
>   /*
>* we must make sure there is not any read request to

Re: [PATCH] Btrfs: kill trans in run_delalloc_nocow and btrfs_cross_ref_exist

2017-02-02 Thread David Sterba
On Mon, Jan 30, 2017 at 12:25:28PM -0800, Liu Bo wrote:
> run_delalloc_nocow has used trans in two places where they don't actually need
> @trans.
> 
> For btrfs_lookup_file_extent, we search for file extents without COWing
> anything, and for btrfs_cross_ref_exist, the only place where we need @trans 
> is
> deferencing it in order to get running_transaction which we could easily get
> from the global fs_info.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: David Sterba 

Removing the transaction is ok, backed by some patch archeology.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: fix assertion failure when freeing block groups at close_ctree()

2017-02-02 Thread fdmanana
From: Filipe Manana 

At close_ctree() we free the block groups and then only after we wait for
any running worker kthreads to finish and shutdown the workqueues. This
behaviour is racy and it triggers an assertion failure when freeing block
groups because while we are doing it we can have for example a block group
caching kthread running, and in that case the block group's reference
count can still be greater than 1 by the time we assert its reference count
is 1, leading to an assertion failure:

[19041.198004] assertion failed: atomic_read(_group->count) == 1, file: 
fs/btrfs/extent-tree.c, line: 9799
[19041.200584] [ cut here ]
[19041.201692] kernel BUG at fs/btrfs/ctree.h:3418!
[19041.202830] invalid opcode:  [#1] PREEMPT SMP
[19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod 
crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev tpm_tis 
parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor button loop 
autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi 
ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod floppy [last 
unloaded: btrfs]
[19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted 
4.9.0-rc7-btrfs-next-36+ #1
[19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[19041.208082] task: 88015f028980 task.stack: c9000ad34000
[19041.208082] RIP: 0010:[]  [] 
assfail.constprop.41+0x1c/0x1e [btrfs]
[19041.208082] RSP: 0018:c9000ad37d60  EFLAGS: 00010286
[19041.208082] RAX: 0061 RBX: 88015ecb4000 RCX: 0001
[19041.208082] RDX: 88023f392fb8 RSI: 817ef7ba RDI: 
[19041.208082] RBP: c9000ad37d60 R08: 0001 R09: 
[19041.208082] R10: c9000ad37cb0 R11: 82f2b66d R12: 88023431d170
[19041.208082] R13: 88015ecb40c0 R14: 88023431d000 R15: 88015ecb4100
[19041.208082] FS:  7f44f3d42840() GS:88023f38() 
knlGS:
[19041.208082] CS:  0010 DS:  ES:  CR0: 80050033
[19041.208082] CR2: 7f65d623b000 CR3: 0002166f2000 CR4: 06e0
[19041.208082] Stack:
[19041.208082]  c9000ad37d98 a035989f 88015ecb4000 
88015ecb5630
[19041.208082]  88014f6be000  7ffcf0ba6a10 
c9000ad37df8
[19041.208082]  a0368cd4 88014e9658e0 c9000ad37e08 
811a634d
[19041.208082] Call Trace:
[19041.208082]  [] btrfs_free_block_groups+0x17f/0x392 [btrfs]
[19041.208082]  [] close_ctree+0x1c5/0x2e1 [btrfs]
[19041.208082]  [] ? evict_inodes+0x132/0x141
[19041.208082]  [] btrfs_put_super+0x15/0x17 [btrfs]
[19041.208082]  [] generic_shutdown_super+0x6a/0xeb
[19041.208082]  [] kill_anon_super+0x12/0x1c
[19041.208082]  [] btrfs_kill_super+0x16/0x21 [btrfs]
[19041.208082]  [] deactivate_locked_super+0x3b/0x68
[19041.208082]  [] deactivate_super+0x36/0x39
[19041.208082]  [] cleanup_mnt+0x58/0x76
[19041.208082]  [] __cleanup_mnt+0x12/0x14
[19041.208082]  [] task_work_run+0x6f/0x95
[19041.208082]  [] prepare_exit_to_usermode+0xa3/0xc1
[19041.208082]  [] syscall_return_slowpath+0x16e/0x1d2
[19041.208082]  [] entry_SYSCALL_64_fastpath+0xab/0xad
[19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 f1 48 
c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 d4 e0 <0f> 0b 
55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9
[19041.208082] RIP  [] assfail.constprop.41+0x1c/0x1e [btrfs]
[19041.208082]  RSP 
[19041.279264] ---[ end trace 23330586f16f064d ]---

This started happening as of kernel 4.8, since commit f3bca8028bd9
("Btrfs: add ASSERT for block group's memory leak") introduced these
assertions.

So fix this by freeing the block groups only after waiting for all
worker kthreads to complete and shutdown the workqueues.

Signed-off-by: Filipe Manana 
---

v2: Make error path of open_ctree() also stop all workers before freeing
the block groups. Assert that when freeing block groups, no block
group can be in the caching started state.

 fs/btrfs/disk-io.c | 6 +++---
 fs/btrfs/extent-tree.c | 9 ++---
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 066d9b9..bf54d7d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3263,7 +3263,6 @@ int open_ctree(struct super_block *sb,
 
 fail_block_groups:
btrfs_put_block_group_cache(fs_info);
-   btrfs_free_block_groups(fs_info);
 
 fail_tree_roots:
free_root_pointers(fs_info, 1);
@@ -3271,6 +3270,7 @@ int open_ctree(struct super_block *sb,
 
 fail_sb_buffer:
btrfs_stop_all_workers(fs_info);
+   btrfs_free_block_groups(fs_info);
 fail_alloc:
 fail_iput:
btrfs_mapping_tree_free(_info->mapping_tree);
@@ -3985,8 +3985,6 @@ void close_ctree(struct btrfs_fs_info *fs_info)
 

[PATCH 0/24 RFC] fs: Convert all embedded bdis into separate ones

2017-02-02 Thread Jan Kara
Hello,

this patch series converts all embedded occurences of struct backing_dev_info
to use standalone dynamically allocated structures. This makes bdi handling
unified across all bdi users and generally removes some boilerplate code from
filesystems setting up their own bdi. It also allows us to remove some code
from generic bdi implementation.

The patches were only compile-tested for most filesystems (I've tested
mounting only for NFS & btrfs) so fs maintainers please have a look whether
the changes look sound to you.

This series is based on top of bdi fixes that were merged into linux-block
git tree.

Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: pass delayed_refs directly to btrfs_find_delayed_ref_head

2017-02-02 Thread David Sterba
On Mon, Jan 30, 2017 at 12:24:37PM -0800, Liu Bo wrote:
> All we need is @delayed_refs, all callers have get it ahead of calling
> btrfs_find_delayed_ref_head since lock needs to be acquired firstly, there is
> no reason to deference it again inside the function.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Subject: [REVERT][v4.x.y] btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl

2017-02-02 Thread Joseph Salisbury
Hello,

Please consider reverting commit
4c63c2454eff996c5e27991221106eb511f7db38 in the next v4.x.y release.  It
was included upstream as of v4.7-rc1  This commit introduced a
regression, described in the following bug:
http://bugs.launchpad.net/bugs/1619918

This new regression was discussed in the following thread:

https://lkml.org/lkml/2017/1/6/467


Sincerely,

Joseph Salisbury


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: remove unused trans in read_block_for_search

2017-02-02 Thread David Sterba
On Mon, Jan 30, 2017 at 12:23:42PM -0800, Liu Bo wrote:
> @trans is not used at all, this removes it.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/24] btrfs: Convert to separately allocated bdi

2017-02-02 Thread Jan Kara
Allocate struct backing_dev_info separately instead of embedding it
inside superblock. This unifies handling of bdi among users.

CC: Chris Mason 
CC: Josef Bacik 
CC: David Sterba 
CC: linux-btrfs@vger.kernel.org
Signed-off-by: Jan Kara 
---
 fs/btrfs/ctree.h   |  1 -
 fs/btrfs/disk-io.c | 36 +++-
 fs/btrfs/super.c   |  7 +++
 3 files changed, 14 insertions(+), 30 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6a823719b6c5..1dc06f66dfcf 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -801,7 +801,6 @@ struct btrfs_fs_info {
struct btrfs_super_block *super_for_commit;
struct super_block *sb;
struct inode *btree_inode;
-   struct backing_dev_info bdi;
struct mutex tree_log_mutex;
struct mutex transaction_kthread_mutex;
struct mutex cleaner_mutex;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 37a31b12bb0c..b25723e729c0 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1810,21 +1810,6 @@ static int btrfs_congested_fn(void *congested_data, int 
bdi_bits)
return ret;
 }
 
-static int setup_bdi(struct btrfs_fs_info *info, struct backing_dev_info *bdi)
-{
-   int err;
-
-   err = bdi_setup_and_register(bdi, "btrfs");
-   if (err)
-   return err;
-
-   bdi->ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_SIZE;
-   bdi->congested_fn   = btrfs_congested_fn;
-   bdi->congested_data = info;
-   bdi->capabilities |= BDI_CAP_CGROUP_WRITEBACK;
-   return 0;
-}
-
 /*
  * called by the kthread helper functions to finally call the bio end_io
  * functions.  This is where read checksum verification actually happens
@@ -2598,16 +2583,10 @@ int open_ctree(struct super_block *sb,
goto fail;
}
 
-   ret = setup_bdi(fs_info, _info->bdi);
-   if (ret) {
-   err = ret;
-   goto fail_srcu;
-   }
-
ret = percpu_counter_init(_info->dirty_metadata_bytes, 0, 
GFP_KERNEL);
if (ret) {
err = ret;
-   goto fail_bdi;
+   goto fail_srcu;
}
fs_info->dirty_metadata_batch = PAGE_SIZE *
(1 + ilog2(nr_cpu_ids));
@@ -2715,7 +2694,6 @@ int open_ctree(struct super_block *sb,
 
sb->s_blocksize = 4096;
sb->s_blocksize_bits = blksize_bits(4096);
-   sb->s_bdi = _info->bdi;
 
btrfs_init_btree_inode(fs_info);
 
@@ -2912,9 +2890,12 @@ int open_ctree(struct super_block *sb,
goto fail_sb_buffer;
}
 
-   fs_info->bdi.ra_pages *= btrfs_super_num_devices(disk_super);
-   fs_info->bdi.ra_pages = max(fs_info->bdi.ra_pages,
-   SZ_4M / PAGE_SIZE);
+   sb->s_bdi->congested_fn = btrfs_congested_fn;
+   sb->s_bdi->congested_data = fs_info;
+   sb->s_bdi->capabilities |= BDI_CAP_CGROUP_WRITEBACK;
+   sb->s_bdi->ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_SIZE;
+   sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
+   sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
 
sb->s_blocksize = sectorsize;
sb->s_blocksize_bits = blksize_bits(sectorsize);
@@ -3282,8 +3263,6 @@ int open_ctree(struct super_block *sb,
percpu_counter_destroy(_info->delalloc_bytes);
 fail_dirty_metadata_bytes:
percpu_counter_destroy(_info->dirty_metadata_bytes);
-fail_bdi:
-   bdi_destroy(_info->bdi);
 fail_srcu:
cleanup_srcu_struct(_info->subvol_srcu);
 fail:
@@ -4010,7 +3989,6 @@ void close_ctree(struct btrfs_fs_info *fs_info)
percpu_counter_destroy(_info->dirty_metadata_bytes);
percpu_counter_destroy(_info->delalloc_bytes);
percpu_counter_destroy(_info->bio_counter);
-   bdi_destroy(_info->bdi);
cleanup_srcu_struct(_info->subvol_srcu);
 
btrfs_free_stripe_hash_table(fs_info);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index b5ae7d3d1896..08ef08b63132 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1133,6 +1133,13 @@ static int btrfs_fill_super(struct super_block *sb,
 #endif
sb->s_flags |= MS_I_VERSION;
sb->s_iflags |= SB_I_CGROUPWB;
+
+   err = super_setup_bdi(sb);
+   if (err) {
+   btrfs_err(fs_info, "super_setup_bdi failed");
+   return err;
+   }
+
err = open_ctree(sb, fs_devices, (char *)data);
if (err) {
btrfs_err(fs_info, "open_ctree failed");
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/24] fs: Provide infrastructure for dynamic BDIs in filesystems

2017-02-02 Thread Jan Kara
Provide helper functions for setting up dynamically allocated
backing_dev_info structures for filesystems and cleaning them up on
superblock destruction.

CC: linux-...@lists.infradead.org
CC: linux-...@vger.kernel.org
CC: Petr Vandrovec 
CC: linux-ni...@vger.kernel.org
CC: cluster-de...@redhat.com
CC: osd-...@open-osd.org
CC: codal...@coda.cs.cmu.edu
CC: linux-...@lists.infradead.org
CC: ecryp...@vger.kernel.org
CC: linux-c...@vger.kernel.org
CC: ceph-de...@vger.kernel.org
CC: linux-btrfs@vger.kernel.org
CC: v9fs-develo...@lists.sourceforge.net
CC: lustre-de...@lists.lustre.org
Signed-off-by: Jan Kara 
---
 fs/super.c   | 49 
 include/linux/backing-dev-defs.h |  2 +-
 include/linux/fs.h   |  6 +
 3 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/fs/super.c b/fs/super.c
index ea662b0e5e78..31dc4c6450ef 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -446,6 +446,11 @@ void generic_shutdown_super(struct super_block *sb)
hlist_del_init(>s_instances);
spin_unlock(_lock);
up_write(>s_umount);
+   if (sb->s_iflags & SB_I_DYNBDI) {
+   bdi_put(sb->s_bdi);
+   sb->s_bdi = _backing_dev_info;
+   sb->s_iflags &= ~SB_I_DYNBDI;
+   }
 }
 
 EXPORT_SYMBOL(generic_shutdown_super);
@@ -1249,6 +1254,50 @@ mount_fs(struct file_system_type *type, int flags, const 
char *name, void *data)
 }
 
 /*
+ * Setup private BDI for given superblock. I gets automatically cleaned up
+ * in generic_shutdown_super().
+ */
+int super_setup_bdi_name(struct super_block *sb, char *fmt, ...)
+{
+   struct backing_dev_info *bdi;
+   int err;
+   va_list args;
+
+   bdi = bdi_alloc(GFP_KERNEL);
+   if (!bdi)
+   return -ENOMEM;
+
+   bdi->name = sb->s_type->name;
+
+   va_start(args, fmt);
+   err = bdi_register_va(bdi, NULL, fmt, args);
+   va_end(args);
+   if (err) {
+   bdi_put(bdi);
+   return err;
+   }
+   WARN_ON(sb->s_bdi != _backing_dev_info);
+   sb->s_bdi = bdi;
+   sb->s_iflags |= SB_I_DYNBDI;
+
+   return 0;
+}
+EXPORT_SYMBOL(super_setup_bdi_name);
+
+/*
+ * Setup private BDI for given superblock. I gets automatically cleaned up
+ * in generic_shutdown_super().
+ */
+int super_setup_bdi(struct super_block *sb)
+{
+   static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0);
+
+   return super_setup_bdi_name(sb, "%.28s-%ld", sb->s_type->name,
+   atomic_long_inc_return(_seq));
+}
+EXPORT_SYMBOL(super_setup_bdi);
+
+/*
  * This is an internal function, please use sb_end_{write,pagefault,intwrite}
  * instead.
  */
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 2ecafc8a2d06..70080b4217f4 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -143,7 +143,7 @@ struct backing_dev_info {
congested_fn *congested_fn; /* Function pointer if device is md/dm */
void *congested_data;   /* Pointer to aux data for congested func */
 
-   char *name;
+   const char *name;
 
struct kref refcnt; /* Reference counter for the structure */
unsigned int registered:1;  /* Is bdi registered? */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c930cbc19342..8ed8b6d1bc54 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1267,6 +1267,9 @@ struct mm_struct;
 /* sb->s_iflags to limit user namespace mounts */
 #define SB_I_USERNS_VISIBLE0x0010 /* fstype already mounted */
 
+/* Temporary flag until all filesystems are converted to dynamic bdis */
+#define SB_I_DYNBDI0x0100
+
 /* Possible states of 'frozen' field */
 enum {
SB_UNFROZEN = 0,/* FS is unfrozen */
@@ -2103,6 +2106,9 @@ extern int vfs_ustat(dev_t, struct kstatfs *);
 extern int freeze_super(struct super_block *super);
 extern int thaw_super(struct super_block *super);
 extern bool our_mnt(struct vfsmount *mnt);
+extern __printf(2, 3)
+int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
+extern int super_setup_bdi(struct super_block *sb);
 
 extern int current_umask(void);
 
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.

2017-02-02 Thread Austin S. Hemmelgarn

On 2017-02-02 09:25, Adam Borowski wrote:

On Thu, Feb 02, 2017 at 07:49:50AM -0500, Austin S. Hemmelgarn wrote:

This is a severe bug that makes a not all that uncommon (albeit bad) use
case fail completely.  The fix had no dependencies itself and


I don't see what's bad in mounting a RAID degraded.  Yeah, it provides no
redundancy but that's no worse than using a single disk from the start.
And most people not doing storage/server farm don't have a stack of spare
disks at hand, so getting a replacement might take a while.
Running degraded is bad. Period.  If you don't have a disk on hand to 
replace the failed one (and if you care about redundancy, you should 
have at least one spare on hand), you should be converting to a single 
disk, not continuing to run in degraded mode until you get a new disk. 
The moment you start talking about running degraded long enough that you 
will be _booting_ the system with the array degraded, you need to be 
converting to a single disk.  This is of course impractical for 
something like a hardware array or an LVM volume, but it's _trivial_ 
with BTRFS, and protects you from all kinds of bad situations that can't 
happen with a single disk but can completely destroy the filesystem if 
it's a degraded array.  Running a single disk is not exactly the same as 
running a degraded array, it's actually marginally safer (even if you 
aren't using dup profile for metadata) because there are fewer moving 
parts to go wrong.  It's also exponentially more efficient.


Being able to continue to run when a disk fails is the whole point of RAID
-- despite what some folks think, RAIDs are not for backups but for uptime.
And if your uptime goes to hell because the moment a disk fails you need to
drop everything and replace the disk immediately, why would you use RAID?
Because just replacing a disk and rebuilding the array is almost always 
much cheaper in terms of time than rebuilding the system from a backup. 
IOW, even if you have to drop everything and replace the disk 
immediately, it's still less time consuming than restoring from a 
backup.  It also has the advantage that you don't lose any data.



I /thought/ the immediate benefit was obvious enough that it
would be mainline-merged by now, not hoovered-up into some long-term
project with no real hint as to /when/ it might be merged.  Oh, well...

I think (although I'm not sure about it) that this:
http://www.spinics.net/lists/linux-btrfs/msg47283.html
is the first posting of the patch series.


Is there a more recent version somewhere?  Mechanically rebasing+resolving
conflicts doesn't work, I'd need to do a more involved refresh, which would
be a waste of time if it's already done by someone with an actual clue about
this code.
There may be, but I haven't looked very far.  Qu would probably be the 
person to ask.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.

2017-02-02 Thread Adam Borowski
On Thu, Feb 02, 2017 at 07:49:50AM -0500, Austin S. Hemmelgarn wrote:
> This is a severe bug that makes a not all that uncommon (albeit bad) use
> case fail completely.  The fix had no dependencies itself and

I don't see what's bad in mounting a RAID degraded.  Yeah, it provides no
redundancy but that's no worse than using a single disk from the start.
And most people not doing storage/server farm don't have a stack of spare
disks at hand, so getting a replacement might take a while.

Being able to continue to run when a disk fails is the whole point of RAID
-- despite what some folks think, RAIDs are not for backups but for uptime.
And if your uptime goes to hell because the moment a disk fails you need to
drop everything and replace the disk immediately, why would you use RAID?

> > I /thought/ the immediate benefit was obvious enough that it
> > would be mainline-merged by now, not hoovered-up into some long-term
> > project with no real hint as to /when/ it might be merged.  Oh, well...
> I think (although I'm not sure about it) that this:
> http://www.spinics.net/lists/linux-btrfs/msg47283.html
> is the first posting of the patch series.

Is there a more recent version somewhere?  Mechanically rebasing+resolving
conflicts doesn't work, I'd need to do a more involved refresh, which would
be a waste of time if it's already done by someone with an actual clue about
this code.


Meow!
-- 
Autotools hint: to do a zx-spectrum build on a pdp11 host, type:
  ./configure --host=zx-spectrum --build=pdp11
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs receive leaves new subvolume modifiable during operation

2017-02-02 Thread Austin S. Hemmelgarn

On 2017-02-02 05:52, Graham Cobb wrote:

On 02/02/17 00:02, Duncan wrote:

If it's a workaround, then many of the Linux procedures we as admins and
users use every day are equally workarounds.  Setting 007 perms on a dir
that doesn't have anything immediately security vulnerable in it, simply
to keep other users from even potentially seeing or being able to write
to something N layers down the subdir tree, is standard practice.


No. There is no need to normally place a read-only snapshot below a
no-execute directory just to prevent write access to it. That is not
part of the admin's expectation.
That depends on the admin.  I for example would absolutely expect that 
to be needed _while the snapshot is being created_ if the operation 
isn't being done by the kernel.



Which is my point.  This is no different than standard security practice,
that an admin should be familiar with and using without even having to
think about it.  Btrfs is simply making the same assumptions that
everyone else does, that an admin knows what they are doing and sets the
upstream permissions with that in mind.  If they don't, how is that
btrfs' fault?


Because btrfs intends the receive snapshot to be read-only. That is the
expectation of the sysadmin. It is an important and useful feature which
makes send/receive very useful for creating
user-readable-but-not-modifiable backups (without it, send/receive are
useful for many things but less useful for creating backups). That
feature has a bug.
If that's your use case, then from a consistency perspective, you should 
be receiving into a location the user can't read and then moving the 
subvolume into a user readable location once receive is done anyway. 
Otherwise, they may see a partial snapshot, and think something has been 
deleted that really hasn't.  This is a pretty basic method of ensuring 
consistency that's used literally all over the place on UNIX systems to 
atomically replace or update data sets.  Doing so also cleanly avoids 
needing to manipulate permissions on the directory the snapshots are in 
to prevent users from modifying the snapshots being received.


Just because you don't personally use the feature, doesn't mean it isn't
a bug! Many of us do rely on that feature.

Even though it is security-related, I agree it isn't the highest
priority btrfs bug. It can probably wait until receive is being worked
on for other reasons. But if it isn't going to be fixed any time soon,
it should be documented in the Wiki and the man page, with the suggested
workround for anyone who needs to make sure the receive won't be
tampered with.
I agree that it should be documented.  I don't agree that a specific 
workaround needs to be stated, as whether or not any particular 
workaround is needed is entirely dependent on the use case.  For 
example, if your users can only access the received snapshots over a 
networked filesystem, there's a much simpler option of just exporting 
the directories the snapshots are received into as read-only.  All that 
needs to be stated is that the way to prevent this is to make sure users 
can't write to the snapshots while they're being received.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.

2017-02-02 Thread Austin S. Hemmelgarn

On 2017-02-01 17:48, Duncan wrote:

Adam Borowski posted on Wed, 01 Feb 2017 12:55:30 +0100 as excerpted:


On Wed, Feb 01, 2017 at 05:23:16AM +, Duncan wrote:

Hans Deragon posted on Tue, 31 Jan 2017 21:51:22 -0500 as excerpted:

But the current scenario makes it difficult for me to put redundancy
back into service!  How much time did I waited until I find the
mailing list, subscribe to it, post my email and get an answer?
Wouldn't it be better if the user could actually add the disk at
anytime, mostly ASAP?

And to fix this, I have to learn how to patch and compile the kernel.
I have not done this since the beginning of the century.  More
delays,
more risk added to the system (what if I compile the kernel with the
wrong parameters?).


The patch fixing the problem and making return from degraded not the
one-
shot thing it tends to be now will eventually be merged


Not anything like the one I posted to this thread -- this one merely
removes a check that can't handle this particular (common) case of an
otherwise healthy RAID that lost one device then was mounted degraded
twice.  We need instead a better check, one that sees whether every
block group is present.

This can be done quite easily, as as far as I know, the list of block
group is at that moment fully present in memory, but someone actually
has to code that, and I for one don't know btrfs internals (yet?).


I didn't mention it because you spared me the trouble with your hack-
patch that did the current job, but FWIW, there's actually a patch that
does per-chunk testing as you indicate, but it got merged into a longer
running feature-add project (hot-spares, IIRC), and thus isn't likely to
be mainline-merged until that project is merged.

But who knows when that might be?  Could be years before it is considered
ready.

Meanwhile, perhaps it's simply because I'm not a dev and am not
appreciating the complexities of some detail or other, but as
demonstrated by the people who have local-merged that patch to get out of
just this sort of jam, as well as the continuing saga of more and more
people appearing here with the same problem, it could be an arguably high
priority fix on its own, and should have been reviewed and ultimately
mainline-merged on its own merits, instead of being stuck in someone's
feature-add project queue for potentially years, while more and more
people who could have definitely used it have to either suffer without it
or go and find and local-merge it themselves.  Even if this feature is
critical to the longer term feature, merge of this little one now would
make the final patch set for the longer term feature that much smaller.
Agreed, it should have been mainlined.  I have no issue with the 
hot-spare patches depending on it, but it's a severe bug.


But that's a btrfs-using sysadmin's viewpoint, not a developer viewpoint,
and it's the developer's doing the work, so they get to define when and
how it gets merged, and us non-devs must either simply live with it, or
if the circumstances allow, fund some dev to have our specific interests
as their priority and take care of it for us.
I don't care in this case if I draw some flak from the developers, but 
this particular developer viewpoint is wrong.  If this was a commercial 
software product, the person responsible would at least be facing some 
serious reprimand, and depending on the company, possibly would be out 
of a job.  This is a severe bug that makes a not all that uncommon 
(albeit bad) use case fail completely.  The fix had no dependencies 
itself and


Meanwhile, perhaps I should have bookmarked that patch at least as it
appeared on-list, but I didn't, so while I know it exists, I too would
have to go looking to actually find it, should I end up needing it.  In
my defense, I /thought/ the immediate benefit was obvious enough that it
would be mainline-merged by now, not hoovered-up into some long-term
project with no real hint as to /when/ it might be merged.  Oh, well...

I think (although I'm not sure about it) that this:
http://www.spinics.net/lists/linux-btrfs/msg47283.html
is the first posting of the patch series.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-02-02 Thread Marc Joliet
On Sunday 28 August 2016 15:29:08 Kai Krakow wrote:
> Hello list!

Hi list

> It happened again. While using VirtualBox the following crash happened,
> btrfs check found a lot of errors which it couldn't repair. Earlier
> that day my system crashed which may already introduced errors into my
> filesystem. Apparently, I couldn't create an image (not enough space
> available), I only can give this trace from dmesg:
> 
> [44819.903435] [ cut here ]
> [44819.903443] WARNING: CPU: 3 PID: 2787 at fs/btrfs/extent-tree.c:2963
> btrfs_run_delayed_refs+0x26c/0x290 [44819.903444] BTRFS: Transaction
> aborted (error -17)
> [44819.903445] Modules linked in: nls_iso8859_15 nls_cp437 vfat fat fuse
> rfcomm veth af_packet ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack bridge stp
> llc w83627ehf bnep hwmon_vid cachefiles btusb btintel bluetooth
> snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic
> snd_hda_intel snd_hda_codec rfkill snd_hwdep snd_hda_core snd_pcm snd_timer
> coretemp hwmon snd r8169 mii kvm_intel kvm iTCO_wdt iTCO_vendor_support
> rtc_cmos irqbypass soundcore ip_tables uas usb_storage nvidia_drm(PO)
> vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nvidia_modeset(PO)
> nvidia(PO) efivarfs unix ipv6 [44819.903484] CPU: 3 PID: 2787 Comm:
> BrowserBlocking Tainted: P   O4.7.2-gentoo #2 [44819.903485]
> Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z68 Pro3, BIOS
> L2.16A 02/22/2013 [44819.903487]   8130af2d
> 8800b7d03d20  [44819.903489]  810865fa
> 880409374428 8800b7d03d70 8803bf299760 [44819.903491] 
>  ffef 8803f677f000 8108666a
> [44819.903493] Call Trace:
> [44819.903496]  [] ? dump_stack+0x46/0x59
> [44819.903500]  [] ? __warn+0xba/0xe0
> [44819.903502]  [] ? warn_slowpath_fmt+0x4a/0x50
> [44819.903504]  [] ? btrfs_run_delayed_refs+0x26c/0x290
> [44819.903507]  [] ? btrfs_release_path+0xe/0x80
> [44819.903509]  [] ?
> btrfs_start_dirty_block_groups+0x2da/0x420 [44819.903511] 
> [] ? btrfs_commit_transaction+0x143/0x990 [44819.903514] 
> [] ? kmem_cache_free+0x165/0x180 [44819.903516] 
> [] ? btrfs_wait_ordered_range+0x7c/0x110 [44819.903518] 
> [] ? btrfs_sync_file+0x286/0x360 [44819.903522] 
> [] ? do_fsync+0x33/0x60
> [44819.903524]  [] ? SyS_fdatasync+0xa/0x10
> [44819.903528]  [] ? entry_SYSCALL_64_fastpath+0x13/0x8f
> [44819.903529] ---[ end trace 6944811e170a0e57 ]---
> [44819.903531] BTRFS: error (device bcache2) in btrfs_run_delayed_refs:2963:
> errno=-17 Object already exists [44819.903533] BTRFS info (device bcache2):
> forced readonly

I got the same error myself, with this stack trace:

-- Logs begin at Fr 2016-04-01 17:07:28 CEST, end at Mi 2017-02-01 22:03:57 
CET. --
Feb 01 01:46:26 diefledermaus kernel: [ cut here ]
Feb 01 01:46:26 diefledermaus kernel: WARNING: CPU: 1 PID: 16727 at 
fs/btrfs/extent-tree.c:2967 btrfs_run_delayed_refs+0x278/0x2b0
Feb 01 01:46:26 diefledermaus kernel: BTRFS: Transaction aborted (error -17)
Feb 01 01:46:26 diefledermaus kernel: BTRFS: error (device sdb2) in 
btrfs_run_delayed_refs:2967: errno=-17 Object already exists
Feb 01 01:46:27 diefledermaus kernel: BTRFS info (device sdb2): forced 
readonly
Feb 01 01:46:27 diefledermaus kernel: Modules linked in: msr ctr ccm tun arc4 
snd_hda_codec_idt applesmc snd_hda_codec_generic input_polldev hwmon 
snd_hda_intel ath5k snd_hda_codec mac80211 snd_hda_core ath snd_pcm cfg80211 
snd_timer video acpi_cpufreq snd backlight sky2 rfkill processor button 
soundcore sg usb_storage sr_mod cdrom ata_generic pata_acpi uhci_hcd ahci 
libahci ata_piix libata ehci_pci ehci_hcd
Feb 01 01:46:27 diefledermaus kernel: CPU: 1 PID: 16727 Comm: kworker/u4:0 Not 
tainted 4.9.6-gentoo #1
Feb 01 01:46:27 diefledermaus kernel: Hardware name: Apple Inc. 
Macmini2,1/Mac-F4208EAA, BIOS MM21.88Z.009A.B00.0706281359 06/28/07
Feb 01 01:46:27 diefledermaus kernel: Workqueue: btrfs-extent-refs 
btrfs_extent_refs_helper
Feb 01 01:46:27 diefledermaus kernel:   812cf739 
c9000285fd60 
Feb 01 01:46:27 diefledermaus kernel:  8104908a 8800428df1e0 
c9000285fdb0 0020
Feb 01 01:46:27 diefledermaus kernel:  880003c1b1b8 8800bb73e900 
 810490fa
Feb 01 01:46:27 diefledermaus kernel: Call Trace:
Feb 01 01:46:27 diefledermaus kernel:  [] ? 
dump_stack+0x46/0x5d
Feb 01 01:46:27 diefledermaus kernel:  [] ? __warn+0xba/0xe0
Feb 01 01:46:27 diefledermaus kernel:  [] ? 
warn_slowpath_fmt+0x4a/0x50
Feb 01 01:46:27 diefledermaus kernel:  [] ? 
btrfs_run_delayed_refs+0x278/0x2b0
Feb 01 01:46:27 diefledermaus kernel:  [] ? 
delayed_ref_async_start+0x84/0xa0
Feb 01 01:46:27 diefledermaus kernel:  [] ? 
process_one_work+0x126/0x310
Feb 01 01:46:27 diefledermaus kernel:  [] ? 

Re: btrfs receive leaves new subvolume modifiable during operation

2017-02-02 Thread Graham Cobb
On 02/02/17 00:02, Duncan wrote:
> If it's a workaround, then many of the Linux procedures we as admins and 
> users use every day are equally workarounds.  Setting 007 perms on a dir 
> that doesn't have anything immediately security vulnerable in it, simply 
> to keep other users from even potentially seeing or being able to write 
> to something N layers down the subdir tree, is standard practice.

No. There is no need to normally place a read-only snapshot below a
no-execute directory just to prevent write access to it. That is not
part of the admin's expectation.

> Which is my point.  This is no different than standard security practice, 
> that an admin should be familiar with and using without even having to 
> think about it.  Btrfs is simply making the same assumptions that 
> everyone else does, that an admin knows what they are doing and sets the 
> upstream permissions with that in mind.  If they don't, how is that 
> btrfs' fault?

Because btrfs intends the receive snapshot to be read-only. That is the
expectation of the sysadmin. It is an important and useful feature which
makes send/receive very useful for creating
user-readable-but-not-modifiable backups (without it, send/receive are
useful for many things but less useful for creating backups). That
feature has a bug.

Just because you don't personally use the feature, doesn't mean it isn't
a bug! Many of us do rely on that feature.

Even though it is security-related, I agree it isn't the highest
priority btrfs bug. It can probably wait until receive is being worked
on for other reasons. But if it isn't going to be fixed any time soon,
it should be documented in the Wiki and the man page, with the suggested
workround for anyone who needs to make sure the receive won't be
tampered with.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] btrfs-progs: fsck-tests: verify 'btrfs check --repair' fixes corrupted nlink field

2017-02-02 Thread Lakshmipathi.G
Signed-off-by: Lakshmipathi.G 
---
 tests/fsck-tests/026-check-inode-link/test.sh | 30 +++
 1 file changed, 30 insertions(+)
 create mode 100755 tests/fsck-tests/026-check-inode-link/test.sh

diff --git a/tests/fsck-tests/026-check-inode-link/test.sh 
b/tests/fsck-tests/026-check-inode-link/test.sh
new file mode 100755
index 000..6822ee2
--- /dev/null
+++ b/tests/fsck-tests/026-check-inode-link/test.sh
@@ -0,0 +1,30 @@
+#!/bin/bash
+# verify that 'btrfs check --repair' fixes corrupted inode nlink field
+
+source $TOP/tests/common
+
+check_prereq btrfs-corrupt-block
+check_prereq mkfs.btrfs
+
+setup_root_helper
+prepare_test_dev 512M
+
+test_inode_nlink_field()
+{
+   run_check $SUDO_HELPER $TOP/mkfs.btrfs -f $TEST_DEV
+
+   run_check_mount_test_dev
+   run_check $SUDO_HELPER touch $TEST_MNT/test_nlink.txt
+
+   # find inode_number
+   inode_number=`stat -c%i $TEST_MNT/test_nlink.txt`
+   run_check_umount_test_dev
+
+   # corrupt nlink field of inode object
+run_check $SUDO_HELPER $TOP/btrfs-corrupt-block -i $inode_number \
+   -f nlink $TEST_DEV
+
+   check_image $TEST_DEV
+}
+
+test_inode_nlink_field
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html