On Mon, Nov 16, 2015 at 10:44 AM, Zhao Lei <zhao...@cn.fujitsu.com> wrote:
> Hi, Filipe Manana
>
>> -----Original Message-----
>> From: linux-btrfs-ow...@vger.kernel.org
>> [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Filipe Manana
>> Sent: Monday, November 16, 2015 6:27 PM
>> To: Zhao Lei <zhao...@cn.fujitsu.com>
>> Cc: linux-btrfs@vger.kernel.org
>> Subject: Re: [PATCH] btrfs: Continue replace when set_block_ro failed
>>
>> On Mon, Nov 16, 2015 at 10:07 AM, Zhao Lei <zhao...@cn.fujitsu.com> wrote:
>> > Hi, Filipe Manana
>> >
>> > Thanks for review.
>> >
>> >> -----Original Message-----
>> >> From: Filipe Manana [mailto:fdman...@gmail.com]
>> >> Sent: Friday, November 13, 2015 8:02 PM
>> >> To: Zhao Lei <zhao...@cn.fujitsu.com>
>> >> Cc: linux-btrfs@vger.kernel.org
>> >> Subject: Re: [PATCH] btrfs: Continue replace when set_block_ro failed
>> >>
>> >> On Fri, Nov 13, 2015 at 11:38 AM, Zhao Lei <zhao...@cn.fujitsu.com> wrote:
>> >> > xfstests/011 failed in node with small_size filesystem.
>> >> > Can be reproduced by following script:
>> >> >   DEV_LIST="/dev/vdd /dev/vde"
>> >> >   DEV_REPLACE="/dev/vdf"
>> >> >
>> >> >   do_test()
>> >> >   {
>> >> >       local mkfs_opt="$1"
>> >> >       local size="$2"
>> >> >
>> >> >       dmesg -c >/dev/null
>> >> >       umount $SCRATCH_MNT &>/dev/null
>> >> >
>> >> >       echo  mkfs.btrfs -f $mkfs_opt "${DEV_LIST[*]}"
>> >> >       mkfs.btrfs -f $mkfs_opt "${DEV_LIST[@]}" || return 1
>> >> >       mount "${DEV_LIST[0]}" $SCRATCH_MNT
>> >> >
>> >> >       echo -n "Writing big files"
>> >> >       dd if=/dev/urandom of=$SCRATCH_MNT/t0 bs=1M
>> >> count=1 >/dev/null 2>&1
>> >> >       for ((i = 1; i <= size; i++)); do
>> >> >           echo -n .
>> >> >           /bin/cp $SCRATCH_MNT/t0 $SCRATCH_MNT/t$i || return 1
>> >> >       done
>> >> >       echo
>> >> >
>> >> >       echo Start replace
>> >> >       btrfs replace start -Bf "${DEV_LIST[0]}" "$DEV_REPLACE"
>> >> $SCRATCH_MNT || {
>> >> >           dmesg
>> >> >           return 1
>> >> >       }
>> >> >       return 0
>> >> >   }
>> >> >
>> >> >   # Set size to value near fs size
>> >> >   # for example, 1897 can trigger this bug in 2.6G device.
>> >> >   #
>> >> >   ./do_test "-d raid1 -m raid1" 1897
>> >> >
>> >> > System will report replace fail with following warning in dmesg:
>> >> >
>> >> > Reason:
>> >> >  When big data writen to fs, the whole free space will be allocated
>> >> > for data chunk.
>> >> >  And operation as scrub need to set_block_ro(), and when there is
>> >> > only one metadata chunk in system(or other metadata chunks  are all
>> >> > full), the function will try to allocate a new chunk,  and failed
>> >> > because no space in device.
>> >> >
>> >> > Fix:
>> >> >  When set_block_ro failed for metadata chunk, it is not a problem
>> >> > because scrub_lock forbids commit_trancaction.
>> >>
>> >> Hi Zhao, I'm confused reading this explanation.
>> >>
>> >> Why isn't it a problem if the block group gets modified while the
>> >> transaction is ongoing? Through writepages() for example.
>> >> Committing a transaction isn't the only way to write data or metadata
>> >> to a block group.
>> >>
>> > Metadata chunks always updated in cow, the writepages() will write new
>> > data to different place with operation done by replace.
>>
>> So why do we try to set the block group ro? Your change is really confusing 
>> - if
>> we fail setting the block group to read-only mode, we just ignore it and 
>> proceed
>> - why bother in the first place to set it read-only?
>>
> I find a problem that set_group_ro will failed for metadata when disk almost 
> full,
> but we also have a good news that the time when set_group_ro failed is
> just when it is not necessary.

Yes - but the code will ignore the failure for any error code, not
just -ENOSPC. Surely btrfs_inc_block_group_ro() can return error codes
different from -ENOSPC. You should make it clear in the code (and in
the change log) why ENOSPC is such a special case - why we need to set
the block to RO and why we can ignore the ENOSPC failure.

>
>> Someone reading the code will find it fishy, and going back to git history,
>> your change log doesn't really explains why this is being done and why is
>> it safe to do it.
>>
> Thanks for notice, it is necessary to add detail explanation into changelog.
>
>> You also forgot to paste the warning you mention below "System will report
>> replace fail with following warning in dmesg:" in the change log.
>>
> Agree, I'll paste it in changelog.
>
> Thanks
> Zhaolei
>
>> thanks
>>
>> >
>> > Thanks
>> > Zhaolei
>> >
>> >> thanks
>> >>
>> >> >  Let replace continue in this case is no problem.
>> >> >
>> >> > Tested by above script, and xfstests/011, plus 100 times xfstests/070.
>> >> >
>> >> > Signed-off-by: Zhao Lei <zhao...@cn.fujitsu.com>
>> >> > ---
>> >> >  fs/btrfs/scrub.c | 9 ++++-----
>> >> >  1 file changed, 4 insertions(+), 5 deletions(-)
>> >> >
>> >> > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index
>> >> > a39f5d1..5a30753 100644
>> >> > --- a/fs/btrfs/scrub.c
>> >> > +++ b/fs/btrfs/scrub.c
>> >> > @@ -3532,6 +3532,7 @@ int scrub_enumerate_chunks(struct scrub_ctx
>> >> *sctx,
>> >> >         u64 length;
>> >> >         u64 chunk_offset;
>> >> >         int ret = 0;
>> >> > +       int ro_set = 0;
>> >> >         int slot;
>> >> >         struct extent_buffer *l;
>> >> >         struct btrfs_key key;
>> >> > @@ -3617,10 +3618,7 @@ int scrub_enumerate_chunks(struct scrub_ctx
>> >> *sctx,
>> >> >                 scrub_pause_on(fs_info);
>> >> >                 ret = btrfs_inc_block_group_ro(root, cache);
>> >> >                 scrub_pause_off(fs_info);
>> >> > -               if (ret) {
>> >> > -                       btrfs_put_block_group(cache);
>> >> > -                       break;
>> >> > -               }
>> >> > +               ro_set = !ret;
>> >> >
>> >> >                 dev_replace->cursor_right = found_key.offset +
>> length;
>> >> >                 dev_replace->cursor_left = found_key.offset; @@
>> >> > -3660,7 +3658,8 @@ int scrub_enumerate_chunks(struct scrub_ctx
>> >> > *sctx,
>> >> >
>> >> >                 scrub_pause_off(fs_info);
>> >> >
>> >> > -               btrfs_dec_block_group_ro(root, cache);
>> >> > +               if (ro_set)
>> >> > +                       btrfs_dec_block_group_ro(root, cache);
>> >> >
>> >> >                 btrfs_put_block_group(cache);
>> >> >                 if (ret)
>> >> > --
>> >> > 1.8.5.1
>> >> >
>> >> > --
>> >> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>> >> > in the body of a message to majord...@vger.kernel.org More
>> >> > majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>> >>
>> >>
>> >> --
>> >> Filipe David Manana,
>> >>
>> >> "Reasonable men adapt themselves to the world.
>> >>  Unreasonable men adapt the world to themselves.
>> >>  That's why all progress depends on unreasonable men."
>> >
>>
>>
>>
>> --
>> Filipe David Manana,
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in 
>> the body
>> of a message to majord...@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to