On Thu, Jun 11, 2015 at 7:17 PM, Jeff Mahoney <je...@suse.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 6/11/15 12:47 PM, Filipe David Manana wrote:
>> On Thu, Jun 11, 2015 at 4:20 PM,  <je...@suse.com> wrote:
>>> From: Jeff Mahoney <je...@suse.com>
>>>
>>> Btrfs doesn't track superblocks with extent records so there is
>>> nothing persistent on-disk to indicate that those blocks are in
>>> use.  We track the superblocks in memory to ensure they don't get
>>> used by removing them from the free space cache when we load a
>>> block group from disk.  Prior to 47ab2a6c6a (Btrfs: remove empty
>>> block groups automatically), that was fine since the block group
>>> would never be reclaimed so the superblock was always safe.  Once
>>> we started removing the empty block groups, we were protected by
>>> the fact that discards weren't being properly issued for unused
>>> space either via FITRIM or -odiscard.  The block groups were
>>> still being released, but the blocks remained on disk.
>>>
>>> In order to properly discard unused block groups, we need to
>>> filter out the superblocks from the discard range.  Superblocks
>>> are located at fixed locations on each device, so it makes sense
>>> to filter them out in btrfs_issue_discard, which is used by both
>>> -odiscard and FITRIM.
>>>
>>> Signed-off-by: Jeff Mahoney <je...@suse.com> ---
>>> fs/btrfs/extent-tree.c | 50
>>> ++++++++++++++++++++++++++++++++++++++++++++------ 1 file
>>> changed, 44 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 0ec3acd..75d0226 100644 --- a/fs/btrfs/extent-tree.c +++
>>> b/fs/btrfs/extent-tree.c @@ -1884,10 +1884,47 @@ static int
>>> remove_extent_backref(struct btrfs_trans_handle *trans, return
>>> ret; }
>>>
>>> -static int btrfs_issue_discard(struct block_device *bdev, -
>>> u64 start, u64 len) +#define in_range(b, first, len)        ((b)
>>> >= (first) && (b) < (first) + (len))
>>
>> Hi Jeff,
>>
>> So this will work if every caller behaves well and passes a region
>> whose start and end offsets are a multiple of the sector size
>> (4096) which currently matches the superblock size.
>>
>> However, I think it would be safer to check for the case where the
>> start offset of a superblock mirror is < (first) and (sb_offset +
>> sb_len) > (first).  Just to deal with cases where for example the
>> 2nd half of the sb starts at offset (first).
>>
>> I guess this sectorsize becoming less than 4096 will happen sooner
>> or later with the subpage sectorsize patch set, so it wouldn't hurt
>> to make it more bullet proof already.
>
> Is that something anyone intends to support?  While I suppose the
> subpage sector patch /could/ be used to allow file systems with a node
> size under 4k, the intention is the other way around -- systems that
> have higher order page sizes currently don't work with btrfs file
> system created on systems with smaller order page sizes like x86.
> Btrfs already has high enough metadata overhead.  Pretty much all new
> hardware has, at least, a native 4k sector size even if it's
> abstracted behind a RMW layer.  The sectors are only going to get
> larger.  With the metadata overhead that btrfs already incurs, I can't
> imagine any production use case with smaller sector sizes.
>
> Are we looking to support <4k nodes to test the subpage sector code on
> x86?  If so, then I'll change this to handle the possibility of
> superblocks crossing sector boundaries.  Otherwise, it's protecting
> against a use case that just shouldn't happen.

I understand your point.
I'm probably being too paranoid. But it's exactly because it's not
supposed to happen that at least an assertion or something should be
added imho. A lot of "not supposed not happen things" happen often,
and that's often how people lose data, and get into other bad issues.

And I think I've heard once of supporting <4k nodes (sectorsizes) for
testing at least on x86 for e.g, but I might have not understood it
correctly. Having such a check would help detect bugs during
development where some caller passes a wrong range to discard - better
to find it during development/RCs rather than in production.

But anyway, just a personal preference.

thanks

>
>> Otherwise it looks good to me. I'll give a test on this patchset
>> soon.
>
> Thanks,
>
> - -Jeff
>
>
> - --
> Jeff Mahoney
> SUSE Labs
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
>
> iQIcBAEBAgAGBQJVedCiAAoJEB57S2MheeWymA4P/2Y+YGwAkcbYxrlMVH/wIGpN
> j+qO/y1YHNJQNiRHt/1ahM8wy2DEEbqrD36xUuX5lDYRREo9jeqIGGajOOHg/KFw
> 7q6zIWVEEwom/RKd9CX48TC2pHly5Fw8ESyi+A857KJ1ZHcpKdyNcIwle/Jsoe0q
> a+SX6hJPlHFVai/QZhBRO00ZgXlTwjXAGyfgmfuHERY6lS5PBmoA8tYxnigpnBOa
> PUrgw+Cdn4glrZUTpt3WN4H5oE+pD6cGMQ+GnFXQYaytssyQNuPpCWdQ1Aferg7u
> Af3E6iBj776bQIRTWZSwjXTMLWHVjnBmdU8D+wXE7Ar3oU1POL7NLvnGm4Yr9TWZ
> n1NXcBhJ4QQQXBprK3bI+WNSzMzMLdvJHIb5t2Z9BO8wd5Ws0QlmT8Gl+u/ZofZU
> 8eouhLgu1hZzPeJgXPuDu0S8QaFtpI0zlupOwJByHp9QSzpUw98xqlFXIRtnLc48
> n/ZMFi98EMroaQx0hzFLGMwp/57tUFWUUroDkfP1NngoE482ohPHAdREtr7+RZe6
> h+pfF+//5CycYxk8BciBmCyLWWTW1WdDL/MzQsXdKE+797cNk79+W5EKYLmQiAbW
> IGyRXMj+XPpWNC1VM12JJXdoOujSGocmJa28jKanZRzqw0HKJOlTUBqNoBV83Bph
> ChcTZyhAi3PCsib+mNnW
> =wb8l
> -----END PGP SIGNATURE-----



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to