Re: [ClusterLabs] OCFS2 fragmentation with snapshots

2021-05-18 Thread Andrei Borzenkov
On Tue, May 18, 2021 at 1:52 PM Ulrich Windl
 wrote:
>
> Hi!
>
> I thought using the reflink feature of OCFS2 would be just a nice way to make 
> crash-consistent VM snapshots while they are running.
> As it is a bit tricky to find out how much data is shared between snapshots, 
> I started to write an utility to examine the blocks allocated to the VM 
> backing files and snapshots.
>
> Unfortunately (as it seems) OCFS2 fragments terribly under reflink snapshots.
>

Does it cause any real problem?
...
>
> So I wonder (while understanding the principle of copy-on-write for reflink 
> snapshots):
> Is there a way to avoid or undo the fragmentation?
>

If you understand the principles you can answer it yourself. To avoid
fragmentation you need to avoid reflink.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] OCFS2 fragmentation with snapshots

2021-05-18 Thread Strahil Nikolov
Are you using KVM ?
Maybe you can create a snapshot on VM level and then defragfs.ocfs2 the  
read-only part of the VM disk file and after the defrag -> merge it back by 
deleting the snapshot ?
Yet, the whole idea seems wrong to me. I would freeze the FS  and the 
application in the VM and then make a snapshot via your Virtualization tech 
stack.
Best Regards,Strahil Nikolov
 
 
  On Tue, May 18, 2021 at 13:52, Ulrich 
Windl wrote:   Hi!

I thought using the reflink feature of OCFS2 would be just a nice way to make 
crash-consistent VM snapshots while they are running.
As it is a bit tricky to find out how much data is shared between snapshots, I 
started to write an utility to examine the blocks allocated to the VM backing 
files and snapshots.

Unfortunately (as it seems) OCFS2 fragments terribly under reflink snapshots.

Here is an example of a rather "good" file: It has 85 extents that are rather 
large (not that the extents are sorted by first block; in reality it's a bit 
worse):
DEBUG(5): update_stats: blk_list[0]: 3551627-3551632 (6, 0x2000)
DEBUG(5): update_stats: blk_list[1]: 3553626-3556978 (3353, 0x2000)
DEBUG(5): update_stats: blk_list[2]: 16777217-16780688 (3472, 0x2000)
DEBUG(5): update_stats: blk_list[3]: 16780689-16792832 (12144, 0x2000)
DEBUG(5): update_stats: blk_list[4]: 17301147-17304618 (3472, 0x2000)
DEBUG(5): update_stats: blk_list[5]: 17304619-17316762 (12144, 0x2000)
...
DEBUG(5): update_stats: blk_list[81]: 31178385-31190528 (12144, 0x2000)
DEBUG(5): update_stats: blk_list[82]: 31191553-31195024 (3472, 0x2000)
DEBUG(5): update_stats: blk_list[83]: 31195025-31207168 (12144, 0x2000)
DEBUG(5): update_stats: blk_list[84]: 31210641-31222385 (11745, 0x2001)
filesystem: 655360 blocks of size 16384
655360 (100%) blocks type 0x2000 (shared)

And here's a terrible example (33837 extents):
DEBUG(4): finalize_blockstats: blk_list[0]: 257778-257841 (64, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[1]: 257842-257905 (64, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[2]: 263503-263513 (11, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[3]: 263558-263558 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[4]: 263559-263569 (11, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[5]: 263587-263587 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[6]: 263597-263610 (14, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[7]: 270414-270415 (2, 0x2000)
...
DEBUG(4): finalize_blockstats: blk_list[90]: 382214-382406 (193, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[91]: 382791-382918 (128, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[92]: 382983-382990 (8, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[93]: 383520-383522 (3, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[94]: 384672-384692 (21, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[95]: 384860-384918 (59, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[96]: 385088-385089 (2, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[97]: 385090-385091 (2, 0x2000)
...
DEBUG(4): finalize_blockstats: blk_list[805]: 2769213-2769213 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[806]: 2769214-2769214 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[807]: 2769259-2769259 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[808]: 2769261-2769261 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[809]: 2769314-2769314 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[810]: 2772041-2772042 (2, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[811]: 2772076-2772076 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[812]: 2772078-2772078 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[813]: 2772079-2772080 (2, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[814]: 2772096-2772096 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[815]: 2772099-2772099 (1, 0x2000)
...
DEBUG(4): finalize_blockstats: blk_list[33829]: 39317682-39317704 (23, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33830]: 39317770-39317775 (6, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33831]: 39318022-39318045 (24, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33832]: 39318274-39318284 (11, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33833]: 39318327-39318344 (18, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33834]: 39319157-39319166 (10, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33835]: 39319172-39319184 (13, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33836]: 39319896-39319936 (41, 0x2000)
filesystem: 1966076 blocks of size 16384
mapped=1121733 (57%)
1007658 (51%) blocks type 0x2000 (shared)
114075 (6%) blocks type 0x2800 (unwritten|shared)

So I wonder (while understanding the principle of copy-on-write for reflink 
snapshots):
Is there a way to avoid or undo the fragmentation?

Regards,
Ulrich

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
___
Manage your subscription:

Re: [ClusterLabs] OCFS2 fragmentation with snapshots

2021-05-20 Thread Gang He

Hi Ulrich,



On 2021/5/18 18:52, Ulrich Windl wrote:

Hi!

I thought using the reflink feature of OCFS2 would be just a nice way to make 
crash-consistent VM snapshots while they are running.
As it is a bit tricky to find out how much data is shared between snapshots, I 
started to write an utility to examine the blocks allocated to the VM backing 
files and snapshots.

Unfortunately (as it seems) OCFS2 fragments terribly under reflink snapshots.

Here is an example of a rather "good" file: It has 85 extents that are rather 
large (not that the extents are sorted by first block; in reality it's a bit worse):
DEBUG(5): update_stats: blk_list[0]: 3551627-3551632 (6, 0x2000)
DEBUG(5): update_stats: blk_list[1]: 3553626-3556978 (3353, 0x2000)
DEBUG(5): update_stats: blk_list[2]: 16777217-16780688 (3472, 0x2000)
DEBUG(5): update_stats: blk_list[3]: 16780689-16792832 (12144, 0x2000)
DEBUG(5): update_stats: blk_list[4]: 17301147-17304618 (3472, 0x2000)
DEBUG(5): update_stats: blk_list[5]: 17304619-17316762 (12144, 0x2000)
...
DEBUG(5): update_stats: blk_list[81]: 31178385-31190528 (12144, 0x2000)
DEBUG(5): update_stats: blk_list[82]: 31191553-31195024 (3472, 0x2000)
DEBUG(5): update_stats: blk_list[83]: 31195025-31207168 (12144, 0x2000)
DEBUG(5): update_stats: blk_list[84]: 31210641-31222385 (11745, 0x2001)
filesystem: 655360 blocks of size 16384
655360 (100%) blocks type 0x2000 (shared)

And here's a terrible example (33837 extents):
DEBUG(4): finalize_blockstats: blk_list[0]: 257778-257841 (64, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[1]: 257842-257905 (64, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[2]: 263503-263513 (11, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[3]: 263558-263558 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[4]: 263559-263569 (11, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[5]: 263587-263587 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[6]: 263597-263610 (14, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[7]: 270414-270415 (2, 0x2000)
...
DEBUG(4): finalize_blockstats: blk_list[90]: 382214-382406 (193, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[91]: 382791-382918 (128, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[92]: 382983-382990 (8, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[93]: 383520-383522 (3, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[94]: 384672-384692 (21, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[95]: 384860-384918 (59, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[96]: 385088-385089 (2, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[97]: 385090-385091 (2, 0x2000)
...
DEBUG(4): finalize_blockstats: blk_list[805]: 2769213-2769213 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[806]: 2769214-2769214 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[807]: 2769259-2769259 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[808]: 2769261-2769261 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[809]: 2769314-2769314 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[810]: 2772041-2772042 (2, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[811]: 2772076-2772076 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[812]: 2772078-2772078 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[813]: 2772079-2772080 (2, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[814]: 2772096-2772096 (1, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[815]: 2772099-2772099 (1, 0x2000)
...
DEBUG(4): finalize_blockstats: blk_list[33829]: 39317682-39317704 (23, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33830]: 39317770-39317775 (6, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33831]: 39318022-39318045 (24, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33832]: 39318274-39318284 (11, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33833]: 39318327-39318344 (18, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33834]: 39319157-39319166 (10, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33835]: 39319172-39319184 (13, 0x2000)
DEBUG(4): finalize_blockstats: blk_list[33836]: 39319896-39319936 (41, 0x2000)
filesystem: 1966076 blocks of size 16384
mapped=1121733 (57%)
1007658 (51%) blocks type 0x2000 (shared)
114075 (6%) blocks type 0x2800 (unwritten|shared)

So I wonder (while understanding the principle of copy-on-write for reflink 
snapshots):
Is there a way to avoid or undo the fragmentation?


Since these files(the original file and the cloned files) share the same 
extent tree, after the files are written,the extents are split(fragmented).
There is a un-fragmentation tool in ocfs2-tools upstream, but it 
obviously do not work for this case(reflink file).
The workaround is, copy the cloned(have fragmentated) file to a new 
file, and delete the cloned file.


Thanks
Gang



Regards,
Ulrich

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscripti