So, I am thinking to try modifying dbuf_read_impl() to fill the arc buf with appropriate level holes instead of doing bzero() when reading metadata-level holes whose birth epoch is greater than zero (or greater than the than the hole_birth feature txg perhaps).
Boris. ________________________________ From: Boris <bprotopo...@hotmail.com> Sent: Tuesday, November 17, 2015 3:00 PM To: Matthew Ahrens Cc: developer@open-zfs.org; zfs-de...@list.zfsonlinux.org Subject: Re: [OpenZFS Developer] zfs send not detecting new holes Hi, Matt, I believe I did reproduce the problem. The difficulty was really with creating an L1 hole. Which I managed with a zfs recv of an empty L1 range from one zvol to another. The target zvol had L1 hole in place of the L1 range filled with L0 holes in the source zvol. The issue that I see is as follows (the datasets have compression on, the pool has hole_birth feature active). If the L1 hole is later partially overwritten with non-zero data, then the result is that a new L1 block is allocated and is partially filled in with new L0 block pointers pointing to non-zero blocks. Unfortunately, the rest of the L1 block appears to be left initialized with zeros (to zdb, it looks like bunch of holes with 0 birth epoch). But this is a wrong thing to do, because now, this hole at the end of the L1 range in question is "old" whereas it should retail the birth epoch of the original L1 hole ("new"). But it does not. So, the next zfs send disregards this hole, which results in lost FREE record(s) in the corresponding zfs send stream. I have the datasets snapped and local, I can reproduce this problem and can dump any zdb data if needed. Here are some snippets of the zdb output.. Before the overwrite: 7c0000 L0 1:2584800:10000 10000L/10000P F=1 B=346/346 7d0000 L0 1:2594800:10000 10000L/10000P F=1 B=346/346 7e0000 L0 1:25a4800:10000 10000L/10000P F=1 B=346/346 7f0000 L0 1:25b4800:10000 10000L/10000P F=1 B=346/346 800000 L1 4000L B=548 1000000 L1 0:7c23e00:400 1:7c2d200:400 4000L/400P F=128 B=1268/1268 1000000 L0 0:7808c00:10000 10000L/10000P F=1 B=1268/1268 1010000 L0 0:7818c00:10000 10000L/10000P F=1 B=1268/1268 1020000 L0 0:7828c00:10000 10000L/10000P F=1 B=1268/1268 The L1 hole is at offset 800000. After the partial overwrite (10 blocks written at the beginning of the L1 range): 7d0000 L0 1:2594800:10000 10000L/10000P F=1 B=346/346 7e0000 L0 1:25a4800:10000 10000L/10000P F=1 B=346/346 7f0000 L0 1:25b4800:10000 10000L/10000P F=1 B=346/346 800000 L1 0:ea36000:600 1:f2d6000:600 4000L/600P F=10 B=1749/1749 800000 L0 1:f246000:10000 10000L/10000P F=1 B=1749/1749 810000 L0 1:f256000:10000 10000L/10000P F=1 B=1749/1749 820000 L0 1:f266000:10000 10000L/10000P F=1 B=1749/1749 830000 L0 1:f276000:10000 10000L/10000P F=1 B=1749/1749 840000 L0 1:f286000:10000 10000L/10000P F=1 B=1749/1749 850000 L0 1:f296000:10000 10000L/10000P F=1 B=1749/1749 860000 L0 1:f2a6000:10000 10000L/10000P F=1 B=1749/1749 870000 L0 1:f2c6000:10000 10000L/10000P F=1 B=1749/1749 880000 L0 1:f2b6000:10000 10000L/10000P F=1 B=1749/1749 890000 L0 0:ea26000:10000 10000L/10000P F=1 B=1749/1749 1000000 L1 0:7c23e00:400 1:7c2d200:400 4000L/400P F=128 B=1268/1268 1000000 L0 0:7808c00:10000 10000L/10000P F=1 B=1268/1268 Dump of the new L1 block's contents: # zdb tpool -R 0:ea36000:600:di Found vdev: /dev/sdk1 DVA[0]=<1:f246000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 cksum=2041f382f58d:408f994ef048de7:daf0e1cadf74f53:47c3fdb952a1e13f DVA[0]=<1:f256000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 cksum=20306f1938d5:403381ccd468d8a:713a193137858160:d91b1c5cecb306af DVA[0]=<1:f266000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 cksum=1fe160444ab9:3fcbaeb1f31c86e:11198655b490d3b5:76cd3d278385af3e DVA[0]=<1:f276000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 cksum=201cb0f386db:4035cd7deb41749:6b4d734a11ce04b5:1fbc2dc2f169dcae DVA[0]=<1:f286000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 cksum=202d87c1a695:403d7ff6a1a6f53:66b049fa47216fb4:848b133855fab5b DVA[0]=<1:f296000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 cksum=200db48ae914:40163392a6e1f2a:62ad5c6b01c39d36:b1fa1b14d986fa82 DVA[0]=<1:f2a6000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 cksum=1feb72709d3a:3f8dbd7a3f1e98f:ab207f926cc8b2fc:c9a1145f06e1f9ab DVA[0]=<1:f2c6000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 cksum=1fd90de96ff0:3f98dc852f15900:a3cd2aed016bc0a9:eb9f507ffe495f15 DVA[0]=<1:f2b6000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 cksum=20361039de8d:40b6d13e4438295:75686dbb7da50937:3217ceae84d5b538 DVA[0]=<0:ea26000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 cksum=202a397fecd2:405130f54a9a83d:cda024b471659627:edf740c0ca1563eb HOLE [L0 unallocated] size=200L birth=0L HOLE [L0 unallocated] size=200L birth=0L HOLE [L0 unallocated] size=200L birth=0L .... HOLE [L0 unallocated] size=200L birth=0L HOLE [L0 unallocated] size=200L birth=0L HOLE [L0 unallocated] size=200L birth=0L The uncompressed data is likely due to the /dev/urandom source. The volume does have lz4 compression set (and had before the overwrite - inherited from the pool). By induction, a similar issue is likely to arise with an Ln hole when it is partially overwritten with non-hole block pointers. The remainder of the new indirect block allocated in place of the Ln hole needs to be backfilled with Ln-1 holes with the same birth epoch as the original Ln hole. At this time, it is not clear to me how this is best accomplished.. Any pointers are highly appreciated. Best regards, Boris. ________________________________ From: Matthew Ahrens <mahr...@delphix.com> Sent: Monday, November 16, 2015 5:14 PM To: Boris Cc: zfs-de...@list.zfsonlinux.org; developer@open-zfs.org Subject: Re: [OpenZFS Developer] zfs send not detecting new holes On Mon, Nov 16, 2015 at 4:36 AM, Boris <bprotopo...@hotmail.com<mailto:bprotopo...@hotmail.com>> wrote: I should have been more specific, in my case I see the problem with zvols: the first snapshot has a non-zero block, the next snapshot has the block overwrite with zeros, but the stream lacks the free record. The zvol is ~1.2T, 64k block size, sparse, has lz4 compression on. In that case I don't think your problem is related to the bug I mentioned, which only has to do with objects that have been reallocated. You must be seeing a different issue. We also can not reproduce your issue with a simple test case. --matt Typos courtesy of my iPhone On Nov 15, 2015, at 12:25 PM, Matthew Ahrens <mahr...@delphix.com<mailto:mahr...@delphix.com>> wrote: btw, here is the bug you're asking about: https://www.illumos.org/issues/6370 --matt On Sun, Nov 15, 2015 at 9:24 AM, Matthew Ahrens <mahr...@delphix.com<mailto:mahr...@delphix.com>> wrote: We have a fix for this that we need to upstream. We are waiting on code reviews for another change to send/receive: https://github.com/openzfs/openzfs/pull/23 6393 zfs receive a full send as a clone I'll probably stop waiting soon and RTI it, then we get get our fix for this in. --matt On Sun, Nov 15, 2015 at 8:37 AM, Boris <bprotopo...@hotmail.com<mailto:bprotopo...@hotmail.com>> wrote: Hi, guys, I've been looking an issue where sometimes, after non-zero data blocks are overwritten with zero blocks with compression on, the corresponding incremental send stream does not include the FREE record for those blocks. The zdb -ddddddd output seems to indicate that the blocks in question have never been written (the offsets for them are not listed in the output). This looks like the issue addressed by commit a4069eef2e403a3b2a307b23b7500e2adc6ecae5 Author: Prakash Surya <prakash.su...@delphix.com<mailto:prakash.su...@delphix.com>> Date: Fri Mar 27 13:03:22 2015 +1100 Illumos 5695 - dmu_sync'ed holes do not retain birth time but I certainly do have that commit. I have experimented with overwriting blocks at different offsets, ranges of blocks spanning L1 and L2 block pointers, but I cannot reproduce the issue. Any suggestions for directions to look ? Perhaps for a way to shape the block tree such that this problem could arise ? Best regards, Boris. _______________________________________________ developer mailing list developer@open-zfs.org<mailto:developer@open-zfs.org> http://lists.open-zfs.org/mailman/listinfo/developer
_______________________________________________ developer mailing list developer@open-zfs.org http://lists.open-zfs.org/mailman/listinfo/developer