So, I am thinking to try modifying dbuf_read_impl() to fill the arc buf with 
appropriate level holes instead of doing bzero() when reading metadata-level 
holes whose birth epoch is greater than zero (or greater than the than the 
hole_birth feature txg perhaps).


Boris.

________________________________
From: Boris <bprotopo...@hotmail.com>
Sent: Tuesday, November 17, 2015 3:00 PM
To: Matthew Ahrens
Cc: developer@open-zfs.org; zfs-de...@list.zfsonlinux.org
Subject: Re: [OpenZFS Developer] zfs send not detecting new holes


Hi, Matt,


I believe I did reproduce the problem. The difficulty was really with creating 
an L1 hole. Which I managed with a zfs recv of an empty L1 range from one zvol 
to another. The target zvol had L1 hole in place of the L1 range filled with L0 
holes in the source zvol.

The issue that I see is as follows (the datasets have compression on, the pool 
has hole_birth feature active).  If the L1 hole is later partially overwritten 
with non-zero data, then the result is that a new L1 block is allocated and is 
partially filled in with new L0 block pointers pointing to non-zero blocks. 
Unfortunately, the rest of the L1 block appears to be left initialized with 
zeros (to zdb, it looks like bunch of holes with 0 birth epoch). But this is a 
wrong thing to do, because now, this hole at the end of the L1 range in 
question is "old" whereas it should retail the birth epoch of the original L1 
hole ("new"). But it does not. So, the next zfs send disregards this hole, 
which results in lost FREE record(s) in the corresponding zfs send stream.

I have the datasets snapped and local, I can reproduce this problem and can 
dump any zdb data if needed. Here are some snippets of the zdb output..

Before the overwrite:

          7c0000   L0 1:2584800:10000 10000L/10000P F=1 B=346/346

          7d0000   L0 1:2594800:10000 10000L/10000P F=1 B=346/346

          7e0000   L0 1:25a4800:10000 10000L/10000P F=1 B=346/346

          7f0000   L0 1:25b4800:10000 10000L/10000P F=1 B=346/346

          800000  L1  4000L B=548

         1000000  L1  0:7c23e00:400 1:7c2d200:400 4000L/400P F=128 B=1268/1268

         1000000   L0 0:7808c00:10000 10000L/10000P F=1 B=1268/1268

         1010000   L0 0:7818c00:10000 10000L/10000P F=1 B=1268/1268

         1020000   L0 0:7828c00:10000 10000L/10000P F=1 B=1268/1268

The L1 hole is at offset 800000. After the partial overwrite (10 blocks written 
at the beginning of the L1 range):


          7d0000   L0 1:2594800:10000 10000L/10000P F=1 B=346/346

          7e0000   L0 1:25a4800:10000 10000L/10000P F=1 B=346/346

          7f0000   L0 1:25b4800:10000 10000L/10000P F=1 B=346/346

          800000  L1  0:ea36000:600 1:f2d6000:600 4000L/600P F=10 B=1749/1749

          800000   L0 1:f246000:10000 10000L/10000P F=1 B=1749/1749

          810000   L0 1:f256000:10000 10000L/10000P F=1 B=1749/1749

          820000   L0 1:f266000:10000 10000L/10000P F=1 B=1749/1749

          830000   L0 1:f276000:10000 10000L/10000P F=1 B=1749/1749

          840000   L0 1:f286000:10000 10000L/10000P F=1 B=1749/1749

          850000   L0 1:f296000:10000 10000L/10000P F=1 B=1749/1749

          860000   L0 1:f2a6000:10000 10000L/10000P F=1 B=1749/1749

          870000   L0 1:f2c6000:10000 10000L/10000P F=1 B=1749/1749

          880000   L0 1:f2b6000:10000 10000L/10000P F=1 B=1749/1749

          890000   L0 0:ea26000:10000 10000L/10000P F=1 B=1749/1749

         1000000  L1  0:7c23e00:400 1:7c2d200:400 4000L/400P F=128 B=1268/1268

         1000000   L0 0:7808c00:10000 10000L/10000P F=1 B=1268/1268

Dump of the new L1 block's contents:

# zdb tpool -R 0:ea36000:600:di

Found vdev: /dev/sdk1

DVA[0]=<1:f246000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous 
unique single size=10000L/10000P birth=1749L/1749P fill=1 
cksum=2041f382f58d:408f994ef048de7:daf0e1cadf74f53:47c3fdb952a1e13f

DVA[0]=<1:f256000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous 
unique single size=10000L/10000P birth=1749L/1749P fill=1 
cksum=20306f1938d5:403381ccd468d8a:713a193137858160:d91b1c5cecb306af

DVA[0]=<1:f266000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous 
unique single size=10000L/10000P birth=1749L/1749P fill=1 
cksum=1fe160444ab9:3fcbaeb1f31c86e:11198655b490d3b5:76cd3d278385af3e

DVA[0]=<1:f276000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous 
unique single size=10000L/10000P birth=1749L/1749P fill=1 
cksum=201cb0f386db:4035cd7deb41749:6b4d734a11ce04b5:1fbc2dc2f169dcae

DVA[0]=<1:f286000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous 
unique single size=10000L/10000P birth=1749L/1749P fill=1 
cksum=202d87c1a695:403d7ff6a1a6f53:66b049fa47216fb4:848b133855fab5b

DVA[0]=<1:f296000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous 
unique single size=10000L/10000P birth=1749L/1749P fill=1 
cksum=200db48ae914:40163392a6e1f2a:62ad5c6b01c39d36:b1fa1b14d986fa82

DVA[0]=<1:f2a6000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous 
unique single size=10000L/10000P birth=1749L/1749P fill=1 
cksum=1feb72709d3a:3f8dbd7a3f1e98f:ab207f926cc8b2fc:c9a1145f06e1f9ab

DVA[0]=<1:f2c6000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous 
unique single size=10000L/10000P birth=1749L/1749P fill=1 
cksum=1fd90de96ff0:3f98dc852f15900:a3cd2aed016bc0a9:eb9f507ffe495f15

DVA[0]=<1:f2b6000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous 
unique single size=10000L/10000P birth=1749L/1749P fill=1 
cksum=20361039de8d:40b6d13e4438295:75686dbb7da50937:3217ceae84d5b538

DVA[0]=<0:ea26000:10000> [L0 zvol object] fletcher4 uncompressed LE contiguous 
unique single size=10000L/10000P birth=1749L/1749P fill=1 
cksum=202a397fecd2:405130f54a9a83d:cda024b471659627:edf740c0ca1563eb

HOLE [L0 unallocated] size=200L birth=0L

HOLE [L0 unallocated] size=200L birth=0L

HOLE [L0 unallocated] size=200L birth=0L

....

HOLE [L0 unallocated] size=200L birth=0L

HOLE [L0 unallocated] size=200L birth=0L

HOLE [L0 unallocated] size=200L birth=0L

The uncompressed data is likely due to the /dev/urandom source. The volume does 
have lz4 compression set (and had before the overwrite - inherited from the 
pool).

By induction, a similar issue is likely to arise with an Ln hole when it is 
partially overwritten with non-hole block pointers. The remainder of the new 
indirect block allocated in place of the Ln hole needs to be backfilled with 
Ln-1 holes with the same birth epoch as the original Ln hole.

At this time, it is not clear to me how this is best accomplished.. Any 
pointers are highly appreciated.

Best regards,
Boris.

________________________________
From: Matthew Ahrens <mahr...@delphix.com>
Sent: Monday, November 16, 2015 5:14 PM
To: Boris
Cc: zfs-de...@list.zfsonlinux.org; developer@open-zfs.org
Subject: Re: [OpenZFS Developer] zfs send not detecting new holes



On Mon, Nov 16, 2015 at 4:36 AM, Boris 
<bprotopo...@hotmail.com<mailto:bprotopo...@hotmail.com>> wrote:
I should have been more specific, in my case I see the problem with zvols: the 
first snapshot has a non-zero block, the next snapshot has the block overwrite 
with zeros, but the stream lacks the free record. The zvol is ~1.2T, 64k block 
size, sparse, has lz4 compression on.

In that case I don't think your problem is related to the bug I mentioned, 
which only has to do with objects that have been reallocated.  You must be 
seeing a different issue.  We also can not reproduce your issue with a simple 
test case.

--matt


Typos courtesy of my iPhone

On Nov 15, 2015, at 12:25 PM, Matthew Ahrens 
<mahr...@delphix.com<mailto:mahr...@delphix.com>> wrote:

btw, here is the bug you're asking about:
https://www.illumos.org/issues/6370

--matt

On Sun, Nov 15, 2015 at 9:24 AM, Matthew Ahrens 
<mahr...@delphix.com<mailto:mahr...@delphix.com>> wrote:
We have a fix for this that we need to upstream.  We are waiting on code 
reviews for another change to send/receive:

https://github.com/openzfs/openzfs/pull/23
6393 zfs receive a full send as a clone

I'll probably stop waiting soon and RTI it, then we get get our fix for this in.

--matt

On Sun, Nov 15, 2015 at 8:37 AM, Boris 
<bprotopo...@hotmail.com<mailto:bprotopo...@hotmail.com>> wrote:

Hi, guys,


I've been looking an issue where sometimes, after non-zero data blocks are 
overwritten with zero blocks with compression on, the corresponding incremental 
send stream does not include the FREE record for those blocks. The zdb -ddddddd 
output seems to indicate that the blocks in question have never been written 
(the offsets for them are not listed in the output).


This looks like the issue addressed by


commit a4069eef2e403a3b2a307b23b7500e2adc6ecae5

Author: Prakash Surya 
<prakash.su...@delphix.com<mailto:prakash.su...@delphix.com>>

Date:   Fri Mar 27 13:03:22 2015 +1100


    Illumos 5695 - dmu_sync'ed holes do not retain birth time


but I certainly do have that commit. I have experimented with overwriting 
blocks at different offsets, ranges of blocks spanning L1 and L2 block 
pointers, but I cannot reproduce the issue.


Any suggestions for directions to look ? Perhaps for a way to shape the block 
tree such that this problem could arise ?


Best regards,

Boris.

_______________________________________________
developer mailing list
developer@open-zfs.org<mailto:developer@open-zfs.org>
http://lists.open-zfs.org/mailman/listinfo/developer




_______________________________________________
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to