Hi everyone,
i've spent a bit of time poking around in #12255 - the short version
is "user reporting that, on a 120G pool with dedup on, writing
additional copies of a 100G file where part of the first 128k is
different occupies an additional ~3.2G each, until ENOSPC".
Curiously, on my reproducing setup (git master, 120G file on an SSD
with single file special and dedup vdevs to try and identify where the
space goes, ashift 9, compression=lz4 even though the 100G is from
/dev/urandom, recordsize=1M though that didn't noticably change
matters), zfs list -o space at only one copy reports:
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
dedup2 15.3G 100G 0B 24K 0B 100G
dedup2/copyme 15.3G 100G 0B 100G 0B 0B
and zpool list -v reports:
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP
DEDUP HEALTH ALTROOT
dedup2 158G 99.9G 58.1G - - 0% 63%
1.00x ONLINE -
/deduppool2 119G 99.9G 19.1G - - 0% 83.9%
- ONLINE
dedup - - - - - - - - -
/dedup2dedup 19.5G 31.3M 19.5G - - 0% 0.15%
- ONLINE
special - - - - - - - - -
/dedup2special 19.5G 11.9M 19.5G - - 0% 0.05%
- ONLINE
while at n=2 (I used entirely identical files)...
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
dedup2 12.1G 200G 0B 24K 0B 200G
dedup2/copyme 12.1G 200G 0B 200G 0B 0B
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP
DEDUP HEALTH ALTROOT
dedup2 158G 100G 57.9G - - 0% 63%
2.00x ONLINE -
/deduppool2 119G 100G 19G - - 0% 84.0%
- ONLINE
dedup - - - - - - - - -
/dedup2dedup 19.5G 43.4M 19.5G - - 0% 0.21%
- ONLINE
special - - - - - - - - -
/dedup2special 19.5G 22.8M 19.5G - - 0% 0.11%
- ONLINE
...and so on until returns ENOSPC at n=6 while zpool list still
reports 19G free:
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
dedup2 0B 587G 0B 24K 0B 587G
dedup2/copyme 0B 587G 0B 587G 0B 0B
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP
DEDUP HEALTH ALTROOT
dedup2 158G 100G 57.9G - - 0% 63%
5.87x ONLINE -
/deduppool2 119G 100G 19G - - 0% 84.0%
- ONLINE
dedup - - - - - - - - -
/dedup2dedup 19.5G 40.9M 19.5G - - 0% 0.20%
- ONLINE
special - - - - - - - - -
/dedup2special 19.5G 52.0M 19.4G - - 0% 0.26%
- ONLINE
My hypothesis was metadata, even though 3% overhead seemed like a lot,
but examining zdb -dddd diff between the n=1 and n=2 cases closely, I
couldn't see any obvious culprit. (and 5 or 6 ds proved too much for
any non-diff utility to handle, while hundreds of MB of diff output
was not particularly readable even colorized).
zdb -bb also was not informative (I have copies, if anyone wants, but
suffice it to say they just report 99.9G/200G/.../587G in total, with
nothing besides "ZFS plain file" reporting over 100M usage).
Deleting the files frees the extra space, so it's not lost, it's
just...allocated somewhere I don't readily see a way to report.
Could someone actually familiar with how the dedup sausage works
glance at this? I'm not averse to diving in and investigating further
myself, but figured I'd at least try asking, since there seem to be a
whole bunch of moving parts, and I'm not entirely sure where to look
for "space zdb doesn't seem to count, but gets freed correctly".
(This also works with similar overhead on sparse zvols, so it doesn't
seem to be specific to files?)
Thanks to whoever has any insight,
- Rich
------------------------------------------
openzfs: openzfs-developer
Permalink:
https://openzfs.topicbox.com/groups/developer/Tc83652ec01c42b81-M59c58b0d0aad209372c29c73
Delivery options: https://openzfs.topicbox.com/groups/developer/subscription