On Tue, Oct 14, 2014 at 9:45 PM, Steven Hartland via illumos-zfs < z...@lists.illumos.org> wrote:
> > ----- Original Message ----- From: "Steven Hartland" > >> I've been investigating an issue for a user who was seeing >> his pool import hang after upgrading on FreeBSD. After >> digging around it turned out the issue was due to lack >> of free space on the pool. >> >> As the pool imports it writes hence requiring space but the >> pool has so little space this was failing. The IO being >> a required IO it retries, but obviously fails again >> resulting the the pool being suspened hence the hang. >> >> With the pool suspended during import it still holds the >> pool lock so all attempts to query the status also hang, >> which is one problem as the user can't tell why the hang >> has occured. >> >> During the debugging I mounted the pool read only and >> sent a copy to another empty pool, which resulted in ~1/2 >> capacity being recovered. This seemed odd but I dismissed >> it at the time. >> >> The machine was then left, with the pool not being accessed, >> however I just recieved an alert from our monitoring for >> a pool failure. On looking I now see the new pool I created >> with 2 write errors and no free space. So just having the >> pool mounted, with no access happening, has managed to use >> the remain 2GB on the 4GB pool. >> >> Has anyone seen this before or has any ideas what might >> be going on? >> >> zdb -m -m -m -m <pool> shows allocation to transactions e.g. >> metaslab 100 offset c8000000 spacemap 1453 free 0 >> segments 0 maxsize 0 freepct 0% >> In-memory histogram: >> On-disk histogram: fragmentation 0 >> [ 0] ALLOC: txg 417, pass 2 >> [ 1] A range: 00c8000000-00c8001600 size: 001600 >> [ 2] ALLOC: txg 417, pass 3 >> [ 3] A range: 00c8001600-00c8003a00 size: 002400 >> [ 4] ALLOC: txg 418, pass 2 >> [ 5] A range: 00c8003a00-00c8005000 size: 001600 >> [ 6] ALLOC: txg 418, pass 3 >> [ 7] A range: 00c8005000-00c8006600 size: 001600 >> [ 8] ALLOC: txg 419, pass 2 >> [ 9] A range: 00c8006600-00c8007c00 size: 001600 >> [ 10] ALLOC: txg 419, pass 3 >> >> I tried destroying the pool and that hung, presumably due to >> IO being suspended after the out of space errors. >> > > After bisecting the kernel changes the commit which seems > to be causing this is: > https://svnweb.freebsd.org/base?view=revision&revision=268650 > https://github.com/freebsd/freebsd/commit/91643324a9009cb5fbc8c00544b778 > 1941f0d5d1 > which correlates to: > https://github.com/illumos/illumos-gate/commit/ > 7fd05ac4dec0c343d2f68f310d3718b715ecfbaf > > I've checked the two make the same changes so there doesn't > seem to have been a downstream merge issue, at least not on > this specific commit. > > My test now consists of: > 1. mdconfig -t malloc -s 4G -S 512 > 2. zpool create tpool md0 > 3. zfs recv -duF tpool < test.zfs > Does the problem reproduce if you are not doing a "blow away" receive, e.g. "zfs recv -du tpool/test"? > 4. zpool list -p -o free zfs 5 > > With this commit present, free reduces every 5 seconds until > the pool is out of space. Without it after at most 3 reductions > the pool settles and no further free space reduction is seen. > > I've also found that creating the pool without async_destroy > enabled also prevents the issue. > > An image that shows the final result of the leak can be found > here: > http://www.ijs.si/usr/mark/bsd/ > > On FreeBSD this image stalls on import unless imported readonly. > Once imported I used the following to create the test image > used above: > zfs send -R zfs/ROOT@auto-2014-09-19_22.30 >test.zfs > > Copying in the zfs illumos list to get more eyeballs given it > seems to be a quite serious issue. > > Regards > Steve > > > ------------------------------------------- > illumos-zfs > Archives: https://www.listbox.com/member/archive/182191/=now > RSS Feed: https://www.listbox.com/member/archive/rss/182191/ > 21635000-ebd1d460 > Modify Your Subscription: https://www.listbox.com/ > member/?member_id=21635000&id_secret=21635000-73dc201a > Powered by Listbox: http://www.listbox.com >
_______________________________________________ developer mailing list developer@open-zfs.org http://lists.open-zfs.org/mailman/listinfo/developer