On Tue, Oct 14, 2014 at 9:45 PM, Steven Hartland via illumos-zfs <
z...@lists.illumos.org> wrote:

>
> ----- Original Message ----- From: "Steven Hartland"
>
>> I've been investigating an issue for a user who was seeing
>> his pool import hang after upgrading on FreeBSD. After
>> digging around it turned out the issue was due to lack
>> of free space on the pool.
>>
>> As the pool imports it writes hence requiring space but the
>> pool has so little space this was failing. The IO being
>> a required IO it retries, but obviously fails again
>> resulting the the pool being suspened hence the hang.
>>
>> With the pool suspended during import it still holds the
>> pool lock so all attempts to query the status also hang,
>> which is one problem as the user can't tell why the hang
>> has occured.
>>
>> During the debugging I mounted the pool read only and
>> sent a copy to another empty pool, which resulted in ~1/2
>> capacity being recovered. This seemed odd but I dismissed
>> it at the time.
>>
>> The machine was then left, with the pool not being accessed,
>> however I just recieved an alert from our monitoring for
>> a pool failure. On looking I now see the new pool I created
>> with 2 write errors and no free space. So just having the
>> pool mounted, with no access happening, has managed to use
>> the remain 2GB on the 4GB pool.
>>
>> Has anyone seen this before or has any ideas what might
>> be going on?
>>
>> zdb -m -m -m -m <pool> shows allocation to transactions e.g.
>> metaslab    100   offset     c8000000   spacemap   1453   free        0
>>      segments          0   maxsize       0   freepct    0%
>> In-memory histogram:
>> On-disk histogram:              fragmentation 0
>> [     0] ALLOC: txg 417, pass 2
>> [     1]    A  range: 00c8000000-00c8001600  size: 001600
>> [     2] ALLOC: txg 417, pass 3
>> [     3]    A  range: 00c8001600-00c8003a00  size: 002400
>> [     4] ALLOC: txg 418, pass 2
>> [     5]    A  range: 00c8003a00-00c8005000  size: 001600
>> [     6] ALLOC: txg 418, pass 3
>> [     7]    A  range: 00c8005000-00c8006600  size: 001600
>> [     8] ALLOC: txg 419, pass 2
>> [     9]    A  range: 00c8006600-00c8007c00  size: 001600
>> [    10] ALLOC: txg 419, pass 3
>>
>> I tried destroying the pool and that hung, presumably due to
>> IO being suspended after the out of space errors.
>>
>
> After bisecting the kernel changes the commit which seems
> to be causing this is:
> https://svnweb.freebsd.org/base?view=revision&revision=268650
> https://github.com/freebsd/freebsd/commit/91643324a9009cb5fbc8c00544b778
> 1941f0d5d1
> which correlates to:
> https://github.com/illumos/illumos-gate/commit/
> 7fd05ac4dec0c343d2f68f310d3718b715ecfbaf
>
> I've checked the two make the same changes so there doesn't
> seem to have been a downstream merge issue, at least not on
> this specific commit.
>
> My test now consists of:
> 1. mdconfig -t malloc -s 4G -S 512
> 2. zpool create tpool md0
> 3. zfs recv -duF tpool < test.zfs
>

Does the problem reproduce if you are not doing a "blow away" receive, e.g.
"zfs recv -du tpool/test"?


> 4. zpool list -p -o free zfs 5
>
> With this commit present, free reduces every 5 seconds until
> the pool is out of space. Without it after at most 3 reductions
> the pool settles and no further free space reduction is seen.
>
> I've also found that creating the pool without async_destroy
> enabled also prevents the issue.
>
> An image that shows the final result of the leak can be found
> here:
> http://www.ijs.si/usr/mark/bsd/
>
> On FreeBSD this image stalls on import unless imported readonly.
> Once imported I used the following to create the test image
> used above:
> zfs send -R zfs/ROOT@auto-2014-09-19_22.30 >test.zfs
>
> Copying in the zfs illumos list to get more eyeballs given it
> seems to be a quite serious issue.
>
>    Regards
>    Steve
>
>
> -------------------------------------------
> illumos-zfs
> Archives: https://www.listbox.com/member/archive/182191/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/182191/
> 21635000-ebd1d460
> Modify Your Subscription: https://www.listbox.com/
> member/?member_id=21635000&id_secret=21635000-73dc201a
> Powered by Listbox: http://www.listbox.com
>
_______________________________________________
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to