Reviewed by: Prakash Surya <prakash.su...@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim.dimi...@delphix.com>
A recent test failed while running ztest, and produced the following result: 05/01 10:25:22 ztest -VVVVV -m 2 -r 0 -R 1 -v 2 -a 9 -T 43 -P 11 -s 128m -f /var/tmp/os-ztest zloop finished, 1 crashes found running: cat ztest.cores core @ ./zloop-180501-093007/core: debugging core file of ztest (64-bit) from zloop-2472-bc7947b1-0.dcenter file: /usr/bin/amd64/ztest initial argv: /usr/bin/amd64/ztest threading model: native threads status: process terminated by SIGABRT (Abort), pid=7585 uid=1025 code=-1 libc.so.1`_lwp_kill+0xa() libc.so.1`raise+0x20(6) libumem.so.1`umem_do_abort+0x44() 0xfffffd7ffe1583f5() libumem.so.1`umem_error+0x1aa(7, 4599e8, 4a2feb0) libumem.so.1`umem_free+0x11b(4a2feb0, 2e) libzpool.so.1`log_internal+0xd2(4656070, fffffd7ff9bcd2d0, 1d42000, e93c700, fffffd7ff9bd280e, fffffd7fd6738e20) libzpool.so.1`spa_history_log_internal+0x162(1d42000, fffffd7ff9bcd2d0, 0, fffffd7ff9bd280e) libzpool.so.1`spa_async_thread+0x318(1d42000) libc.so.1`_thrp_setup+0x8a(fffffd7ffdc0c240) libc.so.1`_lwp_start() It looks like vsnprintf(NULL, 0) returned a shorter length than we actually wrote. Incidentally, we should probably implement a kmem_vasprintf(), like kmem_asprintf(), and use it here rather than rolling our own. ::umem_status Status: ready and active Concurrency: 4 Logs: transaction=64k content=64k (inactive) Message buffer: umem allocator: bad free: free size (46) != alloc size (42) buffer=4a2feb0 bufctl=4a310e0 cache: umem_alloc_48 previous transaction on buffer 4a2feb0: thread=532 time=T-0.000005259 slab=49f0c40 cache: umem_alloc_48 libumem.so.1'umem_cache_alloc_debug+0xfd libumem.so.1'umem_cache_alloc+0xb3 libumem.so.1'umem_alloc+0x64 libzpool.so.1'log_internal+0x97 libzpool.so.1'spa_history_log_internal+0x162 libzpool.so.1'spa_async_thread+0x318 libc.so.1'_thrp_setup+0x8a libc.so.1'_lwp_start+0x0 umem: heap corruption detected stack trace: libumem.so.1'umem_err_recoverable+0xcd libumem.so.1'umem_error+0x1aa libumem.so.1'umem_free+0x11b libzpool.so.1'log_internal+0xd2 libzpool.so.1'spa_history_log_internal+0x162 libzpool.so.1'spa_async_thread+0x318 libc.so.1'_thrp_setup+0x8a libc.so.1'_lwp_start+0x0 fffffd7ff9bd280e/s 0xfffffd7ff9bd280e: pool '%s' size: %llu(+%llu) 4a2feb0/s 0x4a2feb0: pool 'ztest_tmp' size: 2281701376(+134217728) the calling code is: uint64_t old_space, new_space; ... spa_history_log_internal(spa, "vdev online", NULL, "pool '%s' size: %llu(+%llu)", spa_name(spa), new_space, new_space - old_space); It looks like spa_name() changed while we were running, between the two calls to vsnprintf(). It looks like this can happen when ztest_spa_rename() calls spa_rename(). I'm not sure what the point of this is, since spa_rename() appears to be dead code, aside from the test case. The obvious way to fix this would be for everyone using spa_name to hold the spa_namespace_lock, but that is probably impractical. We should consider removing spa_rename() instead. Upstream bug: DLPX-58283 You can view, comment on, or merge this pull request online at: https://github.com/openzfs/openzfs/pull/676 -- Commit Summary -- * 9681 ztest failure in spa_history_log_internal due to spa_rename() -- File Changes -- M usr/src/cmd/ztest/ztest.c (74) M usr/src/uts/common/fs/zfs/spa_misc.c (53) M usr/src/uts/common/fs/zfs/sys/spa.h (1) -- Patch Links -- https://github.com/openzfs/openzfs/pull/676.patch https://github.com/openzfs/openzfs/pull/676.diff -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/openzfs/openzfs/pull/676 ------------------------------------------ openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/Td1977e8acd30c75f-M4064a76bc659cc4d41fa807b Delivery options: https://openzfs.topicbox.com/groups/developer/subscription