Reviewed by: Prakash Surya <prakash.su...@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimi...@delphix.com>

A recent test failed while running ztest, and produced the following result:

05/01 10:25:22 ztest -VVVVV -m 2 -r 0 -R 1 -v 2 -a 9 -T 43 -P 11 -s 128m -f 
/var/tmp/os-ztest
zloop finished, 1 crashes found
running: cat ztest.cores
core @ ./zloop-180501-093007/core:
debugging core file of ztest (64-bit) from zloop-2472-bc7947b1-0.dcenter
file: /usr/bin/amd64/ztest
initial argv: /usr/bin/amd64/ztest
threading model: native threads
status: process terminated by SIGABRT (Abort), pid=7585 uid=1025 code=-1
libc.so.1`_lwp_kill+0xa()
libc.so.1`raise+0x20(6)
libumem.so.1`umem_do_abort+0x44()
0xfffffd7ffe1583f5()
libumem.so.1`umem_error+0x1aa(7, 4599e8, 4a2feb0)
libumem.so.1`umem_free+0x11b(4a2feb0, 2e)
libzpool.so.1`log_internal+0xd2(4656070, fffffd7ff9bcd2d0, 1d42000, e93c700,
fffffd7ff9bd280e, fffffd7fd6738e20)
libzpool.so.1`spa_history_log_internal+0x162(1d42000, fffffd7ff9bcd2d0, 0,
fffffd7ff9bd280e)
libzpool.so.1`spa_async_thread+0x318(1d42000)
libc.so.1`_thrp_setup+0x8a(fffffd7ffdc0c240)
libc.so.1`_lwp_start()
It looks like vsnprintf(NULL, 0) returned a shorter length than we actually 
wrote. Incidentally, we should probably
implement a kmem_vasprintf(), like kmem_asprintf(), and use it here rather than 
rolling our own.

::umem_status

Status: ready and active
Concurrency: 4
Logs: transaction=64k content=64k (inactive)
Message buffer:
umem allocator: bad free: free size (46) != alloc size (42)
buffer=4a2feb0 bufctl=4a310e0 cache: umem_alloc_48
previous transaction on buffer 4a2feb0:
thread=532 time=T-0.000005259 slab=49f0c40 cache: umem_alloc_48
libumem.so.1'umem_cache_alloc_debug+0xfd
libumem.so.1'umem_cache_alloc+0xb3
libumem.so.1'umem_alloc+0x64
libzpool.so.1'log_internal+0x97
libzpool.so.1'spa_history_log_internal+0x162
libzpool.so.1'spa_async_thread+0x318
libc.so.1'_thrp_setup+0x8a
libc.so.1'_lwp_start+0x0
umem: heap corruption detected
stack trace:
libumem.so.1'umem_err_recoverable+0xcd
libumem.so.1'umem_error+0x1aa
libumem.so.1'umem_free+0x11b
libzpool.so.1'log_internal+0xd2
libzpool.so.1'spa_history_log_internal+0x162
libzpool.so.1'spa_async_thread+0x318
libc.so.1'_thrp_setup+0x8a
libc.so.1'_lwp_start+0x0

fffffd7ff9bd280e/s

0xfffffd7ff9bd280e: pool '%s' size: %llu(+%llu)

4a2feb0/s

0x4a2feb0: pool 'ztest_tmp' size: 2281701376(+134217728)

the calling code is:

uint64_t old_space, new_space;
...
            spa_history_log_internal(spa, "vdev online", NULL,
                "pool '%s' size: %llu(+%llu)",
                spa_name(spa), new_space, new_space - old_space);

It looks like spa_name() changed while we were running, between the two calls 
to vsnprintf(). It looks like this
can happen when ztest_spa_rename() calls spa_rename(). I'm not sure what the 
point of this is, since spa_rename()
appears to be dead code, aside from the test case. The obvious way to fix this 
would be for everyone using spa_name
to hold the spa_namespace_lock, but that is probably impractical. We should 
consider removing spa_rename() instead.

Upstream bug: DLPX-58283
You can view, comment on, or merge this pull request online at:

  https://github.com/openzfs/openzfs/pull/676

-- Commit Summary --

  * 9681 ztest failure in spa_history_log_internal due to spa_rename()

-- File Changes --

    M usr/src/cmd/ztest/ztest.c (74)
    M usr/src/uts/common/fs/zfs/spa_misc.c (53)
    M usr/src/uts/common/fs/zfs/sys/spa.h (1)

-- Patch Links --

https://github.com/openzfs/openzfs/pull/676.patch
https://github.com/openzfs/openzfs/pull/676.diff

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/676

------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/Td1977e8acd30c75f-M4064a76bc659cc4d41fa807b
Delivery options: https://openzfs.topicbox.com/groups/developer/subscription

Reply via email to