On Mon, 26 Nov 2012, Konstantin Belousov wrote:

On Mon, Nov 26, 2012 at 06:31:34AM -0800, sig6247 wrote:

Just checked out r243529, this only happens when the kernel is compiled
by clang, and only on i386, either recompiling the kernel with gcc or
booting from a UFS root works fine. Is it a known problem?
It looks like that clang uses more stack than gcc, and zfs makes quite
deep call chains.

It would be a waste, generally, to increase the init process kernel
stack size only to pacify zfs. And I suspect that it would not help
in the similar situations when the same procedure initiated for non-root
mounts.

Or to pacify clang...

----------------------------------------------------------------------
WARNING: WITNESS option enabled, expect reduced performance.
Trying to mount root from zfs:zroot []...

Fatal double fault:
eip = 0xc0adc37d
esp = 0xc86bffc8
ebp = 0xc86c003c
cpuid = 1; apic id = 01
panic: double fault
cpuid = 1
KDB: enter: panic
[ thread pid 1 tid 100002 ]
Stopped at      kdb_enter+0x3d: movl    $0,kdb_why
db> bt
Tracing pid 1 tid 100002 td 0xc89efbc0
kdb_enter(c1064aa4,c1064aa4,c10b806f,c139e3b8,f5eacada,...) at kdb_enter+0x3d
panic(c10b806f,1,1,1,c86c003c,...) at panic+0x14b
dblfault_handler() at dblfault_handler+0xab
--- trap 0x17, eip = 0xc0adc37d, esp = 0xc86bffc8, ebp = 0xc86c003c ---
witness_checkorder(c1fd7508,9,c109df18,7fa,0,...) at witness_checkorder+0x37d
__mtx_lock_flags(c1fd7518,0,c109df18,7fa,c135d918,...) at __mtx_lock_flags+0x87
uma_zalloc_arg(c1fd66c0,0,1,4d3,c86c0110,...) at uma_zalloc_arg+0x605
vm_map_insert(c1fd508c,c13dfc10,bb1f000,0,cba1e000,...) at vm_map_insert+0x499
kmem_back(c1fd508c,cba1e000,1000,3,c86c01d4,...) at kmem_back+0x76
kmem_malloc(c1fd508c,1000,3) at kmem_malloc+0x250
page_alloc(c1fd1d80,1000,c86c020b,3,c1fd1d80,...) at page_alloc+0x27
keg_alloc_slab(103,4,c109df18,870,cb99ef6c,...) at keg_alloc_slab+0xc3
keg_fetch_slab(103,c1fd1d80,cb99ef6c,c1fc8230,c86c02c0,...) at 
keg_fetch_slab+0xe2
zone_fetch_slab(c1fd1d80,c1fd0480,103,826,0,...) at zone_fetch_slab+0x43
uma_zalloc_arg(c1fd1d80,0,102,3,2,...) at uma_zalloc_arg+0x3f2
malloc(4c,c1686100,102,c86c0388,c173d09a,...) at malloc+0xe9
zfs_kmem_alloc(4c,102,cb618820,c89efbc0,cb618820,...) at zfs_kmem_alloc+0x20
vdev_mirror_io_start(cb8218a0,10,cb8218a0,1,0,...) at vdev_mirror_io_start+0x14a
zio_vdev_io_start(cb8218a0,c89efbc0,0,cb8218a0,c86c0600,...) at 
zio_vdev_io_start+0x228
zio_execute(cb8218a0,cb618000,cba1b640,cb900000,400,...) at zio_execute+0x106
spa_load_verify_cb(cb618000,0,cba1b640,cb884b40,c86c0600,...) at 
spa_load_verify_cb+0x89
traverse_visitbp(cb884b40,cba1b640,c86c0600,c86c0ba0,0,...) at 
traverse_visitbp+0x29f
traverse_dnode(cb884b40,0,0,8b,0,...) at traverse_dnode+0x92
traverse_visitbp(cb884bb8,cba07200,c86c0890,cb884bf4,c16ce7e0,...) at 
traverse_visitbp+0xe47
traverse_visitbp(cb884bf4,cb9bf840,c86c0968,c86c0ba0,0,...) at 
traverse_visitbp+0xf32
traverse_dnode(cb884bf4,0,0,0,0,...) at traverse_dnode+0x92
traverse_visitbp(0,cb618398,c86c0b50,2,cb9f1c78,...) at traverse_visitbp+0x96d
traverse_impl(0,0,cb618398,3e1,0,...) at traverse_impl+0x268
traverse_pool(cb618000,3e1,0,d,c1727830,...) at traverse_pool+0x79
spa_load(0,1,c86c0ec4,1e,0,...) at spa_load+0x1dde
spa_load(0,0,c13d8c94,1,3,...) at spa_load+0x11a5
spa_load_best(0,ffffffff,ffffffff,1,c0adc395,...) at spa_load_best+0x71
spa_open_common(c17e0e1e,0,0,c86c1190,c16f5a1c,...) at spa_open_common+0x11a
spa_open(c86c1078,c86c1074,c17e0e1e,c135d918,c1fd7798,...) at spa_open+0x27
dsl_dir_open_spa(0,cb770030,c17e11b1,c86c11f8,c86c11f4,...) at 
dsl_dir_open_spa+0x6c
dsl_dataset_hold(cb770030,cb613800,c86c1240,cb613800,cb613800,...) at 
dsl_dataset_hold+0x3a
dsl_dataset_own(cb770030,0,cb613800,c86c1240,c1684e30,...) at 
dsl_dataset_own+0x21
dmu_objset_own(cb770030,2,1,cb613800,c86c1290,...) at dmu_objset_own+0x2a
zfsvfs_create(cb770030,c86c13ac,c17ee05d,681,0,...) at zfsvfs_create+0x4c
zfs_mount(cb78ed20,c17f411c,c9ff4600,c89cae80,0,...) at zfs_mount+0x42c
vfs_donmount(c89efbc0,4000,0,c86c1790,cb6c0800,...) at vfs_donmount+0xc6d
kernel_mount(cb7700b0,4000,0,0,1,...) at kernel_mount+0x6b
parse_mount(cb7700e0,c1194498,0,1,0,...) at parse_mount+0x606
vfs_mountroot(c13d95b0,4,c105c042,2bb,0,...) at vfs_mountroot+0x6cf
start_init(0,c86c1d08,c105e94c,3db,0,...) at start_init+0x6a
fork_exit(c0a42090,0,c86c1d08) at fork_exit+0x7f
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xc86c1d40, ebp = 0 ---
db>

43 deep (before the double fault) is disgusting, but even if clang has
broken stack alignment due to a wrong default and no
-mpreferred-stack-boundary to fix it, that's still only about 8*43
extra bytes (8 for the average extra stack to align to 16 bytes).
Probably zfs is also putting large data structures on the stack.

It would be useful if the stack trace printed the the stack pointer
on every function call, so that you could see how much stack each
function used.

All those ', ...' printed after 5 args show further apparent clang
lossage.  vfs_mountroot takes no args at all, but the above shows
ot taking 5 known args and further unknown args.  ddb's stack tracer
does limited parsing of the bytes in function's caller to guess the
number of args.  It usually guesses right for gcc.  It apparently
usually guesses wrong for clang.

I can't test this for i386 since ref10-i386 is down, but for amd64 the
parsing seems to be already too dificult for gcc and impossible for
clang: for the function call in `void bar(void); void foo(void) {
bar(); }', clang generates a tail call:

% test:                                   # @test
%       .cfi_startproc
% # BB#0:                                 # %entry
%       xorb    %al, %al
%       jmp     bar                     # TAILCALL
% .Ltmp0:

The bytes after the tail call are unparseable.  The bytes before the
tail call are difficult to parse, and AFAIR ddb doesn't look at them
(the xorb says that there are no xmm args, but that is of no interest
in the kernel).

Gcc generates:

% test:
% .LFB2:
%       subq    $8, %rsp
% .LCFI0:
%       movl    $0, %eax
%       call    bar
%       addq    $8, %rsp
%       ret

The addq $8,%rsp after the call looks like it is popping 1 arg and is
probably misguessed as such -- it is actually to clean up the stack
alignment before return.  Actually, this guess doesn't apply to amd64
-- the first few args are in registers where there number is almost
impossible to guess even for gcc if there are none on stack; it is
only when there are some on the stack that the number there can possibly
be guessed and the total number = #number in registers + #on stack.

With 1 arg: `void bar(int x); void foo(int) { bar(x); }', the generated
code is identical for both compilers.  So have any chance of guessing the
number of args for bar(), bytes in bar()'s caller's callers would have
to be checked, recursively.  This is too hard for ddb.  With tail calls,
the checking is unbounded.

Tail calls also break the stack trace, but save space so running out of
kernel stack is less likely :-).

It is arguable a bug to compile kernels with options that make stack tracing
impossible.  Even passing args in registers makes correct arg printing
impossible -- ddb would have to assume that there are always at least
3 (?) args on amd64.  The above shows 3 being printed for fork_exit(),
which is correct, and 3 for kmem_malloc(), which is correct, and none
for fork_trampoline(), which depends on ddb having special knowledge
of this and a few other low-level/asm functions.  All the others are
broken.

An very old bug in this area are that functions written in asm don't have
normal frames (at least on amd64 and i386) unless profiling is configured. They are mostly leaf functions that shouldn't trap, so this is not very
important for interactive use of ddb.  But it makes the args of a function
like memcpy() harder to find in interactive use, and breaks stack traces
if an interrupt or trap occurs in the functions.  Compilers may also
omit frame pointers for leaf functions, but this is easy to prevent.
The efficiency gains from not using frame pointers are small or negative,
at least on modern x86 since the frame pointer management can run in
parallel and is especially fast in leaf functions where it is not actually
used (except for debugging), so frame pointers should never be omitted if
there is any chance that the code needs to be debugged.  The chance is
almost 100% for kernels since they can be debugged if GDB or DDB or stack
tracing is configured or if panic dumps are enabled and occur.

A not so old bug in this area is that compiling with -O2 or just with
with -march tends to break things.  Even on old i386 kernels compiled
with -O -march=i386, I mostly see 5 args (but no ', ...').  For a just
a few functions, the calling sequence is "call foo; addl $4*N,%esp"
and ddb correctly guesses that this means N args.  Another sequence
is "call foo; movl %eax,%esi; addl $8,%esp".  ddb doesn't understand
this at all, and misguesses that there are 5 args.

Bruce
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to