http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59286

--- Comment #9 from Joost VandeVondele <Joost.VandeVondele at mat dot ethz.ch> 
---
(In reply to Kostya Serebryany from comment #8)
> Just insert more printfs everywhere you can :) 
> E.g. print everything around "s->link = s2" in StackDepotPut

hmm I can write a lot of printfs, but it is not very targetted..

However, I think I got a little further. For this kind of crash:
Getting 0x7fffed22e328
Following 0x7ffff04b8a80
Following 0x40027bd6cd50653b

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffed2f5100 (LWP 9760)]
0x00007fffebbae530 in __sanitizer::StackDepotGet (id=3978818064, size=0x0)
    at
../../../../gcc/libsanitizer/sanitizer_common/sanitizer_stackdepot.cc:197
197           if (s->id == id) {
(gdb) print s
$3 = (__sanitizer::StackDesc *) 0x40027bd6cd50653b

I have put a hardware breakpoint on this field
break __sanitizer::StackDepotGet
awatch ((StackDesc*)0x7ffff04b8a80)->link
(which is the link that gets corrupted).

This breakpoint gets activated from CP2K at:

[Switching to Thread 0x7fffed3ec100 (LWP 9804)]
Hardware access (read/write) watchpoint 13: ((StackDesc*)0x7ffff04b8a80)->link

Value = (PTR TO -> ( __sanitizer::StackDesc )) 0x40027bd6cd50653b
0x00007fffee8811fe in hfx_load_balance_methods::estimate_basic (p=...)
    at /data/vjoost/clean/cp2k/cp2k/src/../src/hfx_load_balance_methods.F:1829
1829           p1=p(1) ; p2=p(2) ; p3=p(3) ; p4=p(4)
(gdb) bt
#0  0x00007fffee8811fe in hfx_load_balance_methods::estimate_basic (p=...)
    at /data/vjoost/clean/cp2k/cp2k/src/../src/hfx_load_balance_methods.F:1829
#1  0x00007fffee881020 in hfx_load_balance_methods::cost_model (nsa=1, nsb=1,
nsc=1, nsd=1, npgfa=6, npgfb=6, 
    npgfc=6, npgfd=6, ratio=-0.3026277383289448, p1=..., p2=..., p3=...)
    at /data/vjoost/clean/cp2k/cp2k/src/../src/hfx_load_balance_methods.F:1817
#2  0x00007fffee87ee8c in hfx_load_balance_methods::estimate_block_cost
(natom=3, nkind=2, list_ij=..., 
    list_kl=..., set_list_ij=..., set_list_kl=..., iatom_start=1, iatom_end=1,
jatom_start=1, jatom_end=1, 
    katom_start=1, katom_end=1, latom_start=1, latom_end=1, particle_set=...,
coeffs_set=..., coeffs_kind=..., 
    is_assoc_atomic_block_global=..., do_periodic=.FALSE., kind_of=...,
basis_parameter=..., pmax_set=..., 
    pmax_atom=..., pmax_blocks=0, cell=0x7d3000012d80, do_p_screening=.FALSE.,
map_atom_to_kind_atom=..., 
    eval_type=1, log10_eps_schwarz=-10, log_2=0.3010299956639812,
coeffs_kind_max0=1.1049525569372649, 
    use_virial=.FALSE., atomic_pair_list=...)
    at /data/vjoost/clean/cp2k/cp2k/src/../src/hfx_load_balance_methods.F:2212

This is the 'correct' place for corruption, as this routine is only called for
those runs that segfault.

Potentially interesting is that this is also a routine that is somewhat special
in Fortran, i.e. a contained subroutine, which presumably is treated somewhat
special by the compiler (not sure about the C-like equivalent, maybe nested
functions or so ?)

Reply via email to