Thanks much Aurelien for the analysis!

Just had a look, and there are several things to note:
* the cpuid calls occur from OGDF, and the latest snapshot still has
  the same code
* the variables set using cpuid info are never used in the OGDF subset
  shipped with tulip
* the code calling cpuid is inside "#if !defined(OGDF_DLL) ||
  !defined(OGDF_SYSTEM_UNIX)".  OGDF_DLL is defined only for win32 for
  some reason, and strangely OGDF_SYSTEM_UNIX is not: we're apparently
  just using that useless faulty code by mistake...

But setting OGDF_DLL causes other errors in basic.cpp: the case where
OGDF_DLL is defined but not OGDF_SYSTEM_WINDOWS is obviously missing
the closing brace for 'extern "C"' clause - this problem is fixed in
the latest OGDF snapshot (by removing this useless extra clause).

After all this, it finally builds, and as expected does not crash any
more.

On Tue, May 06, 2014 at 10:25:45PM +0200, Aurelien Jarno wrote:
> reassign 723982 tulip
> thanks
> 
> On Sat, Feb 01, 2014 at 02:27:49PM +0100, Yann Dirson wrote:
> > [resend with bugs CC'd]
> > 
> > Hello,
> > 
> > Context:
> > 
> > http://bugs.debian.org/734318 - tulip: [amd64] segfaults inside dlopen when 
> > loading plugins
> > http://bugs.debian.org/723982 - dlopen: segfaults right inside call_init
> > 
> > What we get here is a number of plugins that when dlopen'd cause an
> > obscure segfault inside libc code.  Upstream (CC'd) say they have
> > heard of such problems (on Ubuntu 13.10), that people have worked
> > around by downgrading the compiler.
> > 
> > This sounds like either a toolchain regression, or possibly some
> > edge-case that worked by chance with old compilers and now fail.
> 
> This is exactly that the bug is in tulip and up to know it worked only by 
> chance on x86_64. The segfault occurs in dl-init.c when call_init is
> calling all the init functions from DT_INIT_ARRAY. This is done in C by
> this code:
> 
> |      addrs = (ElfW(Addr) *) (init_array->d_un.d_ptr + l->l_addr);
> |      for (j = 0; j < jm; ++j)
> |        ((init_t) addrs[j]) (argc, argv, env);
> 
> which is translated in assembly code into:
> 
> |    0x00007ffff7deb926 <+134>:   lea    0x8(%rbx,%rax,8),%r14
> |    0x00007ffff7deb92b <+139>:   nopl   0x0(%rax,%rax,1)
> |    0x00007ffff7deb930 <+144>:   mov    %r13,%rdx
> |    0x00007ffff7deb933 <+147>:   mov    %r12,%rsi
> |    0x00007ffff7deb936 <+150>:   mov    %ebp,%edi
> |    0x00007ffff7deb938 <+152>:   callq  *(%rbx)
> |    0x00007ffff7deb93a <+154>:   add    $0x8,%rbx
> |    0x00007ffff7deb93e <+158>:   cmp    %r14,%rbx
> |    0x00007ffff7deb941 <+161>:   jne    0x7ffff7deb930 <call_init+144>
> |    0x00007ffff7deb943 <+163>:   pop    %rbx
> |    0x00007ffff7deb944 <+164>:   pop    %rbp
> |    0x00007ffff7deb945 <+165>:   pop    %r12
> |    0x00007ffff7deb947 <+167>:   pop    %r13
> |    0x00007ffff7deb949 <+169>:   pop    %r14
> |    0x00007ffff7deb94b <+171>:   retq
> 
> 
> As you can see the value of addrs is stored in %rbx and is incremented
> by 8 at each loop. The segfault occurs at address 0x00007ffff7deb938
> when trying to dereference %rbx. When it happens, %rbx has its upper
> 32 bits clobbered and thus point to the lower 32-bit of addrs[j].
> 
> Tracing that with GDB, it appeared %rbx is clobbered in the System::init
> constructor from tulip. This code probes among other things uses the
> CPUID instruction using assembly code:
> 
> |        __asm__ __volatile__ ("xchgl    %%ebx,%0\n\t"
> |                                                "cpuid  \n\t"
> |                                                "xchgl  %%ebx,%0\n\t"
> |                                                : "+r" (b), "=a" (a), "=c" 
> (c), "=d" (d)
> |                                                : "1" (infoType), "2" (c));
> 
> As you can see %ebx is saved with xchgl before the %cpuid instruction
> and restored after the same way. While that works correctly on x86, on
> x86_64 the 32 upper bits get zeroed. BOOM !
> 
> I would suggest to use <cpuid.h> (which is available since GCC 4.4)
> instead of this buggy assembly code to probe the CPU. In the meantime I
> am reassigning the bug to tulip.
> 
> Aurelien
> 
> -- 
> Aurelien Jarno                          GPG: 4096R/1DDD8C9B
> aurel...@aurel32.net                 http://www.aurel32.net


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to