Hi, all,

Problem solved in a way that I cannot have a convincing theory.

Libsbms.so and libsbutil.so reference each other, and a few .o objects for 
unknown reason conflict in mysterious ways. By switching link objects order and 
different combination of the shared object's linked objects. I found a correct 
combination and link order that both C++ and Java work fine.

Gcc 4.4 behave differently with gcc 4.8 from link objects order point of view.
Pure C program is fine, but for C++ , object need the runtime to invoke its 
constructor first. Behavior change here. For global objects, the sequence in 
which the object initialized is same as the order in the ld command if using 
gcc4.8. But gcc 4.4 seems does a better job to find a correct initialize 
sequence. If object A and B have dependency, program built with gcc4.4 will 
automatically find the correct init sequence, but gcc 4.8 totally rely on the 
order you link it...

Trafodion has some very complex class hierarchy, so when object A init, it 
needs its parent C++ object constructor to be invoked first, since its parent 
constructor will malloc a buffer. If the sequence is wrong, objects will access 
a null pointer.

But it still cannot explain the whole story, why Java System.loadLibrary() 
crash but C dlopen() work fine. But I believe the root cause is something 
changes in the gcc 4.8 about link objects order. And good news is it works 
although I don't have a perfect explanation...

Maybe someone remember something :-) ? I will keep researching this.

Thanks,
Ming

-----邮件原件-----
发件人: Liu, Ming (Ming) [mailto:ming....@esgyn.cn] 
发送时间: 2016年3月12日 1:16
收件人: dev@trafodion.incubator.apache.org
主题: using gcc 4.8 build trafodion , a strange issue

Hi, all,

Our team start to investigate and work on the task to use gcc 4.8 to build 
Trafodion.
We are now blocking on a strange issue.

After modifying something, we can build successfully and all core components of 
Trafodion start well except for DCS. We can use sqlci to do some simple test. 
But DCS crash, so the system still not available.

DCS master are all java code, it crashed when it try to load a native shared 
object. And my question is about this.

Three shared objects cause problem here.
libsbms.so, libsbutil.so , and libjdbcT2.so
libjdbcT2 requires both libsbms.so and libsbutil.so

DCS master core dump when it wants:
System.loadLibrary("jdbcT2");

I narrow down the issue by try this java code:
System.loadLibrary("sbms");
It crash too, so root cause is not in jdbcT2, but in sbms. jdbcT2 just linke 
with sbms.
After various tries, I found out if I remove the sbutil from the Makefile of 
sbms, then that java load library will not crash.
However, jdbcT2 needs both sbms and sbutil, So when load jdbcT2, it still 
crash. That means, if we link sbms and sbutil together java's 
System.loadLibrary() will crash:
--------------------------------------------------------------------------------
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fe0bac355f7, pid=3785, tid=140603191486208 # # 
JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 1.7.0_79-b15) 
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode linux-amd64 
compressed oops) # Problematic frame:
# C  [libdl.so.2+0x15f7]  _dlerror_run+0x37 # # Core dump written. Default 
location: /home/liuliumi/core or core.3785
------------------------------------------------------------------------------------------------------

But if I use a C code
dlopen("libjdbcT2.so", RDTL_NOW);
it works very well. I have been googling and testing for a while, but run out 
of ideas.

I even download openjdk source code and read the loadLibrary implementation, to 
me its core is simply a dlopen.
So I run out of idea what I can test next, does anyone have any good 
suggestions ?
What could be possible reason of this? C dlopen works fine, but java's 
loadlibrary crash?

Thanks,
Ming

Reply via email to