Hi, all, Problem solved in a way that I cannot have a convincing theory.
Libsbms.so and libsbutil.so reference each other, and a few .o objects for unknown reason conflict in mysterious ways. By switching link objects order and different combination of the shared object's linked objects. I found a correct combination and link order that both C++ and Java work fine. Gcc 4.4 behave differently with gcc 4.8 from link objects order point of view. Pure C program is fine, but for C++ , object need the runtime to invoke its constructor first. Behavior change here. For global objects, the sequence in which the object initialized is same as the order in the ld command if using gcc4.8. But gcc 4.4 seems does a better job to find a correct initialize sequence. If object A and B have dependency, program built with gcc4.4 will automatically find the correct init sequence, but gcc 4.8 totally rely on the order you link it... Trafodion has some very complex class hierarchy, so when object A init, it needs its parent C++ object constructor to be invoked first, since its parent constructor will malloc a buffer. If the sequence is wrong, objects will access a null pointer. But it still cannot explain the whole story, why Java System.loadLibrary() crash but C dlopen() work fine. But I believe the root cause is something changes in the gcc 4.8 about link objects order. And good news is it works although I don't have a perfect explanation... Maybe someone remember something :-) ? I will keep researching this. Thanks, Ming -----邮件原件----- 发件人: Liu, Ming (Ming) [mailto:ming....@esgyn.cn] 发送时间: 2016年3月12日 1:16 收件人: dev@trafodion.incubator.apache.org 主题: using gcc 4.8 build trafodion , a strange issue Hi, all, Our team start to investigate and work on the task to use gcc 4.8 to build Trafodion. We are now blocking on a strange issue. After modifying something, we can build successfully and all core components of Trafodion start well except for DCS. We can use sqlci to do some simple test. But DCS crash, so the system still not available. DCS master are all java code, it crashed when it try to load a native shared object. And my question is about this. Three shared objects cause problem here. libsbms.so, libsbutil.so , and libjdbcT2.so libjdbcT2 requires both libsbms.so and libsbutil.so DCS master core dump when it wants: System.loadLibrary("jdbcT2"); I narrow down the issue by try this java code: System.loadLibrary("sbms"); It crash too, so root cause is not in jdbcT2, but in sbms. jdbcT2 just linke with sbms. After various tries, I found out if I remove the sbutil from the Makefile of sbms, then that java load library will not crash. However, jdbcT2 needs both sbms and sbutil, So when load jdbcT2, it still crash. That means, if we link sbms and sbutil together java's System.loadLibrary() will crash: -------------------------------------------------------------------------------- # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007fe0bac355f7, pid=3785, tid=140603191486208 # # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 1.7.0_79-b15) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [libdl.so.2+0x15f7] _dlerror_run+0x37 # # Core dump written. Default location: /home/liuliumi/core or core.3785 ------------------------------------------------------------------------------------------------------ But if I use a C code dlopen("libjdbcT2.so", RDTL_NOW); it works very well. I have been googling and testing for a while, but run out of ideas. I even download openjdk source code and read the loadLibrary implementation, to me its core is simply a dlopen. So I run out of idea what I can test next, does anyone have any good suggestions ? What could be possible reason of this? C dlopen works fine, but java's loadlibrary crash? Thanks, Ming