Scott T. Hildreth wrote:
On Fri, 2008-03-07 at 16:36 -0600, Scott T. Hildreth wrote:
This seems to be a threading error with the linux kernel version.
I am running this process on newer kernels (2.6.22.x) and the error
never occurs.  We also are experiencing a lot the "Futex WAIT" issues
with Oracle and the 2.6.20 kernels.

The kernel upgrade didn't solve the problem.  Since the process didn't
crash on some of our servers and not the others, I narrowed down the
difference in the servers. I concluded that all the SuSE Enterprise 10 servers had the problem and the crash only occurred when execute_array() method was used. All of our servers have Oracle 10.2.0.3 so there wasn't a difference there and it didn't seem to matter if DBD::Oracle was 1.19 or 1.20. We basically decided that this was race problem with the threads, especially since it was an intermittent problem. I decided to compile a DBD::Oracle with debugging symbols, hopping I would get better info from the core file and gdb. When I was running perl Makefile.PL a message appeared that I often ignored.

WARNING: If you have problems you may need to rebuild perl with threading 
enabled.

I build our own Perl in /usr/local/ and leave the vendor Perl alone. I never compile with threads, since we have not found a need for them, yet.
So I used the /usr/bin/perl, which is always compiled with threading, and
the process stopped crashing. So the WARNING never applied until now. I guess I will start building a threaded Perl on our SuSE Enterprise servers
from now on.  This seems to fixed the problem (knocking on wood).  I thought
would share my findings, just in case someone else runs into this same 
situation.
Save yourself time, read the WARNINGS. :-)
Any ideas on why array processing would cause this to occur?  Did I just get 
lucky
and hit the right scenario for this to happen? Just curious.

                  Thanks.

As far as I recall, Oracle client libraries are built with a thread-safe option -pthread or whatever option the compiler needs for the platform. I seem to remember that on Linux I could build code asking for thread-safe and mix it with libraries which were not built this way and the linker did not complain. The problem (although I'm not saying it still exists) is that when built thread-safe some structures in C header files changed size (one of them was something to do with longjmp I think). So if you mixed thread-safe code with a library that was not built that way they had different ideas of the longjmp structure and that could lead to all sorts of apparently random seg faults etc.

On Linux my company to this day distributes ODBC drivers and the unixODBC driver manager built thread-safe and non-thread-safe for this very reason as we've no idea how any app they use was built.

A sure sign to look for is:

ldd libclntsh.so.10.1
        linux-gate.so.1 =>  (0x00598000)
libnnz10.so => /usr/lib/oracle/xe/app/oracle/product/10.2.0/server/lib/libnnz10.so (0x00111000)
        libdl.so.2 => /lib/libdl.so.2 (0x00320000)
        libm.so.6 => /lib/libm.so.6 (0x00324000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x00349000)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

        libnsl.so.1 => /lib/libnsl.so.1 (0x0035d000)
        libc.so.6 => /lib/libc.so.6 (0x00373000)
        /lib/ld-linux.so.2 (0x00599000)

Mix code dependent on libpthread with code which isn't and they were probably compiled with incompatible compiler options.

Sounds like this may have been your problem.

Martin
--
Martin J. Evans
Easysoft Limited
http://www.easysoft.com

                  Thanks for listening.

On Fri, 2008-03-07 at 11:25 -0600, Scott T. Hildreth wrote:
Should have posted to users not dev.  This is really a bizarre problem.
I can get it to fail about every fifth iteration otherwise the process
works. I ran it from another server connect to the same database and it will intermittently fail. I run it from a third sever and I can't get it to core dump. All 3 servers have the same kernel & Perl
versions.  I've tried recompiling Perl, DBI, DBD::Oracle, still no luck.
I created a test case, which uses execute_array and of course I can't
get it to core dump.  If anyone has any ideas on what might be going on
here,  I would love to hear them!

                          Thanks
                             STH

On Wed, 2008-03-05 at 15:21 -0600, Scott T. Hildreth wrote:
I am not sure how to describe this, my co-worker will run his process and get a 
core dump
(I pasted the back trace below) and then run the process again with no core 
dumps.  Sometimes
it will core dump several times in a row and then the next run it finishes 
fine.  I ran the process
with DBI_TRACE=9 and this is what shows up at the end of the log,


1   -> execute_for_fetch for DBD::Oracle::st (DBI::st=HASH(0xN)~INNER CODE(0xN) 
undef)
    ora_st_execute_array UPDATE count=10 (ARRAY(0xN) undef undef)...
        
OCIBindByName(112df38,1132188,10a9138,":p1",placeh_len=3,value_p=0,value_sz=-1517274788,dty=1,indp=0,alenp=0,rcodep=0,maxarr_len=0,curelep=0
 (*=0),mode=2)=ERROR
        OCIErrorGet(10a9138,1,"<NULL>",7fff058d684c,"ORA-02005: implicit (-1) 
length not valid for this bind or define datatype
",1024,2)=SUCCESS
    OCIErrorGet after OCIBindByName (er1:ok): -1, 2005: ORA-02005: implicit 
(-1) length not valid for this bind or define datatype

        OCIErrorGet(10a9138,2,"<NULL>",7fff058d684c,"ORA-02005: implicit (-1) 
length not valid for this bind or define datatype
",1024,2)=NO_DATA

At first I thought it was a 32bit library with a 64bit Perl problem, but Oracle.so 
& Perl are both linked
with the correct 64 bit libs.  The Oracle client is 10.2.0.3 and DBI versions 
are,

  Perl            : 5.008008    (x86_64-linux)
  OS              : linux       (2.6.20.19)
  DBI             : 1.602
  DBD::mysql      : 4.005
  DBD::Sponge     : 12.010002
  DBD::SQLite     : 1.13
  DBD::Proxy      : 0.2004
  DBD::Oracle     : 1.20
  DBD::Multiplex  : 2.04
  DBD::Gofer      : 0.010103
  DBD::File       : 0.35
  DBD::ExampleP   : 12.010007
  DBD::DBM        : 0.03

I am going to try to isolate a small test case, but right now I wanted to post what I have found so far.

                             Thanks,
                               STH

############## Back Trace 
#############################################################

(gdb) bt
#0  0x00002b66ec7d9b95 in raise () from /lib64/libc.so.6
#1  0x00002b66ec7daf90 in abort () from /lib64/libc.so.6
#2  0x00002b66ec81035b in __libc_message () from /lib64/libc.so.6
#3  0x00002b66ec81534e in malloc_printerr () from /lib64/libc.so.6
#4  0x00002b66ec81695c in free () from /lib64/libc.so.6
#5  0x00002b66ef0ac102 in ora_st_execute_array () from 
/usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBD/Oracle/Oracle.so
#6  0x00002b66ef0a62bf in XS_DBD__Oracle__st_ora_execute_array ()
   from 
/usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBD/Oracle/Oracle.so
#7  0x000000000046bc47 in Perl_pp_entersub ()
#8  0x000000000046a29e in Perl_runops_standard ()
#9  0x000000000041e82d in Perl_call_sv ()
#10 0x00002b66ec9ee038 in XS_DBI_dispatch () from 
/usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBI/DBI.so
#11 0x000000000046bc47 in Perl_pp_entersub ()
#12 0x000000000046a29e in Perl_runops_standard ()
#13 0x000000000041e82d in Perl_call_sv ()
#14 0x00002b66ec9ee038 in XS_DBI_dispatch () from 
/usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBI/DBI.so
#15 0x000000000046bc47 in Perl_pp_entersub ()
#16 0x000000000046a29e in Perl_runops_standard ()
#17 0x000000000041f1d1 in perl_run ()
#18 0x000000000041ba2c in main ()

########################################################################################

*** glibc detected *** /usr/local/bin/perl: double free or corruption (!prev): 
0x0000000001163e10 ***
======= Backtrace: =========
/lib64/libc.so.6[0x2ad39584e34e]
/lib64/libc.so.6(__libc_free+0x6c)[0x2ad39584f95c]
/usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBD/Oracle/Oracle.so(ora_st_execute_array+0xfa4)[0x2ad3980e6e94]
/usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBD/Oracle/Oracle.so(XS_DBD__Oracle__st_ora_execute_array+0xef)[0x2ad3980e0f9f]
/usr/local/bin/perl(Perl_pp_entersub+0x6b7)[0x46bae7]
/usr/local/bin/perl(Perl_runops_standard+0xe)[0x46a13e]
/usr/local/bin/perl(Perl_call_sv+0x49d)[0x41e80d]
/usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBI/DBI.so(XS_DBI_dispatch+0x7a8)[0x2ad395a27068]
/usr/local/bin/perl(Perl_pp_entersub+0x6b7)[0x46bae7]
/usr/local/bin/perl(Perl_runops_standard+0xe)[0x46a13e]
/usr/local/bin/perl(Perl_call_sv+0x49d)[0x41e80d]
/usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBI/DBI.so(XS_DBI_dispatch+0x7a8)[0x2ad395a27068]
/usr/local/bin/perl(Perl_pp_entersub+0x6b7)[0x46bae7]
/usr/local/bin/perl(Perl_runops_standard+0xe)[0x46a13e]
/usr/local/bin/perl(perl_run+0x2c1)[0x41f1b1]
/usr/local/bin/perl(main+0xac)[0x41ba2c]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2ad395800154]
/usr/local/bin/perl[0x41b8e9]
======= Memory map: ========
00400000-004fc000 r-xp 00000000 08:02 427095                             
/usr/local/perl-5.8.8/bin/perl
005fb000-00601000 rw-p 000fb000 08:02 427095                             
/usr/local/perl-5.8.8/bin/perl
00601000-01183000 rw-p 00601000 00:00 0                                  [heap]
2ad39511b000-2ad395136000 r-xp 00000000 08:02 4295                       
/lib64/ld-2.4.so
2ad395136000-2ad395137000 rw-p 2ad395136000 00:00 0 2ad395147000-2ad395148000 rw-p 2ad395147000 00:00 0 2ad395235000-2ad395237000 rw-p 0001a000 08:02 4295 /lib64/ld-2.4.so
2ad395237000-2ad39524a000 r-xp 00000000 08:02 4168                       
/lib64/libnsl-2.4.so
2ad39524a000-2ad395349000 ---p 00013000 08:02 4168                       
/lib64/libnsl-2.4.so
2ad395349000-2ad39534b000 rw-p 00012000 08:02 4168                       
/lib64/libnsl-2.4.so
2ad39534b000-2ad39534d000 rw-p 2ad39534b000 00:00 0 2ad39534d000-2ad39534f000 r-xp 00000000 08:02 4163 /lib64/libdl-2.4.so
2ad39534f000-2ad39544f000 ---p 00002000 08:02 4163                       
/lib64/libdl-2.4.so
2ad39544f000-2ad395451000 rw-p 00002000 08:02 4163                       
/lib64/libdl-2.4.so
2ad395451000-2ad3954a5000 r-xp 00000000 08:02 4165                       
/lib64/libm-2.4.so
2ad3954a5000-2ad3955a4000 ---p 00054000 08:02 4165                       
/lib64/libm-2.4.so
2ad3955a4000-2ad3955a6000 rw-p 00053000 08:02 4165                       
/lib64/libm-2.4.so
2ad3955a6000-2ad3955a7000 rw-p 2ad3955a6000 00:00 0 2ad3955a7000-2ad3955b0000 r-xp 00000000 08:02 4161 /lib64/libcrypt-2.4.so
2ad3955b0000-2ad3956af000 ---p 00009000 08:02 4161                       
/lib64/libcrypt-2.4.so
2ad3956af000-2ad3956b2000 rw-p 00008000 08:02 4161                       
/lib64/libcrypt-2.4.so
2ad3956b2000-2ad3956e0000 rw-p 2ad3956b2000 00:00 0 2ad3956e0000-2ad3956e2000 r-xp 00000000 08:02 4191 /lib64/libutil-2.4.so
2ad3956e2000-2ad3957e1000 ---p 00002000 08:02 4191                       
/lib64/libutil-2.4.so
2ad3957e1000-2ad3957e3000 rw-p 00001000 08:02 4191                       
/lib64/libutil-2.4.so
2ad3957e3000-2ad39590a000 r-xp 00000000 08:02 4157                       
/lib64/libc-2.4.so
2ad39590a000-2ad395a0a000 ---p 00127000 08:02 4157                       
/lib64/libc-2.4.so
2ad395a0a000-2ad395a0d000 r--p 00127000 08:02 4157                       
/lib64/libc-2.4.so
2ad395a0d000-2ad395a0f000 rw-p 0012a000 08:02 4157                       
/lib64/libc-2.4.so
2ad395a0f000-2ad395a16000 rw-p 2ad395a0f000 00:00 0 2ad395a16000-2ad395a30000 r-xp 00000000 08:02 442510 /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBI/DBI.so 2ad395a30000-2ad395b30000 ---p 0001a000 08:02 442510 /usr/local/perl-5.8.8/lib/site_pzsh: 26245 abort (core dumped) DBI_TRACE=2


Reply via email to