Re: DBD::Oracle - execute_array core dumps intermittently [SOLVED]
Scott T. Hildreth wrote: On Fri, 2008-03-07 at 16:36 -0600, Scott T. Hildreth wrote: This seems to be a threading error with the linux kernel version. I am running this process on newer kernels (2.6.22.x) and the error never occurs. We also are experiencing a lot the "Futex WAIT" issues with Oracle and the 2.6.20 kernels. The kernel upgrade didn't solve the problem. Since the process didn't crash on some of our servers and not the others, I narrowed down the difference in the servers. I concluded that all the SuSE Enterprise 10 servers had the problem and the crash only occurred when execute_array() method was used. All of our servers have Oracle 10.2.0.3 so there wasn't a difference there and it didn't seem to matter if DBD::Oracle was 1.19 or 1.20. We basically decided that this was race problem with the threads, especially since it was an intermittent problem. I decided to compile a DBD::Oracle with debugging symbols, hopping I would get better info from the core file and gdb. When I was running perl Makefile.PL a message appeared that I often ignored. WARNING: If you have problems you may need to rebuild perl with threading enabled. I build our own Perl in /usr/local/ and leave the vendor Perl alone. I never compile with threads, since we have not found a need for them, yet. So I used the /usr/bin/perl, which is always compiled with threading, and the process stopped crashing. So the WARNING never applied until now. I guess I will start building a threaded Perl on our SuSE Enterprise servers from now on. This seems to fixed the problem (knocking on wood). I thought would share my findings, just in case someone else runs into this same situation. Save yourself time, read the WARNINGS. :-) Any ideas on why array processing would cause this to occur? Did I just get lucky and hit the right scenario for this to happen? Just curious. Thanks. As far as I recall, Oracle client libraries are built with a thread-safe option -pthread or whatever option the compiler needs for the platform. I seem to remember that on Linux I could build code asking for thread-safe and mix it with libraries which were not built this way and the linker did not complain. The problem (although I'm not saying it still exists) is that when built thread-safe some structures in C header files changed size (one of them was something to do with longjmp I think). So if you mixed thread-safe code with a library that was not built that way they had different ideas of the longjmp structure and that could lead to all sorts of apparently random seg faults etc. On Linux my company to this day distributes ODBC drivers and the unixODBC driver manager built thread-safe and non-thread-safe for this very reason as we've no idea how any app they use was built. A sure sign to look for is: ldd libclntsh.so.10.1 linux-gate.so.1 => (0x00598000) libnnz10.so => /usr/lib/oracle/xe/app/oracle/product/10.2.0/server/lib/libnnz10.so (0x00111000) libdl.so.2 => /lib/libdl.so.2 (0x0032) libm.so.6 => /lib/libm.so.6 (0x00324000) libpthread.so.0 => /lib/libpthread.so.0 (0x00349000) libnsl.so.1 => /lib/libnsl.so.1 (0x0035d000) libc.so.6 => /lib/libc.so.6 (0x00373000) /lib/ld-linux.so.2 (0x00599000) Mix code dependent on libpthread with code which isn't and they were probably compiled with incompatible compiler options. Sounds like this may have been your problem. Martin -- Martin J. Evans Easysoft Limited http://www.easysoft.com Thanks for listening. On Fri, 2008-03-07 at 11:25 -0600, Scott T. Hildreth wrote: Should have posted to users not dev. This is really a bizarre problem. I can get it to fail about every fifth iteration otherwise the process works. I ran it from another server connect to the same database and it will intermittently fail. I run it from a third sever and I can't get it to core dump. All 3 servers have the same kernel & Perl versions. I've tried recompiling Perl, DBI, DBD::Oracle, still no luck. I created a test case, which uses execute_array and of course I can't get it to core dump. If anyone has any ideas on what might be going on here, I would love to hear them! Thanks STH On Wed, 2008-03-05 at 15:21 -0600, Scott T. Hildreth wrote: I am not sure how to describe this, my co-worker will run his process and get a core dump (I pasted the back trace below) and then run the process again with no core dumps. Sometimes it will core dump several times in a row and then the next run it finishes fine. I ran the process with DBI_TRACE=9 and this is what shows up at the end of the log, 1 -> execute_for_fetch for DBD::Oracle::st (DBI::st=HASH(0xN)~INNER CODE(0xN) undef) ora_st_execute_array UPDATE count=10 (ARRAY(0xN) undef undef)... OCIBi
Re: DBD::Oracle - execute_array core dumps intermittently [SOLVED]
On Fri, 2008-03-07 at 16:36 -0600, Scott T. Hildreth wrote: > This seems to be a threading error with the linux kernel version. > I am running this process on newer kernels (2.6.22.x) and the error > never occurs. We also are experiencing a lot the "Futex WAIT" issues > with Oracle and the 2.6.20 kernels. The kernel upgrade didn't solve the problem. Since the process didn't crash on some of our servers and not the others, I narrowed down the difference in the servers. I concluded that all the SuSE Enterprise 10 servers had the problem and the crash only occurred when execute_array() method was used. All of our servers have Oracle 10.2.0.3 so there wasn't a difference there and it didn't seem to matter if DBD::Oracle was 1.19 or 1.20. We basically decided that this was race problem with the threads, especially since it was an intermittent problem. I decided to compile a DBD::Oracle with debugging symbols, hopping I would get better info from the core file and gdb. When I was running perl Makefile.PL a message appeared that I often ignored. WARNING: If you have problems you may need to rebuild perl with threading enabled. I build our own Perl in /usr/local/ and leave the vendor Perl alone. I never compile with threads, since we have not found a need for them, yet. So I used the /usr/bin/perl, which is always compiled with threading, and the process stopped crashing. So the WARNING never applied until now. I guess I will start building a threaded Perl on our SuSE Enterprise servers from now on. This seems to fixed the problem (knocking on wood). I thought would share my findings, just in case someone else runs into this same situation. Save yourself time, read the WARNINGS. :-) Any ideas on why array processing would cause this to occur? Did I just get lucky and hit the right scenario for this to happen? Just curious. Thanks. > > Thanks for listening. > > On Fri, 2008-03-07 at 11:25 -0600, Scott T. Hildreth wrote: > > Should have posted to users not dev. This is really a bizarre problem. > > I can get it to fail about every fifth iteration otherwise the process > > works. I ran it from another server connect to the same database and > > it will intermittently fail. I run it from a third sever and I can't > > get it to core dump. All 3 servers have the same kernel & Perl > > versions. I've tried recompiling Perl, DBI, DBD::Oracle, still no luck. > > I created a test case, which uses execute_array and of course I can't > > get it to core dump. If anyone has any ideas on what might be going on > > here, I would love to hear them! > > > > Thanks > > STH > > > > On Wed, 2008-03-05 at 15:21 -0600, Scott T. Hildreth wrote: > > > I am not sure how to describe this, my co-worker will run his process and > > > get a core dump > > > (I pasted the back trace below) and then run the process again with no > > > core dumps. Sometimes > > > it will core dump several times in a row and then the next run it > > > finishes fine. I ran the process > > > with DBI_TRACE=9 and this is what shows up at the end of the log, > > > > > > > > > 1 -> execute_for_fetch for DBD::Oracle::st (DBI::st=HASH(0xN)~INNER > > > CODE(0xN) undef) > > > ora_st_execute_array UPDATE count=10 (ARRAY(0xN) undef undef)... > > > > > > OCIBindByName(112df38,1132188,10a9138,":p1",placeh_len=3,value_p=0,value_sz=-1517274788,dty=1,indp=0,alenp=0,rcodep=0,maxarr_len=0,curelep=0 > > > (*=0),mode=2)=ERROR > > > OCIErrorGet(10a9138,1,"",7fff058d684c,"ORA-02005: implicit > > > (-1) length not valid for this bind or define datatype > > > ",1024,2)=SUCCESS > > > OCIErrorGet after OCIBindByName (er1:ok): -1, 2005: ORA-02005: > > > implicit (-1) length not valid for this bind or define datatype > > > > > > OCIErrorGet(10a9138,2,"",7fff058d684c,"ORA-02005: implicit > > > (-1) length not valid for this bind or define datatype > > > ",1024,2)=NO_DATA > > > > > > At first I thought it was a 32bit library with a 64bit Perl problem, but > > > Oracle.so & Perl are both linked > > > with the correct 64 bit libs. The Oracle client is 10.2.0.3 and DBI > > > versions are, > > > > > > Perl: 5.008008(x86_64-linux) > > > OS : linux (2.6.20.19) > > > DBI : 1.602 > > > DBD::mysql : 4.005 > > > DBD::Sponge : 12.010002 > > > DBD::SQLite : 1.13 > > > DBD::Proxy : 0.2004 > > > DBD::Oracle : 1.20 > > > DBD::Multiplex : 2.04 > > > DBD::Gofer : 0.010103 > > > DBD::File : 0.35 > > > DBD::ExampleP : 12.010007 > > > DBD::DBM: 0.03 > > > > > > I am going to try to isolate a small test case, but right now I wanted to > > > post what I > > > have found so far. > > > > > > Thanks, > > >STH > > > > > > ## Back Trace > >