Re: DBD::Oracle - execute_array core dumps intermittently [SOLVED]

2008-03-14 Thread Martin Evans

Scott T. Hildreth wrote:

On Fri, 2008-03-07 at 16:36 -0600, Scott T. Hildreth wrote:

This seems to be a threading error with the linux kernel version.
I am running this process on newer kernels (2.6.22.x) and the error
never occurs.  We also are experiencing a lot the Futex WAIT issues
with Oracle and the 2.6.20 kernels.


The kernel upgrade didn't solve the problem.  Since the process didn't
crash on some of our servers and not the others, I narrowed down the
difference in the servers.  I concluded that all the SuSE Enterprise 
10 servers had the problem and the crash only occurred when execute_array()
method was used.  All of our servers have Oracle 10.2.0.3 so there wasn't 
a difference there and it didn't seem to matter if DBD::Oracle was 1.19 or 
1.20.  We basically decided that this was race problem with the threads, 
especially since it was an intermittent problem.  I decided to compile a 
DBD::Oracle with debugging symbols, hopping I would get better info from
the core file and gdb.  When I was running perl Makefile.PL a message 
appeared that I often ignored.


WARNING: If you have problems you may need to rebuild perl with threading 
enabled.

I build our own Perl in /usr/local/ and leave the vendor Perl alone.  I 
never compile with threads, since we have not found a need for them, yet.

So I used the /usr/bin/perl, which is always compiled with threading, and
the process stopped crashing.   So the WARNING never applied until now. I 
guess I will start building a threaded Perl on our SuSE Enterprise servers

from now on.  This seems to fixed the problem (knocking on wood).  I thought
would share my findings, just in case someone else runs into this same 
situation.
Save yourself time, read the WARNINGS. :-)   


Any ideas on why array processing would cause this to occur?  Did I just get 
lucky
and hit the right scenario for this to happen? Just curious.

  Thanks.


As far as I recall, Oracle client libraries are built with a thread-safe 
option -pthread or whatever option the compiler needs for the platform. 
I seem to remember that on Linux I could build code asking for 
thread-safe and mix it with libraries which were not built this way and 
the linker did not complain. The problem (although I'm not saying it 
still exists) is that when built thread-safe some structures in C header 
files changed size (one of them was something to do with longjmp I 
think). So if you mixed thread-safe code with a library that was not 
built that way they had different ideas of the longjmp structure and 
that could lead to all sorts of apparently random seg faults etc.


On Linux my company to this day distributes ODBC drivers and the 
unixODBC driver manager built thread-safe and non-thread-safe for this 
very reason as we've no idea how any app they use was built.


A sure sign to look for is:

ldd libclntsh.so.10.1
linux-gate.so.1 =  (0x00598000)
libnnz10.so = 
/usr/lib/oracle/xe/app/oracle/product/10.2.0/server/lib/libnnz10.so 
(0x00111000)

libdl.so.2 = /lib/libdl.so.2 (0x0032)
libm.so.6 = /lib/libm.so.6 (0x00324000)
libpthread.so.0 = /lib/libpthread.so.0 (0x00349000)


libnsl.so.1 = /lib/libnsl.so.1 (0x0035d000)
libc.so.6 = /lib/libc.so.6 (0x00373000)
/lib/ld-linux.so.2 (0x00599000)

Mix code dependent on libpthread with code which isn't and they were 
probably compiled with incompatible compiler options.


Sounds like this may have been your problem.

Martin
--
Martin J. Evans
Easysoft Limited
http://www.easysoft.com


  Thanks for listening.

On Fri, 2008-03-07 at 11:25 -0600, Scott T. Hildreth wrote:

Should have posted to users not dev.  This is really a bizarre problem.
I can get it to fail about every fifth iteration otherwise the process
works.  I ran it from another server connect to the same database and 
it will intermittently fail.  I run it from a third sever and I can't 
get it to core dump.  All 3 servers have the same kernel  Perl

versions.  I've tried recompiling Perl, DBI, DBD::Oracle, still no luck.
I created a test case, which uses execute_array and of course I can't
get it to core dump.  If anyone has any ideas on what might be going on
here,  I would love to hear them!

  Thanks
 STH

On Wed, 2008-03-05 at 15:21 -0600, Scott T. Hildreth wrote:

I am not sure how to describe this, my co-worker will run his process and get a 
core dump
(I pasted the back trace below) and then run the process again with no core 
dumps.  Sometimes
it will core dump several times in a row and then the next run it finishes 
fine.  I ran the process
with DBI_TRACE=9 and this is what shows up at the end of the log,


1   - execute_for_fetch for DBD::Oracle::st (DBI::st=HASH(0xN)~INNER CODE(0xN) 
undef)
ora_st_execute_array UPDATE count=10 (ARRAY(0xN) undef undef)...


Re: DBD::Oracle - execute_array core dumps intermittently [SOLVED]

2008-03-13 Thread Scott T. Hildreth

On Fri, 2008-03-07 at 16:36 -0600, Scott T. Hildreth wrote:
 This seems to be a threading error with the linux kernel version.
 I am running this process on newer kernels (2.6.22.x) and the error
 never occurs.  We also are experiencing a lot the Futex WAIT issues
 with Oracle and the 2.6.20 kernels.

The kernel upgrade didn't solve the problem.  Since the process didn't
crash on some of our servers and not the others, I narrowed down the
difference in the servers.  I concluded that all the SuSE Enterprise 
10 servers had the problem and the crash only occurred when execute_array()
method was used.  All of our servers have Oracle 10.2.0.3 so there wasn't 
a difference there and it didn't seem to matter if DBD::Oracle was 1.19 or 
1.20.  We basically decided that this was race problem with the threads, 
especially since it was an intermittent problem.  I decided to compile a 
DBD::Oracle with debugging symbols, hopping I would get better info from
the core file and gdb.  When I was running perl Makefile.PL a message 
appeared that I often ignored.

WARNING: If you have problems you may need to rebuild perl with threading 
enabled.

I build our own Perl in /usr/local/ and leave the vendor Perl alone.  I 
never compile with threads, since we have not found a need for them, yet.
So I used the /usr/bin/perl, which is always compiled with threading, and
the process stopped crashing.   So the WARNING never applied until now. I 
guess I will start building a threaded Perl on our SuSE Enterprise servers
from now on.  This seems to fixed the problem (knocking on wood).  I thought
would share my findings, just in case someone else runs into this same 
situation.
Save yourself time, read the WARNINGS. :-)   

Any ideas on why array processing would cause this to occur?  Did I just get 
lucky
and hit the right scenario for this to happen? Just curious.

  Thanks.

 
   Thanks for listening.
 
 On Fri, 2008-03-07 at 11:25 -0600, Scott T. Hildreth wrote:
  Should have posted to users not dev.  This is really a bizarre problem.
  I can get it to fail about every fifth iteration otherwise the process
  works.  I ran it from another server connect to the same database and 
  it will intermittently fail.  I run it from a third sever and I can't 
  get it to core dump.  All 3 servers have the same kernel  Perl
  versions.  I've tried recompiling Perl, DBI, DBD::Oracle, still no luck.
  I created a test case, which uses execute_array and of course I can't
  get it to core dump.  If anyone has any ideas on what might be going on
  here,  I would love to hear them!
  
Thanks
   STH
  
  On Wed, 2008-03-05 at 15:21 -0600, Scott T. Hildreth wrote:
   I am not sure how to describe this, my co-worker will run his process and 
   get a core dump
   (I pasted the back trace below) and then run the process again with no 
   core dumps.  Sometimes
   it will core dump several times in a row and then the next run it 
   finishes fine.  I ran the process
   with DBI_TRACE=9 and this is what shows up at the end of the log,
   
   
   1   - execute_for_fetch for DBD::Oracle::st (DBI::st=HASH(0xN)~INNER 
   CODE(0xN) undef)
   ora_st_execute_array UPDATE count=10 (ARRAY(0xN) undef undef)...
   
   OCIBindByName(112df38,1132188,10a9138,:p1,placeh_len=3,value_p=0,value_sz=-1517274788,dty=1,indp=0,alenp=0,rcodep=0,maxarr_len=0,curelep=0
(*=0),mode=2)=ERROR
   OCIErrorGet(10a9138,1,NULL,7fff058d684c,ORA-02005: implicit 
   (-1) length not valid for this bind or define datatype
   ,1024,2)=SUCCESS
   OCIErrorGet after OCIBindByName (er1:ok): -1, 2005: ORA-02005: 
   implicit (-1) length not valid for this bind or define datatype
   
   OCIErrorGet(10a9138,2,NULL,7fff058d684c,ORA-02005: implicit 
   (-1) length not valid for this bind or define datatype
   ,1024,2)=NO_DATA
   
   At first I thought it was a 32bit library with a 64bit Perl problem, but 
   Oracle.so  Perl are both linked
   with the correct 64 bit libs.  The Oracle client is 10.2.0.3 and DBI 
   versions are,
   
 Perl: 5.008008(x86_64-linux)
 OS  : linux   (2.6.20.19)
 DBI : 1.602
 DBD::mysql  : 4.005
 DBD::Sponge : 12.010002
 DBD::SQLite : 1.13
 DBD::Proxy  : 0.2004
 DBD::Oracle : 1.20
 DBD::Multiplex  : 2.04
 DBD::Gofer  : 0.010103
 DBD::File   : 0.35
 DBD::ExampleP   : 12.010007
 DBD::DBM: 0.03
   
   I am going to try to isolate a small test case, but right now I wanted to 
   post what I 
   have found so far.
   
Thanks,
  STH
   
   ## Back Trace 
   #
   
   (gdb) bt
   #0  0x2b66ec7d9b95 in raise () from /lib64/libc.so.6
   #1  0x2b66ec7daf90 in abort () from /lib64/libc.so.6