Re: DBD::Oracle - execute_array core dumps intermittently [SOLVED]

2008-03-14 Thread Martin Evans

Scott T. Hildreth wrote:

On Fri, 2008-03-07 at 16:36 -0600, Scott T. Hildreth wrote:

This seems to be a threading error with the linux kernel version.
I am running this process on newer kernels (2.6.22.x) and the error
never occurs.  We also are experiencing a lot the "Futex WAIT" issues
with Oracle and the 2.6.20 kernels.


The kernel upgrade didn't solve the problem.  Since the process didn't
crash on some of our servers and not the others, I narrowed down the
difference in the servers.  I concluded that all the SuSE Enterprise 
10 servers had the problem and the crash only occurred when execute_array()
method was used.  All of our servers have Oracle 10.2.0.3 so there wasn't 
a difference there and it didn't seem to matter if DBD::Oracle was 1.19 or 
1.20.  We basically decided that this was race problem with the threads, 
especially since it was an intermittent problem.  I decided to compile a 
DBD::Oracle with debugging symbols, hopping I would get better info from
the core file and gdb.  When I was running perl Makefile.PL a message 
appeared that I often ignored.


WARNING: If you have problems you may need to rebuild perl with threading 
enabled.

I build our own Perl in /usr/local/ and leave the vendor Perl alone.  I 
never compile with threads, since we have not found a need for them, yet.

So I used the /usr/bin/perl, which is always compiled with threading, and
the process stopped crashing.   So the WARNING never applied until now. I 
guess I will start building a threaded Perl on our SuSE Enterprise servers

from now on.  This seems to fixed the problem (knocking on wood).  I thought
would share my findings, just in case someone else runs into this same 
situation.
Save yourself time, read the WARNINGS. :-)   


Any ideas on why array processing would cause this to occur?  Did I just get 
lucky
and hit the right scenario for this to happen? Just curious.

  Thanks.


As far as I recall, Oracle client libraries are built with a thread-safe 
option -pthread or whatever option the compiler needs for the platform. 
I seem to remember that on Linux I could build code asking for 
thread-safe and mix it with libraries which were not built this way and 
the linker did not complain. The problem (although I'm not saying it 
still exists) is that when built thread-safe some structures in C header 
files changed size (one of them was something to do with longjmp I 
think). So if you mixed thread-safe code with a library that was not 
built that way they had different ideas of the longjmp structure and 
that could lead to all sorts of apparently random seg faults etc.


On Linux my company to this day distributes ODBC drivers and the 
unixODBC driver manager built thread-safe and non-thread-safe for this 
very reason as we've no idea how any app they use was built.


A sure sign to look for is:

ldd libclntsh.so.10.1
linux-gate.so.1 =>  (0x00598000)
libnnz10.so => 
/usr/lib/oracle/xe/app/oracle/product/10.2.0/server/lib/libnnz10.so 
(0x00111000)

libdl.so.2 => /lib/libdl.so.2 (0x0032)
libm.so.6 => /lib/libm.so.6 (0x00324000)
libpthread.so.0 => /lib/libpthread.so.0 (0x00349000)


libnsl.so.1 => /lib/libnsl.so.1 (0x0035d000)
libc.so.6 => /lib/libc.so.6 (0x00373000)
/lib/ld-linux.so.2 (0x00599000)

Mix code dependent on libpthread with code which isn't and they were 
probably compiled with incompatible compiler options.


Sounds like this may have been your problem.

Martin
--
Martin J. Evans
Easysoft Limited
http://www.easysoft.com


  Thanks for listening.

On Fri, 2008-03-07 at 11:25 -0600, Scott T. Hildreth wrote:

Should have posted to users not dev.  This is really a bizarre problem.
I can get it to fail about every fifth iteration otherwise the process
works.  I ran it from another server connect to the same database and 
it will intermittently fail.  I run it from a third sever and I can't 
get it to core dump.  All 3 servers have the same kernel & Perl

versions.  I've tried recompiling Perl, DBI, DBD::Oracle, still no luck.
I created a test case, which uses execute_array and of course I can't
get it to core dump.  If anyone has any ideas on what might be going on
here,  I would love to hear them!

  Thanks
 STH

On Wed, 2008-03-05 at 15:21 -0600, Scott T. Hildreth wrote:

I am not sure how to describe this, my co-worker will run his process and get a 
core dump
(I pasted the back trace below) and then run the process again with no core 
dumps.  Sometimes
it will core dump several times in a row and then the next run it finishes 
fine.  I ran the process
with DBI_TRACE=9 and this is what shows up at the end of the log,


1   -> execute_for_fetch for DBD::Oracle::st (DBI::st=HASH(0xN)~INNER CODE(0xN) 
undef)
ora_st_execute_array UPDATE count=10 (ARRAY(0xN) undef undef)...

OCIBi

Re: DBD::Oracle - execute_array core dumps intermittently [SOLVED]

2008-03-13 Thread Scott T. Hildreth

On Fri, 2008-03-07 at 16:36 -0600, Scott T. Hildreth wrote:
> This seems to be a threading error with the linux kernel version.
> I am running this process on newer kernels (2.6.22.x) and the error
> never occurs.  We also are experiencing a lot the "Futex WAIT" issues
> with Oracle and the 2.6.20 kernels.

The kernel upgrade didn't solve the problem.  Since the process didn't
crash on some of our servers and not the others, I narrowed down the
difference in the servers.  I concluded that all the SuSE Enterprise 
10 servers had the problem and the crash only occurred when execute_array()
method was used.  All of our servers have Oracle 10.2.0.3 so there wasn't 
a difference there and it didn't seem to matter if DBD::Oracle was 1.19 or 
1.20.  We basically decided that this was race problem with the threads, 
especially since it was an intermittent problem.  I decided to compile a 
DBD::Oracle with debugging symbols, hopping I would get better info from
the core file and gdb.  When I was running perl Makefile.PL a message 
appeared that I often ignored.

WARNING: If you have problems you may need to rebuild perl with threading 
enabled.

I build our own Perl in /usr/local/ and leave the vendor Perl alone.  I 
never compile with threads, since we have not found a need for them, yet.
So I used the /usr/bin/perl, which is always compiled with threading, and
the process stopped crashing.   So the WARNING never applied until now. I 
guess I will start building a threaded Perl on our SuSE Enterprise servers
from now on.  This seems to fixed the problem (knocking on wood).  I thought
would share my findings, just in case someone else runs into this same 
situation.
Save yourself time, read the WARNINGS. :-)   

Any ideas on why array processing would cause this to occur?  Did I just get 
lucky
and hit the right scenario for this to happen? Just curious.

  Thanks.

> 
>   Thanks for listening.
> 
> On Fri, 2008-03-07 at 11:25 -0600, Scott T. Hildreth wrote:
> > Should have posted to users not dev.  This is really a bizarre problem.
> > I can get it to fail about every fifth iteration otherwise the process
> > works.  I ran it from another server connect to the same database and 
> > it will intermittently fail.  I run it from a third sever and I can't 
> > get it to core dump.  All 3 servers have the same kernel & Perl
> > versions.  I've tried recompiling Perl, DBI, DBD::Oracle, still no luck.
> > I created a test case, which uses execute_array and of course I can't
> > get it to core dump.  If anyone has any ideas on what might be going on
> > here,  I would love to hear them!
> > 
> >   Thanks
> >  STH
> > 
> > On Wed, 2008-03-05 at 15:21 -0600, Scott T. Hildreth wrote:
> > > I am not sure how to describe this, my co-worker will run his process and 
> > > get a core dump
> > > (I pasted the back trace below) and then run the process again with no 
> > > core dumps.  Sometimes
> > > it will core dump several times in a row and then the next run it 
> > > finishes fine.  I ran the process
> > > with DBI_TRACE=9 and this is what shows up at the end of the log,
> > > 
> > > 
> > > 1   -> execute_for_fetch for DBD::Oracle::st (DBI::st=HASH(0xN)~INNER 
> > > CODE(0xN) undef)
> > > ora_st_execute_array UPDATE count=10 (ARRAY(0xN) undef undef)...
> > > 
> > > OCIBindByName(112df38,1132188,10a9138,":p1",placeh_len=3,value_p=0,value_sz=-1517274788,dty=1,indp=0,alenp=0,rcodep=0,maxarr_len=0,curelep=0
> > >  (*=0),mode=2)=ERROR
> > > OCIErrorGet(10a9138,1,"",7fff058d684c,"ORA-02005: implicit 
> > > (-1) length not valid for this bind or define datatype
> > > ",1024,2)=SUCCESS
> > > OCIErrorGet after OCIBindByName (er1:ok): -1, 2005: ORA-02005: 
> > > implicit (-1) length not valid for this bind or define datatype
> > > 
> > > OCIErrorGet(10a9138,2,"",7fff058d684c,"ORA-02005: implicit 
> > > (-1) length not valid for this bind or define datatype
> > > ",1024,2)=NO_DATA
> > > 
> > > At first I thought it was a 32bit library with a 64bit Perl problem, but 
> > > Oracle.so & Perl are both linked
> > > with the correct 64 bit libs.  The Oracle client is 10.2.0.3 and DBI 
> > > versions are,
> > > 
> > >   Perl: 5.008008(x86_64-linux)
> > >   OS  : linux   (2.6.20.19)
> > >   DBI : 1.602
> > >   DBD::mysql  : 4.005
> > >   DBD::Sponge : 12.010002
> > >   DBD::SQLite : 1.13
> > >   DBD::Proxy  : 0.2004
> > >   DBD::Oracle : 1.20
> > >   DBD::Multiplex  : 2.04
> > >   DBD::Gofer  : 0.010103
> > >   DBD::File   : 0.35
> > >   DBD::ExampleP   : 12.010007
> > >   DBD::DBM: 0.03
> > > 
> > > I am going to try to isolate a small test case, but right now I wanted to 
> > > post what I 
> > > have found so far.
> > > 
> > >  Thanks,
> > >STH
> > > 
> > > ## Back Trace 
> > 

Re: DBD::Oracle - execute_array core dumps intermittently

2008-03-07 Thread Scott T. Hildreth

This seems to be a threading error with the linux kernel version.
I am running this process on newer kernels (2.6.22.x) and the error
never occurs.  We also are experiencing a lot the "Futex WAIT" issues
with Oracle and the 2.6.20 kernels.

  Thanks for listening.

On Fri, 2008-03-07 at 11:25 -0600, Scott T. Hildreth wrote:
> Should have posted to users not dev.  This is really a bizarre problem.
> I can get it to fail about every fifth iteration otherwise the process
> works.  I ran it from another server connect to the same database and 
> it will intermittently fail.  I run it from a third sever and I can't 
> get it to core dump.  All 3 servers have the same kernel & Perl
> versions.  I've tried recompiling Perl, DBI, DBD::Oracle, still no luck.
> I created a test case, which uses execute_array and of course I can't
> get it to core dump.  If anyone has any ideas on what might be going on
> here,  I would love to hear them!
> 
>   Thanks
>  STH
> 
> On Wed, 2008-03-05 at 15:21 -0600, Scott T. Hildreth wrote:
> > I am not sure how to describe this, my co-worker will run his process and 
> > get a core dump
> > (I pasted the back trace below) and then run the process again with no core 
> > dumps.  Sometimes
> > it will core dump several times in a row and then the next run it finishes 
> > fine.  I ran the process
> > with DBI_TRACE=9 and this is what shows up at the end of the log,
> > 
> > 
> > 1   -> execute_for_fetch for DBD::Oracle::st (DBI::st=HASH(0xN)~INNER 
> > CODE(0xN) undef)
> > ora_st_execute_array UPDATE count=10 (ARRAY(0xN) undef undef)...
> > 
> > OCIBindByName(112df38,1132188,10a9138,":p1",placeh_len=3,value_p=0,value_sz=-1517274788,dty=1,indp=0,alenp=0,rcodep=0,maxarr_len=0,curelep=0
> >  (*=0),mode=2)=ERROR
> > OCIErrorGet(10a9138,1,"",7fff058d684c,"ORA-02005: implicit 
> > (-1) length not valid for this bind or define datatype
> > ",1024,2)=SUCCESS
> > OCIErrorGet after OCIBindByName (er1:ok): -1, 2005: ORA-02005: implicit 
> > (-1) length not valid for this bind or define datatype
> > 
> > OCIErrorGet(10a9138,2,"",7fff058d684c,"ORA-02005: implicit 
> > (-1) length not valid for this bind or define datatype
> > ",1024,2)=NO_DATA
> > 
> > At first I thought it was a 32bit library with a 64bit Perl problem, but 
> > Oracle.so & Perl are both linked
> > with the correct 64 bit libs.  The Oracle client is 10.2.0.3 and DBI 
> > versions are,
> > 
> >   Perl: 5.008008(x86_64-linux)
> >   OS  : linux   (2.6.20.19)
> >   DBI : 1.602
> >   DBD::mysql  : 4.005
> >   DBD::Sponge : 12.010002
> >   DBD::SQLite : 1.13
> >   DBD::Proxy  : 0.2004
> >   DBD::Oracle : 1.20
> >   DBD::Multiplex  : 2.04
> >   DBD::Gofer  : 0.010103
> >   DBD::File   : 0.35
> >   DBD::ExampleP   : 12.010007
> >   DBD::DBM: 0.03
> > 
> > I am going to try to isolate a small test case, but right now I wanted to 
> > post what I 
> > have found so far.
> > 
> >  Thanks,
> >STH
> > 
> > ## Back Trace 
> > #
> > 
> > (gdb) bt
> > #0  0x2b66ec7d9b95 in raise () from /lib64/libc.so.6
> > #1  0x2b66ec7daf90 in abort () from /lib64/libc.so.6
> > #2  0x2b66ec81035b in __libc_message () from /lib64/libc.so.6
> > #3  0x2b66ec81534e in malloc_printerr () from /lib64/libc.so.6
> > #4  0x2b66ec81695c in free () from /lib64/libc.so.6
> > #5  0x2b66ef0ac102 in ora_st_execute_array () from 
> > /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBD/Oracle/Oracle.so
> > #6  0x2b66ef0a62bf in XS_DBD__Oracle__st_ora_execute_array ()
> >from 
> > /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBD/Oracle/Oracle.so
> > #7  0x0046bc47 in Perl_pp_entersub ()
> > #8  0x0046a29e in Perl_runops_standard ()
> > #9  0x0041e82d in Perl_call_sv ()
> > #10 0x2b66ec9ee038 in XS_DBI_dispatch () from 
> > /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBI/DBI.so
> > #11 0x0046bc47 in Perl_pp_entersub ()
> > #12 0x0046a29e in Perl_runops_standard ()
> > #13 0x0041e82d in Perl_call_sv ()
> > #14 0x2b66ec9ee038 in XS_DBI_dispatch () from 
> > /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBI/DBI.so
> > #15 0x0046bc47 in Perl_pp_entersub ()
> > #16 0x0046a29e in Perl_runops_standard ()
> > #17 0x0041f1d1 in perl_run ()
> > #18 0x0041ba2c in main ()
> > 
> > 
> > 
> > *** glibc detected *** /usr/local/bin/perl: double free or corruption 
> > (!prev): 0x01163e10 ***
> > === Backtrace: =
> > /lib64/libc.so.6[0x2ad39584e34e]
> > /lib64/libc.so.6(__libc_free+0x6c)[0x2ad39584f95c]
> > /us

Re: DBD::Oracle - execute_array core dumps intermittently

2008-03-07 Thread Scott T. Hildreth

Should have posted to users not dev.  This is really a bizarre problem.
I can get it to fail about every fifth iteration otherwise the process
works.  I ran it from another server connect to the same database and 
it will intermittently fail.  I run it from a third sever and I can't 
get it to core dump.  All 3 servers have the same kernel & Perl
versions.  I've tried recompiling Perl, DBI, DBD::Oracle, still no luck.
I created a test case, which uses execute_array and of course I can't
get it to core dump.  If anyone has any ideas on what might be going on
here,  I would love to hear them!

  Thanks
 STH

On Wed, 2008-03-05 at 15:21 -0600, Scott T. Hildreth wrote:
> I am not sure how to describe this, my co-worker will run his process and get 
> a core dump
> (I pasted the back trace below) and then run the process again with no core 
> dumps.  Sometimes
> it will core dump several times in a row and then the next run it finishes 
> fine.  I ran the process
> with DBI_TRACE=9 and this is what shows up at the end of the log,
> 
> 
> 1   -> execute_for_fetch for DBD::Oracle::st (DBI::st=HASH(0xN)~INNER 
> CODE(0xN) undef)
> ora_st_execute_array UPDATE count=10 (ARRAY(0xN) undef undef)...
> 
> OCIBindByName(112df38,1132188,10a9138,":p1",placeh_len=3,value_p=0,value_sz=-1517274788,dty=1,indp=0,alenp=0,rcodep=0,maxarr_len=0,curelep=0
>  (*=0),mode=2)=ERROR
> OCIErrorGet(10a9138,1,"",7fff058d684c,"ORA-02005: implicit (-1) 
> length not valid for this bind or define datatype
> ",1024,2)=SUCCESS
> OCIErrorGet after OCIBindByName (er1:ok): -1, 2005: ORA-02005: implicit 
> (-1) length not valid for this bind or define datatype
> 
> OCIErrorGet(10a9138,2,"",7fff058d684c,"ORA-02005: implicit (-1) 
> length not valid for this bind or define datatype
> ",1024,2)=NO_DATA
> 
> At first I thought it was a 32bit library with a 64bit Perl problem, but 
> Oracle.so & Perl are both linked
> with the correct 64 bit libs.  The Oracle client is 10.2.0.3 and DBI versions 
> are,
> 
>   Perl: 5.008008(x86_64-linux)
>   OS  : linux   (2.6.20.19)
>   DBI : 1.602
>   DBD::mysql  : 4.005
>   DBD::Sponge : 12.010002
>   DBD::SQLite : 1.13
>   DBD::Proxy  : 0.2004
>   DBD::Oracle : 1.20
>   DBD::Multiplex  : 2.04
>   DBD::Gofer  : 0.010103
>   DBD::File   : 0.35
>   DBD::ExampleP   : 12.010007
>   DBD::DBM: 0.03
> 
> I am going to try to isolate a small test case, but right now I wanted to 
> post what I 
> have found so far.
> 
>  Thanks,
>STH
> 
> ## Back Trace 
> #
> 
> (gdb) bt
> #0  0x2b66ec7d9b95 in raise () from /lib64/libc.so.6
> #1  0x2b66ec7daf90 in abort () from /lib64/libc.so.6
> #2  0x2b66ec81035b in __libc_message () from /lib64/libc.so.6
> #3  0x2b66ec81534e in malloc_printerr () from /lib64/libc.so.6
> #4  0x2b66ec81695c in free () from /lib64/libc.so.6
> #5  0x2b66ef0ac102 in ora_st_execute_array () from 
> /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBD/Oracle/Oracle.so
> #6  0x2b66ef0a62bf in XS_DBD__Oracle__st_ora_execute_array ()
>from 
> /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBD/Oracle/Oracle.so
> #7  0x0046bc47 in Perl_pp_entersub ()
> #8  0x0046a29e in Perl_runops_standard ()
> #9  0x0041e82d in Perl_call_sv ()
> #10 0x2b66ec9ee038 in XS_DBI_dispatch () from 
> /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBI/DBI.so
> #11 0x0046bc47 in Perl_pp_entersub ()
> #12 0x0046a29e in Perl_runops_standard ()
> #13 0x0041e82d in Perl_call_sv ()
> #14 0x2b66ec9ee038 in XS_DBI_dispatch () from 
> /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBI/DBI.so
> #15 0x0046bc47 in Perl_pp_entersub ()
> #16 0x0046a29e in Perl_runops_standard ()
> #17 0x0041f1d1 in perl_run ()
> #18 0x0041ba2c in main ()
> 
> 
> 
> *** glibc detected *** /usr/local/bin/perl: double free or corruption 
> (!prev): 0x01163e10 ***
> === Backtrace: =
> /lib64/libc.so.6[0x2ad39584e34e]
> /lib64/libc.so.6(__libc_free+0x6c)[0x2ad39584f95c]
> /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBD/Oracle/Oracle.so(ora_st_execute_array+0xfa4)[0x2ad3980e6e94]
> /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBD/Oracle/Oracle.so(XS_DBD__Oracle__st_ora_execute_array+0xef)[0x2ad3980e0f9f]
> /usr/local/bin/perl(Perl_pp_entersub+0x6b7)[0x46bae7]
> /usr/local/bin/perl(Perl_runops_standard+0xe)[0x46a13e]
> /usr/local/bin/perl(Perl_call_sv+0x49d)[0x41e80d]
> /usr/local/perl-5.8.8/lib/site_perl/5.8.8/x86_64-linux/auto/DBI/DBI.so(XS_DBI_dispatch+0x7a8)[0x2ad395a27068]
> /u