Re: Hunting down (possible) memory leak in DBD::Oracle

2017-12-18 Thread John Scoles


Hmm this type of DBD::Oracle debugging will be tricky.

Could be almost anything.  You are jumping versions in a big way but that still 
should be ok

A few questions

1)  What is the  ORA-NN  in question
2) Set trace to 15  to see if that give you more details
3) What are the type of fields?  Lob and blob and large varchars can be tricky
4) does the error happen in perl or XS (the 15 trace should)
5) To recompile you will need the latest version of the OCI client. Not sure 
what that is

Cheers
John





From: Fennell, Brian 
Sent: December 16, 2017 5:19 PM
To: dbi-users@perl.org
Subject: Hunting down (possible) memory leak in DBD::Oracle

Dear DBI people -

I am trying to port some old perl code to a new box.  (see Details below) 
Needless to say the original box and code works fine, but the new box (and old 
code) does not.
Specifically what I am seeing is that when I select slightly over a million 
records from a specific join of two tables (to be dumped one row at a time into 
a TSV file) we get strange ORA-N errors that don't really make any sense in 
this context.
The Same database and same table works fine on the original box with the large 
number of records.  2 million records always causes errors but two groups of 
1million (divided up by ROWNUM - the EXACT same rows) causes no errors.  I am 
using a test database with little activity do I am reasonably certain that the 
queries deal with the same rows.
So I am thinking the problem is data volume and not any specific piece of data 
(originally I thought it might be an odd string/data related error, but I am 
starting to think it is a memory leak of some kind).
The error always happens inside of fetchrow_array - and "$dbh->trace( 4 , 
$filename )" shows that the error originates inside the DBD::Oracle module 
while reading field 3 of 6.
Researching the ORA-NN error gives a perfectly sane description that makes 
no sense at all in the context of reading a specific field.

We are going thru an Audit and tightening up security so there are some things 
(like REAL hostnames and REAL column/table names) that I cannot share - but I 
will try to share as much as I can.

The Host I am calling "prod" below is the only one NOT exhibiting this issue.

Things I want to try -

1) recompile the DBD::Oracle module on Host "sandbox" with "perl Makefile.PL 
-g" and then use Valgrind.  I haven't used Valgrind before, but I guess it is 
time to learn.
2) Anything else this list suggests.

Details:

Host: prod
OS: Red Hat Enterprise Linux AS release 4 (Nahant Update 4)
Perl: 5.8.8 built for x86_64-linux
DBI: 1.53
DBD::Oracle: 1.19
Oracle: 10.2.0.1.0

Host: dev
OS: Red Hat Enterprise Linux Server release 6.5 (Santiago)
Perl: 5.16.1  built for x86_64-linux
DBI: 1.631
DBD::Oracle: 1.74
Oracle: 11.2.0.3.0

Host: prodnew
OS: Red Hat Enterprise Linux Server release 6.5 (Santiago)
Perl: 5.16.1  built for x86_64-linux
DBI: 1.631
DBD::Oracle: 1.70
Oracle: 11.2.0.1.0

Host: sandbox
OS: CentOS Linux release 7.4.1708 (Core)
Perl: 5.16.3  built for x86_64-linux-thread-multi
DBI: 1.637
DBD::Oracle: 1.74
Oracle: 12.1.0.2.0

--
Brian Fennell, Software Engineer | Radial
O: 610 491 7308 | M: 484 354 1699
fenne...@radial.com

The information contained in this electronic mail transmission is intended only 
for the use of the individual or entity named in this transmission. If you are 
not the intended recipient of this transmission, you are hereby notified that 
any disclosure, copying or distribution of the contents of this transmission is 
strictly prohibited and that you should delete the contents of this 
transmission from your system immediately. Any comments or statements contained 
in this transmission do not necessarily reflect the views or position of Radial 
or its subsidiaries and/or affiliates.

 >++[>++>++>++>+++>+++>++>+++>+++><-]>-->++>+>>>+>-->--><>.>.>.>.>.>.>.>.


RE: Hunting down (possible) memory leak in DBD::Oracle ** EXTERNAL **

2017-12-18 Thread Fennell, Brian
Thanks for the reply, Howard,

I am using the exact same database - a test database that has copies of the 
production data put into it once a day - besides that it doesn't change much 
(if at all) during the day.

The SQL I am using doesn't leave out a "bad guy" - I thought of that and 
actually had an off-by-one gap in some of my early tests.  Closed that hole by 
changing a ">" to a ">=".

Here is the SQL (with the original table and field names changed to allow for 
sharing

SELECT
d.ROW_NUMBER,
d.f1,
d.f2,
d.f3,
d.f4,
d.f5
FROM
(
SELECT /*+ FULL(A) PARALLEL(A 6) */
rownum ROW_NUMBER,
A.field1 f1 ,
A.field2 f2,
A.field3 f3,
A.field4 f4,
B.field5 f5
FROM
tableA A,
tableB B
WHERE
B.field6  IN  ( 'TOK3', 'TOK4', 'TOK5' )
AND B.field7  LIKE'A%'
AND B.field8  IN  ('TOK1', 'TOK2')
AND B.fkfield1=   A.field1
ORDER BY
1, 2, 3, 4, 5
  ) d
WHERE
d.row_number <  202
AND d.row_number >= 100

-Original Message-
From: Howard, Chris [mailto:howa...@prpa.org] 
Sent: Monday, December 18, 2017 9:21 AM
To: Fennell, Brian ; dbi-users@perl.org
Subject: RE: Hunting down (possible) memory leak in DBD::Oracle ** EXTERNAL **

Same database... do you mean the actual very same data source?

What is the Oracle error? 

To eliminate problems based on data (implicit conversions, that kind of thing) 
can you do a run from row 500,000 to 1,500,000 ?  (I think you have this 
covered, but maybe that row right at the breaking spot is somehow a bad guy.)




-Original Message-
From: Fennell, Brian [mailto:fenne...@radial.com]
Sent: Saturday, December 16, 2017 3:19 PM
To: dbi-users@perl.org
Subject: Hunting down (possible) memory leak in DBD::Oracle ** EXTERNAL **

Dear DBI people - 

I am trying to port some old perl code to a new box.  (see Details below) 
Needless to say the original box and code works fine, but the new box (and old 
code) does not.
Specifically what I am seeing is that when I select slightly over a million 
records from a specific join of two tables (to be dumped one row at a time into 
a TSV file) we get strange ORA-N errors that don't really make any sense in 
this context.
The Same database and same table works fine on the original box with the large 
number of records.  2 million records always causes errors but two groups of 
1million (divided up by ROWNUM - the EXACT same rows) causes no errors.  I am 
using a test database with little activity do I am reasonably certain that the 
queries deal with the same rows.
So I am thinking the problem is data volume and not any specific piece of data 
(originally I thought it might be an odd string/data related error, but I am 
starting to think it is a memory leak of some kind).
The error always happens inside of fetchrow_array - and "$dbh->trace( 4 , 
$filename )" shows that the error originates inside the DBD::Oracle module 
while reading field 3 of 6.  
Researching the ORA-NN error gives a perfectly sane description that makes 
no sense at all in the context of reading a specific field.  

We are going thru an Audit and tightening up security so there are some things 
(like REAL hostnames and REAL column/table names) that I cannot share - but I 
will try to share as much as I can.

The Host I am calling "prod" below is the only one NOT exhibiting this issue.

Things I want to try - 

1) recompile the DBD::Oracle module on Host "sandbox" with "perl Makefile.PL 
-g" and then use Valgrind.  I haven't used Valgrind before, but I guess it is 
time to learn.
2) Anything else this list suggests. 

Details:

Host: prod
OS: Red Hat Enterprise Linux AS release 4 (Nahant Update 4)
Perl: 5.8.8 built for x86_64-linux
DBI: 1.53
DBD::Oracle: 1.19
Oracle: 10.2.0.1.0

Host: dev
OS: Red Hat Enterprise Linux Server release 6.5 (Santiago)
Perl: 5.16.1  built for x86_64-linux
DBI: 1.631
DBD::Oracle: 1.74
Oracle: 11.2.0.3.0

Host: prodnew
OS: Red Hat Enterprise Linux Server release 6.5 (Santiago)
Perl: 5.16.1  built for x86_64-linux
DBI: 1.631
DBD::Oracle: 1.70
Oracle: 11.2.0.1.0

Host: sandbox
OS: CentOS Linux release 7.4.1708 (Core)
Perl: 5.16.3  built for x86_64-linux-thread-multi
DBI: 1.637
DBD::Oracle: 1.74
Oracle: 12.1.0.2.0

--
Brian Fennell, Software Engineer | Radial
O: 610 491 7308 | M: 484 354 1699
fenne...@radial.com

The information contained in this electronic mail transmission is intended only 
for the use of the individual or entity named in this transmission. If you are 
not the intended recipient of this transmission, you are hereby notified that 
any disclosure, copying or distribution of the contents of this transmission is 
strictly prohibited and that you should delete the contents of this 
transmission from your system immediately. Any comments or statements contained 
in 

RE: Hunting down (possible) memory leak in DBD::Oracle

2017-12-18 Thread Fennell, Brian
John,
Thanks so much for your reply!

I have put off this work for a few years and now the pressure is on - the 
original box and OS are so old that the DBA and System Engineer and the 
Operations manager have all ganged up on me.

I suppose I could try and work around by downgrading both the perl and the 
DBD::Oracle to the same version we use in production, but it would be nice to 
actually fix the bug if I can.

I tried just downgrading the DBD::Oracle, but changes in perl 5 to support 
MULTIPLICITY made that look like more than just a little work - spend two days 
on it and then backed off.  

I am a polyglot programmer so I can program in C and Perl (and about a dozen 
other languages).  I have done enough time with C that it doesn't scare me.  
Valgrind is new to me, but make and gcc and ld are not.
I have started to read the Valgrind docs and it seems to make sense - it 
basically emulates all the CPU instructions with injected instrumentation - I 
assume it works for Intel and Red Hat if it works at all 
(and it seems to have a long history and good open source support community).  
Perhaps I am fooling myself, but I figure it is worth a try.

I have negotiated support from both DBA and System Engineering (the Red Hat OS 
guys) so if I am going to fix this now is the time.

The only other option I can think of is to try to get the old code working with 
the DBD::JDBC driver (which would mean adding a JVM running in parallel and 
additional overhead - so I would rather not).

1) The error changes depending on the data - which is why I think it is a 
buffer overrun or a wild pointer - but it is  always in "field N of N" - 
Current I can reproduce with ORA-01403
2) I will re-try at level 15 and post the results - current at 4 (or perhaps 5) 
here is a section from the log (which suggests to me it is happing in the C 
code and not in the Perl

-> fetchrow_array for DBD::Oracle::st 
(GSI::DBI::Connection::st=HASH(0x29353c8)~0x286f6b0) thr#134c010
dbd_st_fetch 6 fields...
dbd_st_fetched 6 fields with status of 0(SUCCESS)
field #1 with rc=0(OK)
field #2 with rc=0(OK)
field #3 with rc=0(OK)
field #4 with rc=1405(NULL)
field #5 with rc=0(OK)
field #6 with rc=0(OK)
-> fetchrow_array for DBD::Oracle::st 
(GSI::DBI::Connection::st=HASH(0x29353c8)~0x286f6b0) thr#134c010
dbd_st_fetch 6 fields...
dbd_st_fetched 6 fields with status of 0(SUCCESS)
field #1 with rc=0(OK)
field #2 with rc=0(OK)
field #3 with rc=0(OK)
field #4 with rc=14135(UNKNOWN RC=14135))
OCIErrorGet after ORA-14135 error on field 4 of 6, ora_type 2 (er1:ok): 
-1, 1403: ORA-01403: no data found

-- HandleSetErr err=1403, errstr='ORA-01403: no data found (DBD ERROR: 
ORA-14135 error on field 4 of 6, ora_type 2)', state=undef, undef
field #5 with rc=0(OK)
field #6 with rc=0(OK)
1   -> FETCH for DBD::Oracle::st 
(GSI::DBI::Connection::st=HASH(0x286f6b0)~INNER 'ParamValues') thr#134c010

3) I think the most exotic thing in these tables is a VARCHAR2 but I will check 
and post the results.
4) I looks like it is in the XS to me (see answer to 2) - but I suppose it 
could be elsewhere - like a loopback-perl-ref that should be weak but is not.
5) I think I have what I need, DBA installed Oracle 12 OCI client and "dot.so" 
libraries but currently I am concerned that I am using "ins_rdbms.mk" when I 
should be using "demo.mk" or similar - I am getting a Warning (see details 
below) when I run Makefile.PL - I asked DBA to look into installing the 
"demo.mk" file and consider opening up a Oracle METALINK support ticket to see 
if another customer had already solved this with Oracle's help.

Details:

# /usr/local/bin/perl Makefile.PL -g
Using DBI 1.637 (for perl 5.016003 on x86_64-linux-thread-multi) installed in 
/usr/local/lib64/perl5/auto/DBI/

Configuring DBD::Oracle for perl 5.016003 on linux (x86_64-linux-thread-multi)

Remember to actually *READ* the README file! Especially if you have any 
problems.

Installing on a linux, Ver#3.10
Using Oracle in /db/app/oracle/product/12.1.0/client_1
DEFINE _SQLPLUS_RELEASE = "1201000200" (CHAR)
Oracle version 12.1.0.2 (12.1)
Found /db/app/oracle/product/12.1.0/client_1/rdbms/lib/ins_rdbms.mk
Using /db/app/oracle/product/12.1.0/client_1/rdbms/lib/ins_rdbms.mk
Your LD_LIBRARY_PATH env var is set to 
'/db/app/oracle/product/12.1.0/client_1/lib:/db/app/oracle/product/12.1.0/client_1'
Reading /db/app/oracle/product/12.1.0/client_1/rdbms/lib/ins_rdbms.mk
Reading /db/app/oracle/product/12.1.0/client_1/rdbms/lib/env_rdbms.mk
WARNING: Oracle /db/app/oracle/product/12.1.0/client_1/rdbms/lib/ins_rdbms.mk 
doesn't define a 'build' rule.

WARNING: I will now try to guess how to build and link DBD::Oracle for you.
 This kind of guess work is very error prone and Oracle-version 
sensitive.
 It is possible that it won't be supported in future versions of 
DBD::Oracle.
 

RE: Hunting down (possible) memory leak in DBD::Oracle ** EXTERNAL **

2017-12-18 Thread Fennell, Brian
Ok . . . I am already benefiting from the support from this list : - ) 

I noticed that I was actually ordering my query BY rownum (which doesn't make 
much sense . . . and perhaps oracle's optimizer recognized this and ignored the 
pseudo column.)

Just to be sure, I change the query and then reran my tests - the results, 
however, did not change

New SQL

SELECT
d.ROW_NUMBER,
d.f1,
d.f2,
d.f3,
d.f4,
d.f5
FROM
(
SELECT /*+ FULL(A) PARALLEL(A 6) */
rownum ROW_NUMBER,
A.field1 f1 ,
A.field2 f2,
A.field3 f3,
A.field4 f4,
B.field5 f5
FROM
tableA A,
tableB B
WHERE
B.field6  IN  ( 'TOK3', 'TOK4', 'TOK5' )
AND B.field7  LIKE'A%'
AND B.field8  IN  ('TOK1', 'TOK2')
AND B.fkfield1=   A.field1
ORDER BY
2, 3, 4, 5, 6
  ) d
WHERE
d.row_number <  202
AND d.row_number >= 100

Here is a summary of the test results (I have a test harness which uses ssh to 
run the exact same tests on more than one host).

INPUT: ( ( HOSTNAME = sandbox ) ( RANGE = 200   0 ) ) OUTPUT: ( 
ORACLE_ERROR_COUNT = 2 )
INPUT: ( ( HOSTNAME = prod) ( RANGE = 200   0 ) ) OUTPUT: ( 
ORACLE_ERROR_COUNT = 0 )
INPUT: ( ( HOSTNAME = sandbox ) ( RANGE = 100   0 ) ) OUTPUT: ( 
ORACLE_ERROR_COUNT = 0 )
INPUT: ( ( HOSTNAME = prod) ( RANGE = 100   0 ) ) OUTPUT: ( 
ORACLE_ERROR_COUNT = 0 )
INPUT: ( ( HOSTNAME = sandbox ) ( RANGE = 200 100 ) ) OUTPUT: ( 
ORACLE_ERROR_COUNT = 0 )
INPUT: ( ( HOSTNAME = prod) ( RANGE = 200 100 ) ) OUTPUT: ( 
ORACLE_ERROR_COUNT = 0 )
INPUT: ( ( HOSTNAME = sandbox ) ( RANGE = 202 100 ) ) OUTPUT: ( 
ORACLE_ERROR_COUNT = 0 )
INPUT: ( ( HOSTNAME = prod) ( RANGE = 202 100 ) ) OUTPUT: ( 
ORACLE_ERROR_COUNT = 0 )
INPUT: ( ( HOSTNAME = sandbox ) ( RANGE = 203 100 ) ) OUTPUT: ( 
ORACLE_ERROR_COUNT = 2 )
INPUT: ( ( HOSTNAME = prod) ( RANGE = 203 100 ) ) OUTPUT: ( 
ORACLE_ERROR_COUNT = 0 )


RE: Hunting down (possible) memory leak in DBD::Oracle ** EXTERNAL **

2017-12-18 Thread Fennell, Brian
Good question - I have asked DBA and am waiting for a reply.

-Original Message-
From: Howard, Chris [mailto:howa...@prpa.org] 
Sent: Monday, December 18, 2017 11:36 AM
To: Fennell, Brian ; dbi-users@perl.org
Subject: RE: Hunting down (possible) memory leak in DBD::Oracle ** EXTERNAL **

Fennell,

Are you seeing anything on the database side, in the alert log, etc.?


Howard


-Original Message-
From: Fennell, Brian [mailto:fenne...@radial.com] 
Sent: Monday, December 18, 2017 8:30 AM
To: Howard, Chris ; dbi-users@perl.org
Subject: RE: Hunting down (possible) memory leak in DBD::Oracle ** EXTERNAL **

Thanks for the reply, Howard,

I am using the exact same database - a test database that has copies of the 
production data put into it once a day - besides that it doesn't change much 
(if at all) during the day.

The SQL I am using doesn't leave out a "bad guy" - I thought of that and 
actually had an off-by-one gap in some of my early tests.  Closed that hole by 
changing a ">" to a ">=".

Here is the SQL (with the original table and field names changed to allow for 
sharing

SELECT
d.ROW_NUMBER,
d.f1,
d.f2,
d.f3,
d.f4,
d.f5
FROM
(
SELECT /*+ FULL(A) PARALLEL(A 6) */
rownum ROW_NUMBER,
A.field1 f1 ,
A.field2 f2,
A.field3 f3,
A.field4 f4,
B.field5 f5
FROM
tableA A,
tableB B
WHERE
B.field6  IN  ( 'TOK3', 'TOK4', 'TOK5' )
AND B.field7  LIKE'A%'
AND B.field8  IN  ('TOK1', 'TOK2')
AND B.fkfield1=   A.field1
ORDER BY
1, 2, 3, 4, 5
  ) d
WHERE
d.row_number <  202
AND d.row_number >= 100

-Original Message-
From: Howard, Chris [mailto:howa...@prpa.org] 
Sent: Monday, December 18, 2017 9:21 AM
To: Fennell, Brian ; dbi-users@perl.org
Subject: RE: Hunting down (possible) memory leak in DBD::Oracle ** EXTERNAL **

Same database... do you mean the actual very same data source?

What is the Oracle error? 

To eliminate problems based on data (implicit conversions, that kind of thing) 
can you do a run from row 500,000 to 1,500,000 ?  (I think you have this 
covered, but maybe that row right at the breaking spot is somehow a bad guy.)




-Original Message-
From: Fennell, Brian [mailto:fenne...@radial.com]
Sent: Saturday, December 16, 2017 3:19 PM
To: dbi-users@perl.org
Subject: Hunting down (possible) memory leak in DBD::Oracle ** EXTERNAL **

Dear DBI people - 

I am trying to port some old perl code to a new box.  (see Details below) 
Needless to say the original box and code works fine, but the new box (and old 
code) does not.
Specifically what I am seeing is that when I select slightly over a million 
records from a specific join of two tables (to be dumped one row at a time into 
a TSV file) we get strange ORA-N errors that don't really make any sense in 
this context.
The Same database and same table works fine on the original box with the large 
number of records.  2 million records always causes errors but two groups of 
1million (divided up by ROWNUM - the EXACT same rows) causes no errors.  I am 
using a test database with little activity do I am reasonably certain that the 
queries deal with the same rows.
So I am thinking the problem is data volume and not any specific piece of data 
(originally I thought it might be an odd string/data related error, but I am 
starting to think it is a memory leak of some kind).
The error always happens inside of fetchrow_array - and "$dbh->trace( 4 , 
$filename )" shows that the error originates inside the DBD::Oracle module 
while reading field 3 of 6.  
Researching the ORA-NN error gives a perfectly sane description that makes 
no sense at all in the context of reading a specific field.  

We are going thru an Audit and tightening up security so there are some things 
(like REAL hostnames and REAL column/table names) that I cannot share - but I 
will try to share as much as I can.

The Host I am calling "prod" below is the only one NOT exhibiting this issue.

Things I want to try - 

1) recompile the DBD::Oracle module on Host "sandbox" with "perl Makefile.PL 
-g" and then use Valgrind.  I haven't used Valgrind before, but I guess it is 
time to learn.
2) Anything else this list suggests. 

Details:

Host: prod
OS: Red Hat Enterprise Linux AS release 4 (Nahant Update 4)
Perl: 5.8.8 built for x86_64-linux
DBI: 1.53
DBD::Oracle: 1.19
Oracle: 10.2.0.1.0

Host: dev
OS: Red Hat Enterprise Linux Server release 6.5 (Santiago)
Perl: 5.16.1  built for x86_64-linux
DBI: 1.631
DBD::Oracle: 1.74
Oracle: 11.2.0.3.0

Host: prodnew
OS: Red Hat Enterprise Linux Server release 6.5 (Santiago)
Perl: 5.16.1  built for x86_64-linux
DBI: 1.631
DBD::Oracle: 1.70
Oracle: 11.2.0.1.0

Host: sandbox
OS: CentOS Linux release 7.4.1708 (Core)
Perl: 5.16.3  built for x86_64-linux-thread-mult

RE: Hunting down (possible) memory leak in DBD::Oracle

2017-12-18 Thread Fennell, Brian
Pluta,

Looks like it is worth a try - when I looked at the project before it looked 
like it was for installing a "per user" perl.  Does it work for root / all 
users on a box as well?

Brian