This is an Oracle database (10.2g). I apologize if this question isn't DBI or DBD::Oracle related.

Basically, I have an instance of Oracle running a pair of tablespaces (one for data, the other for indexes) with 101 million records (data dbf is 53GB). On the same box (a quad core Xeon with 16GB of ram) I'm running an empty database with the same schema.

Using DBD::Oracle, I open both databases (src, dst) and do the following:

Build query for the names of the tables in the src db

foreach table
   query the column names in this table
   push ROWID onto column list
   build query string using column list (not using SELECT *)
   prepare stm
   bind columns to an array i'll call @row
build destination insert statement using placeholders from the column list (minus ROWID)
   prepare dst_sth
   while src_sth->fetch()
      modify @row values (@new_row)
      bind new_row values as bind_params on dst_sth
insert @new_row with execute, returns ROWID (using RETURNING INTO :new_id)
      insert tablename, new_rowid, old_rowid (kind of like logging...)
      commit every 1000 rows
    rinse repeat for each row
rinse repeat for each table


I've done this type of processing on MySQL database before. This one is on Oracle.

When running the code, I see a huge performance problem that pops up what appears to be randomly on the execute for the insert. When running DBI::Profile, and using Time::HiRes around the each of the steps, I've isolated the delay at the execute statement.

When a delay occurs, it is always 29.99999 (basically 30) seconds for the execute. Good executes occur in 0.003 seconds.

Here is a piece of DBI::Profile for 20000 rows in the loop:

execute' =>
  'DBD::Oracle::st::execute' =>
130.783813s / 20000 = 0.006539s avg (first 0.004013s, min 0.000390s, max 29.996666s)

As you can see, the delay occurred a quite a number of times during this run (I'm seeing about 1000 rows/second when it "works"). The delay is always 30 seconds, so this just screams of a timer or timeout somewhere in code. I've stripped the code to the bare bones (read a row, write a row, drop all indexes on the dst_database), and I can duplicate the problem.

Using an Oracle procedure I found on the net (snapper), I ran it against the SID that the above code was connected to. From looking at the output, I believe all of the time is being consumed in "WAIT, SQL*Net more data from client , 29292216, 29292216, 29.29s 29.29s ".

According to what I can Google, this implies that there's a "network performance" problem. I'm connecting with TCP on localhost. I've made the listener changes suggested in the DBD::Oracle docs. I'm running a queue depth of 50, with 40 listeners pre-started.

When capturing data using tcpdump, I see the following around the 30 second delay:

DBD::Oracle sends a request data packet (t=0)
TCP ACK packet in 17 microseconds later (t=0.000017)
Delay of 29.995692 seconds before the first DATA RESPONSE packet arrives from Oracle (port 1521) DBD::Oracle request data (an insert into second table the ROWID tracking table) 35.4 microseconds later
Oracle response 75 microseconds later

Subsequent packets are the same, but the inter-packet times are around 14 - 17 microseconds for the reading/writing.

Based on the packet trace, clearly Oracle isn't responding very quickly.

Martin Evans suggested that this might be a "Quantum Wait" timer in Oracle. He also hinted that the user doing the inserts might be in a Resource Group that was limiting the performance. After Googling and looking at the Database Resource Manager, I found my user was in the DEFAULT_RESOURCE_GROUP. I attempted to move my user to the SYS_GROUP, thinking that I'd get better performance. I restarted the database, and ran the test again. It sure looked like the problem disappeared. After a while, though, the problem returned. The more data I process, it seems the more the delay is introduced.

I don't think that I'm even running the DRM, because queries into the V$ tables that refer to DRM related things return zero rows.

I'm at a loss to determine what Oracle might be doing to kill performance. A year ago, there was a post to this list that suggested the same problem I'm seeing. Unfortunately, there wasn't follow up to suggest the root cause and remedy.

Any suggestions would be greatly appreciated.



Andy Baumhauer

Reply via email to