RE: Timestamps in trace files (and other trace file oddities)

Wolfgang Breitling Thu, 24 Jul 2003 13:24:28 -0700

20% idle = 80% busy? How many cpus?

If it is 20% idle that should lay to rest the theory that the process may wait for a cpu. But, if it is a 5-way join, maybe it is reading a set of blocks from file 65 and then collects the corresponding rows from the other 4 tables, which must be all pure logical IO since there are no further waits. But if you have a few nested NL joins, they can easily churn through millions of LIOs taking minutes of cpu. And if you have multiple cpus, only one of them would be 100% busy during that time, leaving some overall idle %. Since the corresponding rows of the other tables get heavily referenced, they stay virtually "pinned" in the buffer ( in quotes because pinned in the buffer pool context has a different meaning than the ususal connotation ) offering an explanation why the already read 17 blocks need to be re-read once the session is done with block 6041 and wants to move on to block 6042.

At 12:34 PM 7/24/2003 -0800, you wrote:

Dan, No reason to stay in the corner (unless that's where the fridge is). I'm trying to decide where to look on the OS. When I looked at vmstat, there was no paging, and ~20% idle cpu. Maybe set up a cron to take snapshots of something(??) every few minutes? Support from the Sys Admin side is minimal, so I am kind of on my own. I also have more experience with Sun than AIX.

Henry


-----Original Message-----
Daniel Fink
Sent: Thursday, July 24, 2003 4:14 PM
To: Multiple recipients of list ORACLE-L


Henry,
        I completely agree, something did not feel right (hence my obvious
reluctance). I did a very poor job of trying to explain that we cannot
extrapolate system waits/activity by looking at database waits/activity. I
should have learned my lesson (bad dba...go sit in the corner) from a long
email discussion with another list member where I was trying to understand a
session issue by examining a single processes trace file.

Wolgang's answer really nails the issue. What I think it comes down to is that wait events are only valid for what Oracle is waiting for internally. It cannot see what is happening at the system level. Some of the system activity it can determine by the call it is making. For example, when a request is sent to disk, it knows that it is waiting for the data to be returned. However, it does not know what is consuming the wait time. It could be that the disk service time is high or that there are so many requests for the disk that they are being queued up. All the process knows is that the data has not been returned. As Wolfgang illustrated, the same thing happens at the CPU. I doubt that all 7 minutes were CPU, most likely there was a lot of intervening activity. All Oracle know was that it was not waiting on something to happen internally.

I hope I've done better the second time around.

Dan

Henry Poras wrote:
>
> Dan,
>         I see what you are saying but something doesn't feel right (see,
say, feel.
> Should be some pun in here about not making sense). CPU time of 7 minutes
> after a single read??? This is an ugly INSERT on a 5 table join, but
that's
> about it. Unfortunately I don't think there is a way to measure service
time
> at this granularity. Isn't it recorded in sesstat at the end of the
> transactions? Guess I'll go and double check that.
>         Also, I remember an old paper by Craig Shallahamer that shows you
don't get
> a full multi_block_read_count if a block is already in buffer cache (e.g.
> mbrc=8; p3=8,p3=8,p3=3,...The 3 is because the fourth block is already in
> memory. The cached block is already reflected in the trace numbers.)
>         There is something very strange hear but I don't know what. Might
be an
> interesting case to track db/os relationships (running on AIX 4.3)
>
> Henry
>
> -----Original Message-----
> Daniel Fink
> Sent: Thursday, July 24, 2003 11:49 AM
> To: Multiple recipients of list ORACLE-L
>
> Henry,
>         I'll make an attempt, but I am still learning a great deal about
wait
> events and trace files. Cary, Mogens, Anjo, Tim, Jonathan, Wolfgang,
et.al.
> are better authorities, so any corrections are very welcome.
>         The time between waits is the elapsed time. If we equate elapsed
time to
> response time, we must look at the two components of RT, wait time and
> service time. If the time differential is 7 minutes, x minutes is service
> time and y minutes is wait time. It is possible that only .01 seconds was
> wait time, while the remaining 6:59:99 was service time (i.e. cpu)
>         I can't explain the scattered read issue. However, it is
interesting to
> note that the # of blocks to be read is offset by 1 in the second wait
> event. Perhaps, the read request was issued, but the 65.6041 block was
> already in memory, so the read request was reissued minus that block. I'm
> not sure how the latch free enters into the picture, but I have a feeling
> that this wait is integral in the sequence of events.
>
> Dan
>

--
Please see the official ORACLE-L FAQ: http://www.orafaq.net
--
Author: Henry Poras
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).


Wolfgang Breitling
Oracle7, 8, 8i, 9i OCP DBA
Centrex Consulting Corporation
http://www.centrexcc.com


--
Please see the official ORACLE-L FAQ: http://www.orafaq.net
--
Author: Wolfgang Breitling
 INET: [EMAIL PROTECTED]

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

RE: Timestamps in trace files (and other trace file oddities)

Reply via email to