Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

Jignesh K. Shah Fri, 26 Oct 2007 09:50:52 -0700

Also to give perspective on the equivalent writes on CLOG

I used the following script which runs for 10 sec to track all writes tothe clog directory and here is what it came up with... (This is with 500users running)


# cat write.d
#!/usr/sbin/dtrace -s
syscall::write:entry

/execname=="postgres" &&dirname(fds[arg0].fi_pathname)=="/export/home0/igen/pgdata/pg_clog"/

{
   @write[fds[arg0].fi_pathname,arg1] = count();
}
tick-10sec
{
exit(0);
}

# ./write.d
dtrace: script './write.d' matched 2 probes
CPU     ID                    FUNCTION:NAME
 3   1026                      :tick-10sec

/export/home0/igen/pgdata/pg_clog/001E-2753028277088 1

#
I modified read.d to do a 5sec read
# ./read.d
dtrace: script './read.d' matched 3 probes
CPU     ID                    FUNCTION:NAME
 0      1                           :BEGIN
 0   1027                       :tick-5sec

/export/home0/igen/pgdata/pg_clog/001F-2753028268896 1/export/home0/igen/pgdata/pg_clog/001F-2753028252512 1/export/home0/igen/pgdata/pg_clog/001F-2753028285280 2/export/home0/igen/pgdata/pg_clog/001F-2753028277088 3/export/home0/igen/pgdata/pg_clog/001F-2753028236128 3/export/home0/igen/pgdata/pg_clog/001E-2753028285280 5/export/home0/igen/pgdata/pg_clog/001E-2753028236128 9/export/home0/igen/pgdata/pg_clog/001E-2753028277088 13/export/home0/igen/pgdata/pg_clog/001E-2753028268896 15/export/home0/igen/pgdata/pg_clog/001E-2753028252512 27

#

So the ratio of reads vs writes to clog files is pretty huge..


-Jignesh



Jignesh K. Shah wrote:

Tom,

Here is what I did:

I started aggregating all read information:
First I also had added group by pid (arg0,arg1, pid) and the countswere all coming as 1
Then I just grouped by filename and location (arg0,arg1 of reads) andthe counts came back as
# cat read.d
#!/usr/sbin/dtrace -s
syscall::read:entry
/execname=="postgres"/
{
   @read[fds[arg0].fi_pathname, arg1] = count();
}


# ./read.d
dtrace: script './read.d' matched 1 probe
^C
/export/home0/igen/pgdata/pg_clog/0014-2753028293472 1/export/home0/igen/pgdata/pg_clog/0014-2753028277088 1/export/home0/igen/pgdata/pg_clog/0015-2753028244320 2/export/home0/igen/pgdata/pg_clog/0015-2753028268896 14/export/home0/igen/pgdata/pg_clog/0015-2753028260704 25/export/home0/igen/pgdata/pg_clog/0015-2753028252512 27/export/home0/igen/pgdata/pg_clog/0015-2753028277088 28/export/home0/igen/pgdata/pg_clog/0015-2753028293472 37
FYI  I pressed ctrl-c within like less than a second
So to me this seems that multiple processes are reading the same pagefrom different pids. (This was with about 600 suers active.
Aparently we do have a problem that we are reading the same bufferaddress again. (Same as not being cached anywhere or not finding itin cache anywhere).
I reran lock wait script on couple of processes and did not seeCLogControlFileLock as a problem..
# ./83_lwlock_wait.d 14341

            Lock Id            Mode           Count
      WALInsertLock       Exclusive               1
      ProcArrayLock       Exclusive              16

            Lock Id   Combined Time (ns)
      WALInsertLock               383109
      ProcArrayLock            198866236

# ./83_lwlock_wait.d 14607

            Lock Id            Mode           Count
      WALInsertLock       Exclusive               2
      ProcArrayLock       Exclusive              15

            Lock Id   Combined Time (ns)
      WALInsertLock                55243
      ProcArrayLock             69700140

#

What will help you find out why it is reading the same page again?


-Jignesh



Jignesh K. Shah wrote:
I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS isjust avoiding the symptom to a later value.. I promise to look moreinto it before making any recommendations to increase NUM_CLOG_BUFFERs.
Because though "iGen" showed improvements in that area by increasingnum_clog_buffers , EAStress had shown no improvements.. Plus thereason I think this is not the problem in 8.3beta1 since the LockOutput clearly does not show CLOGControlFile as to be the issue whichI had seen in earlier case. So I dont think that increasingNUM_CLOG_BUFFERS will change thing here.
Now I dont understand the code pretty well yet I see three hotspotsand not sure if they are related to each other* ProcArrayLock waits - causing Waits as reported by83_lockwait.d script* SimpleLRUReadPage - causing read IOs as reported byiostat/rsnoop.d
* GetSnapshotData - causing CPU utilization  as reported by hotuser

But I will shut up and do more testing.

Regards,
Jignesh



Tom Lane wrote:
Josh Berkus <[EMAIL PROTECTED]> writes:
Actually, 32 made a significant difference as I recall ... do youstill have the figures for that, Jignesh?
I'd want to see a new set of test runs backing up any call for a change
in NUM_CLOG_BUFFERS --- we've changed enough stuff around this areathat
benchmarks using code from a few months back shouldn't carry a lot of
weight.

            regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris

Reply via email to