G'Day Max,
Thanks for your reply. Things have become a little stranger...
On Thu, 12 Jan 2006 [EMAIL PROTECTED] wrote:
> Hi Greg,
>
> OK. I start the segvn_fault.d script, then in a second window
> a "dd if=/dev/dsk/c0d0s2 of=/dev/null bs=8k"
> and then in a third window do the cp of the 1000 page file.
> Now I get ~1000 segvn_faults for the cp.
> I expected to get a larger count because the dd is
> contending with the cp.
> So, either your disk is very slow or there are other
> busy processes on your old system.
If my disk was slow or busy when cp wanted to read, then I too would
expect more faults - as cp can fault faster than disk-read-ahead.
But cp isn't reading from disk! A modified segvn.d (attached),
# segvn.d
Sampling...
^C
segvn_fault
-----------
CMD FILE COUNT
[...]
cp /extra1/1000 1000
CMD FILE BYTES
[...]
cp /extra1/1000 4096000
io:::start
----------
CMD FILE DIR BYTES
cp /extra1/1000 R 12288
cp /extra2/1000 W 4096000
The input file /extra1/1000 is not being read from disk (only 12 Kb).
Repeating your test kicked my Ultra 5 from a consistant 132 segvn_faults
to 1000 segvn_faults, but this remained at 1000 for subsequent tests
without the dd running. I suspect dd to /dev/dsk thrashed the cache
(or changed how it held pages) such that cache-read-ahead stopped working
(even though the pages were still cached). Ok, this is still sounding far
fetched.
Returning my Ultra 5 to a consistant 132 segvn_faults state was a real
challenge: mount remount didn't work, nor did init 6!
What did work was rewriting my /extra1/1000 file using dd. Hmmm.
It appears that using dd to WRITE to a file, leaves that file cached in a
cache-read-ahead optimal way (eg, repeatable 132 segvn_faults). Then
either remount or dd the /dev/dsk device (both affecting the cache) and
we go to a repeatable 1000 segvn_faults.
I rewrote my /extra1/1000 file on my x86 server, and yes - it now
consistantly faults at 129. Phew!
...
I came up with the following simple test to check this out,
# dd if=/dev/urandom bs=4k of=/extra1/5000 count=5000
0+5000 records in
0+5000 records out
# ptime cp -f /extra1/5000 /tmp
real 0.077 --- fast, as we just created it
user 0.001
sys 0.075
# ptime cp -f /extra1/5000 /tmp
real 0.076 --- still fast...
user 0.001
sys 0.074
# ptime cp -f /extra1/5000 /tmp
real 0.076 --- still fast...
user 0.001
sys 0.074
# umount /extra1; mount /extra1
# ptime cp -f /extra1/5000 /tmp
real 0.129 --- slow, as we just remounted the FS
user 0.001
sys 0.099
# ptime cp -f /extra1/5000 /tmp
real 0.084 --- faster, as the file is now cached
user 0.001
sys 0.081
# ptime cp -f /extra1/5000 /tmp
real 0.084 --- hrm...
user 0.001
sys 0.081
# ptime cp -f /extra1/5000 /tmp
real 0.084 --- not getting any faster than this.
user 0.001
sys 0.081
So after creation, /extra1/5000 is copied in 0.076 secs (consistantly).
After remounting, /extra1/5000 is copied in 0.084 secs (consistantly).
I haven't found a way to convince a file to be cached as well as it is on
creation. It seems that newly created files are blessed.
How does this all sound? :)
cheers,
Brendan
> max
>
> Quoting Brendan Gregg <[EMAIL PROTECTED]>:
[...]
> >> For instance, if the file system brings in 56k, (14 pages
> >> on my amd64 box), and my disk is reasonably fast,
> >> by the time cp gets a bit into the first 56k, I suspect
> >> that all of the data is in memory and there is no
> >> trapping into the kernel at all until the next 56k
> >> needs to be read in.
> >
> > That would make sense. In this case (testing here anyway) it's not going
> > near disk for reading (only writing the destination file),
> >
> > x86,
> > extended device statistics
> > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
> > 0.0 74.0 0.0 3999.8 0.0 0.1 0.0 0.9 0 6 c0d0
> > sparc,
> > extended device statistics
> > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
> > 0.0 65.0 0.0 7995.6 23.6 1.8 363.7 27.7 88 90 c0t0d0
> >
> > So, considering both systems undercount expected faults - one fault must
> > be triggering some form of "read ahead" from the page cache, not disk.
> > I'm thinking the path is somthing like,
> >
> > ufs_getpage_ra -> pvn_read_kluster -> page_create_va -> (read many?)
> >
> >> (I guess I am assuming the hat
> >> layer is setting up pte's as the pages are brought in,
> >> not as cp is accessing them).
> >
> > Yep - and that sort of problem would be the very thing that throws a
> > spanner in the works. If it always waited for cp to access them, then I'd
> > have consistant events to trace...
> >
> > Thanks Max! :)
> >
> > Brendan
> >
> >
> >
> >
>
>
>
>
#!/usr/sbin/dtrace -s
#pragma D option quiet
dtrace:::BEGIN
{
trace("Sampling...\n");
}
fbt::segvn_fault:entry
/(int)((struct segvn_data *)args[1]->s_data)->vp != NULL/
{
self->vn = (struct vnode *)((struct segvn_data *)args[1]->s_data)->vp;
@faults[execname, stringof(self->vn->v_path)] = count();
@bytes[execname, stringof(self->vn->v_path)] = sum(args[3]);
}
io:::start
{
@iobytes[execname, args[2]->fi_pathname,
args[0]->b_flags & B_READ ? "R" : "W"] = sum(args[0]->b_bcount);
}
dtrace:::END
{
printf("segvn_fault\n-----------\n");
printf("%-16s %32s %8s\n", "CMD", "FILE", "COUNT");
printa("%-16s %32s [EMAIL PROTECTED]", @faults);
printf("\n%-16s %32s %14s\n", "CMD", "FILE", "BYTES");
printa("%-16s %32s [EMAIL PROTECTED]", @bytes);
printf("\nio:::start\n----------\n");
printf("%-16s %32s %3s %10s\n", "CMD", "FILE", "DIR", "BYTES");
printa("%-16s %32s %3s [EMAIL PROTECTED]", @iobytes);
}
_______________________________________________
perf-discuss mailing list
[email protected]