Hello community, here is the log from the commit of package collectl for openSUSE:Factory checked in at 2017-03-04 16:37:47 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/collectl (Old) and /work/SRC/openSUSE:Factory/.collectl.new (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "collectl" Sat Mar 4 16:37:47 2017 rev:27 rq:476777 version:4.1.2 Changes: -------- --- /work/SRC/openSUSE:Factory/collectl/collectl.changes 2016-10-20 23:10:03.000000000 +0200 +++ /work/SRC/openSUSE:Factory/.collectl.new/collectl.changes 2017-03-04 16:44:53.658543420 +0100 @@ -1,0 +2,19 @@ +Fri Mar 3 13:53:59 UTC 2017 - tabra...@suse.com + +- Update to 4.1.2 + + incorrectly requiring a + with --rawdskfilt to be at beginning + + when added support for 64bit IB counters it looks like I was only + saving 3 of the 4 values (loop only went to 3 instead of 4) + around line 4403. [thanks seb] + +- Changes from 4.1.1 + + added packet loss and fast restransmissions to TCP Extended versbose + output and renamed AkNoPy and PreAck to PurAck and HPAcks to be + consistent with earlier versions [thanks Sophie] + + add support for nvme disks [thanks fred] + + it turns out some people re-enable lustre support for the sake of + monitoring clients and to support that I had to add a check for + the lustre-client module which is now in a differetn location than + others [thanks fred] + +------------------------------------------------------------------- Old: ---- collectl-4.1.0.src.tar.gz New: ---- collectl-4.1.2.src.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ collectl.spec ++++++ --- /var/tmp/diff_new_pack.vLEfOz/_old 2017-03-04 16:44:54.114478862 +0100 +++ /var/tmp/diff_new_pack.vLEfOz/_new 2017-03-04 16:44:54.114478862 +0100 @@ -1,7 +1,7 @@ # # spec file for package collectl # -# Copyright (c) 2016 SUSE LINUX GmbH, Nuernberg, Germany. +# Copyright (c) 2017 SUSE LINUX GmbH, Nuernberg, Germany. # # All modifications and additions to the file contributed by third parties # remain the property of their copyright owners, unless otherwise agreed @@ -17,7 +17,7 @@ Name: collectl -Version: 4.1.0 +Version: 4.1.2 Release: 0 Summary: Collects data that describes the current system status License: Artistic-1.0 and GPL-2.0+ ++++++ collectl-4.1.0.src.tar.gz -> collectl-4.1.2.src.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/collectl-4.1.0/RELEASE-collectl new/collectl-4.1.2/RELEASE-collectl --- old/collectl-4.1.0/RELEASE-collectl 2016-10-07 14:35:39.000000000 +0200 +++ new/collectl-4.1.2/RELEASE-collectl 2017-02-27 21:46:01.000000000 +0100 @@ -27,6 +27,22 @@ COLLECTL CHANGES +4.1.2 Feb 27, 2017 + - incorrectly requiring a + with --rawdskfilt to be at beginning + - when added support for 64bit IB counters it looks like I was only + saving 3 of the 4 values (loop only went to 3 instead of 4) + around line 4403. [thanks seb] + +4.1.1 Nov 2, 2016 + - added packet loss and fast restransmissions to TCP Extended versbose + output and renamed AkNoPy and PreAck to PurAck and HPAcks to be + consistent with earlier versions [thanks Sophie] + - add support for nvme disks [thanks fred] + - it turns out some people re-enable lustre support for the sake of + monitoring clients and to support that I had to add a check for + the lustre-client module which is now in a differetn location than + others [thanks fred] + 4.1.0 Oct 7, 2016 - allow lexpr to pass formatting information for strings and numbers [thanks Guy] @@ -106,6 +122,11 @@ COLMUX CHANGES +4.9.2 + - if ping fails, it still tries to ssh and fails, generating + meaninlgess uninitialized variable errors + - include 'ssh' in the error messages when check() fails (thanks KM5) + 4.9.1 Mar 29, 2016 - assume collectl in same directory as colmux so you can install both on network share BUT if colmux ends in 'pl', it's probably me doing diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/collectl-4.1.0/collectl new/collectl-4.1.2/collectl --- old/collectl-4.1.0/collectl 2016-10-07 14:35:39.000000000 +0200 +++ new/collectl-4.1.2/collectl 2017-02-27 21:46:01.000000000 +0100 @@ -1,6 +1,6 @@ #!/usr/bin/perl -w -# Copyright 2003-2016 Hewlett-Packard Development Company, L.P. +# Copyright 2003-2017 Hewlett-Packard Development Company, L.P. # # collectl may be copied only under the terms of either the Artistic License # or the GNU General Public License, which may be found in the source kit @@ -111,8 +111,8 @@ $rootFlag=(!$PcFlag && `whoami`=~/root/) ? 1 : 0; $SrcArch= $Config{"archname"}; -$Version= '4.1.0-1'; -$Copyright='Copyright 2003-2016 Hewlett-Packard Development Company, L.P.'; +$Version= '4.1.2-1'; +$Copyright='Copyright 2003-2017 Hewlett-Packard Development Company, L.P.'; $License= "collectl may be copied only under the terms of either the Artistic License\n"; $License.= "or the GNU General Public License, which may be found in the source kit"; @@ -222,7 +222,7 @@ $DiskMaxValue=-1; # disabled # NOTE - the following line should match what is in collectl.conf. If uncommented there, it will be replaced -$DiskFilter='cciss/c\d+d\d+ |hd[ab] | sd[a-z]+ |dm-\d+ |xvd[a-z] |fio[a-z]+ | vd[a-z]+ |emcpower[a-z]+ |psv\d+ '; +$DiskFilter='cciss/c\d+d\d+ |hd[ab] | sd[a-z]+ |dm-\d+ |xvd[a-z] |fio[a-z]+ | vd[a-z]+ |emcpower[a-z]+ |psv\d+ |nvme\d+n\d+ '; $DiskFilterFlag=0; # only set when filter set in collectl.conf $ProcReadTest='yes'; @@ -1034,11 +1034,8 @@ if ($rawDskFilter ne '') { # if filter starts with '+', just add to existing string - if ($rawDskFilter=~/\+/) + if ($rawDskFilter=~/^\+/) { - error("+ in rawdskfilt must be the first char") - if $rawDskFilter!~/^\+/; - $rawDskFilter=~s/^\+//; $DiskFilter.="|$rawDskFilter"; } @@ -3921,6 +3918,7 @@ if ($line=~/ vd[a-z] /) { record(2, "$tag $line"); next; } if ($line=~/emcpower[a-z]+ /) { record(2, "$tag $line"); next; } if ($line=~/psv\d+ /) { record(2, "$tag $line"); next; } + if ($line=~/nvme\d+n\d+ /) { record(2, "$tag $line"); next; } } else { @@ -6781,6 +6779,11 @@ my $whatsnew=<<EOF6; What's new in collectl in the last year or so? +version 4.1.1 + - added support for nvme disks + - added packet loss and fast retransmissions for -st and renamed + AkNoPy and PreAck to be consistent with older versions + version 4.1.0 - enhanced lexpr to allow reporting fractional metrics - modified misc to report uptime days to 3 decimal places diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/collectl-4.1.0/collectl.conf new/collectl-4.1.2/collectl.conf --- old/collectl-4.1.0/collectl.conf 2016-10-07 14:35:39.000000000 +0200 +++ new/collectl-4.1.2/collectl.conf 2017-02-27 21:46:01.000000000 +0100 @@ -142,7 +142,7 @@ # ignored. To change the filter, set the string below to those you want to keep BUT # you need to know what a perl regular expression looks like or you may not get the # desired results. CAUTION - white space is CRITICAL for this to work. -#DiskFilter = /cciss/c\d+d\d+ |hd[ab] | sd[a-z]+ |dm-\d+ |xvd[a-z] |fio[a-z]+ | vd[a-z]+ |emcpower[a-z]+ |psv\d+ / +#DiskFilter = /cciss/c\d+d\d+ |hd[ab] | sd[a-z]+ |dm-\d+ |xvd[a-z] |fio[a-z]+ | vd[a-z]+ |emcpower[a-z]+ |psv\d+ |nvme\d+n\d+ / # Kernel Efficiency Test # On kernels 2.6.32 forward (and you can't tell how distros patched) there is a read inefficiency diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/collectl-4.1.0/colmux new/collectl-4.1.2/colmux --- old/collectl-4.1.0/colmux 2016-10-07 14:35:39.000000000 +0200 +++ new/collectl-4.1.2/colmux 2017-02-27 21:46:01.000000000 +0100 @@ -3,7 +3,7 @@ # problems # pattern match for -i recognition/removal (in -test) not robust enough -# Copyright 2005-2015 Hewlett-Packard Development Company, LP +# Copyright 2005-2017 Hewlett-Packard Development Company, LP # # colmux may be copied only under the terms of either the Artistic License # or the GNU General Public License, which may be found in the source kit @@ -73,8 +73,8 @@ my $Collectl="$BinDir/collectl"; my $Program='colmux'; -my $Version='4.9.1'; -my $Copyright='Copyright 2005-2015 Hewlett-Packard Development Company, L.P.'; +my $Version='4.9.2'; +my $Copyright='Copyright 2005-2017 Hewlett-Packard Development Company, L.P.'; my $License="colmux may be copied only under the terms of either the Artistic License\n"; $License.= "or the GNU General Public License, which may be found in the source kit"; @@ -438,21 +438,25 @@ my $maxSecs=0; for (my $i=0; $i<@hostnames; $i++) { - my $year=substr($dates[$i], 0, 4); - my $mon= substr($dates[$i], 4, 2); - my $day= substr($dates[$i], 6, 2); - my $hour=substr($dates[$i], 8, 2); - my $mins=substr($dates[$i], 10, 2); - my $secs=substr($dates[$i], 12, 2); - my $seconds=(timelocal($secs, $mins, $hour, $day, $mon-1, $year-1900)); - $minSecs=$seconds if $seconds<$minSecs; - $maxSecs=$seconds if $seconds>$maxSecs; - print "$hostnames[$i]: $dates[$i]" if $debug & 1024; - if ($maxSecs-$minSecs>$timerange) - { - my $plural=($timerange>1) ? 's' : ''; - print "WARNING: $hostnames[$i]'s time differs by more than $timerange second$plural with at least one other\n"; - print " run again with -debug 1024 and/or use -timerange or -quiet to suppress this message\n"; + # some checks may have failed so only look at 'good' nodes + if (!$threadFailure[$i]) + { + my $year=substr($dates[$i], 0, 4); + my $mon= substr($dates[$i], 4, 2); + my $day= substr($dates[$i], 6, 2); + my $hour=substr($dates[$i], 8, 2); + my $mins=substr($dates[$i], 10, 2); + my $secs=substr($dates[$i], 12, 2); + my $seconds=(timelocal($secs, $mins, $hour, $day, $mon-1, $year-1900)); + $minSecs=$seconds if $seconds<$minSecs; + $maxSecs=$seconds if $seconds>$maxSecs; + print "$hostnames[$i]: $dates[$i]" if $debug & 1024; + if ($maxSecs-$minSecs>$timerange) + { + my $plural=($timerange>1) ? 's' : ''; + print "WARNING: $hostnames[$i]'s time differs by more than $timerange second$plural with at least one other\n"; + print " run again with -debug 1024 and/or use -timerange or -quiet to suppress this message\n"; + } } } @@ -477,8 +481,8 @@ $reason='passwordless ssh failed' if $threadFailure[$i]==-1; $reason='ping failed' if $threadFailure[$i]==1; $reason='collectl not installed' if $threadFailure[$i]==2; - $reason='connection refused' if $threadFailure[$i]==4; - $reason='permission denied' if $threadFailure[$i]==8; + $reason='ssh: connection refused' if $threadFailure[$i]==4; + $reason='ssh: permission denied' if $threadFailure[$i]==8; $reason='could not resolve name' if $threadFailure[$i]==16; $reason='timed out during banner exchange' if $threadFailure[$i]==32; $reason='collectl version < 3.5' if $threadFailure[$i]==64; @@ -1600,7 +1604,7 @@ close ROUTES; $myaddr=`$Ifconfig $interface | grep 'inet '`; $myaddr=(split(/\s+/, $myaddr))[2]; - $myaddr=(split(/:/, $myaddr))[1]; # another no-op for newer ifconfig + $myaddr=~s/addr://; # for cases where ipaddr is preceeded by this print "Got address from route for $interface: $myaddr\n" if $debug & 1; return ($myaddr); } diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/collectl-4.1.0/docs/#Data-verbose.html# new/collectl-4.1.2/docs/#Data-verbose.html# --- old/collectl-4.1.0/docs/#Data-verbose.html# 1970-01-01 01:00:00.000000000 +0100 +++ new/collectl-4.1.2/docs/#Data-verbose.html# 2017-02-27 21:46:02.000000000 +0100 @@ -0,0 +1,832 @@ +<html> +<head> +<link rel=stylesheet href="style.css" type="text/css"> +<title>Verbose Data</title> +</head> + +<body> +<center><h1>Verbose Data</h1></center> +<p> + +Data is reported in this form when either --verbose is used OR if there is at least one +type of data requested that doesn't have a brief form such as any detail data or +ionodes, processes or slabs. Specifying some of the lustre output options with --lustopts +such as B, D and M will also force verbose format. + +<h3>Buddy (Memory Fragmentation) Data, <i>collectl -sb</i></h3> +<p> +<div class=terminal-wide14><pre> +# MEMORY FRAGMENTATION SUMMARY (4K pages) +# 1Pg 2Pgs 4Pgs 8Pgs 16Pgs 32Pgs 64Pgs 128Pgs 256Pgs 512Pgs 1024Pgs +</pre></div> + +This table shows the total number of memory fragments by pagesize in increasing powers of 2 for +all the memory types. +<p> +<p><h3>CPU, <i>collectl -sc</i></h3> + +<div class=terminal-wide14> +<pre> +# CPU SUMMARY (INTR, CTXSW & PROC /sec) +# User Nice Sys Wait IRQ Soft Steal Guest NiceG Idle CPUs Intr Ctxsw Proc RunQ Run Avg1 Avg5 Avg15 RunT BlkT +</pre> +</div> + +These are the percentage of time the system in running is one of the modes, noting that +these are averaged across all CPUs. While User and Sys modes are self-eplanitory, the others +may not be: +<p> +<table> +<tr valign=top><td><b>User</b></td> +<td>Time spent in User mode, not including time spend in "nice" mode.</td></tr> + +<tr valign=top><td><b>Nice</b></td> +<td>Time spent in Nice mode, that is lower priority as adjusted by +the nice command and have the "N" status flag set when examined with "ps".</td></tr> + +<tr valign=top><td><b>Sys</b></td> +<td>This is time spent in "pure" system time.</td></tr> + +<tr valign=top><td><b>Wait</b></td> +<td>Also known as "iowait", this is the time the CPU was idle during an +outstanding disk I/O request. This is not considered to be part of the total or +system times reported in brief mode.</td></tr> + +<tr valign=top><td><b>Irq</b></td> +<td>Time spent processing interrupts and also considered to be part of +the summary system time reported in "brief" mode.</td></tr> + +<tr valign=top><td><b>Soft</b></td> +<td>Time spent processing soft interrupts and also considered to be part +of the summary system time reported in "brief" mode.</td></tr> + +<tr valign=top><td><b>Steal</b></td> +<td>Time spent in other operating systems when running in a virtualized environment</td></tr> + +<tr valign=top><td><b>Guest</b></td> +<td>Time spent running a virtual CPU for guest operating systems under the control of the +Linux kernel, new since 2.6.24</td></tr> + +<tr valign=top><td><b>NiceG</b></td> +<td>Time spent running a niced guest (virtual CPU for guest operating systems under the +control of the Linux kernel), new since 2.6.33</td></tr> +</table> + +<p> +This next set of fields apply to processes +<p> +<table> +<tr><td><b>Proc</b></td><td>Process creations/sec.</li></td></tr> +<tr><td><b>Runq</b></td><td>Number of processes in the run queue.</li></td></tr> +<tr><td><b>Run</b></td><td>Number of processes in the run state.</li></td></tr> +<tr><td><b>Avg1, Avg5, Avg15</b></td><td>Load average over the last 1,5 and 15 minutes.</td></tr> +<tr><td><b>RunT</b></td><td>Total number of process in the run state, not counting collectl itself</td></tr> +<tr><td><b>BlkT</b></td><td>Total number of process blocked, waiting on I/O</td></tr> +</table> + +<p><h3>Disks, <i>collectl -sd</i></h3> +<p> +If you specify filtering with <i>--dskfilt</i>, the disks that match the pattern(s) +will either be included or excluded from the the summary data. However, the data will +<i>still</i> be collected so if recorded to a file can later be viewed. + +<div class=terminal-wide14> +<pre> +# DISK SUMMARY (/sec) +#KBRead RMerged Reads SizeKB KBWrit WMerged Writes SizeKB +</div></pre> + +<table> +<tr><td><b>KBRead</b></td><td>KB read/sec</td></tr> +<tr valign=top><td><b>RMerged</b></td><td>Read requests merged per second when being dequeued.</td></tr> +<tr><td><b>Reads</b></td><td>Number of reads/sec</td></tr> +<tr><td><b>SizeKB</b></td><td>Average read size in KB</td></tr> +<tr><td><b>KBWrite<b></td><td>KB written/sec</td></tr> +<tr valign=top><td><b>WMerged</b></td> +<td>Write requests merged per second when being dequeued.</tr></tr> +<tr><td><b>Writes</b></td><td>Number of writes/sec</td></tr> +<tr><td><b>SizeKB</b></td><td>Average write size in KB</td></tr> +</table> + +<p><h3>Inodes/Filesystem, <i>collectl -si</i></h3> + +<div class=terminal-wide14> +<pre> +# INODE SUMMARY +# Dentries File Handles Inodes +# Number Unused Alloc MaxPct Number + 40585 39442 576 0.17 38348 +</pre></div> + +<table> +<tr><td colspan=2>DCache</td></tr> +<tr><td><b>Dentries Number</b></td><td>Number of entries in directory cache</td</tr> +<tr><td><b>Dentried Unused</b></td><td>Number of unused entries in directory cache</td</tr> +<tr><td><b>Handles Alloc</b></td><td>Number of allocated file handles</td></tr> +<tr><td><b>handles % Max</b></td><td>Percentage of maximum available file handles</td></tr> +<tr><td><b>Inodes Number</b></td><td>Number of inodes in use</td></tr> +</table> + +<p><i>NOTE - as of this writing I'm baffled by the dentry unused field. No matter how +many files and/or directories I create, this number goes up! Sholdn't it go down?</i> + +<p><h3>Infiniband, <i>collectl -sx</i></h3> + +<div class=terminal-wide14> +<pre> +# INFINIBAND SUMMARY (/sec) +# KBIn PktIn SizeIn KBOut PktOut SizeOut Errors +</pre></div> +<table> +<tr><td><b>KBIn</b></td><td>KB received/sec.</td></tr> +<tr><td><b>PktIn</b></td><td>Packets received/sec.</td></tr> +<tr><td><b>SizeIn</b></td><td>Average incoming packet size in KB</td></tr> +<tr><td><b>KBOut</b></td><td>KB transmitted/sec.</td></tr> +<tr><td><b>PktOut</b></td><td>Packets transmitted/sec.</td></tr> +<tr><td><b>SizeOut</b></td><td>Average outgoing packet size in KB</td></tr> +<tr valign=top><td><b>Errs</b></td><td>Count of current errors. Since these +are typically infrequent, it is felt that reporting them as a rate would result +in either not seeing them OR round-off hiding their values.</td></tr> +</table> + +<p><h3>Lustre</h3> + +<p><b>Lustre Client</b>, <i>collectl -sl</i> +<p>There are several formats here controlled by the --lustopts switch. There is +also detail data for these available as well. Specifying -sL results in +data broken out by the file system and --lustopts O further breaks it out by OST. +Also note the average read/write sizes are only reported when --lustopts is <i>not</i> specified. + +<div class=terminal-wide14> +<pre> +# LUSTRE CLIENT SUMMARY +# KBRead Reads SizeKB KBWrite Writes SizeKB +</pre></div> + +<table> +<tr><td><b>KBRead</b></td><td>KB/sec delivered to the client.</td></tr> +<tr><td valign=top><b>Reads</b></td><td>Reads/sec delivered to the client, +not necessarily from the lustre storage servers.</td></tr> +<tr><td><b>SizeKB</b></td><td>Average read size in KB</td></tr> +<tr><td><b>KBWrite</b></td><td>KB Writes/sec delievered to the storage servers.</td></tr> +<tr><td><b>Writes</b></td><td>Writes/sec delievered to the storage servers.</td></tr> +<tr><td><b>SizeKB</b></td><td>Average write size in KB</td></tr> +</table> + +<div class=terminal-wide14> +<pre> +# LUSTRE CLIENT SUMMARY: METADATA +# KBRead Reads KBWrite Writes Open Close GAttr SAttr Seek Fsynk DrtHit DrtMis +</pre></div> + +<table> +<tr><td><b>KBRead</b></td><td>KB/sec delivered to the client.</td></tr> +<tr><td valign=top><b>Reads</b></td><td>Reads/sec delivered to the client, +not necessarily from the lustre storage servers.</td></tr> +<tr><td><b>KBWrite</b></td><td>KB Writes/sec delievered to the storage servers.</td></tr> +<tr><td><b>Writes</b></td><td>Writes/sec delievered to the storage servers.</td></tr> +<tr><td><b>Open</b></td><td>File opens/sec</td></tr> +<tr><td><b>Close</b></td><td>File closes/sec</td></tr> +<tr><td><b>GAttr</b></td><td>getattrs/sec</td></tr> +<tr><td><b>Seek</b></td><td>seeks/sec</td></tr> +<tr><td><b>Fsync</b></td><td>fsyncs/sec</td></tr> +<tr><td><b>DrtHit</b></td><td>dirty hits/sec</td></tr> +<tr><td><b>DrtMis</b></td><td>dirty misses/sec</td></tr> +</table> + +<div class=terminal-wide14> +<pre> +# LUSTRE CLIENT SUMMARY: READAHEAD +# KBRead Reads KBWrite Writes Pend Hits Misses NotCon MisWin FalGrb LckFal Discrd ZFile ZerWin RA2Eof HitMax Wrong +</pre></div> + +<table> +<tr><td><b>KBRead</b></td><td>KB/sec delivered to the client.</td></tr> +<tr><td valign=top><b>Reads</b></td><td>Reads/sec delivered to the client, +not necessarily from the lustre storage servers.</td></tr> +<tr><td><b>KBWrite</b></td><td>KB Writes/sec delievered to the storage servers.</td></tr> +<tr><td><b>Writes</b></td><td>Writes/sec delievered to the storage servers.</td></tr> +<tr><td><b>Pend</b></td><td>Pending issued pages</td></tr> +<tr><td><b>Hits</b></td><td>prefetch cache hits</td></tr> +<tr><td><b>Misses</b></td><td>prefetch cache misses</td></tr> +<tr><td><b>NotCon</b></td><td>The current pages read that were not consecutive with the previous ones./td></tr> +<tr><td><b>MisWin</b></td><td>Miss inside window. The pages that were expected to be in the + prefetch cache but weren't. They were probably + reclaimed due to memory pressure</td></tr> +<tr><td><b>LckFal</b></td><td>Failed grab_cache_pages. Tried to prefetch page but it was locked.</td></tr> +<tr><td><b>Discrd</b></td><td>Read but discarded. Prefetched pages (but not read by applicatin) + have been discarded either becuase of memory pressure or lock + revocation.</td></tr> +<tr><td><b>ZFile</b></td><td>Zero length file.</td></tr> +<tr><td><b>ZerWin</b></td><td>Zero size window.</td></tr> +<tr><td><b>RA2Eof</b></td><td>Read ahead to end of file</td></tr> +<tr><td><b>HitMax</b></td><td>Hit maximum readahead issue. The read-ahead window has grown to the + maximum specified by <i>max_read_ahead_mb</i></td></tr> +</table> + +<div class=terminal-wide14> +<pre> +# LUSTRE CLIENT SUMMARY: RPC-BUFFERS (pages) +#RdK Rds 1K 2K ... WrtK Wrts 1K 2K ... +</pre></div> + +This display shows the size of rpc buffer distribution buckets in K-pages. You can find the +page size for you system in the header (<i>collectl --showheader</i>). +<p> +<table> +<tr><td valign=top><b>RdK</b></td><td>KBs read/sec</td></tr> +<tr><td valign=top><b>Rds</b></td><td>Reads/sec</td></tr> +<tr><td valign=top><b>nK</b></td><td>Number of pages of of this size read</td></tr> +<tr><td valign=top><b>WrtK</b></td><td>KBs written/sec</td></tr> +<tr><td valign=top><b>Wrts</b></td><td>Writes/sec</td></tr> +<tr><td valign=top><b>nK</b></td><td>Number of pages of of this size written</td></tr> +</table> + +<p><b>Lustre Meta-Data Server</b>, <i>collectl -sl</i> +<p>As of Lustre 1.6.5, the data reported for the MDS had changed, breaking out the <i>Reint</i> +data into 5 individual buckets which are the last 5 fields described below. For earlier +versions those 5 fields will be replaced by a single one named <i>Reint</i>. + +<div class=terminal-wide14> +<pre> +# LUSTRE MDS SUMMARY +#Getattr GttrLck StatFS Sync Gxattr Sxattr Connect Disconn Create Link Setattr Rename Unlink +</pre></div> + +<table> +<tr><td valign=top><b>Getattr</b></td><td>Number of getattr calls, for example <i>lfs osts</i>. +Note that this counter is <i>not</i> incremented as the result of <i>ls</i> - see <i>Gxattr</i></td></tr> +<tr><td><b>GttrLck</b></td><td>These are getattrs that also return a lock on the file</td></tr> +<tr><td valign=top><b>StatFS</b></td><td>Number of stat calls, for example <i>df</i> or <i>lfs df</i>. +Note that lustre caches data for up to a second so many calls within a second may only show +up as a single <i>statfs</i></td></tr> +<tr><td><b>Sync</b></td><td>Number of sync calls</td></tr> +<tr><td valign=top><b>Gxattr</b></td><td>Extended attribute get operations, for example <i>getfattr</i>, +<i>getfacl</i> or even <i>ls.</i> Note that the MDS must have been mounted with <i>-o acl</i> +for this counter to be enabled.</td></tr> +<tr><td><b>Sxattr</b></td><td>Extended attribute set operations, for example <i>setfattr</i> or <i>setfacl</i></td></tr> +<tr><td><b>Connect</b></td><td>Client mount operations</td></tr> +<tr><td><b>Disconn</b></td><td>Client umount operations</td></tr> +<tr><td><b>Create</b></td><td>Count of mknod and mkdir operations, also used by NFS servers internally when creating files</td></tr> +<tr><td><b>Link</b></td><td>Hard and symbolic links, for example <i>ln</i></td></tr> +<tr><td><b>Setattr</b></td><td>All operations that modify inode attributes including chmod, chown, touch, etc</td></tr> +<tr><td><b>Rename</b></td><td>File and directory renames, for example <i>mv</i></td></tr> +<tr><td><b>Unlink</b></td><td>File/directory removals, for example <i>rm</i> or <i>rmdir</i></td></tr> +</table> +<p> +The following display is very similar the the RPC buffers in that the sizes of different size +I/O requests are reported. In this case there are requests sent to the disk driver. +Note that this report is only available for HP's SFS. + +<div class=terminal-wide14> +<pre> +# LUSTRE DISK BLOCK LEVEL SUMMARY +#Rds RdK 0.5K 1K ... Wrts WrtK 0.5K 1K ... +</pre></div> + +<table> +<tr><td valign=top><b>Rds</b></td><td>Reads/sec</td></tr> +<tr><td valign=top><b>RdK</b></td><td>KBs read/sec</td></tr> +<tr><td valign=top><b>nK</b></td><td>Number of blocks of of this size read</td></tr> +<tr><td valign=top><b>Wrts</b></td><td>Writes/sec</td></tr> +<tr><td valign=top><b>WrtK</b></td><td>KBs written/sec</td></tr> +<tr><td valign=top><b>nK</b></td><td>Number of blocks of of this size written</td></tr> +</table> + +<p><b>Lustre Object Storage Server</b>, <i>collectl -sl</i> +<div class=terminal-wide14> +<pre> +# LUSTRE OST SUMMARY +# KBRead Reads SizeKB KBWrite Writes SizeKB +</pre></div> + +<table> +<tr><td><b>KBRead</b></td><td>KB/sec read</td></tr> +<tr><td><b>Reads</b></td><td>Reads/sec</td></tr> +<tr><td><b>SizeKB</b></td><td>Average read size in KB</td></tr> +<tr><td><b>KBWrite</b></td><td>KB/sec written</td></tr> +<tr><td><b>Writes</b></td><td>Writes/sec</li> +<tr><td><b>SizeKB</b></td><td>Average write size in KB</td></tr> +</table> + +<p><b>Lustre Object Storage Server</b>, <i>collectl -sl --lustopts B</i> +<p> +As with client data, when you only get read/write average sizes when +--lustopt is <i>not</i> specified. +<div class=terminal-wide14> +<pre> +# LUSTRE OST SUMMARY +#<--------reads-----------|----writes----------------- +#RdK Rds 1K 2K ... WrtK Wrts 1K 2K .... +</pre></div> + +<table> +<tr><td valign=top><b>RdK</b></td><td>KBs read/sec</td></tr> +<tr><td valign=top><b>Rds</b></td><td>Reads/sec</td></tr> +<tr><td valign=top><b>nK</b></td><td>Number of pages of of this size read</td></tr> +<tr><td valign=top><b>WrtK</b></td><td>KBs written/sec</td></tr> +<tr><td valign=top><b>Wrts</b></td><td>Writes/sec</td></tr> +<tr><td valign=top><b>nK</b></td><td>Number of pages of of this size written</td></tr> +</table> + +<p><b>Lustre Object Storage Server</b>, <i>collectl -sl --lustopts D</i> + +<div class=terminal-wide14> +<pre> +# LUSTRE DISK BLOCK LEVEL SUMMARY +#RdK Rds 0.5K 1K ... WrtK Wrts 0.5K 1K ... +</pre></div> + +<table> +<tr><td valign=top><b>RdK</b></td><td>KBs read/sec</td></tr> +<tr><td valign=top><b>Rds</b></td><td>Reads/sec</td></tr> +<tr><td valign=top><b>nK</b></td><td>Number of blocks of of this size read</td></tr> +<tr><td valign=top><b>WrtK</b></td><td>KBs written/sec</td></tr> +<tr><td valign=top><b>Wrts</b></td><td>Writes/sec</td></tr> +<tr><td valign=top><b>nK</b></td><td>Number of blocks of of this size written</td></tr> +</table> + +<p><h3>Memory, <i>collectl -sm</i></h3> + +<div class=terminal-wide14> +<pre> +# MEMORY SUMMARY +#<-------------------------------Physical Memory-------------------------------------><-----------Swap------------><-------Paging------> +# Total Used Free Buff Cached Slab Mapped Anon Commit Locked Inact Total Used Free In Out Fault MajFt In Out +</pre></div> + +<table> +<tr><td><b>Total</b></td> +<td>Total physical memory</td></tr> + +<tr valign=top><td><b>Used</b></td> +<td>Used physical memory. This does not include memory used by the kernel itself.</td></tr> + +<tr valign=top><td><b>Free</b></td> +<td>Unallocated memory</td></tr> + +<tr valign=top><td><b>Buff</b></td> +<td>Memory used for system buffers</td></tr> + +<tr valign=top><td><b>Cached</b></td> +<td>Memory used for caching data beween the kernel and disk, noting direct I/O does not use the cache</td></tr> + +<tr valign=top><td><b>Slab</b></td> +<td>Memory used for slabs, see <i>collectl -sY</i></td></tr> + +<tr valign=top><td><b>Mapped</b></td> +<td>Memory mapped by processes</td></tr> + +<tr valign=top><td><b>Anon</b></td> +<td>Anonymous memory. <i>NOTE - this </i>is included<i> with mapped memory in brief format</i></td></tr> + +<tr valign=top><td><b>Commit</b></td> +<td>According to RedHat: <i>"An estimate of how much RAM you would need to make a 99.99% guarantee +that there never is OOM (out of memory) for this workload."</i></td></tr> + +<tr valign=top><td><b>Locked</b></td> +<td>Locked Memory</td> + +<tr valign=top><td><b>Inactive</b></td> +<td>Inactive pages. On ealier kernels this number is the sum of the clean, dirty +and laundry pages.</td></tr> + +<tr><td><b>Swap Total</b></td> +<td>Total Swap</td></tr> + +<tr><td><b>Swap Used</b></td> +<td>Used Swap</td></tr> + +<tr><td><b>Swap Free</b></td> +<td>Free Swap</td></tr> + +<tr><td><b>Swap In</b></td> +<td>Kb swapped in/sec</td></tr> + +<tr><td><b>Swap Out</b></td> +<td>Kb swapped out/sec</td></tr> + +<tr><td><b>Fault</b></td> +<td>Page faults/sec resolved by not going to disk</td></tr> + +<tr><td><b>MajFt</b></td> +<td>These page faults are resolved by going to disk</td></tr> + +<tr><td><b>Paging In</b></td> +<td>Total number of pages read by block devices</td></tr> + +<tr><td><b>Paging Out</b></td> +<td>Total number of pages written by block devices</td></tr> +</table> +<p> +<center><b><i>Notes</i></b></center> +If you include <i>--memopts R</i>, memory and swap values wil be displayed as changes/sec between intervals +rather than absolute values in addition to page fault information, which is already displayed as rates. +This switch will also honor -on in that the values will not be normalized to a rate but rather displayed +as changes in size per interval. +<p> +If you include <i>--memopts</i> with P or V, collectl will only display Physical or Virtual memory. +The default is PV and will display both. + +<p><h3>Memory, <i>collectl -sm --memopts ps</i></h3> +<p> +The p and s options allow you to display data about page and/or steal and scan information. If you want this data combined with +the standard physical or virtual data you must explicitly request them as well. The columns show how the memory is allocated +for the respective sections. + +<div class=terminal-wide14> +<pre> +# MEMORY SUMMARY +#<---Other---|-------Page Alloc------|------Page Refill-----><------Page Steal-------|-------Scan KSwap------|------Scan Direct-----> +# Free Activ Dma Dma32 Norm Move Dma Dma32 Norm Move Dma Dma32 Norm Move Dma Dma32 Norm Move Dma Dma32 Norm Move + 14M 136K 2 69 13M 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +</pre></div> + +<p><h3>Network, <i>collectl -sn</i></h3> +<p> + +The entries for error counts in the following are actually the total of several +types of errors. To get individual error counts, you must either include <i>--netopts e</i> +or report details on individual interfaces in plot format by specifying -P. +Transmission errors are categorized by errors, dropped, fifo, collisions and carrier. +Receive errors are broken out for errors, dropped, fifo and framing errors. +<p> +If you specify filtering with <i>--netfilt</i>, the names that match the pattern(s) +will either be included or excluded from the the summary data. However, the data will +<i>still</i> be collected so if recorded to a file can later be viewed. + +<div class=terminal-wide14> +<pre> +# NETWORK SUMMARY (/sec) +# KBIn PktIn SizeIn MultI CmpI ErrsI KBOut PktOut SizeO CmpO ErrsO +</pre></div> + +<table> +<tr><td><b>KBIn</b></td> +<td>Incoming KB/sec</td></tr> + +<tr><td><b>PktIn</b></td> +<td>Incoming packets/sec</td></tr> + +<tr><td><b>SizeI</b></td> +<td>Average incoming packet size in bytes</td></tr> + +<tr><td><b>MultI</b></td> +<td>Incoming multicast packets/sec</td></tr> + +<tr><td><b>CmpI</b></td> +<td>Incoming compressed packets/sec</td></tr> + +<tr valign=top><td><b>ErrsI</b></td> +<td>Total incoming errors/sec. This is an aggregation of incoming error counters. To see explicit +error counters use <i>--netopts e</i></td></tr> + +<tr><td><b>KBOut</b></td> +<td>Outgoing KB/sec</td></tr> + +<tr><td><b>PktOut</b></td> +<td>Outgoing packets/sec</td></tr> + +<tr><td><b>SizeO</b></td> +<td>Average outgoing packet size in bytes</td></tr> + +<tr><td><b>CmpO</b></td> +<td>Outgoing compressed packets/sec</td></tr> + +<tr valign=top><td><b>ErrsO</b></td> +<td>Total outgoing errors/sec. This is an aggregation of outgoing error counters. To see explicit +error counters use <i>--netopts e</i> </td></tr> +</table> + +<p><h3>Network, <i>collectl -sn --netopts e</i></h3> +<p> +This alternative format, which is displayed when you specify <i>--netopts e</i> enumerates +the individual error types. You cannot see both output formats at the same time. + +<div class=terminal-wide14> +<pre> +# NETWORK ERRORS SUMMARY (/sec) +# ErrIn DropIn FifoIn FrameIn ErrOut DropOut FifoOut CollOut CarrOut +</pre></div> + +<table> +<tr><td><b>ErrIn</b></td> +<td>Receive errors/sec detected by the device driver</td></tr> +<tr><td><b>DropIn</b></td> +<td>Receive packets dropped/sec</td></tr> +<tr><td><b>FifoIn</b></td> +<td>Receive packet FIFO buffer errors/sec</td></tr> +<tr><td><b>FrameIn</b></td> +<td>Receive packet framing errors/sec</td></tr> +<tr><td><b>ErrOut</b></td> +<td>Transmit errors/sec detected by the device driver</td></tr> +<tr><td><b>DropOut</b></td> +<td>Transmit packets dropped/sec</td></tr> +<tr><td><b>FifoOut</b></td> +<td>Transmit packet FIFO buffer errors/sec</td></tr> +<tr><td><b>CollOut</b></td> +<td>Transmit collisions/sec detected on the interface</td></tr> +<tr><td><b>CarrOut</b></td> +<td>Transmit packet carrier loss errors detected/sec</td></tr> +</table> + +<p><h3>NFS, <i>collectl -sf</i></h3> +<p> +As of version 3.2.1, by default collectl collects and reports on all versions of nfs data, +both clients and servers. One can limit the types of data reported with <i>--nfsfilt</i> +and if only server or client data has been selected, only that type of data will be +reported as shown in the 2 forms below. When both server and client data are being +reported they will be displayed side by side. As with brief format, if filters have been +selected they will be displayed in the header. + +<div class=terminal-wide14> +<pre> +# NFS SUMMARY (/sec) +#<---------------------------server---------------------------> +# Reads Writes Meta Comm UDP TCP TCPConn BadAuth BadClnt +</pre></div> + +<table> +<tr><td><b>Reads</b></td><td>Total reads/sec</td></tr> +<tr><td><b>Writes</b></td><td>Total writes/sec</td></tr> +<tr><td><b>Meta</b></td><td>Total nfs meta data calls/sec, where meta data is +considered to be any of: lookup, access, getattr, setattr, readdir and readdirplus, +noting that not all types of nfs version report all as V3 clients/servers do.</td></tr> +<tr><td><b>Comm</b></td><td>Total commits/sec</td></tr> +<tr><td><b>UDP</b></td><td>Number of UDP packets/sec</td></tr> +<tr><td><b>TCP</b></td><td>Number of TCP packets/sec</td></tr> +<tr><td><b>TCPConn</b></td><td>Number of TCP connections/sec</td></tr> +<tr><td><b>BadAuth</b></td><td>Number of authentication failures/sec</td></tr> +<tr><td><b>BadClnt</b></td><td>Number of unknown clients/sec</td></tr> +</table> + +<div class=terminal-wide14> +<pre> +# NFS SUMMARY (/sec) +#<----------------client----------------> +# Reads Writes Meta Comm Retrans Authref +</pre></div> + +<table> +<tr><td><b>Reads</b></td><td>Total reads/sec</td></tr> +<tr><td><b>Writes</b></td><td>Total writes/sec</td></tr> +<tr><td><b>Meta</b></td><td>Total nfs meta data calls/sec, where meta data is +considered to be any of: lookup, access, getattr, setattr, readdir and readdirplus, +noting that not all types of nfs version report all as V3 clients/servers do.</td></tr> +<tr><td><b>Comm</b></td><td>Total commits/sec</td></tr> +<tr><td><b>Retrans</b></td><td>Number of retransmissions/sec</td></tr> +<tr><td><b>Authref</b></td><td>Number of authrefreshes/sec</td></tr> +</table + +<p><h3>NFS, <i>collectl -sf -nfsopts C</i></h3> +<p>The data reported for clients is slightly different, specifically the +retrans and authref fields. + +<div class=terminal-wide14> +<pre> +# NFS CLIENT (/sec) +#<----------RPC---------><---NFS V3---> +#CALLS RETRANS AUTHREF READ WRITE +</pre></div> + +<table> +<tr><td><b>Calls</b></td><td>Number of RPC calls/sec</td></tr> +<tr><td><b>Retrans</b></td><td>Retransmitted calls</td></tr> +<tr><td><b>Authref</b></td><td>Authentication failed</td></tr> +<tr><td><b>Read</b></td><td>Number of reads/sec</td></tr> +<tr><td><b>Write</b></td><td>Number of writes/sec</td></tr> +</table> + +<p><h3>Slabs, <i>collectl -sy</i></h3> +<p>As of the 2.6.22 kernel, there is a new slab allocator, called SLUB, and since +there is not a 1:1 mapping between what it reports and the older slab allocator, +the format of this listing will depend on which allocator is being used. The following +format is for the older allocator. + +<div class=terminal-wide14> +<pre> +# SLAB SUMMARY +#<------------Objects------------><--------Slab Allocation-------><--Caches---> +# InUse Bytes Alloc Bytes InUse Bytes Total Bytes InUse Total +</pre></div> + +<table> +<tr><td colspan=2>Objects</td></tr> +<tr valign=top><td><b>InUse</b></td> +<td>Total number of objects that are currently in use.</td></tr> +<tr valign=top><td><b>Bytes</b></td> +<td>Total size of all the objects in use.</td></tr> +<tr valign=top><td><b>Alloc</b></td> +<td>Total number of objects that have been allocated but not necessarily in use.</td></tr> +<tr valign=top><td><b>Bytes</b></td> +<td>Total size of all the allocated objects whether in use or not.</td></tr> + +<tr><td colspan=2>Slab Allocation</td></tr> +<tr valign=top><td><b>InUse</b></td> +<td>Number of slabs that have at least one active object in them.</td></tr> +<tr valign=top><td><b>Bytes</b></td> +<td>Total size of all the slabs.</td></tr> +<tr valign=top><td><b>Total</b></td> +<td>Total number of slabs that have been allocated whether in use or not.</td></tr> +<tr valign=top><td><b> Bytes</b></td> +<td>Total size of all the slabs that have been allocted whether in use or not.</td></tr> + +<tr><td colspan=2>Caches</td></tr> +<tr valign=top><td><b>InUse</b></td> +<td>Not all caches are actully in use. This included only those with non-zero +counts.</td></tr> +<tr valign=top><td><b>Total</b></td> +<td>This is the count of all caches, whether currently in use or not.</td></tr> +</table> + +<p>This is format for the new <i>slub</i> allocator + +<div class=terminal-wide14> +<pre> +# SLAB SUMMARY +#<---Objects---><-Slabs-><-----memory-----> +# In Use Avail Number Used Total +</pre></div> + +One should note that this report summarizes those slabs being monitored. In general +this represents all slabs, but if filering is being used these numbers will only +apply to those slabs that have matched the filter. +<p> +<table> +<tr><td colspan=2>Objects</td></tr> +<tr><td><b>InUse</b></td> +<td>The total number of objects that have been allocated to processes.</td></tr> +<tr valign=top><td><b>Avail</b></td> +<td>The total number of objects that are available in the currently allocated slabs. +This includes those that have already been allocated toprocesses.</td></tr> + +<tr><td colspan=2>Slabs</td></tr> +<tr valign=top><td><b>Number</b></td> +<td>This is the number of individual slabs that have been allocated and +taking physical memory.</td></tr> + +<tr><td colspan=2>Memory</td></tr> +<tr valign=top><td><b>Used</b></td> +<td>Used memory corresponds to those objects that have been allocated to +processes.</td></tr> + +<tr valign=top><td><b>Total</b></td> +<td>Total physical memory allocated to processes. When there is no filtering +in effect, this number will be equal to the Slabs field reported by -sm.</td></tr> +</table> + +<p><h3>Sockets, <i>collectl -ss</i></h3> + +<div class=terminal-wide14> +<pre> +# SOCKET STATISTICS +# <-------------Tcp-------------> Udp Raw <---Frag--> +#Used Inuse Orphan Tw Alloc Mem Inuse Inuse Inuse Mem +</pre></div> + +<table> +<tr><td valign=top><b>Used</b></td><td>Total number if socket allocated which can include additional types such as domain.</td></tr> +<tr><td colspan=2>Tcp</td></tr> +<tr><td><b>Inuse</td><td>Number of TCP connections in use</td></tr> +<tr><td><b>Orphan</td><td>Number of TCP orphaned connections</td></tr> +<tr><td><b>Tw</td><td>Number of connections in <i>TIME_WAIT</i></td></tr> +<tr><td><b>Alloc</td><td>TCP sockets allocated</td></tr> +<tr><td><b>Mem</td><td></td></tr> +<tr><td colspan=2>Udp</td></tr> +<tr><td><b>Inuse</td><td>Number of UCP connections in use</td></tr> +<tr><td colspan=2>Raw</td></tr> +<tr><td><b>Inuse</td><td>Number of RAW connections in use</td></tr> +<tr><td colspan=2>Frag</td></tr> +<tr><td><b>Inuse</td><td></td></tr> +<tr><td><b>Mem</td><td></td></tr> +</table> + +<p><h3>TCP, <i>collectl -st</i></h3> + +These are the counters one sees when running the command <i>netstat -s</i>, whose output is very verbose. Since this +format is an attemt to compress those field names to 6 characters or less, sometime something gets lost in the translation. + +As described in the <a href=Data-brief.html>brief</a> data formats, the actual TCP data displayed is based on the value of +<i>--tcpfilt</i> and like brief data, everything is displayed on a single line which can be quite wide, even more reason to +use this switch, espcially since the default format is over 200 columns wide! The following definitions are based the +value of that filter: + +<p><b>--tcpfilt i</b> +<div class=terminal-wide14> +<pre> +# TCP SUMMARY (/sec)# TCP STACK SUMMARY (/sec) +#<----------------------------------IpPkts-----------------------------------> +# Receiv Delivr Forwrd DiscdI InvAdd Sent DiscrO ReasRq ReasOK FragOK FragCr +</pre></div> + +<table> +<tr><td><b>Receiv</b><td></td><td>- total packets received/sec</td></tr> +<tr><td><b>Delivr</b><td></td><td>- incoming packets delivered/sec</td></tr> +<tr><td><b>Forwrd</b><td></td><td>- packets forwarded</td></tr> +<tr><td><b>DiscdI</b><td></td><td>- discarded incoming packets</td></tr> +<tr><td><b>InvAdd</b><td></td><td>- packets received with invalid addresses</td></tr> +<tr><td><b>Sent</b><td></td><td> - requests sent out/sec</td></tr> +<tr><td><b>DiscrO</b><td></td><td>- discarded outbound requests</td></tr> +<tr><td><b>ReasRq</b><td></td><td>- reassembled requests</td></tr> +<tr><td><b>ReasOK</b><td></td><td>- reassembled OK</td></tr> +<tr><td><b>FragOK</b><td></td><td>- fragments received OK</td></tr> +<tr><td><b>FragCr</b><td></td><td>- fragments created</td></tr> +</table> + +<p><b>--tcpfilt t</b> +<div class=terminal-wide14> +<pre> +# TCP SUMMARY (/sec)# TCP STACK SUMMARY (/sec) +#<---------------------------------Tcp---------------------------------> +# ActOpn PasOpn Failed ResetR Estab SegIn SegOut SegRtn SegBad SegRes +</pre></div> + +<table> +<tr><td><b>ActOpn</b></td><td>- active connections opened/sec</td></tr> +<tr><td><b>PasOpn</b></td><td>- passive connection opened/sec</td></tr> +<tr><td><b>Failed</b></td><td>- failed connection attempts</td></tr> +<tr><td><b>ResetR</b></td><td>- connection resets received</td></tr> +<tr><td><b>Estab</b></td><td> - connections established</td></tr> +<tr><td><b>SegIn</b></td><td> - segments received/sec</td></tr> +<tr><td><b>SegOut</b></td><td>- segments sent out/sec</td></tr> +<tr><td><b>SegRtn</b></td><td>- segments retransmitted</td></tr> +<tr><td><b>SegBad</b></td><td>- bad segments received</td></tr> +<tr><td><b>SegRes</b></td><td>- resets sent</td></tr> +</table> + +<p><b>--tcpfilt u</b> +<div class=terminal-wide14> +<pre> +# TCP SUMMARY (/sec)# TCP STACK SUMMARY (/sec) +#<------------Udp-----------> +# InDgm OutDgm NoPort Errors +</pre></div> + +<table> +<tr><td><b>InDgm</b></td><td>- packets received/sec</tr> +<tr><td><b>OutDgm</b></td><td>- packets sent/sec</tr> +<tr><td><b>NoPort</b></td><td>- packets received to unknown port</tr> +<tr><td><b>Errors</b></td><td>- packet receive errors</tr> +</table> + +<p><b>--tcpfilt c</b> +<div class=terminal-wide14> +<pre> +# TCP SUMMARY (/sec)# TCP STACK SUMMARY (/sec) +#<----------------------------Icmp---------------------------> +# Recvd FailI UnreI EchoI ReplI Trans FailO UnreO EchoO ReplO +</pre></div> + +<table> +<tr><td><b>Recvd</b></td><td>- ICMP messages received</td></tr> +<tr><td><b>FailI</b></td><td>- incoming ICMP messages failed</td></tr> +<tr><td><b>UnreI</b></td><td>- input destination unreachable</td></tr> +<tr><td><b>EchoI</b></td><td>- input echo requests</td></tr> +<tr><td><b>ReplI</b></td><td>- input echo reploes</td></tr> +<tr><td><b>Trans</b></td><td>- ICMP messages sent</td></tr> +<tr><td><b>FailO</b></td><td>- outbound ICMP messages failed</td></tr> +<tr><td><b>UnreO</b></td-><td>- output destination unreachable</td></tr> +<tr><td><b>EchoO</b></td><td> - output echo requests</td></tr> +<tr><td><b>ReplO</b></td><td> - output echo replies</td></tr> +</table> + +<p><b>--tcpfilt T</b> +<div class=terminal-wide14> +<pre> +# TCP SUMMARY (/sec)# TCP STACK SUMMARY (/sec) +#<-------------------------------------------------TcpExt------------------------------------------------> +# FasTim Reject DelAck QikAck PktQue PreQuB HdPdct PurAck HPAcks DsAcks RUData REClos SackS PkLoss FTrans +</pre></div> + +<table> +<tr><td><b>FasTim</b></td><td>- TCP sockets finished time wait in fast timer</td></tr> +<tr><td><b>Reject</b></td><td>- packet rejects in established connections because of timestamp</td></tr> +<tr><td><b>DelAck</b></td><td>- delayed ACKs sent</td></tr> +<tr><td><b>QikAck</b></td><td>- times quick ACK mode activated </td></tr> +<tr><td><b>PktQue</b></td><td>- packets directly queued to recvmsg prequeue</td></tr> +<tr><td><b>PreQuB</b></td><td>- bytes directly received in process context from prequeue</td></tr> +<tr><td><b>HdPdct</b></td><td>- packet headers predicted</td></tr> +<tr><td><b>PurAck</b></td><td>- acknowledgements for received packets not containing data, PureAcks in older versions </td></tr> +<tr><td><b>PurAck</b></td><td>- predicted acknowledgements</td></tr> +<tr><td><b>HPAcks</b></td><td>- DSACKS sent for old packets</td></tr> +<tr><td><b>RUData</b></td><td>- connections reset to do unexpected data</td></tr> +<tr><td><b>REClos</b></td><td>- connections reset due to early close</td></tr> +<tr><td><b>SackS</b></td><td>- SackShiftFallback</td></tr> +<tr><td><b>PkLoss</b></td><td>- Packet Loss</td></tr> +<tr><td><b>FTrans</b></td><td>- Fast Retransmission</td></tr> +</table> + +<table width=100%><tr><td align=right><i>updated Nov 02, 2016</i></td></tr></colgroup></table> + +<script type="text/javascript"> +var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); +document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); +</script> +<script type="text/javascript"> +try { +var pageTracker = _gat._getTracker("UA-6408045-1"); +pageTracker._trackPageview(); +} catch(err) {}</script> + +</body> +</html> diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/collectl-4.1.0/docs/Data-verbose.html new/collectl-4.1.2/docs/Data-verbose.html --- old/collectl-4.1.0/docs/Data-verbose.html 2016-10-07 14:35:40.000000000 +0200 +++ new/collectl-4.1.2/docs/Data-verbose.html 2017-02-27 21:46:02.000000000 +0100 @@ -806,15 +806,17 @@ <tr><td><b>PktQue</b></td><td>- packets directly queued to recvmsg prequeue</td></tr> <tr><td><b>PreQuB</b></td><td>- bytes directly received in process context from prequeue</td></tr> <tr><td><b>HdPdct</b></td><td>- packet headers predicted</td></tr> -<tr><td><b>AkNoPy</b></td><td>- acknowledgements for received packets not containing data </td></tr> -<tr><td><b>PreAck</b></td><td>- predicted acknowledgements</td></tr> +<tr><td><b>PurAck</b></td><td>- acknowledgements for received packets not containing data</td></tr> +<tr><td><b>HPAcks</b></td><td>- predicted acknowledgements</td></tr> <tr><td><b>DsAcks</b></td><td>- DSACKS sent for old packets</td></tr> <tr><td><b>RUData</b></td><td>- connections reset to do unexpected data</td></tr> <tr><td><b>REClos</b></td><td>- connections reset due to early close</td></tr> <tr><td><b>SackS</b></td><td>- SackShiftFallback</td></tr> +<tr><td><b>PkLoss</b></td><td>- Packet Loss</td></tr> +<tr><td><b>FTrans</b></td><td>- Fast Retransmission</td></tr> </table> -<table width=100%><tr><td align=right><i>updated July 23, 2014</i></td></tr></colgroup></table> +<table width=100%><tr><td align=right><i>updated Nov 02, 2016</i></td></tr></colgroup></table> </body> </html> diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/collectl-4.1.0/docs/Disks.html new/collectl-4.1.2/docs/Disks.html --- old/collectl-4.1.0/docs/Disks.html 2016-10-07 14:35:40.000000000 +0200 +++ new/collectl-4.1.2/docs/Disks.html 2017-02-27 21:46:02.000000000 +0100 @@ -36,7 +36,7 @@ individual disks are operating properly since high wait and/or service times are a bad thing and indicate something is causing an undesired delay somewhere. -<p><b>Filtering</b> +<p><b>Basic Filtering</b> <br>If you'd like to limit the disks included in either the detail output or the summary totals, you can explicity include or exclude them using <i>--dskfilt</i>. The target of this switch is actually one or more perl expressions, but if you don't know perl all you @@ -94,6 +94,64 @@ </pre> </div> +<p><b>Raw Filtering</b> +<br> +As mentioned in the previous section, basic disk fitering is applied after the data is collected, +so if you don't collect it in the first place there's nothing to filter. So what about special +situations where you maye have a disk collectl doesn't know about in it's default filtering +string which is specified in /etc/collectl.conf as DiskFilter (and commented out since that is +the default)? OR what if you'd like to see partition level data which is also filtered out? +<p> +If you want to override this filtering you actually have 2 choices available to you. Either edit +the collectl.conf file or simply use --rawdskfilt which essentially redefines DiskFilter. It can +be a handy way to specify filtering via a switch in case you don't want to have to modify the conf +file. + +<p><i>Show stats for unknown disk named nvme1n1 - the wrong way</i> +<p> +Since we only specified a partial name, we see everything that matches including partition. +<div class=terminal> +<pre> +collectl.pl -sD --rawdskfilt nvme -c1 +# DISK STATISTICS (/sec) +# <---------reads---------------><---------writes--------------><--------averages--------> Pct +#Name KBytes Merged IOs Size Wait KBytes Merged IOs Size Wait RWSize QLen Wait SvcTim Util +nvme1n1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +nvme1n1p1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +nvme1n1p2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +nvme0n1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +</pre> +</div> + + +<p><i>Show stats for unknown disk named nvme1n1 - the right way</i> +<p> +If we want to just see nvme0n1 and nvme1n1, we need to be more specific and be sure to include a +space at the end of the pattern which will also require the string to be quoted. + +<div class=terminal> +<pre> +collectl.pl -sD --rawdskfilt 'nvme\dn\d ' +# DISK STATISTICS (/sec) +# <---------reads---------------><---------writes--------------><--------averages--------> Pct +#Name KBytes Merged IOs Size Wait KBytes Merged IOs Size Wait RWSize QLen Wait SvcTim Util +nvme1n1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +nvme0n1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +</pre> +</div> + +<p><i>Show stats for specific partition(s)</i> +<p> +<div class=terminal> +<pre> +collectl.pl -sD --rawdskfilt sda1 +# DISK STATISTICS (/sec) +# <---------reads---------------><---------writes--------------><--------averages--------> Pct +#Name KBytes Merged IOs Size Wait KBytes Merged IOs Size Wait RWSize QLen Wait SvcTim Util +sda1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +</pre> +</div> + <p><b>Dynamic Disk Discovery</b> <br> Dynamic disks are handled by the exact same algorithms that are applied to dynamic networks and @@ -104,7 +162,7 @@ option <i>--dskopts o</i> would be applied. -<p><table width=100%><tr><td align=right><i>updated Sept 4, 2013</i></td></tr></colgroup></table> +<p><table width=100%><tr><td align=right><i>updated Nov 8, 2016</i></td></tr></colgroup></table> </body> </html> diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/collectl-4.1.0/formatit.ph new/collectl-4.1.2/formatit.ph --- old/collectl-4.1.0/formatit.ph 2016-10-07 14:35:39.000000000 +0200 +++ new/collectl-4.1.2/formatit.ph 2017-02-27 21:46:01.000000000 +0100 @@ -651,7 +651,8 @@ if ($subsys=~/l/i) { if ((`ls /lib/modules/*/kernel/net/lustre 2>/dev/null|wc -l`==0) && - (`ls /lib/modules/*/*/kernel/net/lustre 2>/dev/null|wc -l`==0)) + (`ls /lib/modules/*/*/kernel/net/lustre 2>/dev/null|wc -l`==0) && + (`ls /lib/modules/*/*/lustre-client 2>/dev/null|wc -l`==0)) { disableSubsys('l', 'this system does not have lustre modules installed'); } @@ -3385,11 +3386,13 @@ # Apply filters to summary totals, explicitly ignoring those we don't want if ($diskName!~/^dm-|^psv/ && ($dskFilt eq '' || $diskName!~/$dskFiltIgnore/)) { - # if some disks explicitly named to keep, keep only those BUT only if - # not a partition or else we end up double counting. Note that we're - # doing a lookbehind to make sure not a cciss name like c0d0 which DOES - # look like a partition when in fact it's not. - if ($diskName!~/(?<!c\dd)\d$/ && ($dskFiltKeep eq '' || $diskName=~/$dskFiltKeep/)) + # if we're not filtering, we want to add all the disks to the summary except + # cciss and nvme disks whose basenames look like partitions and they're not! + # the partitions were already filtered out by the default disk filter. + # if we are filtering, anything that passes the filter will be reported, noting + # to report partition level stats one needs to use --rawdskfilt + if (($dskFiltKeep eq '' && ($diskName=~/c\dd\d$/ || $diskName=~/nvme/)) || + ($dskFiltKeep ne '' && $diskName=~/$dskFiltKeep/)) { $dskReadTot+= $dskRead[$dskIndex]; $dskReadMrgTot+= $dskReadMrg[$dskIndex]; @@ -4397,7 +4400,7 @@ { my ($port, @fieldsNow)=(split(/\s+/, $data))[0,4..7]; - for ($j=0; $j<3; $j++) + for ($j=0; $j<4; $j++) { $fields[$j]=fix($fieldsNow[$j]-$ibFieldsLast[$i][$port][$j]); $ibFieldsLast[$i][$port][$j]=$fieldsNow[$j]; @@ -7044,7 +7047,7 @@ $line.="<------------Udp----------->" if $tcpFilt=~/u/; $line.="<----------------------------Icmp--------------------------->" if $tcpFilt=~/c/; $line.="<-------------------------IpExt------------------------>" if $tcpFilt=~/I/; - $line.="<------------------------------------------TcpExt----------------------------------------->" if $tcpFilt=~/T/; + $line.="<-------------------------------------------------TcpExt------------------------------------------------>" if $tcpFilt=~/T/; $line.="\n"; $line.="#$miniFiller"; @@ -7053,7 +7056,7 @@ $line.=" InDgm OutDgm NoPort Errors" if $tcpFilt=~/u/; $line.=" Recvd FailI UnreI EchoI ReplI Trans FailO UnreO EchoO ReplO" if $tcpFilt=~/c/; $line.=" MPktsI BPktsI OctetI MOctsI BOctsI MPktsI OctetI MOctsI" if $tcpFilt=~/I/; - $line.=" FasTim Reject DelAck QikAck PktQue PreQuB HdPdct AkNoPy PreAck DsAcks RUData REClos SackS" if $tcpFilt=~/T/; + $line.=" FasTim Reject DelAck QikAck PktQue PreQuB HdPdct PurAck HPAcks DsAcks RUData REClos SackS PkLoss FTrans" if $tcpFilt=~/T/; $line.="\n"; printText($line); @@ -7099,14 +7102,15 @@ $tcpData{IpExt}->{OutOctets}, $tcpData{IpExt}->{OutMcastOctets}) if $tcpFilt=~/I/; - $line.=sprintf(" %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d", + $line.=sprintf(" %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d", $tcpData{TcpExt}->{TW}, $tcpData{TcpExt}->{PAWSEstab}, $tcpData{TcpExt}->{DelayedACKs}, $tcpData{TcpExt}->{DelayedACKLost}, $tcpData{TcpExt}->{TCPPrequeued}, $tcpData{TcpExt}->{TCPDirectCopyFromPrequeue}, $tcpData{TcpExt}->{TCPHPHits}, $tcpData{TcpExt}->{TCPPureAcks}, $tcpData{TcpExt}->{TCPHPAcks}, $tcpData{TcpExt}->{TCPDSACKOldSent}, $tcpData{TcpExt}->{TCPAbortOnData}, $tcpData{TcpExt}->{TCPAbortOnClose}, - $tcpData{TcpExt}->{TCPSackShiftFallback}) + $tcpData{TcpExt}->{TCPSackShiftFallback}, $tcpData{TcpExt}->{TCPLoss}, + $tcpData{TcpExt}->{TCPFastRetrans}) if $tcpFilt=~/T/; $line.="\n"; diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/collectl-4.1.0/man1/collectl.1 new/collectl-4.1.2/man1/collectl.1 --- old/collectl-4.1.0/man1/collectl.1 2016-10-07 14:35:40.000000000 +0200 +++ new/collectl-4.1.2/man1/collectl.1 2017-02-27 21:46:02.000000000 +0100 @@ -965,6 +965,10 @@ first expression with a caret, all names that match all strings will be excluded from the summary totals and detail displays rather then included. If you don't know perl, a partial string will usually work too. + +Just remember, this only applies to collected data and so if for example you specify a +parition, such as sda1, you'll never see the data since it was filtered out at the time of +data collection. To see those stats you would need to say --rawdskfilt sda1. .RE .B "--dskopts" @@ -1295,7 +1299,7 @@ .SH AUTHOR This program was written by Mark Seger (mjse...@gmail.com). .br -Copyright 2003-2015 Hewlett-Packard Development Company, LP +Copyright 2003-2016 Hewlett-Packard Development Company, LP .br collectl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the source kit