Ran PAL against the log.

Um, wow. It's a freaking christmas tree - red and yellow all over the
place in CPU and disk.

Who should I be talking with to analyze this?

A sample of the issues shown - all of which show up in more than one
time slice - some in every or almost every slice:
o- More than 50% Processor Utilization
o- More than 30% privileged (kernel) mode CPU usage
o- More than 2 packets are waiting in the output queue
o- Greater than 25ms physical disk READ response times
o- Greater than 25ms physical disk WRITE response times
o- More than 80% of Pool Paged Kernel Memory Used
o- More than 2 I/O's are waiting on the physical disk
o- 20 (Processor(_Total)\DPC Rate)
o- More than 30% Interrupt Time
o- Greater than 1000 page inputs per second (Memory\Pages Input/sec)

Some things that showed no alerts:
o- Memory\Available MBytes
o- Memory\Free System Page Table Entrie
o- Memory\Pages/sec
o- Memory\System Cache Resident Bytes
o- Memory\Cache Bytes
o- Memory\% Committed Bytes In Use
o- Network Interface(*)\% Network Utilization
     MS TCP Loopback interface
     VMware Accelerated AMD PCNet Adapter
     VMware Accelerated AMD PCNet Adapter#1
o- Network Interface(*)\Packets Outbound Errors
     MS TCP Loopback interface
     VMware Accelerated AMD PCNet Adapter
     VMware Accelerated AMD PCNet Adapter#1


Kurt

On Fri, Feb 10, 2012 at 16:04, Brian Desmond <br...@briandesmond.com> wrote:
> Rather than trying to do this yourself, check out PAL - 
> http://pal.codeplex.com/. It will setup all the right counters for you and 
> crunch the data.
>
> Thanks,
> Brian Desmond
> br...@briandesmond.com
>
> w – 312.625.1438 | c   – 312.731.3132
>
> -----Original Message-----
> From: Kurt Buff [mailto:kurt.b...@gmail.com]
> Sent: Friday, February 10, 2012 4:43 PM
> To: NT System Admin Issues
> Subject: Picking up file server tuning again
>
> I'm getting back to monitoring my situation with the file server again, and 
> just finished a perfmon session covering the 3rd through the 7th of this 
> month. Simultaneously, I set up perfmon on the same workstation to monitor 
> the backup server.
>
> If anyone cares to help, I'd be deeply appreciative.
>
> I set up perfmon on a Win7 VM on an ESXi 4.1 host to take measurements at 60 
> second intervals of a whole bunch of counters, many of them probably just 
> noise.
>
> I'll describe the history of the configuration first, however:
>
> The file server is a Win2k3 R2 VM running on a ESX 3.5 host with 16g of RAM - 
> it's one of 10 VMs, and is definitely the heaviest hitter in terms of disk 
> I/O. About 2.5-3 months ago we noticed that the time to completion for the 
> weekly full backups spiked dramatically.
>
> Prior to that time, the fulls would start around 7pm on a Friday, and finish 
> by about 7pm on Sunday.
>
> Now they take until Thursday or Friday to complete.
>
> This coincided with some changes to the environment: I had to move the VM to 
> a new host (it was a manual copy - we don't have vmotion licensed and 
> configured for these hosts) and at about that time I also had to expand 2 of 
> the 4 LUNS.  Finally, the OS drive for the VM on the old host was on a LUN on 
> our Lefthand unit - I had to migrate it to the local disk storage on the new 
> home for the VM. The 4 data drives for this VM are attached via the MSFT 
> iSCSI client running on the VM, not through VMWare's iSCSI client. So, at 
> that point, all of the LUNS were on the Lefthand SAN, which is a 3-node 
> cluster, and we use 2-way replication for all LUNS. The 2 LUNS that were 
> expanded went to 2tb or slightly beyond. The Lefthand has two NSM 2060s and a 
> P4300G2, with 6 and 8 disks each, respectively - a total of 20 disks
>
> Since that time, I've also added in our EMC VNXe 3100 with 6 disks in it in a 
> RAID6 array. I mention this because this means that all of the file systems 
> on the VNXe are clean and defragged.
>
> Currently, I've migrated 3 of the 4 data LUNs for the VM to the EMC. I made 
> sure to align the partitions on the EMC to a megabyte boundary.
>
> So, to make this simpler to visualize, a little table:
>
> c: - local disk on ESX 3.5, 40gb, 23.6gb free
> j: - iSCSI LUN on Lefthand, 2.5tb, 900gb free
> k: - iSCSI LUN on VNXe, 1.98tb, 336gb free
> l: - iSCSI LUN on VNXe, 1tb, 79gb free
> m: - iSCSI LUN on VNXe 750gb, 425gb free
>
> I tried to capture separate disk queue stats for each LUN, but in spite of 
> selecting and adding each drive letter separately in the perfmon interface, 
> all I got was _Total.
>
> Selected stats are as follows:
>
>     PhysicalDisk counters
> Current disk queue length - average 0.483, maximum 33.000 Average disk read 
> queue length - 0.037, maximum 1.294 %disk time - average 34.068, maximum 
> 153.877 Average disk write queue length - average 0.645, maximum 2.828 
> Average disk queue length - average 0.681, maximum 3.078
>
> I have more data on PhysicalDisk, and data on other objects, including 
> Memory, NetworkInterface, Paging File, Processor and  Server Work Queues.
>
> If anyone has thoughts, I'd surely like to hear them.
>
> Thanks,
>
> Kurt
>
> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ 
> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
>
> ---
> To manage subscriptions click here: 
> http://lyris.sunbelt-software.com/read/my_forums/
> or send an email to listmana...@lyris.sunbeltsoftware.com
> with the body: unsubscribe ntsysadmin
>
>
> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
>
> ---
> To manage subscriptions click here: 
> http://lyris.sunbelt-software.com/read/my_forums/
> or send an email to listmana...@lyris.sunbeltsoftware.com
> with the body: unsubscribe ntsysadmin

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to listmana...@lyris.sunbeltsoftware.com
with the body: unsubscribe ntsysadmin

Reply via email to