It *is* a busy box, and migrating the iSCSI LUNs to a 64bit server is something I've definitely considered. I have a Dell R310 with 16gb RAM that I could use, but it's already got 9 active VMs, although they're not heavy hitters. AFAICT, probably the highest-use machines on the ESXi 4.1 box are the secondary DC (no FSMO roles, but does do DNS and WINS) and the issuing CA box.
It's currently a VM on what I believe to be an underpowered ESX 3.5 box - I think it's possible that it's simply starved for resources on that ESX box. I'm sure there's something out there like perfmon for VMware that I can use to capture performance over time - I'd like to measure and analyze the performance of the ESX 3.5 box while the backups are happening against the file server. I'm also considering moving the Win2k3 file server VM to the ESX box and seeing if the situation improves. Kurt On Mon, Feb 13, 2012 at 12:08, Michael B. Smith <mich...@smithcons.com> wrote: > That's a busy box. I'd suggest moving to a 64-bit OS. > > Regards, > > Michael B. Smith > Consultant and Exchange MVP > http://TheEssentialExchange.com > > -----Original Message----- > From: Kurt Buff [mailto:kurt.b...@gmail.com] > Sent: Monday, February 13, 2012 3:00 PM > To: NT System Admin Issues > Subject: Re: Picking up file server tuning again > > Ran PAL against the log. > > Um, wow. It's a freaking christmas tree - red and yellow all over the > place in CPU and disk. > > Who should I be talking with to analyze this? > > A sample of the issues shown - all of which show up in more than one > time slice - some in every or almost every slice: > o- More than 50% Processor Utilization > o- More than 30% privileged (kernel) mode CPU usage > o- More than 2 packets are waiting in the output queue > o- Greater than 25ms physical disk READ response times > o- Greater than 25ms physical disk WRITE response times > o- More than 80% of Pool Paged Kernel Memory Used > o- More than 2 I/O's are waiting on the physical disk > o- 20 (Processor(_Total)\DPC Rate) > o- More than 30% Interrupt Time > o- Greater than 1000 page inputs per second (Memory\Pages Input/sec) > > Some things that showed no alerts: > o- Memory\Available MBytes > o- Memory\Free System Page Table Entrie > o- Memory\Pages/sec > o- Memory\System Cache Resident Bytes > o- Memory\Cache Bytes > o- Memory\% Committed Bytes In Use > o- Network Interface(*)\% Network Utilization > MS TCP Loopback interface > VMware Accelerated AMD PCNet Adapter > VMware Accelerated AMD PCNet Adapter#1 > o- Network Interface(*)\Packets Outbound Errors > MS TCP Loopback interface > VMware Accelerated AMD PCNet Adapter > VMware Accelerated AMD PCNet Adapter#1 > > > Kurt > > On Fri, Feb 10, 2012 at 16:04, Brian Desmond <br...@briandesmond.com> wrote: >> Rather than trying to do this yourself, check out PAL - >> http://pal.codeplex.com/. It will setup all the right counters for you and >> crunch the data. >> >> Thanks, >> Brian Desmond >> br...@briandesmond.com >> >> w – 312.625.1438 | c – 312.731.3132 >> >> -----Original Message----- >> From: Kurt Buff [mailto:kurt.b...@gmail.com] >> Sent: Friday, February 10, 2012 4:43 PM >> To: NT System Admin Issues >> Subject: Picking up file server tuning again >> >> I'm getting back to monitoring my situation with the file server again, and >> just finished a perfmon session covering the 3rd through the 7th of this >> month. Simultaneously, I set up perfmon on the same workstation to monitor >> the backup server. >> >> If anyone cares to help, I'd be deeply appreciative. >> >> I set up perfmon on a Win7 VM on an ESXi 4.1 host to take measurements at 60 >> second intervals of a whole bunch of counters, many of them probably just >> noise. >> >> I'll describe the history of the configuration first, however: >> >> The file server is a Win2k3 R2 VM running on a ESX 3.5 host with 16g of RAM >> - it's one of 10 VMs, and is definitely the heaviest hitter in terms of disk >> I/O. About 2.5-3 months ago we noticed that the time to completion for the >> weekly full backups spiked dramatically. >> >> Prior to that time, the fulls would start around 7pm on a Friday, and finish >> by about 7pm on Sunday. >> >> Now they take until Thursday or Friday to complete. >> >> This coincided with some changes to the environment: I had to move the VM to >> a new host (it was a manual copy - we don't have vmotion licensed and >> configured for these hosts) and at about that time I also had to expand 2 of >> the 4 LUNS. Finally, the OS drive for the VM on the old host was on a LUN >> on our Lefthand unit - I had to migrate it to the local disk storage on the >> new home for the VM. The 4 data drives for this VM are attached via the MSFT >> iSCSI client running on the VM, not through VMWare's iSCSI client. So, at >> that point, all of the LUNS were on the Lefthand SAN, which is a 3-node >> cluster, and we use 2-way replication for all LUNS. The 2 LUNS that were >> expanded went to 2tb or slightly beyond. The Lefthand has two NSM 2060s and >> a P4300G2, with 6 and 8 disks each, respectively - a total of 20 disks >> >> Since that time, I've also added in our EMC VNXe 3100 with 6 disks in it in >> a RAID6 array. I mention this because this means that all of the file >> systems on the VNXe are clean and defragged. >> >> Currently, I've migrated 3 of the 4 data LUNs for the VM to the EMC. I made >> sure to align the partitions on the EMC to a megabyte boundary. >> >> So, to make this simpler to visualize, a little table: >> >> c: - local disk on ESX 3.5, 40gb, 23.6gb free >> j: - iSCSI LUN on Lefthand, 2.5tb, 900gb free >> k: - iSCSI LUN on VNXe, 1.98tb, 336gb free >> l: - iSCSI LUN on VNXe, 1tb, 79gb free >> m: - iSCSI LUN on VNXe 750gb, 425gb free >> >> I tried to capture separate disk queue stats for each LUN, but in spite of >> selecting and adding each drive letter separately in the perfmon interface, >> all I got was _Total. >> >> Selected stats are as follows: >> >> PhysicalDisk counters >> Current disk queue length - average 0.483, maximum 33.000 Average disk read >> queue length - 0.037, maximum 1.294 %disk time - average 34.068, maximum >> 153.877 Average disk write queue length - average 0.645, maximum 2.828 >> Average disk queue length - average 0.681, maximum 3.078 >> >> I have more data on PhysicalDisk, and data on other objects, including >> Memory, NetworkInterface, Paging File, Processor and Server Work Queues. >> >> If anyone has thoughts, I'd surely like to hear them. >> >> Thanks, >> >> Kurt >> >> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ >> <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ >> >> --- >> To manage subscriptions click here: >> http://lyris.sunbelt-software.com/read/my_forums/ >> or send an email to listmana...@lyris.sunbeltsoftware.com >> with the body: unsubscribe ntsysadmin >> >> >> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ >> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ >> >> --- >> To manage subscriptions click here: >> http://lyris.sunbelt-software.com/read/my_forums/ >> or send an email to listmana...@lyris.sunbeltsoftware.com >> with the body: unsubscribe ntsysadmin > > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ > ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > > --- > To manage subscriptions click here: > http://lyris.sunbelt-software.com/read/my_forums/ > or send an email to listmana...@lyris.sunbeltsoftware.com > with the body: unsubscribe ntsysadmin > > > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ > ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > > --- > To manage subscriptions click here: > http://lyris.sunbelt-software.com/read/my_forums/ > or send an email to listmana...@lyris.sunbeltsoftware.com > with the body: unsubscribe ntsysadmin ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ --- To manage subscriptions click here: http://lyris.sunbelt-software.com/read/my_forums/ or send an email to listmana...@lyris.sunbeltsoftware.com with the body: unsubscribe ntsysadmin