Re: [BackupPC-users] CPU Load statistics (Was: Re: OK, how about changing the server's backuppc process niceness?)

Jason Hughes Tue, 09 Jan 2007 13:33:48 -0800

Sorry, I'm not great at deciphering linux diagnostics (I'm relativelynew to it--a year or two), but I did a little poking around to see whatmight be causing trouble. Wikipedia had these choice bits to say aboutthe C3 chip design:


     C3

   * Because memory performance is the limiting factor in many
     benchmarks, VIA processors implement large primary caches, large
     TLBs <http://en.wikipedia.org/wiki/Translation_Lookaside_Buffer>,
     and aggressive prefetching
     <http://en.wikipedia.org/wiki/Prefetching>, among other
     enhancements. While these features are not unique to VIA, memory
     access optimization is one area where they have not dropped
     features to save die space. In fact generous primary caches (128K)
     have always been a distinctive hallmark of Centaur / VIA designs.

   * Clock frequency is in general terms favored over increasing
     instructions per cycle. Complex features such as out-of-order
     instruction execution are deliberately not implemented, because
     they impact the ability to increase the clock rate, require a lot
     of extra die space and power, and have little impact on
     performance in several common application scenarios. Internally,
     the C7 has 16 pipeline stages.

   * The pipeline is arranged to provide one-clock execution of the
     heavily used register--memory and memory--register forms of x86
     instructions. Several frequently used instructions require fewer
     pipeline clocks than on other x86 processors.

   * Infrequently used x86 instructions are implemented in microcode
     <http://en.wikipedia.org/wiki/Microcode> and emulated. This saves
     die space and reduces power consumption. The impact upon the
     majority of real world application scenarios is minimized.

   * These design guidelines are derivative from the original RISC
     <http://en.wikipedia.org/wiki/RISC> advocates, who stated a
     smaller set of instructions, better optimized, would deliver
     faster overall CPU performance.



And they give stats on L1/L2 cache sizes that are pertinent:
Processor       Secondary
Cache (k)       Die size
130 nm (mm²)    Die size
90 nm (mm²)
C3 / C7         64/128  52      30
Athlon XP       256     84      N/A
Athlon 64       512     144     84
Pentium M       2048    N/A     84
P4 Northwood    512     146     N/A
P4 Prescott     1024    N/A     110

What I would take from this is A) the C3 does not have out of orderinstruction scheduling, so a lot of places where a Pentium class chipwould fly through a hunk of code that has numerous data dependencieswill stall like crazy on a C3, causing tons of wasted cycles which showup as CPU usage (the pipe is 16 instructions long on the C3, so a stallis at least that many cycles). Calculating hashes is a pretty tightloop, so that will probably increase the total clocks required toperform a hash computation. B) The C3 has a pretty small L2 chip cache,but a large L1. It may be adequate for this task, it may not... hard tosay without getting performance counters straight from the chip whilerunning a backup, to see how many cache misses you have. Chances are,running the OS, Perl, plus the large data sets that are flooding throughthe chip are demanding a lot from such a small cache. It may be thatthe data itself is always in the cache, but the code for other tasks arebeing swapped out so task switches are very expensive, or possibly thedata is so large or iterated in just the wrong way so that ~150k is toosmall a working space to compute hashes. A cache miss on every 64 byteswould show up as an incredible CPU hit.

You can test situation B) by going into the CMOS and disabling the L2cache, or both if you have to, and re-run a 'quick' backup to comparethe times. If the cache is being blown constantly, this will havelittle effect. Otherwise, it should run about 5x slower with the cachedisabled, indicating the L2 cache is not the bottleneck.

I will say that your server has a ton of files more than mine do, soperhaps you're also being hit by a per-file overhead... maybe packetprocessing costs are eating your lunch? It's possible your networkdriver is doing a lot in software that counts toward your CPU usage? Ifyour server is also experiencing high load, I would suspect it to be aper-file overhead of the transfer protocol rather than a specifichardware problem with the C3, since it would also be reflected on a muchbeefier box.


Hope this helps,
JH

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Re: [BackupPC-users] CPU Load statistics (Was: Re: OK, how about changing the server's backuppc process niceness?)

Reply via email to