Sorry, I'm not great at deciphering linux diagnostics (I'm relatively new to it--a year or two), but I did a little poking around to see what might be causing trouble. Wikipedia had these choice bits to say about the C3 chip design:

     C3

   * Because memory performance is the limiting factor in many
     benchmarks, VIA processors implement large primary caches, large
     TLBs <http://en.wikipedia.org/wiki/Translation_Lookaside_Buffer>,
     and aggressive prefetching
     <http://en.wikipedia.org/wiki/Prefetching>, among other
     enhancements. While these features are not unique to VIA, memory
     access optimization is one area where they have not dropped
     features to save die space. In fact generous primary caches (128K)
     have always been a distinctive hallmark of Centaur / VIA designs.

   * Clock frequency is in general terms favored over increasing
     instructions per cycle. Complex features such as out-of-order
     instruction execution are deliberately not implemented, because
     they impact the ability to increase the clock rate, require a lot
     of extra die space and power, and have little impact on
     performance in several common application scenarios. Internally,
     the C7 has 16 pipeline stages.

   * The pipeline is arranged to provide one-clock execution of the
     heavily used register--memory and memory--register forms of x86
     instructions. Several frequently used instructions require fewer
     pipeline clocks than on other x86 processors.

   * Infrequently used x86 instructions are implemented in microcode
     <http://en.wikipedia.org/wiki/Microcode> and emulated. This saves
     die space and reduces power consumption. The impact upon the
     majority of real world application scenarios is minimized.

   * These design guidelines are derivative from the original RISC
     <http://en.wikipedia.org/wiki/RISC> advocates, who stated a
     smaller set of instructions, better optimized, would deliver
     faster overall CPU performance.



And they give stats on L1/L2 cache sizes that are pertinent:
Processor       Secondary
Cache (k)       Die size
130 nm (mm²)    Die size
90 nm (mm²)
C3 / C7         64/128  52      30
Athlon XP       256     84      N/A
Athlon 64       512     144     84
Pentium M       2048    N/A     84
P4 Northwood    512     146     N/A
P4 Prescott     1024    N/A     110


What I would take from this is A) the C3 does not have out of order instruction scheduling, so a lot of places where a Pentium class chip would fly through a hunk of code that has numerous data dependencies will stall like crazy on a C3, causing tons of wasted cycles which show up as CPU usage (the pipe is 16 instructions long on the C3, so a stall is at least that many cycles). Calculating hashes is a pretty tight loop, so that will probably increase the total clocks required to perform a hash computation. B) The C3 has a pretty small L2 chip cache, but a large L1. It may be adequate for this task, it may not... hard to say without getting performance counters straight from the chip while running a backup, to see how many cache misses you have. Chances are, running the OS, Perl, plus the large data sets that are flooding through the chip are demanding a lot from such a small cache. It may be that the data itself is always in the cache, but the code for other tasks are being swapped out so task switches are very expensive, or possibly the data is so large or iterated in just the wrong way so that ~150k is too small a working space to compute hashes. A cache miss on every 64 bytes would show up as an incredible CPU hit.

You can test situation B) by going into the CMOS and disabling the L2 cache, or both if you have to, and re-run a 'quick' backup to compare the times. If the cache is being blown constantly, this will have little effect. Otherwise, it should run about 5x slower with the cache disabled, indicating the L2 cache is not the bottleneck.

I will say that your server has a ton of files more than mine do, so perhaps you're also being hit by a per-file overhead... maybe packet processing costs are eating your lunch? It's possible your network driver is doing a lot in software that counts toward your CPU usage? If your server is also experiencing high load, I would suspect it to be a per-file overhead of the transfer protocol rather than a specific hardware problem with the C3, since it would also be reflected on a much beefier box.

Hope this helps,
JH

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Reply via email to