Damn, I sent this mail to myself instead of to devl.. Another try then.. sorry for loosing the formatting.
The attachements I am talking about can be retrieved from http://194.236.28.174/freenetstuff/prof.zip /N ----- Original Message ----- From: "Niklas Bergh" <[email protected]> To: <nb at tific.com> Sent: Friday, May 16, 2003 5:35 PM Subject: Profiling of fred I decided to play around with a profiler and fred.. The test case I selected was: 'Why is fproxy slow at delivering web pages even though most of the data is cached locally'/'Why is TFE slow'. I deliberately skipped drawing any deep conclusions from the collected data since I am not enough aquainted with the code yet (and because I am to tired right now), please discuss. Setup: 1. Firewalled v6032 node (not transient but not able to recieve incoming connections) 2. Source modified to not kill of any surplus threads (QThreadFactory.run) 3. Node configured with maximumTreads=800 4. 5 pieces of lightweight webspiders, each running 10 concurrent requests againt fred 5. Eclipse profiler configured to profile using System.getCurrentMillis() call (low-overhead profiling) 6. Datastore status (was pretty much static during the tests): Maximum size 10 000 MiB Free space 782 716 KiB Used space 9 457 284 KiB Percent used 92 Total keys 23574 Most recent file timestamp Fri May 16 13:43:36 GMT+01:00 2003 Least recent file timestamp Thu Apr 10 10:11:26 GMT+01:00 2003 7. Sun JVM 1.4.1 8. maximumThreads=700 9. Instrumenting (and thereby profiling) disabled of any methods where the bytecode was less than 250 bytes. Because of this they dont show up in the stats (except as an addition to the callers execution time of course) 10. The profiling was done on a P3 650MHz with 1 gig of ram Assumptions: 1. Since I have viewed TFE multiple times before starting these tests my datastore ought to contain most of the data used by TFE (and if not, then the tests in themselves should definitely make fred cache the data) 2. Since 50 concurrent request should consume most of the machine:s CPU pretty much all of the profiling data I collect should be related to the TFE loading Actions performed: 1. Spidered all of TFE without profiling (called 'First run' below), using 5*10 threads, 20 minutes request timeout and 5 retries 2. Spidered all of TFE without profiling (called 'Second run' below), using 5*10 threads, 20 minutes request timeout and 5 retries 3. Spidered all of TFE without profiling (called 'Third run' below), using 5*10 threads, 20 minutes request timeout and 5 retries 4. Spidered all of TFE without profiling (called 'Fourth run' below), using 5*10 threads, 1 minute request timeout and 0 retries 5. Spidered all of TFE with profiling (called 'Fifth run' below), using 5*10 threads, 10 minute request timeout and 0 retries. Profiling data collection done ONLY during the time when the spiders where running. Export of some of the collected data attached, more detailed eclipse profiler stored data available upon request (much recommended above the puny HTML export version.. Sorting and filtering and such things are available there). Please note that some of the supplied HTML:s are javascript-expandable ('+'-sign) It might be helpful to have a look at http://eclipsecolorer.sourceforge.net if you have problems interpreting some of the data. Possibly interesting facts: At any time during any of the tests network I/O where very close to 0 At any time during any of the tests kernel time (disk I/O?!) where significant. About 25% when profiler wheren't attached, about 15% when it was. At nearly any time during any of the tests 95-100% of the CPU was used by javaw.exe (only exception was when the profiler sucked data from the node, for about one second every 5 seconds it used 50% CPU) While actual CPU load was around 100% freds load indicator said 15% The test set of TFE is 313 files (thelist.html and images used in it) About 240 of the files where retrievable using 20 min timeout and 5 retries (Approx total size: 2 528 290 bytes) About 240 of the files where retrievable using 1 min timeout and 0 retries (Approx total size: 2 528 290 bytes) First run average retrieval speed was 5*1800 bytes/sec Second run average retrieval speed was 5*1600 bytes/sec Third run average retrieval speed was 5*1700 bytes/sec Fourth run average retrieval speed was 5*1700 bytes/sec Fifth run average retrieval speed was 5*700 bytes/sec At all times the retrieval speed was higher initially than finally but an approx factor of 3 (probably because more threads where stuck waiting for DNF files in the later part of the test, see 'Caveats' below) It would seem that file retrievability increased slightly during my tests.. Later tests where able to retrieve about 2-3 files more than the initial ones Restart of the node was done only between run 4 and 5 After the restart I didn't request anything manually from the node, it was just the spiders running The machine running the test had more than enough RAM. At no point during the tests the memory consumption where closer to the physical memory present than 200 megs less The tests took somewhere between 30-60 each (forgot to time it but it was something like that) While one spider managed to retrieve 254 files another one running during the same time period managed to retrieve 238, third one managed 239 and so one. This might tie in to the issues with disappearing NIMs and jpg:s people have reported Some snapshots of thread usage: Second run freenet.OpenConnectionManager$ConnectionJob 19 freenet.client.InternalClient$ClientMessageVector 16 freenet.interfaces.LocalInterface$ConnectionShell 51 Third run freenet.OpenConnectionManager$ConnectionJob 27 freenet.client.InternalClient$ClientMessageVector 28 freenet.interfaces.LocalInterface$ConnectionShell 51 Fourth run freenet.OpenConnectionManager$ConnectionJob 17 freenet.client.InternalClient$ClientMessageVector 52 freenet.interfaces.LocalInterface$ConnectionShell 71 Fifth run freenet.OpenConnectionManager$ConnectionJob 14 freenet.client.InternalClient$ClientMessageVector 32 freenet.interfaces.LocalInterface$ConnectionShell 51 Caveats: Since the spiders are each using only 10 threads it might be possible that they where stuck waiting for the 73 files which wheren't retrievable much of the time. That would explain the low throughput. However, if so where the case, then I suppose the CPU shouldn't be pegged at 100% all the time?! When using 0 retries and 1 minute timeout I saw some problems with retrieval. It would seem that 2 of the spider applications even had problems getting thelist.html using those settings During all of the runs a couple of the threads where occupied at freenet.Message: QueryRejected. Who where they rejecting? Me? Current routingTime: 18ms, Active pooled jobs: 120 (15.000001%), Current estimated load: 15.000001%. Even though the profiler wasn't collecting data all the time I was executing profiled code all the time.. According to the author of the profiler this should only slow down the application by 30% (not verified by me) Ideas for the analyzing the profiling data/figure out where to modify the code: Compare number of bytes retrieved to number of invocations of methods and time consumed in them Compare number of files retrieved to number of invocations of methods and time consumed in them Compare number of files that failed retrieval to number of invocations of methods and time consumed in them Compare number of files requested to number of requests sent out over the network Compare number of files requested to number of DNFs and/or RNFs Ideas for the future: Try to eliminate a bottleneck or two and try again Run with a spider that can handle more concurrent requests Listen to people feedback and collect some more data if needed... Brrghh.. Sorry if I missed something... I am starting to get a little bit tired of this now.. Please ask if there is something that need clarification. /N _______________________________________________ devl mailing list devl at freenetproject.org http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl
