>From: Christian Biere <[EMAIL PROTECTED]>
>Date: Fri, 17 Nov 2006 10:00:55 +0100
>
>Lloyd Bryant wrote:
> > Attached is a patch to have a "browse host" request respond with the 
>files
> > sorted by descending file date/time (so that the newest files on the 
>system
> > appear at the beginning of the browse list).
>
>I've integrated your patch now. However, instead of sorting the list, I 
>decided
>to sort file_table[]. Maybe this fixes the performance issue you've 
>noticed. I
>didn't notice a significant difference but it shouldn't be slower in any 
>case.
>I did this mainly because request_sha1() can cause to remove the file again 
>but
>a NULL pointer in shared_files was handled elsewhere whereas holes in
>file_table[] are already expected. Removing a node from a GSList is a bit
>awkward albeit not really difficult but I consider sorting the table 
>simpler
>and clearer.
>

Since the table is being built from the slist, it's 6 of one, half-a-dozen 
of the other.  I sorted the slist because it was the only list available at 
that point.  Either way, the sort had to occur before request_sha1() was 
called.  Read below for more info on that performance issue.

> > As part of this change, it was necessary for me to move some things from
> > "recurse_scan_intern()" to "share_scan()", as I was having problems with
> > the SHA1 calculation routine relying on values that were set in
> > "recurse_scan_intern()" (Otherwise, I would have had to wait until all
> > SHA1's were updated before doing the sort - that could take a while).
>
>Yes, I noticed that request_sha1() uses the file index, so you can't easily
>change it after that.
>
> > I created a new property to determine whether or not this sort is
> > performed: "sort_browse_upload" (gboolean, of course).  The property is
> > live and can be accessed via the preferences-debug screen (or the shell) 
>or
> > the gnet_config file.  If somebody has an exceptionally large number of
> > files, then the sort could potentially add a substantial amount of time 
>to
> > the total required for a rescan (note: on my headless box, which is an
> > obsolete P2-300, processing 7600 files, having this option active adds
> > about 15 seconds to the time required for a rescan.  Not too bad....)
>
>I've left this property out for now. There's a message printed that shows 
>how
>long the sorting took. Look for "MESSAGE.*sorting took" after a rescan. If
>that shows that sorting took significantly longer than a second, I'll 
>consider
>adding this part of your patch as well.

First off, that 15 second value was a goof on my part - I had a recompile 
going at the time, and it was seriously affecting the amount of memory 
available for Linux's dcache.  The 15 second differential was primarily an 
artifact of this (I was only measuring total time for the rescan, not the 
time required for it's different elements).

I've gone through (with my original patch) and set some saner metrics.  Here 
are the results of three test sets I ran - First, no sort; Second, sort 
using g_slist_sort; Finally, sort using sort_slist_using_qsort.  All test 
were run with my "live" file set (7659 jpegs), average after 10 iterations 
(discarding the first pass of each set to eliminate caching effects):

No sort:  3.77 sec (0 ms avg sort time)
g_slist_sort:  3.83 sec (18.29 ms avg sort time)
sort_slist_with_qsort:  3.77 sec (10.79 ms avg sort time)

The overall time was dominated by other factors, making sort time pretty 
much irrelevant.

I'm currently setting up another test, using 100,000 files.  I want to see 
how sort time grows for the different sorting methods.  It took 20 minutes 
to do the initial directory scan (directories not in cache, I guess), and 
I'm still waiting for the SHA1's to build so I can do the real testing.

This test has (so far) highlighted a couple of other issues involving large 
numbers of files:

The initial code, which started SHA1 calculations before directory scanning 
was complete, was actually a better solution for large numbers of files.  It 
takes a LONG time to complete the directory scan, and having the SHA1 
calculation routine setting idle during that time doesn't make a lot of 
sense.

Second, the SHA1 calculation does not use enough CPU.  Right now, I'm 
waiting for the SHA1's to complete - CPU (user + system) is running at about 
5% (I have that machine offline, so there's no network load on it at the 
moment).

Of course, once the SHA1's are calculated, they're in the cache, so both 
these issues are mainly 1 time artifacts.   I'm aware that SHA1's can be 
calculated offline, but my experience is that the average user shouldn't be 
using a shell command for anything (take a look at some of the issue on the 
Ubuntu forums if you want good examples of why not).

I've updated from the SVN - I'll take a look at the changes after I finish 
my tests...

Lloyd Bryant



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Gtk-gnutella-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gtk-gnutella-devel

Reply via email to