>From: Christian Biere <[EMAIL PROTECTED]>
>Reply-To: [email protected]
>To: [email protected]
>Subject: Re: [Gtk-gnutella-devel] Patch: Browse-host sort
>bydescendingfiledate/time (ne
>Date: Fri, 17 Nov 2006 19:15:34 +0100
>
>Lloyd Bryant wrote:
> > Is there any advantage to keeping the indices stable? SHA1 calculation
> > we've already looked at. The search table is rebuilt on each rescan, as
>is
> > the hash table. If there's anything in the system that depends on those
> > indices being stable after a rescan, then the whole sorting idea is in
>deep
> > trouble.....
>
>No, it's nothing crucial. However, if the SHA-1 isn't available yet, the
>only
>unique key is the filename plus the index. Albeit I'm not certain that
>Gtk-Gnutella
>actually handles multiple files with the same name correctly in this case.
>That's
>easy to verify though. The idea behind the /get/<index>/<filename> scheme
>is
>to allow multiple files with the same name. At least I think so.
>
But share_free() is called early in share_scan(). It destroys search_table,
the file_basenames hash table, all of the shared_file entries, and
file_table. At that point, it doesn't matter what indices you've still got,
as they've got nothing to index. Until the directory scan is complete and
file_table is rebuilt, it's pretty much impossible to access any file.
Note that without the sort, the situation was different. file_table was
rebuilt as files were scanned, becoming accessible via the index much sooner
than they do now. Of course, "sooner" is less than 3 seconds sooner (that's
the time from the end of the sort to the end of share_scan() from my 100,000
file test).
Perhaps it would have been better to have added "browse_previous_index" and
"browse_next_index" to the shared_file structure, and then have a
"browse_first_index" global that points to the first entry. Such a linked
list could be sorted in any order without affecting anything else in the
system. Build an array based on this list, qsort it, then update the links
based on what qsort returns.
>
>I know that qsort() isn't stable which could be fixed by using a secondary
>key
>but I would guess that the same implementation would always gain the same
>order for the same input. Sure, if you add new files, this could cause a
>difference. I'd say for common use such problems are sufficiently unlikely
>and without exposing the full pathname, we can't really prevent such
>clashes
>before we know a checksum. Okay, we could store the index in the cache and
>then persist it. Note that the sha1_cache as it is, doesn't really use an
>extensible format. If we wanted to revise it, we'd probably have to rename
>it
>to prevent trouble if someone downgrades Gtk-Gnutella.
>
What about using device + inode in place of the full file path? These are
just as guaranteed to be unique (and a heck of a lot shorter: Typical ext3
filesystem device is 64bits, and inode 32. Worst case would be 64+64, or 16
bytes, which is still a heck of a lot shorter than
"/home/lloyd/gtk-gnutella/shared/......". This has the advantage of not
requiring a SHA1 recalc if a file is mv'ed (which changes ctime, but not
atime or mtime).
I've used device/inode in the past as a unique file key. Works well on
every *nix flavor I've worked on (Linux, Solaris, AIX, SCO).
We're already stat'ing the files as part of the rescan, so it wouldn't
require any extra effort to get our hands on these....
Lloyd Bryant
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Gtk-gnutella-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gtk-gnutella-devel