On Thursday 16 Feb 2012 17:15:59 xor wrote:
> Hi,
> 
> consider WebOfTrust as a graph, consisting of:
> - The set of Identities, the vertexes of the graph.
> - The set of Score values, the edges of the graph. For each pair of 
> OwnIdentity & Identity, such an edge exists. This mail assumes N to be the 
> number of those edges.
> 
> Currently, a single WOT instance fetches all new trust lists which each of 
> the 
> known identities publishes. A new trust list is published whenever someone 
> changes a trust value or adds a new identity, etc. So there are MANY trust 
> lists published every day.
> 
> For each fetched trust list, a score re-computation happens.
> The score re-computation is effectively a graph algorithm.
> Where N being the number of Score values, it does an amount of O(N) db4o-
> queries of the type: getScore(Identity truster, Identity target)
> 
> Obviously, such a query is theoretically executable in a time of O(1) by using
> a Hashtable<OwnIdentity truster, Hashtable<Identity target, Score score>>.

O(1) algorithms generally don't work well in on disk structures due to locality 
and size changing issues. Most databases use trees.
> 
> HOWEVER, db4o does NOT execute it in O(1). It takes an average of O(N), where 
> N is the number of Score objects!!! [1]

What actually happens is it looks up the index for both of the identities, and 
intersects them. Technically this is O(n) but it is only really slow when one 
of them (the in-set for an identity and the out-set for another identity) is 
huge. Or at least, that's what's supposed to happen - stack traces while db4o 
is running a query can be rather worrying, suggesting that sometimes it does in 
fact page in each full object, rather than simply intersecting two shortish 
lists of object ID's, for reasons that are unclear! Anyway, having said that, 
clearly it is much slower than it needs to be.

> Therefore, the old implementation of the Score computation algorithm has
> an average runtime of O(N^2 * ...) which is VERY bad, N^2 grows insanely fast.
> 
> Now the good news is:
> I have implemented a branch [2] which DOES reduce the average time of those 
> O(N) queries to O(1). It does this by creating a composite primary key (ID) 
> for 
> each Score object:
> We construct a String ID which is equal to "ID of the truster concatenated 
> with the ID of the trustee". We create a db4o index upon those composite IDs.

That's a very good idea.
> 
> Now the getScore(Identity truster, Identity target) query does not have 2 
> field-value constraints anymore, there is only 1. Therefore, due to the 
> index, 
> that query executes in O(1).

It's a tree. It's O(log) or something close to it.
> 
> We also do this for Trust-queries. This effectively should make WOT usable 
> again.

That would be awesome. Did you do any benchmarks?
> 
> The branch is already fully implemented and I'm doing a test run right now.
> The topic of the mail says "almost" done because I have not written code to 
> upgrade old databases yet. This is a simple task however, maybe 30 lines of 
> code, and I will definitely do it with then next week.
> 
> So we can see an optimized WOT build in two weeks I hope.
> 
> Greetings, and THANKS FOR YOUR EPIC PATIENCE WITH ME,
>       xor
> 
> 
> [1] I've written a db4o performance test which proves my theory that queries 
> which constrain upon two indexed fields take an average execution time of 
> O(N).
> I will publish the test in my next flog update.

That would be interesting. If it is in fact reading every object of that type 
into memory, as stack traces suggest it is doing, then something is seriously 
wrong, and we should look carefully at the API usage, the index setup code, and 
later versions of db4o.

O() numbers here are only representative of the worst-case, what is important 
is what actually happens IMHO.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20120217/fc3e40e4/attachment.pgp>

Reply via email to