[freenet-dev] Code review of recent interdex changes

Ximin Luo Fri, 07 Aug 2009 18:39:07 +0100

Matthew Toseland wrote:
> - Object.hashCode() is required to return different numbers for different 
> objects as much as possible so hopefully this will work anywhere...


This is just a guideline and not a requirement, and so code can't depend on it
for its correctness (the original IDCmp was doing this). Even on the best hash
code generators, eventually 2 objects will have the same hashCode (and even
identityHashCode will do this; there are only 2^32 possible int values and a
2^64 size memory space).

> b1315295bb159a126c938fdfa445c6a48481446c
> - what you want is to sort by relevance and then have no order. why is this 
> important? one way to implement it would be a sorted set/map by relevance 
> containing sets...
> - is it absolutely vital that compareTo() return 0 for same relevance? this 
> is a breach of the Comparable contract and must be flagged up in a comment, 
> will cause problems with any built-in collection...
> ' It is strongly recommended, but not strictly required that 
> (x.compareTo(y)==0) == (x.equals(y)). Generally speaking, any class that 
> implements the Comparable interface and violates this condition should 
> clearly indicate this fact. The recommended language is "Note: this class has 
> a natural ordering that is inconsistent with equals." '

Fixed this in one of the more recent commits.

> 
> PrefixTree:
> 
> -               if ((par == null && len != 0) || (par != null && (par.preflen 
> != len-1 || !par.prefix.match(p, len-1))))
> {
> -                       throw new IllegalArgumentException("Not a valid 
> parent for this node.");
> -               }
> -
> 
> - why was this removed?
> 
> SkeletonPrefixTreeMap.DummyChild : parent never set?

I can't remember exactly why, but I think it was necessary to set the parent in
the serialiser or the translator, and not deflate(). So it had to be null for a
short period. Or something like this. (The class is disused now.)

> What is NP vs coNP?

Oh, NP is vaguely "searching a tree for a single positive leaf" (quicker) and
co-NP is "searching a tree to see that all leaves are negative" (slower). I was
just commenting on the performance of the algorithm under different conditions.

> 01deb963752d6ccb3a9281ecf84534af69366a80
> 
> - i like java.util.concurrent, we do a lot of this the hard way, we should 
> retrofit some of our code... otoh deadlock fixes often complicate matters... 
> :|
> - will you use interrupt() ? if not, don't worry about it ...

Ah ok, I just don't like leaving empty exception handlers unless I'm explicitly
ignoring something.

> caa55450796788a2c08f8cbd5088573f85984e9c
> 
> IdentityComparator :
> +** URGENT this is SERIOUSLY bugged because System.identityHashCode has a 
> small
> +** chance of returning the same value for two distinct objects!!!
> - how does this happen? where do you get it from?

System.identityHashCode has a small chance to generate the same number for two
distinct objects. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6321873

> 9807f322ef206f1f86a3f544b80e913526b08519

did you have a comment about this commit? nothing in the email..

> 1e92788e78d2c76ca07b38551e4e5f43d2ff71f5
> - fixes IdentityComparator problems...
> - memory leaking a problem? weak references maybe appropriate? otoh weak 
> references have some overhead...

this should happen almost never so it shouldn't be a problem. the code is "just
in case".

> e18384d6f9510b61ade373892fc5fed53ca8c563
> - i am not intimately familiar with btrees.
> - my understanding is they have severe problems with concurrent updates, is 
> this a problem for us?
> - lnodes vs rnodes wastes memory, no? they both map to the same set?
> - don't understand some code in merge, maybe you fixed it later:
> +                       rnode.rnodes.put(mkey, rnode.rnodes.get(null));
> +                       rnode.rnodes.put(null, lnode.rnodes.get(null));
> [16:54:15] <toad_> +                       rnode.rnodes.put(mkey, 
> rnode.rnodes.get(null));
> [16:54:23] <toad_> in merge
> [16:54:40] <toad_> rnodes.get(null) is the entry belonging to the right end 
> of the right node
> [16:54:51] <toad_> mkey is the key between lnode and rnode pre-merge
> [16:55:01] <toad_> so what's going on here?
> [16:55:19] <infinity0> which revision is this?
> [16:55:26] <toad_> ah, it's a bug? :)
> [16:55:31] <infinity0> there were maybe 4-5 more bug-fixing commits after the 
> first commit
> [16:55:32] <toad_> this is the initial commit ...
> [16:55:33] <toad_> okay
> [16:55:40] <toad_> you have tests for it etc?
> [16:55:42] <infinity0> yeah
> [16:56:07] <infinity0> it addes 65536 entries and checks the integrity, then 
> removes it and checks the integrity

> 8a399b2fea2cbbb990cc3eaba314c172298fb350
> - TaskCompleteException == job cancelled?
> - ParallelSerialiser.joinAll() - if it fails, we won't necessarily know until 
> all of the other tasks have completed ... is this acceptable?
> - ProgressTracker docs: race condition in comments: it would be tricky 
> architecturally to remove the progress *after* loading the data? that's more 
> or less what a weak map would do? not sure what you mean about freeing the 
> data. if the data changes, then the data changes - we would have to do 
> deflate twice, no? in broader terms i'm not sure what your concurrency model 
> is; for inflate, it's clear that we load it on a separate thread and then 
> merge it, but for deflate, presumably we need to ensure that nothing gets 
> deflated further down and nothing gets modified while we deflate?
> - TaskAbortException: why does DataFormatException => retrying may work? in 
> an index fetch from freenet, a DataNotFound might result in the inflate 
> failing but being retryable - but if the data is bogus, retrying won't help?

TaskCompleteException means this job is cancelled, but the intended effects
(eg. inflation of a tree node) have been completed, from elsewhere in the 
program,

joinAll(): at the moment it's not a priority; i don't think it's too much of a
problem. but i'll note it down in the source.

ProgressTracker is now implemented in terms of WeakHashMap, see
4000b1b88d14c1eda47e18fc45283c3d9a561a52

DataFormatException should imply retry=false... must have been a typo. Seems to
be fixed in the latest revision - I don't catch DataFormatException anyway
(only RuntimeException) and that should be attached to a default TaskAbortEx
with error=true, retry=false

> f8a3d35f3c91dd32e76bbe584b0bc0f97336348d

empty? email is blank here..

> ab06fea04de2853e2685e89b9881471f6613aed6
> 
> Trackables? Is there only one pull/push happening at any given time for any 
> one of TTAB_KEYS, TTAB_DATA, UTAB_KEYS, UTAB_DATA ? is ProtoIndex the index 
> or is it a single high-level operation on the index?

Those are just the serialisers that hold ProgressTrackers, so the code has a
convenient place to access them. I've removed it from the code now, and just
casting the serialisers to (Serialiser.Trackable) directly

ProtoIndex is the whole index data structure.

X

[freenet-dev] Code review of recent interdex changes

Reply via email to