On 05/01/2010 11:34 PM, Matthew Toseland wrote: > I have tagged and released version 8. However all changes relate to the new > index format, and almost all to on-the-fly merging. This will be vital in the > next stage - making XMLSpider generate new format indexes, progressively, so > that 1) XMLSpider no longer takes a week to write an index from scratch, and > 2) it can actually insert the index. > > Going backwards in time... > > 0e5af601a81b3daca6a79953f92372a5d7c38cf6 > - this commit removes a bunch of distributed indexing design docs > - are there up to date docs re distributed indexing? are these all going to > have to be severely rewritten anyway?
pretty much... i'll get around to these. writing up the project atm. > eb90053d6ae081d9a6b9d6bcd67ba58a808edbc3 > - random unit tests are good, but unless you have a fixed seed, it is > essential that you print it out, ideally on failure but if necessary on > starting the tests. i've made all the tests use the same random object, and it'll print out its seed when initialised. > c3d26271bea79189e836c56fd9df8d5bb584dc0e > - BTreeMap.subSet(). > - last line: how can k2 be null? surely if it is null then there is nothing > left, we can return empty? right, and ss.headSet(lk) is an easy-if-hackish way to make an empty set with the right comparator, of the same class as the input. > 9724ad257b76473a6fa94d7dcd5f1b65cd963f14 > BTreeMap.subEntries() - if there is only one key in the tail set, we return > entries.subMap(lk, lk), which is empty at best or a contradiction at worst; > changing to subMap(lk, rk) doesn't seem to fix it either, afaics this line > should just be removed. same principle as above - it's supposed to be empty, this is the easiest way to make sure it's got the right comparator and class type. :) > cc8a478986b6039b8c6d68cbd33108db175a700c > (t = timeout) == timeout > eh?? > bad idea to Thread.sleep while holding a lock. Use wait/notify not > sleep/interrupt, it is more reliable. could you elaborate? where is it holding a lock? that section of the code runs in a new thread. (granted, it is a bit untidy, will probably be removed when i do the -thread+async upgrades.) > f34dd12004afceb7fb332af642d6b98ea81500bd > Integers.allocateEvenlyRandom (todo only atm) - shouldn't it be > deterministic, to maximise overlap when inserting the same data etc? or would > it only be used in unit tests? (in which case must pass a random in) what do you mean by maximise overlap? this will be just as efficient as the deterministic version in distributing values across nodes. the only difference is instead of doing, eg. [11211211] it will do eg. [12111211]. the rationale was to avoid all possibility of "interference patterns", eg. if you have two such sequences and add them, the imperfections might line up and accumulate. i don't have a formal argument for this, and it might not even apply to the b-trees case, hence "TODO LOW", but basically the principle is similar to when you want to make a grayscale image out of black and white pixels, you don't spread the dots regularly, as that effect accumulates and doesn't look nice. > 71bd3b2f74fef2f74c0cf6684b9d96de6fb09304 > ObjectProcessor.update() - what if it's already run? Looks racy - or you > expect this to happen before we start _any_ jobs? i can't find any trace of ObjectProcessor.update(), what did you mean? auto() will only ever run one handler, and that is synchronized properly. > 0d96f8101002ce4b127ff3028858b4738e95bf54 > - use NativeThread? would need a priority passing in ... > > 3a28caa359a30d5e8361c94118002fb5135936fc > - Sorted.keySet(). > - instanceof SortedMap? SortedSet surely. > - i guess this doesn't exist any more? ahh you're right, fixed. > 6aac0ad832dcc829ef0e3e7769818d5d6bf3be1d > - new GhostNode(null, null, null, en.getValue()) > - when do the keys get set? it's stored in gh and passed to SkeletonNode's constructor, which sets the keys. > 1a65974565ccfe809230a8b3abd954c23477e66d > - merge and split seem to be swapped around? looking forward to promised > javadocs! no this is right, i've put the javadocs in. > ParallelSerialiser: > > - // TODO: toad - if it fails, we won't necessarily know > until all of the > + // TODO NORM: toad - if it fails, we won't necessarily > know until all of the > > dunno ... fast failure is definitely best for UI ...
