Here is my current high level view of the main challenges involved in the queue-in-database implementation. Of course, I started implementing on Monday, and this has been partly the result of learning more about db4o and partly the result of encountering new challenges in the code...
Not being a database geek, I have to adapt the design on the fly ... ah well, I prefer the way I do it to the way that everyone else doesn't! Constructive criticisms are however very welcome... Any database geeks who could review the below, ideally combined with the client layer code (YEAH RIGHT, it's a monster...), please do so! [13:29] <toad_> okay, a single client database thread is likely to be simplest in terms of avoiding bugs ... if it becomes a performance problem later on, we can change it without changing any persistent data structures [13:29] <toad_> so lets do that [13:31] <toad_> we'll have "SerialExecutor databaseThread" ... we'll queue jobs on it, and they will commit [13:31] <toad_> since there's only one thread, we should never need to rollback ... hopefully ... [13:31] <toad_> but we commit at the end of each job [13:34] <toad_> since all the database action occurs on a single thread, we can safely deactivate stuff when we're done with it As opposed to real transactions and multiple threads. It is possible to have an ObjectServer and multiple ObjectContainer's in the same VM. But it has some interesting results: You have separate weak-ref caches for each ObjectContainer client, and you may frequently have multiple copies of the same object in the same VM (for different containers). So running everything that requires database access on one thread may cost some performance, but parallel transactions aren't necessarily faster because seeks are bad on cheap disks; and it means we have a single cache, which should be more efficient. However it avoids a lot of problems with objects from different containers interacting, with updating stuff, etc etc. [13:36] <toad_> we COULD use Db4oMap since we're running everything on one thread, but if we don't, we have two advantages: 1) we can make it transactional later, 2) Db4oMap is deprecated in 7.X Thus, we will probably have e.g. pendingKeys converted to a mass of small tuple-like objects, create an index, and then query them. We can either search by byte[], ByteWrapper or Key (NodeCHK/NodeSSK). *In theory* the last option should work, provided we make them implement Comparable so that db4o can construct a btree index by value... [13:39] <toad_> we plan to set the default activation depth quite low I.e. 1 or 2! [13:40] <toad_> w.r.t. non-persistent requests, we have parallel data structures in RAM, we try to share the logic even if we have different impl's... Much of this is done already (ClientRequestSchedulerCore vs CRSNonPersistent vs CRSBase), but the above changes will require further changes. [13:40] <toad_> we can use transient references to deal with FCPServer callbacks etc [13:42] <toad_> we should pass the ObjectContainer into any method that might need it, 1) for activation, and 2) so we don't store it (makes transition to real transactions later on easier) [13:43] <toad_> the FEC queue needs to be persistent ... we'll have a running list in the database, and a queue. if we submit a job and the fec runner is idle, it can be started immediately, but it must still go into the database's running list... This is essential for integrity when we get the last block for a segment: we need to either commit the fact that we got the block, AND the fact that we need to decode the segment, or we need to commit neither. [13:43] <toad_> we will of course have a parallel non-persistent structure... [13:44] <toad_> we can safely buffer a few slots ahead in RAM with weak references, or we can rely on queries... [13:44] <toad_> they will need to be SODA because they will need to be ordered Native Queries do not support sorting results, unfortunately. [13:44] <toad_> the splitfile code will probably need to be rewritten It may make sense to have a small block handler object for each block, plus a segment handler for each segment. Then we would get a block, find which block it is for, (pendingKeys really should incorporate some sort of context object for this) activate the relevant block handler, ask it to deal with it; it would then decode the block (probably off-thread, we could split it into two database-thread jobs, one before and one after), if necessary activate the segment, and decode it. Or perhaps we should keep the current design: fetcher -> segment -> subsegment (segment:retryCount), with the subsegment being queued. Either way we will need significant changes for activation and deactivation: we don't want to pull in the whole splitfile and all its keys to handle one block. [13:46] <toad_> re the cooldown queue, we need to persist it, because we will need to actually remove stuff from the main queue when putting it onto the cooldown queue [13:47] <toad_> again we'll have to have parallel structures ... the persistent variant can however be much simpler as memory pressure isn't such an issue... [13:47] <toad_> the non-persistent one we can keep as-is, or we can try to minimise the code difference between the two... The cooldown queue at the moment is a complex, tightly memory optimised structure, which consists of 3 arrays (it would have been 2 arrays but I discovered that we need to keep the SendableGet and not just the Key for various reasons). It is possible to make it much simpler, given queries and/or TreeMap's, and one object per entry. The cost is significantly higher memory usage. That's not a problem if it's only for nonpersistent requests, and of course the persistent one doesn't have to worry too much about footprint - not that the size of the on-disk database doesn't matter, but it's more important to not have to bring the entire arrays into RAM than for them to be small overall. [13:48] <toad_> when the user deletes a request we will need to delete its structures [13:49] <toad_> which means we need to be careful to avoid sharing stuff E.g. FreenetURI's. Not having garbage collection is a problem, but hopefully we can deal with it. [13:50] <toad_> we can use activation and deactivation to keep the top levels of the queue structure in RAM, or not to, depending on config ... that's not something we need to worry about atm... [13:51] <toad_> ... and if we need to change stuff, there are extensive options for schema evolution [13:51] <toad_> so i think that's about it for the preliminary design [13:51] <toad_> three days after i started the low level implementation! :) Another item we've already discussed: We must backup the database every 6 hours (including downtime, so as soon as possible after starting up if the last one was more than 6 hours ago) or whenever a major change happens, such as adding a new persistent request. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20080522/e5494ae1/attachment.pgp>
