[freenet-dev] Preliminary design notes for queue-in-database (3 days after starting implementation!!)

Matthew Toseland Thu, 22 May 2008 14:15:11 +0100

Here is my current high level view of the main challenges involved in the 
queue-in-database implementation. Of course, I started implementing on 
Monday, and this has been partly the result of learning more about db4o and 
partly the result of encountering new challenges in the code...


Not being a database geek, I have to adapt the design on the fly ... ah well, 
I prefer the way I do it to the way that everyone else doesn't! Constructive 
criticisms are however very welcome... Any database geeks who could review 
the below, ideally combined with the client layer code (YEAH RIGHT, it's a 
monster...), please do so!

[13:29] <toad_> okay, a single client database thread is likely to be simplest 
in terms of avoiding bugs ... if it becomes a performance problem later on, 
we can change it without changing any persistent data structures
[13:29] <toad_> so lets do that
[13:31] <toad_> we'll have "SerialExecutor databaseThread" ... we'll queue 
jobs on it, and they will commit
[13:31] <toad_> since there's only one thread, we should never need to 
rollback ... hopefully ...
[13:31] <toad_> but we commit at the end of each job
[13:34] <toad_> since all the database action occurs on a single thread, we 
can safely deactivate stuff when we're done with it


As opposed to real transactions and multiple threads. It is possible to have 
an ObjectServer and multiple ObjectContainer's in the same VM. But it has 
some interesting results: You have separate weak-ref caches for each 
ObjectContainer client, and you may frequently have multiple copies of the 
same object in the same VM (for different containers). So running everything 
that requires database access on one thread may cost some performance, but 
parallel transactions aren't necessarily faster because seeks are bad on 
cheap disks; and it means we have a single cache, which should be more 
efficient. However it avoids a lot of problems with objects from different 
containers interacting, with updating stuff, etc etc.

[13:36] <toad_> we COULD use Db4oMap since we're running everything on one 
thread, but if we don't, we have two advantages: 1) we can make it 
transactional later, 2) Db4oMap is deprecated in 7.X

Thus, we will probably have e.g. pendingKeys converted to a mass of small 
tuple-like objects, create an index, and then query them. We can either 
search by byte[], ByteWrapper or Key (NodeCHK/NodeSSK). *In theory* the last 
option should work, provided we make them implement Comparable so that db4o 
can construct a btree index by value...

[13:39] <toad_> we plan to set the default activation depth quite low

I.e. 1 or 2!

[13:40] <toad_> w.r.t. non-persistent requests, we have parallel data 
structures in RAM, we try to share the logic even if we have different 
impl's...

Much of this is done already (ClientRequestSchedulerCore vs CRSNonPersistent 
vs CRSBase), but the above changes will require further changes.

[13:40] <toad_> we can use transient references to deal with FCPServer 
callbacks etc
[13:42] <toad_> we should pass the ObjectContainer into any method that might 
need it, 1) for activation, and 2) so we don't store it (makes transition to 
real transactions later on easier)
[13:43] <toad_> the FEC queue needs to be persistent ... we'll have a running 
list in the database, and a queue. if we submit a job and the fec runner is 
idle, it can be started immediately, but it must still go into the database's 
running list...

This is essential for integrity when we get the last block for a segment: we 
need to either commit the fact that we got the block, AND the fact that we 
need to decode the segment, or we need to commit neither.

[13:43] <toad_> we will of course have a parallel non-persistent structure...
[13:44] <toad_> we can safely buffer a few slots ahead in RAM with weak 
references, or we can rely on queries...
[13:44] <toad_> they will need to be SODA because they will need to be ordered

Native Queries do not support sorting results, unfortunately.

[13:44] <toad_> the splitfile code will probably need to be rewritten

It may make sense to have a small block handler object for each block, plus a 
segment handler for each segment. Then we would get a block, find which block 
it is for, (pendingKeys really should incorporate some sort of context object 
for this) activate the relevant block handler, ask it to deal with it; it 
would then decode the block (probably off-thread, we could split it into two 
database-thread jobs, one before and one after), if necessary activate the 
segment, and decode it.

Or perhaps we should keep the current design: fetcher -> segment -> subsegment 
(segment:retryCount), with the subsegment being queued. Either way we will 
need significant changes for activation and deactivation: we don't want to 
pull in the whole splitfile and all its keys to handle one block.

[13:46] <toad_> re the cooldown queue, we need to persist it, because we will 
need to actually remove stuff from the main queue when putting it onto the 
cooldown queue
[13:47] <toad_> again we'll have to have parallel structures ... the 
persistent variant can however be much simpler as memory pressure isn't such 
an issue...
[13:47] <toad_> the non-persistent one we can keep as-is, or we can try to 
minimise the code difference between the two...

The cooldown queue at the moment is a complex, tightly memory optimised 
structure, which consists of 3 arrays (it would have been 2 arrays but I 
discovered that we need to keep the SendableGet and not just the Key for 
various reasons). It is possible to make it much simpler, given queries 
and/or TreeMap's, and one object per entry. The cost is significantly higher 
memory usage. That's not a problem if it's only for nonpersistent requests, 
and of course the persistent one doesn't have to worry too much about 
footprint - not that the size of the on-disk database doesn't matter, but 
it's more important to not have to bring the entire arrays into RAM than for 
them to be small overall.

[13:48] <toad_> when the user deletes a request we will need to delete its 
structures
[13:49] <toad_> which means we need to be careful to avoid sharing stuff

E.g. FreenetURI's.

Not having garbage collection is a problem, but hopefully we can deal with it.

[13:50] <toad_> we can use activation and deactivation to keep the top levels 
of the queue structure in RAM, or not to, depending on config ... that's not 
something we need to worry about atm...
[13:51] <toad_> ... and if we need to change stuff, there are extensive 
options for schema evolution
[13:51] <toad_> so i think that's about it for the preliminary design
[13:51] <toad_> three days after i started the low level implementation! :)

Another item we've already discussed: We must backup the database every 6 
hours (including downtime, so as soon as possible after starting up if the 
last one was more than 6 hours ago) or whenever a major change happens, such 
as adding a new persistent request.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20080522/e5494ae1/attachment.pgp>

[freenet-dev] Preliminary design notes for queue-in-database (3 days after starting implementation!!)

Reply via email to