Re: [freenet-dev] New database for Freenet: db4o

Matthew Toseland Mon, 19 May 2008 03:44:51 -0700

On Sunday 18 May 2008 05:27, Florent Daignière wrote:
> * Matthew Toseland <[EMAIL PROTECTED]> [2008-05-17 19:00:13]:
> 
> > On Saturday 17 May 2008 00:29, Matthew Toseland wrote:
> > > Ian and I have eventually come to the conclusion that we should include 
> > db4o, 
> > > and use it for our various persistence needs. I eventually reached the 
> > > conclusion that while we can do most of what we need to do with simple 
> > > flatfile databases, there are big chunks that will require a real 
database 
> > of 
> > > some kind (even if it's only a persistent hash table). db4o has various 
> > > advantages:
> > > - Robust in real-world use. See for example this testimonial from a 
company 
> > > who used it on cell phones:
> > > http://www.db4o.com/about/customers/success/mandalait.aspx
> > > BDBJE has not met our expectations in this regard. It seems very 
sensitive 
> > to 
> > > unusual situations - in particular, it will spontaneously corrupt and 
lose 
> > > all data on running out of disk space.
> > > - True object database: no SQL, simple and powerful queries, etc.
> > > - Transparent or manual activation of objects from storage.
> > > - 800K jar, so not big enough to be a problem.
> > > - Mature and actively maintained.
> > > - Allows for future expansion (e.g. passive requests will need to store 
a 
> > fair 
> > > amount of persistent data).
> > > - Much more flexible than the hand-coded solution I was thinking of. We 
can 
> > > persistent the entire queue (not just the splitfiles), if it's useful to 
do 
> > > that.
> > > - Transactions (although this requires some juggling of in-memory 
objects on 
> > > rollback).
> > > 
> > > Tasks:
> > > - Add db4o to freenet-ext.jar.
> > > - Think about using it for the datastore. We don't want to have two 
> > databases! 
> > > Sdiz's new datastore may be the One True Store, or it may not be. If 
it's 
> > > not, we don't want to keep BDBJE: we could build a db4o-based store, 
with or 
> > > without LRU replacement. It would have the advantage of filling up more 
> > > quickly than sdiz's store. It should require reconstructing less 
frequently 
> > > than BDBJE!
> > > - Migrate the client layer, including splitfiles, pendingKeys, and so 
on, to 
> > > be persisted via db4o. Of course there will be latency here when objects 
are 
> > > not cached, so we will need to cache a few request choices in advance 
for 
> > > each RequestStarter. And we will need to devise some way to deal with 
> > > requests that don't want to be persisted - presumably we'd keep them in 
RAM.
> > > 
> > It turns out that db4o does indeed unrecoverably self-corrupt when it runs 
out 
> > of disk space. (Thanks nextgens for getting me to test this!)
> > 
> > http://amphibian.dyndns.org/bdb4o-test.log
> 
> muhahahaha.
> 
> Last time I checked the bdb database was recoverable... Okay
> it lost some^wmost of the data in the process but at least it did
> attempt to recover!


It attempts to recover (iff we try to use the DbDump/DbLoad tools). It does 
not succeed. Because we have secondary indexes, it ends up dropping almost 
everything.
> 
> > We will therefore have to keep a fallback. IMHO for the client layer the 
> > fallback should be downloads.dat.gz. We are careful not to lose that when 
we 
> > run out of disk space, and it should only contain what is needed to 
restart 
> > requests from the beginning (in practice a lot will come from the store).
> ...
> 
> While we are at it, what's wrong with bdb-je's persistence framework
> again ?
> http://www.oracle.com/database/berkeley-db/je/index.html

The fact that it belongs to BDBJE? I dunno, it is possible that *every* report 
of corruption of BDBJE is because of hardware issues... but we've certainly 
had lots of them, and not only on out of disk space either...

Wouldn't a native object database be better?
> 
> > I apologise if the above was presented as a fait accompli, any input on 
> > databases would be appreciated. On Friday, me and Ian spent a long time 
> > debating the issue, first and foremost of whether we should even have a 
> > database; I was initially in favour of not having one at all, or using 
jdbm's 
> > persistent hashtable class (HTree).
> > 
> > Personally I think if we have a database it should be a native object 
database 
> > i.e. either Perst or db4o. It also should be robust, low overhead, mature, 
> > open source etc. I will start implementing the new client layer with db4o 
> > soon, unless convinced to use something else in the meantime. But it seems 
> > that with BDBJE (which isn't a native object database), you can lose the 
> > database even by an unclean shutdown... can anyone confirm this from 
> > experience? Or is it only out of disk space and memory corruption that 
causes 
> > this?
> 
> I'm still not convinced that we need a database... as our requirements
> are completely different from their typical use-cases... but well, your
> immediate concern is to store persistent requests to disk, right? What
> about using Hibernate or javax.persistence (from EE) to do that ?
> 
Hibernate needs SQL and is really heavy.

I might be willing to go with ripping the code for a persistent hashtable from 
jdbm and rolling our own for the rest, but Ian isn't.

Our requirements, at the moment, are:
- The client layer queue. With an object database, we can persist all of this 
(assuming we make backups etc); without one, it's easier to keep 
downloads.dat.gz for the top level, and move the bottom level request-grabber 
structures to files (for splitfiles only).
- pendingKeys. This is a map of key to SendableGet or SendableGet[]. In the 
above case, a SplitfileFetcher would be a SendableGet. Anyway, this is a 
hashtable, it maps from a key (fixed length object) to a potentially large 
number of SendableGet's. One option is to keep it in RAM. We would still save 
some memory. However, it would be better to keep it on disk / in database. 
Thus we need a persistent hashtable of some kind, at least in the long run. 
We could grab one from jdbm, but then we'd have to maintain jdbm. Or we could 
use a real object database and save *everything* to it. Implementing an 
on-disk hashtable ourselves is another option, but it would require chaining 
and therefore garbage collection... Quadratic probing for example probably 
wouldn't work well for us, since it needs to be reliable and need few seeks.

pgps8d2gddTEv.pgp
Description: PGP signature

_______________________________________________
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] New database for Freenet: db4o

Reply via email to