Re: [freenet-dev] New database for Freenet: db4o

2008-05-19 Thread Matthew Toseland
On Sunday 18 May 2008 05:27, Florent Daignière wrote:
 * Matthew Toseland [EMAIL PROTECTED] [2008-05-17 19:00:13]:
 
  On Saturday 17 May 2008 00:29, Matthew Toseland wrote:
   Ian and I have eventually come to the conclusion that we should include 
  db4o, 
   and use it for our various persistence needs. I eventually reached the 
   conclusion that while we can do most of what we need to do with simple 
   flatfile databases, there are big chunks that will require a real 
database 
  of 
   some kind (even if it's only a persistent hash table). db4o has various 
   advantages:
   - Robust in real-world use. See for example this testimonial from a 
company 
   who used it on cell phones:
   http://www.db4o.com/about/customers/success/mandalait.aspx
   BDBJE has not met our expectations in this regard. It seems very 
sensitive 
  to 
   unusual situations - in particular, it will spontaneously corrupt and 
lose 
   all data on running out of disk space.
   - True object database: no SQL, simple and powerful queries, etc.
   - Transparent or manual activation of objects from storage.
   - 800K jar, so not big enough to be a problem.
   - Mature and actively maintained.
   - Allows for future expansion (e.g. passive requests will need to store 
a 
  fair 
   amount of persistent data).
   - Much more flexible than the hand-coded solution I was thinking of. We 
can 
   persistent the entire queue (not just the splitfiles), if it's useful to 
do 
   that.
   - Transactions (although this requires some juggling of in-memory 
objects on 
   rollback).
   
   Tasks:
   - Add db4o to freenet-ext.jar.
   - Think about using it for the datastore. We don't want to have two 
  databases! 
   Sdiz's new datastore may be the One True Store, or it may not be. If 
it's 
   not, we don't want to keep BDBJE: we could build a db4o-based store, 
with or 
   without LRU replacement. It would have the advantage of filling up more 
   quickly than sdiz's store. It should require reconstructing less 
frequently 
   than BDBJE!
   - Migrate the client layer, including splitfiles, pendingKeys, and so 
on, to 
   be persisted via db4o. Of course there will be latency here when objects 
are 
   not cached, so we will need to cache a few request choices in advance 
for 
   each RequestStarter. And we will need to devise some way to deal with 
   requests that don't want to be persisted - presumably we'd keep them in 
RAM.
   
  It turns out that db4o does indeed unrecoverably self-corrupt when it runs 
out 
  of disk space. (Thanks nextgens for getting me to test this!)
  
  http://amphibian.dyndns.org/bdb4o-test.log
 
 muhahahaha.
 
 Last time I checked the bdb database was recoverable... Okay
 it lost some^wmost of the data in the process but at least it did
 attempt to recover!

It attempts to recover (iff we try to use the DbDump/DbLoad tools). It does 
not succeed. Because we have secondary indexes, it ends up dropping almost 
everything.
 
  We will therefore have to keep a fallback. IMHO for the client layer the 
  fallback should be downloads.dat.gz. We are careful not to lose that when 
we 
  run out of disk space, and it should only contain what is needed to 
restart 
  requests from the beginning (in practice a lot will come from the store).
 ...
 
 While we are at it, what's wrong with bdb-je's persistence framework
 again ?
 http://www.oracle.com/database/berkeley-db/je/index.html

The fact that it belongs to BDBJE? I dunno, it is possible that *every* report 
of corruption of BDBJE is because of hardware issues... but we've certainly 
had lots of them, and not only on out of disk space either...

Wouldn't a native object database be better?
 
  I apologise if the above was presented as a fait accompli, any input on 
  databases would be appreciated. On Friday, me and Ian spent a long time 
  debating the issue, first and foremost of whether we should even have a 
  database; I was initially in favour of not having one at all, or using 
jdbm's 
  persistent hashtable class (HTree).
  
  Personally I think if we have a database it should be a native object 
database 
  i.e. either Perst or db4o. It also should be robust, low overhead, mature, 
  open source etc. I will start implementing the new client layer with db4o 
  soon, unless convinced to use something else in the meantime. But it seems 
  that with BDBJE (which isn't a native object database), you can lose the 
  database even by an unclean shutdown... can anyone confirm this from 
  experience? Or is it only out of disk space and memory corruption that 
causes 
  this?
 
 I'm still not convinced that we need a database... as our requirements
 are completely different from their typical use-cases... but well, your
 immediate concern is to store persistent requests to disk, right? What
 about using Hibernate or javax.persistence (from EE) to do that ?
 
Hibernate needs SQL and is really heavy.

I might be willing to go with ripping the code for a persistent 

Re: [freenet-dev] New database for Freenet: db4o

2008-05-19 Thread Matthew Toseland
On Sunday 18 May 2008 19:44, Ian Clarke wrote:
 I've got to say, I really hope Perst employ better software engineers
 than their web designers, because their website is awful.  It somewhat
 shakes my confidence in them.  I know this seems like a very
 superficial judgement, but if they put little care into the public
 face of their software, it is reasonable to suspect that they may not
 put too much care into the software itself.

You said pretty much the same about ours!
 
 Looking at the manual, it looks like Perst operates at a lower level
 than db4o - you need to manually create and maintain indexes.  This is
 closer to the Java collections API, which could be good because its
 more familiar, but it leaves more opportunities for the programmer to
 screw up, which is bad.

IMHO we will get better performance and fewer screwups from a more familiar 
API.
 
 My preference for db4o is based mainly on my familiarity with it, the
 fact that it is more widely used, and the fact that it superficially
 appears to have a more credible company behind it.
 
 None of these is a solid justification for choosing one option over
 the other, but they all suggest that db4o is the way to go.

We should, as bback suggests, test Perst, see what happens when it runs out of 
disk space. I will look into it soon.
 
 Ian.
 
 On Sun, May 18, 2008 at 11:08 AM,  [EMAIL PROTECTED] wrote:
  I know I repeat myself, but if you consider db4o you should also
  consider perst as an option.
  Toad, can you do your 'disk full test' with perst to compare it against 
db4o?
 
  Perst and db4o seem to provide the same things. But according to
  http://www.garret.ru/~knizhnik/perstbench.html
  perst is much faster.
 
  No, I don't get money for advertising perst, but I made good
  experiences with it.
 
  On Sun, May 18, 2008 at 12:46 PM, Daniel Cheng
  [EMAIL PROTECTED] wrote:
  On Sun, May 18, 2008 at 12:27 PM, Florent Daignière
  [EMAIL PROTECTED] wrote:
  * Matthew Toseland [EMAIL PROTECTED] [2008-05-17 19:00:13]:
 
  On Saturday 17 May 2008 00:29, Matthew Toseland wrote:
   Ian and I have eventually come to the conclusion that we should 
include
  db4o,
   and use it for our various persistence needs. I eventually reached 
the
   conclusion that while we can do most of what we need to do with 
simple
   flatfile databases, there are big chunks that will require a real 
database
  of
   some kind (even if it's only a persistent hash table). db4o has 
various
   advantages:
   - Robust in real-world use. See for example this testimonial from a 
company
   who used it on cell phones:
   http://www.db4o.com/about/customers/success/mandalait.aspx
   BDBJE has not met our expectations in this regard. It seems very 
sensitive
  to
   unusual situations - in particular, it will spontaneously corrupt and 
lose
   all data on running out of disk space.
   - True object database: no SQL, simple and powerful queries, etc.
   - Transparent or manual activation of objects from storage.
   - 800K jar, so not big enough to be a problem.
   - Mature and actively maintained.
   - Allows for future expansion (e.g. passive requests will need to 
store a
  fair
   amount of persistent data).
   - Much more flexible than the hand-coded solution I was thinking of. 
We can
   persistent the entire queue (not just the splitfiles), if it's useful 
to do
   that.
   - Transactions (although this requires some juggling of in-memory 
objects on
   rollback).
  
   Tasks:
   - Add db4o to freenet-ext.jar.
   - Think about using it for the datastore. We don't want to have two
  databases!
   Sdiz's new datastore may be the One True Store, or it may not be. If 
it's
   not, we don't want to keep BDBJE: we could build a db4o-based store, 
with or
   without LRU replacement. It would have the advantage of filling up 
more
   quickly than sdiz's store. It should require reconstructing less 
frequently
   than BDBJE!
   - Migrate the client layer, including splitfiles, pendingKeys, and so 
on, to
   be persisted via db4o. Of course there will be latency here when 
objects are
   not cached, so we will need to cache a few request choices in advance 
for
   each RequestStarter. And we will need to devise some way to deal with
   requests that don't want to be persisted - presumably we'd keep them 
in RAM.
  
  It turns out that db4o does indeed unrecoverably self-corrupt when it 
runs out
  of disk space. (Thanks nextgens for getting me to test this!)
 
  http://amphibian.dyndns.org/bdb4o-test.log
 
 
  muhahahaha.
 
  Last time I checked the bdb database was recoverable... Okay
  it lost some^wmost of the data in the process but at least it did
  attempt to recover!
 
  Both BDB (the original C version) and BDB-JE have very bad history on
  recovering.
  Why should we lost some data when there are better alternative that 
don't?
 
  We will therefore have to keep a fallback. IMHO for the client layer 
the
  fallback should be downloads.dat.gz. We are careful not to lose 

Re: [freenet-dev] New database for Freenet: db4o

2008-05-19 Thread Matthew Toseland
On Monday 19 May 2008 11:34, Matthew Toseland wrote:
 On Sunday 18 May 2008 19:44, Ian Clarke wrote:
  
  Looking at the manual, it looks like Perst operates at a lower level
  than db4o - you need to manually create and maintain indexes.  This is
  closer to the Java collections API, which could be good because its
  more familiar, but it leaves more opportunities for the programmer to
  screw up, which is bad.
 
 IMHO we will get better performance and fewer screwups from a more familiar 
 API.
  
  My preference for db4o is based mainly on my familiarity with it, the
  fact that it is more widely used, and the fact that it superficially
  appears to have a more credible company behind it.
  
  None of these is a solid justification for choosing one option over
  the other, but they all suggest that db4o is the way to go.
 
 We should, as bback suggests, test Perst, see what happens when it runs out 
of 
 disk space. I will look into it soon.

Completed testing. Perst fails gracefully on running out of disk space in 
between writes. The database is readable after the failed commit, and if I 
take away the bigfile occupying all the disk space, we can even write to it.

How about we go with Perst?
  
  Ian.
  
  On Sun, May 18, 2008 at 11:08 AM,  [EMAIL PROTECTED] wrote:
   I know I repeat myself, but if you consider db4o you should also
   consider perst as an option.
   Toad, can you do your 'disk full test' with perst to compare it against 
 db4o?
  
   Perst and db4o seem to provide the same things. But according to
   http://www.garret.ru/~knizhnik/perstbench.html
   perst is much faster.
  
   No, I don't get money for advertising perst, but I made good
   experiences with it.
  
   On Sun, May 18, 2008 at 12:46 PM, Daniel Cheng
   [EMAIL PROTECTED] wrote:
   On Sun, May 18, 2008 at 12:27 PM, Florent Daignière
   [EMAIL PROTECTED] wrote:
   * Matthew Toseland [EMAIL PROTECTED] [2008-05-17 19:00:13]:
  
   On Saturday 17 May 2008 00:29, Matthew Toseland wrote:
Ian and I have eventually come to the conclusion that we should 
 include
   db4o,
and use it for our various persistence needs. I eventually reached 
 the
conclusion that while we can do most of what we need to do with 
 simple
flatfile databases, there are big chunks that will require a real 
 database
   of
some kind (even if it's only a persistent hash table). db4o has 
 various
advantages:
- Robust in real-world use. See for example this testimonial from a 
 company
who used it on cell phones:
http://www.db4o.com/about/customers/success/mandalait.aspx
BDBJE has not met our expectations in this regard. It seems very 
 sensitive
   to
unusual situations - in particular, it will spontaneously corrupt 
and 
 lose
all data on running out of disk space.
- True object database: no SQL, simple and powerful queries, etc.
- Transparent or manual activation of objects from storage.
- 800K jar, so not big enough to be a problem.
- Mature and actively maintained.
- Allows for future expansion (e.g. passive requests will need to 
 store a
   fair
amount of persistent data).
- Much more flexible than the hand-coded solution I was thinking 
of. 
 We can
persistent the entire queue (not just the splitfiles), if it's 
useful 
 to do
that.
- Transactions (although this requires some juggling of in-memory 
 objects on
rollback).
   
Tasks:
- Add db4o to freenet-ext.jar.
- Think about using it for the datastore. We don't want to have two
   databases!
Sdiz's new datastore may be the One True Store, or it may not be. 
If 
 it's
not, we don't want to keep BDBJE: we could build a db4o-based 
store, 
 with or
without LRU replacement. It would have the advantage of filling up 
 more
quickly than sdiz's store. It should require reconstructing less 
 frequently
than BDBJE!
- Migrate the client layer, including splitfiles, pendingKeys, and 
so 
 on, to
be persisted via db4o. Of course there will be latency here when 
 objects are
not cached, so we will need to cache a few request choices in 
advance 
 for
each RequestStarter. And we will need to devise some way to deal 
with
requests that don't want to be persisted - presumably we'd keep 
them 
 in RAM.
   
   It turns out that db4o does indeed unrecoverably self-corrupt when it 
 runs out
   of disk space. (Thanks nextgens for getting me to test this!)
  
   http://amphibian.dyndns.org/bdb4o-test.log
  
  
   muhahahaha.
  
   Last time I checked the bdb database was recoverable... Okay
   it lost some^wmost of the data in the process but at least it did
   attempt to recover!
  
   Both BDB (the original C version) and BDB-JE have very bad history on
   recovering.
   Why should we lost some data when there are better alternative that 
 don't?
  
   We will therefore have to keep a fallback. IMHO for the client layer 
 the
   fallback should be downloads.dat.gz. We are 

Re: [freenet-dev] New database for Freenet: db4o

2008-05-18 Thread Daniel Cheng
On Sun, May 18, 2008 at 12:27 PM, Florent Daignière
[EMAIL PROTECTED] wrote:
 * Matthew Toseland [EMAIL PROTECTED] [2008-05-17 19:00:13]:

 On Saturday 17 May 2008 00:29, Matthew Toseland wrote:
  Ian and I have eventually come to the conclusion that we should include
 db4o,
  and use it for our various persistence needs. I eventually reached the
  conclusion that while we can do most of what we need to do with simple
  flatfile databases, there are big chunks that will require a real database
 of
  some kind (even if it's only a persistent hash table). db4o has various
  advantages:
  - Robust in real-world use. See for example this testimonial from a company
  who used it on cell phones:
  http://www.db4o.com/about/customers/success/mandalait.aspx
  BDBJE has not met our expectations in this regard. It seems very sensitive
 to
  unusual situations - in particular, it will spontaneously corrupt and lose
  all data on running out of disk space.
  - True object database: no SQL, simple and powerful queries, etc.
  - Transparent or manual activation of objects from storage.
  - 800K jar, so not big enough to be a problem.
  - Mature and actively maintained.
  - Allows for future expansion (e.g. passive requests will need to store a
 fair
  amount of persistent data).
  - Much more flexible than the hand-coded solution I was thinking of. We can
  persistent the entire queue (not just the splitfiles), if it's useful to do
  that.
  - Transactions (although this requires some juggling of in-memory objects 
  on
  rollback).
 
  Tasks:
  - Add db4o to freenet-ext.jar.
  - Think about using it for the datastore. We don't want to have two
 databases!
  Sdiz's new datastore may be the One True Store, or it may not be. If it's
  not, we don't want to keep BDBJE: we could build a db4o-based store, with 
  or
  without LRU replacement. It would have the advantage of filling up more
  quickly than sdiz's store. It should require reconstructing less frequently
  than BDBJE!
  - Migrate the client layer, including splitfiles, pendingKeys, and so on, 
  to
  be persisted via db4o. Of course there will be latency here when objects 
  are
  not cached, so we will need to cache a few request choices in advance for
  each RequestStarter. And we will need to devise some way to deal with
  requests that don't want to be persisted - presumably we'd keep them in 
  RAM.
 
 It turns out that db4o does indeed unrecoverably self-corrupt when it runs 
 out
 of disk space. (Thanks nextgens for getting me to test this!)

 http://amphibian.dyndns.org/bdb4o-test.log


 muhahahaha.

 Last time I checked the bdb database was recoverable... Okay
 it lost some^wmost of the data in the process but at least it did
 attempt to recover!

Both BDB (the original C version) and BDB-JE have very bad history on
recovering.
Why should we lost some data when there are better alternative that don't?

 We will therefore have to keep a fallback. IMHO for the client layer the
 fallback should be downloads.dat.gz. We are careful not to lose that when we
 run out of disk space, and it should only contain what is needed to restart
 requests from the beginning (in practice a lot will come from the store).

 ...

 While we are at it, what's wrong with bdb-je's persistence framework
 again ?
 http://www.oracle.com/database/berkeley-db/je/index.html


Another reason to get rid of BDB-JE is memory usage and performance.

db4o is usable on J2ME and scalable to terabytes database. I think
this say a lot.

 I apologise if the above was presented as a fait accompli, any input on
 databases would be appreciated. On Friday, me and Ian spent a long time
 debating the issue, first and foremost of whether we should even have a
 database; I was initially in favour of not having one at all, or using jdbm's
 persistent hashtable class (HTree).

 Personally I think if we have a database it should be a native object 
 database
 i.e. either Perst or db4o. It also should be robust, low overhead, mature,
 open source etc. I will start implementing the new client layer with db4o
 soon, unless convinced to use something else in the meantime. But it seems
 that with BDBJE (which isn't a native object database), you can lose the
 database even by an unclean shutdown... can anyone confirm this from
 experience? Or is it only out of disk space and memory corruption that causes
 this?

 I'm still not convinced that we need a database... as our requirements
 are completely different from their typical use-cases... but well, your
 immediate concern is to store persistent requests to disk, right? What
 about using Hibernate or javax.persistence (from EE) to do that ?

eee
Hibernate is just ORM -- You need a sql backend for that.
(I am not oppose to the idea of using a sql backend, but then we have
to decide which one to use)

javax.persistence have Java5 dependency, and you need a J2EE
container. just too ugly.

--
___
Devl mailing list

Re: [freenet-dev] New database for Freenet: db4o

2008-05-18 Thread bbackde
I know I repeat myself, but if you consider db4o you should also
consider perst as an option.
Toad, can you do your 'disk full test' with perst to compare it against db4o?

Perst and db4o seem to provide the same things. But according to
http://www.garret.ru/~knizhnik/perstbench.html
perst is much faster.

No, I don't get money for advertising perst, but I made good
experiences with it.

On Sun, May 18, 2008 at 12:46 PM, Daniel Cheng
[EMAIL PROTECTED] wrote:
 On Sun, May 18, 2008 at 12:27 PM, Florent Daignière
 [EMAIL PROTECTED] wrote:
 * Matthew Toseland [EMAIL PROTECTED] [2008-05-17 19:00:13]:

 On Saturday 17 May 2008 00:29, Matthew Toseland wrote:
  Ian and I have eventually come to the conclusion that we should include
 db4o,
  and use it for our various persistence needs. I eventually reached the
  conclusion that while we can do most of what we need to do with simple
  flatfile databases, there are big chunks that will require a real database
 of
  some kind (even if it's only a persistent hash table). db4o has various
  advantages:
  - Robust in real-world use. See for example this testimonial from a 
  company
  who used it on cell phones:
  http://www.db4o.com/about/customers/success/mandalait.aspx
  BDBJE has not met our expectations in this regard. It seems very sensitive
 to
  unusual situations - in particular, it will spontaneously corrupt and lose
  all data on running out of disk space.
  - True object database: no SQL, simple and powerful queries, etc.
  - Transparent or manual activation of objects from storage.
  - 800K jar, so not big enough to be a problem.
  - Mature and actively maintained.
  - Allows for future expansion (e.g. passive requests will need to store a
 fair
  amount of persistent data).
  - Much more flexible than the hand-coded solution I was thinking of. We 
  can
  persistent the entire queue (not just the splitfiles), if it's useful to 
  do
  that.
  - Transactions (although this requires some juggling of in-memory objects 
  on
  rollback).
 
  Tasks:
  - Add db4o to freenet-ext.jar.
  - Think about using it for the datastore. We don't want to have two
 databases!
  Sdiz's new datastore may be the One True Store, or it may not be. If it's
  not, we don't want to keep BDBJE: we could build a db4o-based store, with 
  or
  without LRU replacement. It would have the advantage of filling up more
  quickly than sdiz's store. It should require reconstructing less 
  frequently
  than BDBJE!
  - Migrate the client layer, including splitfiles, pendingKeys, and so on, 
  to
  be persisted via db4o. Of course there will be latency here when objects 
  are
  not cached, so we will need to cache a few request choices in advance for
  each RequestStarter. And we will need to devise some way to deal with
  requests that don't want to be persisted - presumably we'd keep them in 
  RAM.
 
 It turns out that db4o does indeed unrecoverably self-corrupt when it runs 
 out
 of disk space. (Thanks nextgens for getting me to test this!)

 http://amphibian.dyndns.org/bdb4o-test.log


 muhahahaha.

 Last time I checked the bdb database was recoverable... Okay
 it lost some^wmost of the data in the process but at least it did
 attempt to recover!

 Both BDB (the original C version) and BDB-JE have very bad history on
 recovering.
 Why should we lost some data when there are better alternative that don't?

 We will therefore have to keep a fallback. IMHO for the client layer the
 fallback should be downloads.dat.gz. We are careful not to lose that when we
 run out of disk space, and it should only contain what is needed to restart
 requests from the beginning (in practice a lot will come from the store).

 ...

 While we are at it, what's wrong with bdb-je's persistence framework
 again ?
 http://www.oracle.com/database/berkeley-db/je/index.html


 Another reason to get rid of BDB-JE is memory usage and performance.

 db4o is usable on J2ME and scalable to terabytes database. I think
 this say a lot.

 I apologise if the above was presented as a fait accompli, any input on
 databases would be appreciated. On Friday, me and Ian spent a long time
 debating the issue, first and foremost of whether we should even have a
 database; I was initially in favour of not having one at all, or using 
 jdbm's
 persistent hashtable class (HTree).

 Personally I think if we have a database it should be a native object 
 database
 i.e. either Perst or db4o. It also should be robust, low overhead, mature,
 open source etc. I will start implementing the new client layer with db4o
 soon, unless convinced to use something else in the meantime. But it seems
 that with BDBJE (which isn't a native object database), you can lose the
 database even by an unclean shutdown... can anyone confirm this from
 experience? Or is it only out of disk space and memory corruption that 
 causes
 this?

 I'm still not convinced that we need a database... as our requirements
 are completely different from their typical 

Re: [freenet-dev] New database for Freenet: db4o

2008-05-18 Thread Ian Clarke
I've got to say, I really hope Perst employ better software engineers
than their web designers, because their website is awful.  It somewhat
shakes my confidence in them.  I know this seems like a very
superficial judgement, but if they put little care into the public
face of their software, it is reasonable to suspect that they may not
put too much care into the software itself.

Looking at the manual, it looks like Perst operates at a lower level
than db4o - you need to manually create and maintain indexes.  This is
closer to the Java collections API, which could be good because its
more familiar, but it leaves more opportunities for the programmer to
screw up, which is bad.

My preference for db4o is based mainly on my familiarity with it, the
fact that it is more widely used, and the fact that it superficially
appears to have a more credible company behind it.

None of these is a solid justification for choosing one option over
the other, but they all suggest that db4o is the way to go.

Ian.

On Sun, May 18, 2008 at 11:08 AM,  [EMAIL PROTECTED] wrote:
 I know I repeat myself, but if you consider db4o you should also
 consider perst as an option.
 Toad, can you do your 'disk full test' with perst to compare it against db4o?

 Perst and db4o seem to provide the same things. But according to
 http://www.garret.ru/~knizhnik/perstbench.html
 perst is much faster.

 No, I don't get money for advertising perst, but I made good
 experiences with it.

 On Sun, May 18, 2008 at 12:46 PM, Daniel Cheng
 [EMAIL PROTECTED] wrote:
 On Sun, May 18, 2008 at 12:27 PM, Florent Daignière
 [EMAIL PROTECTED] wrote:
 * Matthew Toseland [EMAIL PROTECTED] [2008-05-17 19:00:13]:

 On Saturday 17 May 2008 00:29, Matthew Toseland wrote:
  Ian and I have eventually come to the conclusion that we should include
 db4o,
  and use it for our various persistence needs. I eventually reached the
  conclusion that while we can do most of what we need to do with simple
  flatfile databases, there are big chunks that will require a real 
  database
 of
  some kind (even if it's only a persistent hash table). db4o has various
  advantages:
  - Robust in real-world use. See for example this testimonial from a 
  company
  who used it on cell phones:
  http://www.db4o.com/about/customers/success/mandalait.aspx
  BDBJE has not met our expectations in this regard. It seems very 
  sensitive
 to
  unusual situations - in particular, it will spontaneously corrupt and 
  lose
  all data on running out of disk space.
  - True object database: no SQL, simple and powerful queries, etc.
  - Transparent or manual activation of objects from storage.
  - 800K jar, so not big enough to be a problem.
  - Mature and actively maintained.
  - Allows for future expansion (e.g. passive requests will need to store a
 fair
  amount of persistent data).
  - Much more flexible than the hand-coded solution I was thinking of. We 
  can
  persistent the entire queue (not just the splitfiles), if it's useful to 
  do
  that.
  - Transactions (although this requires some juggling of in-memory 
  objects on
  rollback).
 
  Tasks:
  - Add db4o to freenet-ext.jar.
  - Think about using it for the datastore. We don't want to have two
 databases!
  Sdiz's new datastore may be the One True Store, or it may not be. If it's
  not, we don't want to keep BDBJE: we could build a db4o-based store, 
  with or
  without LRU replacement. It would have the advantage of filling up more
  quickly than sdiz's store. It should require reconstructing less 
  frequently
  than BDBJE!
  - Migrate the client layer, including splitfiles, pendingKeys, and so 
  on, to
  be persisted via db4o. Of course there will be latency here when objects 
  are
  not cached, so we will need to cache a few request choices in advance for
  each RequestStarter. And we will need to devise some way to deal with
  requests that don't want to be persisted - presumably we'd keep them in 
  RAM.
 
 It turns out that db4o does indeed unrecoverably self-corrupt when it runs 
 out
 of disk space. (Thanks nextgens for getting me to test this!)

 http://amphibian.dyndns.org/bdb4o-test.log


 muhahahaha.

 Last time I checked the bdb database was recoverable... Okay
 it lost some^wmost of the data in the process but at least it did
 attempt to recover!

 Both BDB (the original C version) and BDB-JE have very bad history on
 recovering.
 Why should we lost some data when there are better alternative that don't?

 We will therefore have to keep a fallback. IMHO for the client layer the
 fallback should be downloads.dat.gz. We are careful not to lose that when 
 we
 run out of disk space, and it should only contain what is needed to restart
 requests from the beginning (in practice a lot will come from the store).

 ...

 While we are at it, what's wrong with bdb-je's persistence framework
 again ?
 http://www.oracle.com/database/berkeley-db/je/index.html


 Another reason to get rid of BDB-JE is memory usage and 

Re: [freenet-dev] New database for Freenet: db4o

2008-05-17 Thread Matthew Toseland
On Saturday 17 May 2008 06:24, Ian Clarke wrote:
 On Fri, May 16, 2008 at 11:40 PM, Daniel Cheng
 [EMAIL PROTECTED] wrote:
  On Sat, May 17, 2008 at 7:29 AM, Matthew Toseland
  [EMAIL PROTECTED] wrote:
  Ian and I have eventually come to the conclusion that we should include 
db4o,
 
 Yay.
 
  Of course there will be latency here when objects are
  not cached, so we will need to cache a few request choices in advance for
  each RequestStarter. And we will need to devise some way to deal with
  requests that don't want to be persisted - presumably we'd keep them in 
RAM.
 
  Please don't.
  According to what I have read, db4o should be good enough to use directly:
 
 I agree with Daniel, DIY may be an admirable quality when it comes to
 home and car repair, but not with software.
 
 One of the benefits of using third-party stuff like db4o is that we
 can outsource problems to people far more focussed on the solutions to
 those problems than we can afford to be.
 
 If we spot a problem, we should fix it, but let's not fall into the
 premature optomization trap.

Which part of my proposal are you both criticising?

If it's the we should cache a few request choices in advance bit then I 
stand by that. Any database query (assuming the working set is large) will 
involve many dependant disk accesses. We have to send a request roughly every 
800ms on my node (for SSKs). With slow commodity drives, often with other 
disk load going on, and with fairly complex database accesses involving 
multiple tables and therefore *many* mostly dependant seeks, we are not going 
to reliably meet that deadline no matter how good the database is. I will add 
a statistic for this, but my system is massively overpowered for this sort of 
effect, so we will need to test it on volunteers' slow systems.

If it's the requests that don't want to be persisted, what would you do with 
fproxy requests and other non-persistent requests? Store them to disk anyway?


pgp5OIGFuOWd4.pgp
Description: PGP signature
___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] New database for Freenet: db4o

2008-05-17 Thread Daniel Cheng
On Sat, May 17, 2008 at 7:06 PM, Matthew Toseland
[EMAIL PROTECTED] wrote:
 On Saturday 17 May 2008 06:24, Ian Clarke wrote:
 On Fri, May 16, 2008 at 11:40 PM, Daniel Cheng
 [EMAIL PROTECTED] wrote:
  On Sat, May 17, 2008 at 7:29 AM, Matthew Toseland
  [EMAIL PROTECTED] wrote:
  Ian and I have eventually come to the conclusion that we should include
 db4o,

 Yay.

  Of course there will be latency here when objects are
  not cached, so we will need to cache a few request choices in advance for
  each RequestStarter. And we will need to devise some way to deal with
  requests that don't want to be persisted - presumably we'd keep them in
 RAM.
 
  Please don't.
  According to what I have read, db4o should be good enough to use directly:

 I agree with Daniel, DIY may be an admirable quality when it comes to
 home and car repair, but not with software.

 One of the benefits of using third-party stuff like db4o is that we
 can outsource problems to people far more focussed on the solutions to
 those problems than we can afford to be.

 If we spot a problem, we should fix it, but let's not fall into the
 premature optomization trap.

 Which part of my proposal are you both criticising?

 If it's the we should cache a few request choices in advance bit then I
 stand by that. Any database query (assuming the working set is large) will
 involve many dependant disk accesses.

Did you read my original post? There were three features in db4o to
offload that a bit... CachedIoAdapter, prefetching, weakreferences ..
see my previous post for details on that.

 We have to send a request roughly every
 800ms on my node (for SSKs). With slow commodity drives, often with other
 disk load going on, and with fairly complex database accesses involving
 multiple tables and therefore *many* mostly dependant seeks, we are not going
 to reliably meet that deadline no matter how good the database is. I will add
 a statistic for this, but my system is massively overpowered for this sort of
 effect, so we will need to test it on volunteers' slow systems.

 If it's the requests that don't want to be persisted, what would you do with
 fproxy requests and other non-persistent requests? Store them to disk anyway?

 ___
 Devl mailing list
 Devl@freenetproject.org
 http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl


Re: [freenet-dev] New database for Freenet: db4o

2008-05-17 Thread Matthew Toseland
On Saturday 17 May 2008 14:07, Daniel Cheng wrote:
 On Sat, May 17, 2008 at 7:06 PM, Matthew Toseland
 [EMAIL PROTECTED] wrote:
  On Saturday 17 May 2008 06:24, Ian Clarke wrote:
  On Fri, May 16, 2008 at 11:40 PM, Daniel Cheng
  [EMAIL PROTECTED] wrote:
   On Sat, May 17, 2008 at 7:29 AM, Matthew Toseland
   [EMAIL PROTECTED] wrote:
   Ian and I have eventually come to the conclusion that we should 
include
  db4o,
 
  Yay.
 
   Of course there will be latency here when objects are
   not cached, so we will need to cache a few request choices in advance 
for
   each RequestStarter. And we will need to devise some way to deal with
   requests that don't want to be persisted - presumably we'd keep them 
in
  RAM.
  
   Please don't.
   According to what I have read, db4o should be good enough to use 
directly:
 
  I agree with Daniel, DIY may be an admirable quality when it comes to
  home and car repair, but not with software.
 
  One of the benefits of using third-party stuff like db4o is that we
  can outsource problems to people far more focussed on the solutions to
  those problems than we can afford to be.
 
  If we spot a problem, we should fix it, but let's not fall into the
  premature optomization trap.
 
  Which part of my proposal are you both criticising?
 
  If it's the we should cache a few request choices in advance bit then I
  stand by that. Any database query (assuming the working set is large) will
  involve many dependant disk accesses.
 
 Did you read my original post? There were three features in db4o to
 offload that a bit... CachedIoAdapter, prefetching, weakreferences ..
 see my previous post for details on that.

Strictly speaking this is prefetch we're talking about, not caching.
 
  We have to send a request roughly every
  800ms on my node (for SSKs). With slow commodity drives, often with other
  disk load going on, and with fairly complex database accesses involving
  multiple tables and therefore *many* mostly dependant seeks, we are not 
going
  to reliably meet that deadline no matter how good the database is. I will 
add
  a statistic for this, but my system is massively overpowered for this sort 
of
  effect, so we will need to test it on volunteers' slow systems.
 
  If it's the requests that don't want to be persisted, what would you do 
with
  fproxy requests and other non-persistent requests? Store them to disk 
anyway?


pgpcuqy4VAHBq.pgp
Description: PGP signature
___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] New database for Freenet: db4o

2008-05-17 Thread Matthew Toseland
On Saturday 17 May 2008 00:29, Matthew Toseland wrote:
 Ian and I have eventually come to the conclusion that we should include 
db4o, 
 and use it for our various persistence needs. I eventually reached the 
 conclusion that while we can do most of what we need to do with simple 
 flatfile databases, there are big chunks that will require a real database 
of 
 some kind (even if it's only a persistent hash table). db4o has various 
 advantages:
 - Robust in real-world use. See for example this testimonial from a company 
 who used it on cell phones:
 http://www.db4o.com/about/customers/success/mandalait.aspx
 BDBJE has not met our expectations in this regard. It seems very sensitive 
to 
 unusual situations - in particular, it will spontaneously corrupt and lose 
 all data on running out of disk space.
 - True object database: no SQL, simple and powerful queries, etc.
 - Transparent or manual activation of objects from storage.
 - 800K jar, so not big enough to be a problem.
 - Mature and actively maintained.
 - Allows for future expansion (e.g. passive requests will need to store a 
fair 
 amount of persistent data).
 - Much more flexible than the hand-coded solution I was thinking of. We can 
 persistent the entire queue (not just the splitfiles), if it's useful to do 
 that.
 - Transactions (although this requires some juggling of in-memory objects on 
 rollback).
 
 Tasks:
 - Add db4o to freenet-ext.jar.
 - Think about using it for the datastore. We don't want to have two 
databases! 
 Sdiz's new datastore may be the One True Store, or it may not be. If it's 
 not, we don't want to keep BDBJE: we could build a db4o-based store, with or 
 without LRU replacement. It would have the advantage of filling up more 
 quickly than sdiz's store. It should require reconstructing less frequently 
 than BDBJE!
 - Migrate the client layer, including splitfiles, pendingKeys, and so on, to 
 be persisted via db4o. Of course there will be latency here when objects are 
 not cached, so we will need to cache a few request choices in advance for 
 each RequestStarter. And we will need to devise some way to deal with 
 requests that don't want to be persisted - presumably we'd keep them in RAM.
 
It turns out that db4o does indeed unrecoverably self-corrupt when it runs out 
of disk space. (Thanks nextgens for getting me to test this!)

http://amphibian.dyndns.org/bdb4o-test.log

We will therefore have to keep a fallback. IMHO for the client layer the 
fallback should be downloads.dat.gz. We are careful not to lose that when we 
run out of disk space, and it should only contain what is needed to restart 
requests from the beginning (in practice a lot will come from the store).

I apologise if the above was presented as a fait accompli, any input on 
databases would be appreciated. On Friday, me and Ian spent a long time 
debating the issue, first and foremost of whether we should even have a 
database; I was initially in favour of not having one at all, or using jdbm's 
persistent hashtable class (HTree).

Personally I think if we have a database it should be a native object database 
i.e. either Perst or db4o. It also should be robust, low overhead, mature, 
open source etc. I will start implementing the new client layer with db4o 
soon, unless convinced to use something else in the meantime. But it seems 
that with BDBJE (which isn't a native object database), you can lose the 
database even by an unclean shutdown... can anyone confirm this from 
experience? Or is it only out of disk space and memory corruption that causes 
this?


pgpGocJq9CYWc.pgp
Description: PGP signature
___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] New database for Freenet: db4o

2008-05-17 Thread Florent Daignière
* Matthew Toseland [EMAIL PROTECTED] [2008-05-17 19:00:13]:

 On Saturday 17 May 2008 00:29, Matthew Toseland wrote:
  Ian and I have eventually come to the conclusion that we should include 
 db4o, 
  and use it for our various persistence needs. I eventually reached the 
  conclusion that while we can do most of what we need to do with simple 
  flatfile databases, there are big chunks that will require a real database 
 of 
  some kind (even if it's only a persistent hash table). db4o has various 
  advantages:
  - Robust in real-world use. See for example this testimonial from a company 
  who used it on cell phones:
  http://www.db4o.com/about/customers/success/mandalait.aspx
  BDBJE has not met our expectations in this regard. It seems very sensitive 
 to 
  unusual situations - in particular, it will spontaneously corrupt and lose 
  all data on running out of disk space.
  - True object database: no SQL, simple and powerful queries, etc.
  - Transparent or manual activation of objects from storage.
  - 800K jar, so not big enough to be a problem.
  - Mature and actively maintained.
  - Allows for future expansion (e.g. passive requests will need to store a 
 fair 
  amount of persistent data).
  - Much more flexible than the hand-coded solution I was thinking of. We can 
  persistent the entire queue (not just the splitfiles), if it's useful to do 
  that.
  - Transactions (although this requires some juggling of in-memory objects 
  on 
  rollback).
  
  Tasks:
  - Add db4o to freenet-ext.jar.
  - Think about using it for the datastore. We don't want to have two 
 databases! 
  Sdiz's new datastore may be the One True Store, or it may not be. If it's 
  not, we don't want to keep BDBJE: we could build a db4o-based store, with 
  or 
  without LRU replacement. It would have the advantage of filling up more 
  quickly than sdiz's store. It should require reconstructing less frequently 
  than BDBJE!
  - Migrate the client layer, including splitfiles, pendingKeys, and so on, 
  to 
  be persisted via db4o. Of course there will be latency here when objects 
  are 
  not cached, so we will need to cache a few request choices in advance for 
  each RequestStarter. And we will need to devise some way to deal with 
  requests that don't want to be persisted - presumably we'd keep them in RAM.
  
 It turns out that db4o does indeed unrecoverably self-corrupt when it runs 
 out 
 of disk space. (Thanks nextgens for getting me to test this!)
 
 http://amphibian.dyndns.org/bdb4o-test.log
 

muhahahaha.

Last time I checked the bdb database was recoverable... Okay
it lost some^wmost of the data in the process but at least it did
attempt to recover!

 We will therefore have to keep a fallback. IMHO for the client layer the 
 fallback should be downloads.dat.gz. We are careful not to lose that when we 
 run out of disk space, and it should only contain what is needed to restart 
 requests from the beginning (in practice a lot will come from the store).
 

...

While we are at it, what's wrong with bdb-je's persistence framework
again ?
http://www.oracle.com/database/berkeley-db/je/index.html

 I apologise if the above was presented as a fait accompli, any input on 
 databases would be appreciated. On Friday, me and Ian spent a long time 
 debating the issue, first and foremost of whether we should even have a 
 database; I was initially in favour of not having one at all, or using jdbm's 
 persistent hashtable class (HTree).
 
 Personally I think if we have a database it should be a native object 
 database 
 i.e. either Perst or db4o. It also should be robust, low overhead, mature, 
 open source etc. I will start implementing the new client layer with db4o 
 soon, unless convinced to use something else in the meantime. But it seems 
 that with BDBJE (which isn't a native object database), you can lose the 
 database even by an unclean shutdown... can anyone confirm this from 
 experience? Or is it only out of disk space and memory corruption that causes 
 this?

I'm still not convinced that we need a database... as our requirements
are completely different from their typical use-cases... but well, your
immediate concern is to store persistent requests to disk, right? What
about using Hibernate or javax.persistence (from EE) to do that ?


signature.asc
Description: Digital signature
___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] New database for Freenet: db4o

2008-05-16 Thread Daniel Cheng
On Sat, May 17, 2008 at 7:29 AM, Matthew Toseland
[EMAIL PROTECTED] wrote:
 Ian and I have eventually come to the conclusion that we should include db4o,
 and use it for our various persistence needs. I eventually reached the
 conclusion that while we can do most of what we need to do with simple
 flatfile databases, there are big chunks that will require a real database of
 some kind (even if it's only a persistent hash table). db4o has various
 advantages:
 - Robust in real-world use. See for example this testimonial from a company
 who used it on cell phones:
 http://www.db4o.com/about/customers/success/mandalait.aspx
 BDBJE has not met our expectations in this regard. It seems very sensitive to
 unusual situations - in particular, it will spontaneously corrupt and lose
 all data on running out of disk space.
 - True object database: no SQL, simple and powerful queries, etc.
 - Transparent or manual activation of objects from storage.
 - 800K jar, so not big enough to be a problem.
 - Mature and actively maintained.
 - Allows for future expansion (e.g. passive requests will need to store a fair
 amount of persistent data).
 - Much more flexible than the hand-coded solution I was thinking of. We can
 persistent the entire queue (not just the splitfiles), if it's useful to do
 that.
 - Transactions (although this requires some juggling of in-memory objects on
 rollback).

Look good to me (except it's website need registration)

 Tasks:
 - Add db4o to freenet-ext.jar.
 - Think about using it for the datastore. We don't want to have two databases!
 Sdiz's new datastore may be the One True Store, or it may not be. If it's
 not, we don't want to keep BDBJE: we could build a db4o-based store, with or
 without LRU replacement. It would have the advantage of filling up more
 quickly than sdiz's store. It should require reconstructing less frequently
 than BDBJE!


 - Migrate the client layer, including splitfiles, pendingKeys, and so on, to
 be persisted via db4o.

 Of course there will be latency here when objects are
 not cached, so we will need to cache a few request choices in advance for
 each RequestStarter. And we will need to devise some way to deal with
 requests that don't want to be persisted - presumably we'd keep them in RAM.

Please don't.

According to what I have read, db4o should be good enough to use directly:
  - Active objects are kept with a WeakReference, so as long as it is not GC'ed,
you don't have to read the disk  (there is an option to use
hard reference too)
  - Data are prefetched and activated in batch.. if the query is
well-written, we have
the items in memory already.
  - CachedIoAdapter provide a low level disk cache.

This maybe not as good as a custom cache scheme.. But I really hate
the idea of having yet-another-caching scheme for some marginal (or
even hypo-theoretical) performance benefit -- we have had too much of
them already.

Please, do that ONLY if it's supported by some benchmarks / cpu profiles.

--
___
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl