Re: [freenet-dev] New database for Freenet: db4o
On Sunday 18 May 2008 05:27, Florent Daignière wrote: * Matthew Toseland [EMAIL PROTECTED] [2008-05-17 19:00:13]: On Saturday 17 May 2008 00:29, Matthew Toseland wrote: Ian and I have eventually come to the conclusion that we should include db4o, and use it for our various persistence needs. I eventually reached the conclusion that while we can do most of what we need to do with simple flatfile databases, there are big chunks that will require a real database of some kind (even if it's only a persistent hash table). db4o has various advantages: - Robust in real-world use. See for example this testimonial from a company who used it on cell phones: http://www.db4o.com/about/customers/success/mandalait.aspx BDBJE has not met our expectations in this regard. It seems very sensitive to unusual situations - in particular, it will spontaneously corrupt and lose all data on running out of disk space. - True object database: no SQL, simple and powerful queries, etc. - Transparent or manual activation of objects from storage. - 800K jar, so not big enough to be a problem. - Mature and actively maintained. - Allows for future expansion (e.g. passive requests will need to store a fair amount of persistent data). - Much more flexible than the hand-coded solution I was thinking of. We can persistent the entire queue (not just the splitfiles), if it's useful to do that. - Transactions (although this requires some juggling of in-memory objects on rollback). Tasks: - Add db4o to freenet-ext.jar. - Think about using it for the datastore. We don't want to have two databases! Sdiz's new datastore may be the One True Store, or it may not be. If it's not, we don't want to keep BDBJE: we could build a db4o-based store, with or without LRU replacement. It would have the advantage of filling up more quickly than sdiz's store. It should require reconstructing less frequently than BDBJE! - Migrate the client layer, including splitfiles, pendingKeys, and so on, to be persisted via db4o. Of course there will be latency here when objects are not cached, so we will need to cache a few request choices in advance for each RequestStarter. And we will need to devise some way to deal with requests that don't want to be persisted - presumably we'd keep them in RAM. It turns out that db4o does indeed unrecoverably self-corrupt when it runs out of disk space. (Thanks nextgens for getting me to test this!) http://amphibian.dyndns.org/bdb4o-test.log muhahahaha. Last time I checked the bdb database was recoverable... Okay it lost some^wmost of the data in the process but at least it did attempt to recover! It attempts to recover (iff we try to use the DbDump/DbLoad tools). It does not succeed. Because we have secondary indexes, it ends up dropping almost everything. We will therefore have to keep a fallback. IMHO for the client layer the fallback should be downloads.dat.gz. We are careful not to lose that when we run out of disk space, and it should only contain what is needed to restart requests from the beginning (in practice a lot will come from the store). ... While we are at it, what's wrong with bdb-je's persistence framework again ? http://www.oracle.com/database/berkeley-db/je/index.html The fact that it belongs to BDBJE? I dunno, it is possible that *every* report of corruption of BDBJE is because of hardware issues... but we've certainly had lots of them, and not only on out of disk space either... Wouldn't a native object database be better? I apologise if the above was presented as a fait accompli, any input on databases would be appreciated. On Friday, me and Ian spent a long time debating the issue, first and foremost of whether we should even have a database; I was initially in favour of not having one at all, or using jdbm's persistent hashtable class (HTree). Personally I think if we have a database it should be a native object database i.e. either Perst or db4o. It also should be robust, low overhead, mature, open source etc. I will start implementing the new client layer with db4o soon, unless convinced to use something else in the meantime. But it seems that with BDBJE (which isn't a native object database), you can lose the database even by an unclean shutdown... can anyone confirm this from experience? Or is it only out of disk space and memory corruption that causes this? I'm still not convinced that we need a database... as our requirements are completely different from their typical use-cases... but well, your immediate concern is to store persistent requests to disk, right? What about using Hibernate or javax.persistence (from EE) to do that ? Hibernate needs SQL and is really heavy. I might be willing to go with ripping the code for a persistent
Re: [freenet-dev] New database for Freenet: db4o
On Sunday 18 May 2008 19:44, Ian Clarke wrote: I've got to say, I really hope Perst employ better software engineers than their web designers, because their website is awful. It somewhat shakes my confidence in them. I know this seems like a very superficial judgement, but if they put little care into the public face of their software, it is reasonable to suspect that they may not put too much care into the software itself. You said pretty much the same about ours! Looking at the manual, it looks like Perst operates at a lower level than db4o - you need to manually create and maintain indexes. This is closer to the Java collections API, which could be good because its more familiar, but it leaves more opportunities for the programmer to screw up, which is bad. IMHO we will get better performance and fewer screwups from a more familiar API. My preference for db4o is based mainly on my familiarity with it, the fact that it is more widely used, and the fact that it superficially appears to have a more credible company behind it. None of these is a solid justification for choosing one option over the other, but they all suggest that db4o is the way to go. We should, as bback suggests, test Perst, see what happens when it runs out of disk space. I will look into it soon. Ian. On Sun, May 18, 2008 at 11:08 AM, [EMAIL PROTECTED] wrote: I know I repeat myself, but if you consider db4o you should also consider perst as an option. Toad, can you do your 'disk full test' with perst to compare it against db4o? Perst and db4o seem to provide the same things. But according to http://www.garret.ru/~knizhnik/perstbench.html perst is much faster. No, I don't get money for advertising perst, but I made good experiences with it. On Sun, May 18, 2008 at 12:46 PM, Daniel Cheng [EMAIL PROTECTED] wrote: On Sun, May 18, 2008 at 12:27 PM, Florent Daignière [EMAIL PROTECTED] wrote: * Matthew Toseland [EMAIL PROTECTED] [2008-05-17 19:00:13]: On Saturday 17 May 2008 00:29, Matthew Toseland wrote: Ian and I have eventually come to the conclusion that we should include db4o, and use it for our various persistence needs. I eventually reached the conclusion that while we can do most of what we need to do with simple flatfile databases, there are big chunks that will require a real database of some kind (even if it's only a persistent hash table). db4o has various advantages: - Robust in real-world use. See for example this testimonial from a company who used it on cell phones: http://www.db4o.com/about/customers/success/mandalait.aspx BDBJE has not met our expectations in this regard. It seems very sensitive to unusual situations - in particular, it will spontaneously corrupt and lose all data on running out of disk space. - True object database: no SQL, simple and powerful queries, etc. - Transparent or manual activation of objects from storage. - 800K jar, so not big enough to be a problem. - Mature and actively maintained. - Allows for future expansion (e.g. passive requests will need to store a fair amount of persistent data). - Much more flexible than the hand-coded solution I was thinking of. We can persistent the entire queue (not just the splitfiles), if it's useful to do that. - Transactions (although this requires some juggling of in-memory objects on rollback). Tasks: - Add db4o to freenet-ext.jar. - Think about using it for the datastore. We don't want to have two databases! Sdiz's new datastore may be the One True Store, or it may not be. If it's not, we don't want to keep BDBJE: we could build a db4o-based store, with or without LRU replacement. It would have the advantage of filling up more quickly than sdiz's store. It should require reconstructing less frequently than BDBJE! - Migrate the client layer, including splitfiles, pendingKeys, and so on, to be persisted via db4o. Of course there will be latency here when objects are not cached, so we will need to cache a few request choices in advance for each RequestStarter. And we will need to devise some way to deal with requests that don't want to be persisted - presumably we'd keep them in RAM. It turns out that db4o does indeed unrecoverably self-corrupt when it runs out of disk space. (Thanks nextgens for getting me to test this!) http://amphibian.dyndns.org/bdb4o-test.log muhahahaha. Last time I checked the bdb database was recoverable... Okay it lost some^wmost of the data in the process but at least it did attempt to recover! Both BDB (the original C version) and BDB-JE have very bad history on recovering. Why should we lost some data when there are better alternative that don't? We will therefore have to keep a fallback. IMHO for the client layer the fallback should be downloads.dat.gz. We are careful not to lose
Re: [freenet-dev] New database for Freenet: db4o
On Monday 19 May 2008 11:34, Matthew Toseland wrote: On Sunday 18 May 2008 19:44, Ian Clarke wrote: Looking at the manual, it looks like Perst operates at a lower level than db4o - you need to manually create and maintain indexes. This is closer to the Java collections API, which could be good because its more familiar, but it leaves more opportunities for the programmer to screw up, which is bad. IMHO we will get better performance and fewer screwups from a more familiar API. My preference for db4o is based mainly on my familiarity with it, the fact that it is more widely used, and the fact that it superficially appears to have a more credible company behind it. None of these is a solid justification for choosing one option over the other, but they all suggest that db4o is the way to go. We should, as bback suggests, test Perst, see what happens when it runs out of disk space. I will look into it soon. Completed testing. Perst fails gracefully on running out of disk space in between writes. The database is readable after the failed commit, and if I take away the bigfile occupying all the disk space, we can even write to it. How about we go with Perst? Ian. On Sun, May 18, 2008 at 11:08 AM, [EMAIL PROTECTED] wrote: I know I repeat myself, but if you consider db4o you should also consider perst as an option. Toad, can you do your 'disk full test' with perst to compare it against db4o? Perst and db4o seem to provide the same things. But according to http://www.garret.ru/~knizhnik/perstbench.html perst is much faster. No, I don't get money for advertising perst, but I made good experiences with it. On Sun, May 18, 2008 at 12:46 PM, Daniel Cheng [EMAIL PROTECTED] wrote: On Sun, May 18, 2008 at 12:27 PM, Florent Daignière [EMAIL PROTECTED] wrote: * Matthew Toseland [EMAIL PROTECTED] [2008-05-17 19:00:13]: On Saturday 17 May 2008 00:29, Matthew Toseland wrote: Ian and I have eventually come to the conclusion that we should include db4o, and use it for our various persistence needs. I eventually reached the conclusion that while we can do most of what we need to do with simple flatfile databases, there are big chunks that will require a real database of some kind (even if it's only a persistent hash table). db4o has various advantages: - Robust in real-world use. See for example this testimonial from a company who used it on cell phones: http://www.db4o.com/about/customers/success/mandalait.aspx BDBJE has not met our expectations in this regard. It seems very sensitive to unusual situations - in particular, it will spontaneously corrupt and lose all data on running out of disk space. - True object database: no SQL, simple and powerful queries, etc. - Transparent or manual activation of objects from storage. - 800K jar, so not big enough to be a problem. - Mature and actively maintained. - Allows for future expansion (e.g. passive requests will need to store a fair amount of persistent data). - Much more flexible than the hand-coded solution I was thinking of. We can persistent the entire queue (not just the splitfiles), if it's useful to do that. - Transactions (although this requires some juggling of in-memory objects on rollback). Tasks: - Add db4o to freenet-ext.jar. - Think about using it for the datastore. We don't want to have two databases! Sdiz's new datastore may be the One True Store, or it may not be. If it's not, we don't want to keep BDBJE: we could build a db4o-based store, with or without LRU replacement. It would have the advantage of filling up more quickly than sdiz's store. It should require reconstructing less frequently than BDBJE! - Migrate the client layer, including splitfiles, pendingKeys, and so on, to be persisted via db4o. Of course there will be latency here when objects are not cached, so we will need to cache a few request choices in advance for each RequestStarter. And we will need to devise some way to deal with requests that don't want to be persisted - presumably we'd keep them in RAM. It turns out that db4o does indeed unrecoverably self-corrupt when it runs out of disk space. (Thanks nextgens for getting me to test this!) http://amphibian.dyndns.org/bdb4o-test.log muhahahaha. Last time I checked the bdb database was recoverable... Okay it lost some^wmost of the data in the process but at least it did attempt to recover! Both BDB (the original C version) and BDB-JE have very bad history on recovering. Why should we lost some data when there are better alternative that don't? We will therefore have to keep a fallback. IMHO for the client layer the fallback should be downloads.dat.gz. We are
Re: [freenet-dev] New database for Freenet: db4o
On Sun, May 18, 2008 at 12:27 PM, Florent Daignière [EMAIL PROTECTED] wrote: * Matthew Toseland [EMAIL PROTECTED] [2008-05-17 19:00:13]: On Saturday 17 May 2008 00:29, Matthew Toseland wrote: Ian and I have eventually come to the conclusion that we should include db4o, and use it for our various persistence needs. I eventually reached the conclusion that while we can do most of what we need to do with simple flatfile databases, there are big chunks that will require a real database of some kind (even if it's only a persistent hash table). db4o has various advantages: - Robust in real-world use. See for example this testimonial from a company who used it on cell phones: http://www.db4o.com/about/customers/success/mandalait.aspx BDBJE has not met our expectations in this regard. It seems very sensitive to unusual situations - in particular, it will spontaneously corrupt and lose all data on running out of disk space. - True object database: no SQL, simple and powerful queries, etc. - Transparent or manual activation of objects from storage. - 800K jar, so not big enough to be a problem. - Mature and actively maintained. - Allows for future expansion (e.g. passive requests will need to store a fair amount of persistent data). - Much more flexible than the hand-coded solution I was thinking of. We can persistent the entire queue (not just the splitfiles), if it's useful to do that. - Transactions (although this requires some juggling of in-memory objects on rollback). Tasks: - Add db4o to freenet-ext.jar. - Think about using it for the datastore. We don't want to have two databases! Sdiz's new datastore may be the One True Store, or it may not be. If it's not, we don't want to keep BDBJE: we could build a db4o-based store, with or without LRU replacement. It would have the advantage of filling up more quickly than sdiz's store. It should require reconstructing less frequently than BDBJE! - Migrate the client layer, including splitfiles, pendingKeys, and so on, to be persisted via db4o. Of course there will be latency here when objects are not cached, so we will need to cache a few request choices in advance for each RequestStarter. And we will need to devise some way to deal with requests that don't want to be persisted - presumably we'd keep them in RAM. It turns out that db4o does indeed unrecoverably self-corrupt when it runs out of disk space. (Thanks nextgens for getting me to test this!) http://amphibian.dyndns.org/bdb4o-test.log muhahahaha. Last time I checked the bdb database was recoverable... Okay it lost some^wmost of the data in the process but at least it did attempt to recover! Both BDB (the original C version) and BDB-JE have very bad history on recovering. Why should we lost some data when there are better alternative that don't? We will therefore have to keep a fallback. IMHO for the client layer the fallback should be downloads.dat.gz. We are careful not to lose that when we run out of disk space, and it should only contain what is needed to restart requests from the beginning (in practice a lot will come from the store). ... While we are at it, what's wrong with bdb-je's persistence framework again ? http://www.oracle.com/database/berkeley-db/je/index.html Another reason to get rid of BDB-JE is memory usage and performance. db4o is usable on J2ME and scalable to terabytes database. I think this say a lot. I apologise if the above was presented as a fait accompli, any input on databases would be appreciated. On Friday, me and Ian spent a long time debating the issue, first and foremost of whether we should even have a database; I was initially in favour of not having one at all, or using jdbm's persistent hashtable class (HTree). Personally I think if we have a database it should be a native object database i.e. either Perst or db4o. It also should be robust, low overhead, mature, open source etc. I will start implementing the new client layer with db4o soon, unless convinced to use something else in the meantime. But it seems that with BDBJE (which isn't a native object database), you can lose the database even by an unclean shutdown... can anyone confirm this from experience? Or is it only out of disk space and memory corruption that causes this? I'm still not convinced that we need a database... as our requirements are completely different from their typical use-cases... but well, your immediate concern is to store persistent requests to disk, right? What about using Hibernate or javax.persistence (from EE) to do that ? eee Hibernate is just ORM -- You need a sql backend for that. (I am not oppose to the idea of using a sql backend, but then we have to decide which one to use) javax.persistence have Java5 dependency, and you need a J2EE container. just too ugly. -- ___ Devl mailing list
Re: [freenet-dev] New database for Freenet: db4o
I know I repeat myself, but if you consider db4o you should also consider perst as an option. Toad, can you do your 'disk full test' with perst to compare it against db4o? Perst and db4o seem to provide the same things. But according to http://www.garret.ru/~knizhnik/perstbench.html perst is much faster. No, I don't get money for advertising perst, but I made good experiences with it. On Sun, May 18, 2008 at 12:46 PM, Daniel Cheng [EMAIL PROTECTED] wrote: On Sun, May 18, 2008 at 12:27 PM, Florent Daignière [EMAIL PROTECTED] wrote: * Matthew Toseland [EMAIL PROTECTED] [2008-05-17 19:00:13]: On Saturday 17 May 2008 00:29, Matthew Toseland wrote: Ian and I have eventually come to the conclusion that we should include db4o, and use it for our various persistence needs. I eventually reached the conclusion that while we can do most of what we need to do with simple flatfile databases, there are big chunks that will require a real database of some kind (even if it's only a persistent hash table). db4o has various advantages: - Robust in real-world use. See for example this testimonial from a company who used it on cell phones: http://www.db4o.com/about/customers/success/mandalait.aspx BDBJE has not met our expectations in this regard. It seems very sensitive to unusual situations - in particular, it will spontaneously corrupt and lose all data on running out of disk space. - True object database: no SQL, simple and powerful queries, etc. - Transparent or manual activation of objects from storage. - 800K jar, so not big enough to be a problem. - Mature and actively maintained. - Allows for future expansion (e.g. passive requests will need to store a fair amount of persistent data). - Much more flexible than the hand-coded solution I was thinking of. We can persistent the entire queue (not just the splitfiles), if it's useful to do that. - Transactions (although this requires some juggling of in-memory objects on rollback). Tasks: - Add db4o to freenet-ext.jar. - Think about using it for the datastore. We don't want to have two databases! Sdiz's new datastore may be the One True Store, or it may not be. If it's not, we don't want to keep BDBJE: we could build a db4o-based store, with or without LRU replacement. It would have the advantage of filling up more quickly than sdiz's store. It should require reconstructing less frequently than BDBJE! - Migrate the client layer, including splitfiles, pendingKeys, and so on, to be persisted via db4o. Of course there will be latency here when objects are not cached, so we will need to cache a few request choices in advance for each RequestStarter. And we will need to devise some way to deal with requests that don't want to be persisted - presumably we'd keep them in RAM. It turns out that db4o does indeed unrecoverably self-corrupt when it runs out of disk space. (Thanks nextgens for getting me to test this!) http://amphibian.dyndns.org/bdb4o-test.log muhahahaha. Last time I checked the bdb database was recoverable... Okay it lost some^wmost of the data in the process but at least it did attempt to recover! Both BDB (the original C version) and BDB-JE have very bad history on recovering. Why should we lost some data when there are better alternative that don't? We will therefore have to keep a fallback. IMHO for the client layer the fallback should be downloads.dat.gz. We are careful not to lose that when we run out of disk space, and it should only contain what is needed to restart requests from the beginning (in practice a lot will come from the store). ... While we are at it, what's wrong with bdb-je's persistence framework again ? http://www.oracle.com/database/berkeley-db/je/index.html Another reason to get rid of BDB-JE is memory usage and performance. db4o is usable on J2ME and scalable to terabytes database. I think this say a lot. I apologise if the above was presented as a fait accompli, any input on databases would be appreciated. On Friday, me and Ian spent a long time debating the issue, first and foremost of whether we should even have a database; I was initially in favour of not having one at all, or using jdbm's persistent hashtable class (HTree). Personally I think if we have a database it should be a native object database i.e. either Perst or db4o. It also should be robust, low overhead, mature, open source etc. I will start implementing the new client layer with db4o soon, unless convinced to use something else in the meantime. But it seems that with BDBJE (which isn't a native object database), you can lose the database even by an unclean shutdown... can anyone confirm this from experience? Or is it only out of disk space and memory corruption that causes this? I'm still not convinced that we need a database... as our requirements are completely different from their typical
Re: [freenet-dev] New database for Freenet: db4o
I've got to say, I really hope Perst employ better software engineers than their web designers, because their website is awful. It somewhat shakes my confidence in them. I know this seems like a very superficial judgement, but if they put little care into the public face of their software, it is reasonable to suspect that they may not put too much care into the software itself. Looking at the manual, it looks like Perst operates at a lower level than db4o - you need to manually create and maintain indexes. This is closer to the Java collections API, which could be good because its more familiar, but it leaves more opportunities for the programmer to screw up, which is bad. My preference for db4o is based mainly on my familiarity with it, the fact that it is more widely used, and the fact that it superficially appears to have a more credible company behind it. None of these is a solid justification for choosing one option over the other, but they all suggest that db4o is the way to go. Ian. On Sun, May 18, 2008 at 11:08 AM, [EMAIL PROTECTED] wrote: I know I repeat myself, but if you consider db4o you should also consider perst as an option. Toad, can you do your 'disk full test' with perst to compare it against db4o? Perst and db4o seem to provide the same things. But according to http://www.garret.ru/~knizhnik/perstbench.html perst is much faster. No, I don't get money for advertising perst, but I made good experiences with it. On Sun, May 18, 2008 at 12:46 PM, Daniel Cheng [EMAIL PROTECTED] wrote: On Sun, May 18, 2008 at 12:27 PM, Florent Daignière [EMAIL PROTECTED] wrote: * Matthew Toseland [EMAIL PROTECTED] [2008-05-17 19:00:13]: On Saturday 17 May 2008 00:29, Matthew Toseland wrote: Ian and I have eventually come to the conclusion that we should include db4o, and use it for our various persistence needs. I eventually reached the conclusion that while we can do most of what we need to do with simple flatfile databases, there are big chunks that will require a real database of some kind (even if it's only a persistent hash table). db4o has various advantages: - Robust in real-world use. See for example this testimonial from a company who used it on cell phones: http://www.db4o.com/about/customers/success/mandalait.aspx BDBJE has not met our expectations in this regard. It seems very sensitive to unusual situations - in particular, it will spontaneously corrupt and lose all data on running out of disk space. - True object database: no SQL, simple and powerful queries, etc. - Transparent or manual activation of objects from storage. - 800K jar, so not big enough to be a problem. - Mature and actively maintained. - Allows for future expansion (e.g. passive requests will need to store a fair amount of persistent data). - Much more flexible than the hand-coded solution I was thinking of. We can persistent the entire queue (not just the splitfiles), if it's useful to do that. - Transactions (although this requires some juggling of in-memory objects on rollback). Tasks: - Add db4o to freenet-ext.jar. - Think about using it for the datastore. We don't want to have two databases! Sdiz's new datastore may be the One True Store, or it may not be. If it's not, we don't want to keep BDBJE: we could build a db4o-based store, with or without LRU replacement. It would have the advantage of filling up more quickly than sdiz's store. It should require reconstructing less frequently than BDBJE! - Migrate the client layer, including splitfiles, pendingKeys, and so on, to be persisted via db4o. Of course there will be latency here when objects are not cached, so we will need to cache a few request choices in advance for each RequestStarter. And we will need to devise some way to deal with requests that don't want to be persisted - presumably we'd keep them in RAM. It turns out that db4o does indeed unrecoverably self-corrupt when it runs out of disk space. (Thanks nextgens for getting me to test this!) http://amphibian.dyndns.org/bdb4o-test.log muhahahaha. Last time I checked the bdb database was recoverable... Okay it lost some^wmost of the data in the process but at least it did attempt to recover! Both BDB (the original C version) and BDB-JE have very bad history on recovering. Why should we lost some data when there are better alternative that don't? We will therefore have to keep a fallback. IMHO for the client layer the fallback should be downloads.dat.gz. We are careful not to lose that when we run out of disk space, and it should only contain what is needed to restart requests from the beginning (in practice a lot will come from the store). ... While we are at it, what's wrong with bdb-je's persistence framework again ? http://www.oracle.com/database/berkeley-db/je/index.html Another reason to get rid of BDB-JE is memory usage and
Re: [freenet-dev] New database for Freenet: db4o
On Saturday 17 May 2008 06:24, Ian Clarke wrote: On Fri, May 16, 2008 at 11:40 PM, Daniel Cheng [EMAIL PROTECTED] wrote: On Sat, May 17, 2008 at 7:29 AM, Matthew Toseland [EMAIL PROTECTED] wrote: Ian and I have eventually come to the conclusion that we should include db4o, Yay. Of course there will be latency here when objects are not cached, so we will need to cache a few request choices in advance for each RequestStarter. And we will need to devise some way to deal with requests that don't want to be persisted - presumably we'd keep them in RAM. Please don't. According to what I have read, db4o should be good enough to use directly: I agree with Daniel, DIY may be an admirable quality when it comes to home and car repair, but not with software. One of the benefits of using third-party stuff like db4o is that we can outsource problems to people far more focussed on the solutions to those problems than we can afford to be. If we spot a problem, we should fix it, but let's not fall into the premature optomization trap. Which part of my proposal are you both criticising? If it's the we should cache a few request choices in advance bit then I stand by that. Any database query (assuming the working set is large) will involve many dependant disk accesses. We have to send a request roughly every 800ms on my node (for SSKs). With slow commodity drives, often with other disk load going on, and with fairly complex database accesses involving multiple tables and therefore *many* mostly dependant seeks, we are not going to reliably meet that deadline no matter how good the database is. I will add a statistic for this, but my system is massively overpowered for this sort of effect, so we will need to test it on volunteers' slow systems. If it's the requests that don't want to be persisted, what would you do with fproxy requests and other non-persistent requests? Store them to disk anyway? pgp5OIGFuOWd4.pgp Description: PGP signature ___ Devl mailing list Devl@freenetproject.org http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
Re: [freenet-dev] New database for Freenet: db4o
On Sat, May 17, 2008 at 7:06 PM, Matthew Toseland [EMAIL PROTECTED] wrote: On Saturday 17 May 2008 06:24, Ian Clarke wrote: On Fri, May 16, 2008 at 11:40 PM, Daniel Cheng [EMAIL PROTECTED] wrote: On Sat, May 17, 2008 at 7:29 AM, Matthew Toseland [EMAIL PROTECTED] wrote: Ian and I have eventually come to the conclusion that we should include db4o, Yay. Of course there will be latency here when objects are not cached, so we will need to cache a few request choices in advance for each RequestStarter. And we will need to devise some way to deal with requests that don't want to be persisted - presumably we'd keep them in RAM. Please don't. According to what I have read, db4o should be good enough to use directly: I agree with Daniel, DIY may be an admirable quality when it comes to home and car repair, but not with software. One of the benefits of using third-party stuff like db4o is that we can outsource problems to people far more focussed on the solutions to those problems than we can afford to be. If we spot a problem, we should fix it, but let's not fall into the premature optomization trap. Which part of my proposal are you both criticising? If it's the we should cache a few request choices in advance bit then I stand by that. Any database query (assuming the working set is large) will involve many dependant disk accesses. Did you read my original post? There were three features in db4o to offload that a bit... CachedIoAdapter, prefetching, weakreferences .. see my previous post for details on that. We have to send a request roughly every 800ms on my node (for SSKs). With slow commodity drives, often with other disk load going on, and with fairly complex database accesses involving multiple tables and therefore *many* mostly dependant seeks, we are not going to reliably meet that deadline no matter how good the database is. I will add a statistic for this, but my system is massively overpowered for this sort of effect, so we will need to test it on volunteers' slow systems. If it's the requests that don't want to be persisted, what would you do with fproxy requests and other non-persistent requests? Store them to disk anyway? ___ Devl mailing list Devl@freenetproject.org http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl ___ Devl mailing list Devl@freenetproject.org http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
Re: [freenet-dev] New database for Freenet: db4o
On Saturday 17 May 2008 14:07, Daniel Cheng wrote: On Sat, May 17, 2008 at 7:06 PM, Matthew Toseland [EMAIL PROTECTED] wrote: On Saturday 17 May 2008 06:24, Ian Clarke wrote: On Fri, May 16, 2008 at 11:40 PM, Daniel Cheng [EMAIL PROTECTED] wrote: On Sat, May 17, 2008 at 7:29 AM, Matthew Toseland [EMAIL PROTECTED] wrote: Ian and I have eventually come to the conclusion that we should include db4o, Yay. Of course there will be latency here when objects are not cached, so we will need to cache a few request choices in advance for each RequestStarter. And we will need to devise some way to deal with requests that don't want to be persisted - presumably we'd keep them in RAM. Please don't. According to what I have read, db4o should be good enough to use directly: I agree with Daniel, DIY may be an admirable quality when it comes to home and car repair, but not with software. One of the benefits of using third-party stuff like db4o is that we can outsource problems to people far more focussed on the solutions to those problems than we can afford to be. If we spot a problem, we should fix it, but let's not fall into the premature optomization trap. Which part of my proposal are you both criticising? If it's the we should cache a few request choices in advance bit then I stand by that. Any database query (assuming the working set is large) will involve many dependant disk accesses. Did you read my original post? There were three features in db4o to offload that a bit... CachedIoAdapter, prefetching, weakreferences .. see my previous post for details on that. Strictly speaking this is prefetch we're talking about, not caching. We have to send a request roughly every 800ms on my node (for SSKs). With slow commodity drives, often with other disk load going on, and with fairly complex database accesses involving multiple tables and therefore *many* mostly dependant seeks, we are not going to reliably meet that deadline no matter how good the database is. I will add a statistic for this, but my system is massively overpowered for this sort of effect, so we will need to test it on volunteers' slow systems. If it's the requests that don't want to be persisted, what would you do with fproxy requests and other non-persistent requests? Store them to disk anyway? pgpcuqy4VAHBq.pgp Description: PGP signature ___ Devl mailing list Devl@freenetproject.org http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
Re: [freenet-dev] New database for Freenet: db4o
On Saturday 17 May 2008 00:29, Matthew Toseland wrote: Ian and I have eventually come to the conclusion that we should include db4o, and use it for our various persistence needs. I eventually reached the conclusion that while we can do most of what we need to do with simple flatfile databases, there are big chunks that will require a real database of some kind (even if it's only a persistent hash table). db4o has various advantages: - Robust in real-world use. See for example this testimonial from a company who used it on cell phones: http://www.db4o.com/about/customers/success/mandalait.aspx BDBJE has not met our expectations in this regard. It seems very sensitive to unusual situations - in particular, it will spontaneously corrupt and lose all data on running out of disk space. - True object database: no SQL, simple and powerful queries, etc. - Transparent or manual activation of objects from storage. - 800K jar, so not big enough to be a problem. - Mature and actively maintained. - Allows for future expansion (e.g. passive requests will need to store a fair amount of persistent data). - Much more flexible than the hand-coded solution I was thinking of. We can persistent the entire queue (not just the splitfiles), if it's useful to do that. - Transactions (although this requires some juggling of in-memory objects on rollback). Tasks: - Add db4o to freenet-ext.jar. - Think about using it for the datastore. We don't want to have two databases! Sdiz's new datastore may be the One True Store, or it may not be. If it's not, we don't want to keep BDBJE: we could build a db4o-based store, with or without LRU replacement. It would have the advantage of filling up more quickly than sdiz's store. It should require reconstructing less frequently than BDBJE! - Migrate the client layer, including splitfiles, pendingKeys, and so on, to be persisted via db4o. Of course there will be latency here when objects are not cached, so we will need to cache a few request choices in advance for each RequestStarter. And we will need to devise some way to deal with requests that don't want to be persisted - presumably we'd keep them in RAM. It turns out that db4o does indeed unrecoverably self-corrupt when it runs out of disk space. (Thanks nextgens for getting me to test this!) http://amphibian.dyndns.org/bdb4o-test.log We will therefore have to keep a fallback. IMHO for the client layer the fallback should be downloads.dat.gz. We are careful not to lose that when we run out of disk space, and it should only contain what is needed to restart requests from the beginning (in practice a lot will come from the store). I apologise if the above was presented as a fait accompli, any input on databases would be appreciated. On Friday, me and Ian spent a long time debating the issue, first and foremost of whether we should even have a database; I was initially in favour of not having one at all, or using jdbm's persistent hashtable class (HTree). Personally I think if we have a database it should be a native object database i.e. either Perst or db4o. It also should be robust, low overhead, mature, open source etc. I will start implementing the new client layer with db4o soon, unless convinced to use something else in the meantime. But it seems that with BDBJE (which isn't a native object database), you can lose the database even by an unclean shutdown... can anyone confirm this from experience? Or is it only out of disk space and memory corruption that causes this? pgpGocJq9CYWc.pgp Description: PGP signature ___ Devl mailing list Devl@freenetproject.org http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
Re: [freenet-dev] New database for Freenet: db4o
* Matthew Toseland [EMAIL PROTECTED] [2008-05-17 19:00:13]: On Saturday 17 May 2008 00:29, Matthew Toseland wrote: Ian and I have eventually come to the conclusion that we should include db4o, and use it for our various persistence needs. I eventually reached the conclusion that while we can do most of what we need to do with simple flatfile databases, there are big chunks that will require a real database of some kind (even if it's only a persistent hash table). db4o has various advantages: - Robust in real-world use. See for example this testimonial from a company who used it on cell phones: http://www.db4o.com/about/customers/success/mandalait.aspx BDBJE has not met our expectations in this regard. It seems very sensitive to unusual situations - in particular, it will spontaneously corrupt and lose all data on running out of disk space. - True object database: no SQL, simple and powerful queries, etc. - Transparent or manual activation of objects from storage. - 800K jar, so not big enough to be a problem. - Mature and actively maintained. - Allows for future expansion (e.g. passive requests will need to store a fair amount of persistent data). - Much more flexible than the hand-coded solution I was thinking of. We can persistent the entire queue (not just the splitfiles), if it's useful to do that. - Transactions (although this requires some juggling of in-memory objects on rollback). Tasks: - Add db4o to freenet-ext.jar. - Think about using it for the datastore. We don't want to have two databases! Sdiz's new datastore may be the One True Store, or it may not be. If it's not, we don't want to keep BDBJE: we could build a db4o-based store, with or without LRU replacement. It would have the advantage of filling up more quickly than sdiz's store. It should require reconstructing less frequently than BDBJE! - Migrate the client layer, including splitfiles, pendingKeys, and so on, to be persisted via db4o. Of course there will be latency here when objects are not cached, so we will need to cache a few request choices in advance for each RequestStarter. And we will need to devise some way to deal with requests that don't want to be persisted - presumably we'd keep them in RAM. It turns out that db4o does indeed unrecoverably self-corrupt when it runs out of disk space. (Thanks nextgens for getting me to test this!) http://amphibian.dyndns.org/bdb4o-test.log muhahahaha. Last time I checked the bdb database was recoverable... Okay it lost some^wmost of the data in the process but at least it did attempt to recover! We will therefore have to keep a fallback. IMHO for the client layer the fallback should be downloads.dat.gz. We are careful not to lose that when we run out of disk space, and it should only contain what is needed to restart requests from the beginning (in practice a lot will come from the store). ... While we are at it, what's wrong with bdb-je's persistence framework again ? http://www.oracle.com/database/berkeley-db/je/index.html I apologise if the above was presented as a fait accompli, any input on databases would be appreciated. On Friday, me and Ian spent a long time debating the issue, first and foremost of whether we should even have a database; I was initially in favour of not having one at all, or using jdbm's persistent hashtable class (HTree). Personally I think if we have a database it should be a native object database i.e. either Perst or db4o. It also should be robust, low overhead, mature, open source etc. I will start implementing the new client layer with db4o soon, unless convinced to use something else in the meantime. But it seems that with BDBJE (which isn't a native object database), you can lose the database even by an unclean shutdown... can anyone confirm this from experience? Or is it only out of disk space and memory corruption that causes this? I'm still not convinced that we need a database... as our requirements are completely different from their typical use-cases... but well, your immediate concern is to store persistent requests to disk, right? What about using Hibernate or javax.persistence (from EE) to do that ? signature.asc Description: Digital signature ___ Devl mailing list Devl@freenetproject.org http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
Re: [freenet-dev] New database for Freenet: db4o
On Sat, May 17, 2008 at 7:29 AM, Matthew Toseland [EMAIL PROTECTED] wrote: Ian and I have eventually come to the conclusion that we should include db4o, and use it for our various persistence needs. I eventually reached the conclusion that while we can do most of what we need to do with simple flatfile databases, there are big chunks that will require a real database of some kind (even if it's only a persistent hash table). db4o has various advantages: - Robust in real-world use. See for example this testimonial from a company who used it on cell phones: http://www.db4o.com/about/customers/success/mandalait.aspx BDBJE has not met our expectations in this regard. It seems very sensitive to unusual situations - in particular, it will spontaneously corrupt and lose all data on running out of disk space. - True object database: no SQL, simple and powerful queries, etc. - Transparent or manual activation of objects from storage. - 800K jar, so not big enough to be a problem. - Mature and actively maintained. - Allows for future expansion (e.g. passive requests will need to store a fair amount of persistent data). - Much more flexible than the hand-coded solution I was thinking of. We can persistent the entire queue (not just the splitfiles), if it's useful to do that. - Transactions (although this requires some juggling of in-memory objects on rollback). Look good to me (except it's website need registration) Tasks: - Add db4o to freenet-ext.jar. - Think about using it for the datastore. We don't want to have two databases! Sdiz's new datastore may be the One True Store, or it may not be. If it's not, we don't want to keep BDBJE: we could build a db4o-based store, with or without LRU replacement. It would have the advantage of filling up more quickly than sdiz's store. It should require reconstructing less frequently than BDBJE! - Migrate the client layer, including splitfiles, pendingKeys, and so on, to be persisted via db4o. Of course there will be latency here when objects are not cached, so we will need to cache a few request choices in advance for each RequestStarter. And we will need to devise some way to deal with requests that don't want to be persisted - presumably we'd keep them in RAM. Please don't. According to what I have read, db4o should be good enough to use directly: - Active objects are kept with a WeakReference, so as long as it is not GC'ed, you don't have to read the disk (there is an option to use hard reference too) - Data are prefetched and activated in batch.. if the query is well-written, we have the items in memory already. - CachedIoAdapter provide a low level disk cache. This maybe not as good as a custom cache scheme.. But I really hate the idea of having yet-another-caching scheme for some marginal (or even hypo-theoretical) performance benefit -- we have had too much of them already. Please, do that ONLY if it's supported by some benchmarks / cpu profiles. -- ___ Devl mailing list Devl@freenetproject.org http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl