Re: Adding .db support to pkg_tools
Hi, On Wed, May 14, 2008 at 05:55:33PM +0930, Daniel O'Connor wrote: > So long as you can tell it is corrupted.. > It's also a drag from a user POV when the tool crashes because the DB is > hosed (seen in portupgrade a number of times) The simplest way to achieve some measure of protection (provided the database itself is not buggy), is to use a single lock file in /var/db/pkg. The ports tools can create and lock this as needed, and if the file exists, but they are able to obtain an exclusive lock on it, then it's a dangling lock. In that case, the tools crashed, and the database caches should be rebuilt. This is not fool proof, but it would reduce the number of FAQs on the mailing lists. This kills two birds with one stone, since it would serialize access to /var/db/pkg, so that various tools don't stomp on one another. An exclusive lock can be held by pkg_create, pkg_add and pkg_delete as needed. pkg_info and pkg_version can downgrade the lock to a shared lock, and just continue if they can get a shared lock, or block if the file is exclusively locked. One problem is the pkg_add calls itself recursively, so it would have to drop the lock around those calls. That could be hard. If I recall correctly, it was not that hard to remove this recursive behaviour. Also, since bsd.port.mk grubs around in /var/db/pkg, it would need to learn to participate in the locking. Regards, -Jeremy -- FreeBSD - Because the best things in life are free... http://www.freebsd.org/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adding .db support to pkg_tools
On Tue, 13 May 2008, Tim Kientzle wrote: > > I think this is a really bad idea. The problem with the tools is > > not with the files. It is that the files need to be parsed on each > > run, often recursively, and your solution would not help at all. > > Parsing one file isn't expensive; parsing several hundred files > to find one bit of information is expensive. > > > The database(s) should just be a cache of the information stored in > > the files. > > Bingo! As long as the .db version can be easily recreated > from scratch from the master data stored in the same files > as always, it doesn't really matter if the BDB is occasionally > corrupted, as long as it can be rebuilt fairly quickly. So long as you can tell it is corrupted.. It's also a drag from a user POV when the tool crashes because the DB is hosed (seen in portupgrade a number of times) -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C signature.asc Description: This is a digitally signed message part.
Re: Adding .db support to pkg_tools
I think this is a really bad idea. The problem with the tools is not with the files. It is that the files need to be parsed on each run, often recursively, and your solution would not help at all. Parsing one file isn't expensive; parsing several hundred files to find one bit of information is expensive. The database(s) should just be a cache of the information stored in the files. Bingo! As long as the .db version can be easily recreated from scratch from the master data stored in the same files as always, it doesn't really matter if the BDB is occasionally corrupted, as long as it can be rebuilt fairly quickly. So, the key points are: * Use the .db file for lookups that can't be done quickly against the existing data. In particular, look for operations where the pkg tools have to scan a lot of files to verify a single fact. * Ensure the .db file can be deleted and rebuilt from scratch from the data in the regular files. Tim Kientzle ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adding .db support to pkg_tools
Hi, On Fri, May 09, 2008 at 01:52:46PM +0200, Anders Nore wrote: > I'm working on adding .db support to the pkg_tools I hope that you know that you're stepping into a hotly debated part of the project... Good luck. My advice to you is to ignore any calls for a 'complete rewrite' and to focus on releasing small patches from your project rather than one big patch at the end. Back each patch up with performance numbers (I suggest pkg_add gnome2-x (from a local package source), pkg_version and pkg_delete -r glib-2.x). > The way I'm thinking of storing information to the .db is to name the > keys as the directory names in /var/db/pkg. And save the +* files in > the directories to the value element in the db, separated with a > special character or similar. I think this is a really bad idea. The problem with the tools is not with the files. It is that the files need to be parsed on each run, often recursively, and your solution would not help at all. The database(s) should just be a cache of the information stored in the files. This is what the existing databases used by portupgrade and NetBSD's tools do. They store the files and which ports installed them, so that it is easy to do conflict resolution, and some pkg_info operations. In addition, what you need is a cache which stores {pkg_name,pkg_origin} pairs, where both are unique keys with a 1:1 relationship, and then a {pkg_name,pkg_dep_name} directed graph, which stores the dependency network between ports (aslo replacing +REQUIRED_BY). Both can be reconstructed if needed from the +CONTENTS files. You could do both in BDB, but you'll be better off loading both into memory completely (as needed), using C to do the queries, and just maintaining the database when things change (on pkg_add or pkg_delete). SQLite would make this job easier in some ways, but hurt in others. You could use queries, which would make what was going on much more obvious, but then it would be slower. You also have to fight the anti-SQLite lobby. Ignore people who talk about disk space on embedded systems. They're just trolling. ;-) If they don't have the disk space for /var/db/pkg, then they don't have the space for the ports themselves. Besides, removing the redundant copy of +MTREE_DIRS for each package would save more space (and is as easy as checking the CVS revision of the new file in each package and updating a single master copy (even /etc/mtree/BSD.local.dist), if it is newer). With regards to locking, which someone raised, the package tools are a long way away from parallel modifications to /var/db/pkg. People are talking about parallel ports builds, but those have small windows where the package database is being updated, and those updates need to be sync'ed via a lock. Regards, -Jeremy -- FreeBSD - Because the best things in life are free... http://www.freebsd.org/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adding .db support to pkg_tools
On Mon, May 12, 2008 at 05:12:56PM +0200, Anders Nore wrote: > One of the reasons for using BDB is that it is in the base system, SQLite > however is not. I'm aware of that. But I believe that the pain and suffering of importing and maintaining SQLite in the base (that is, the cost) is outweighed by its richer feature set and higher-level API, facilitating easier tools development. -- Jos Backus jos at catnook.com ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adding .db support to pkg_tools
On Mon, May 12, 2008 at 08:47:53AM -0700, Jeremy Chadwick wrote: > Secondly, the following FAQ entry and documentation from Mozilla is of > concern, specifically the last paragraph of the FAQ entry, since there > is ongoing work in the ports collection to support parallel building, > which would surely use fork() or locks): I don't think this is applicable. > Thirdly, there is the concern of disk space. There are a substantial > number of users who have FreeBSD on embedded systems or systems with > little or no space, who use binary packages (pkg_add -r), which of > course still makes use of /var/db/pkg. SQLite DBs grow over time, and > don't shrink in size unless you use auto-vacuum mode or execute the > VACUUM query: It is trivial to recreate the database, so I don't think this is a concern either. I should point again that it will result in a non-trivial increase in size. > Fourthly, it doesn't sound as if storage SQLite databases on an NFS > mount is a good idea, and I've a feeling some people have /var NFS > mounted (diskless systems for example): Same applies to BDB and basically anything that depends on fcntl to work. Basically, as long as you have /var per host (you don't share /var, do you?), it will just work as the *local* locks are working. The problem with NFS comes form the fact that multiple hosts might not share them. Joerg ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adding .db support to pkg_tools
On Mon, May 12, 2008 at 05:12:56PM +0200, Anders Nore wrote: > On Sun, 11 May 2008 21:07:41 +0200, Jos Backus <[EMAIL PROTECTED]> wrote: > >> On Sun, May 11, 2008 at 06:38:25AM -0700, Garrett Cooper wrote: >>> +1. BDB is quite easy to corrupt... >> >> If we're going to use a binary file format, please consider using SQLite >> instead. It has the right license, a nice API (transactions!) and is >> robust >> enough for yum to use it for a similar purpose. >> > > One of the reasons for using BDB is that it is in the base system, SQLite > however is not. Interesting discussion, and there's some irony in it (SQLite uses a very BDB-like a storage medium, despite being a relational database system). SQLite's license is compatible/friendly, its API is decent, the source is fairly small and binary footprint (disk and memory both) are small, and it includes tools for dumping databases to ASCII so developers aren't forced to write DB dumping tools and users don't have to resort to using hexdump. A few things worth pointing out: With regards to DB1.x being easily corrupted, this is true, and I have heard claims of it from a couple developer-folk I worked with many years ago at Best Internet. On the other hand, SQLite appears to have a history of this as well, although those bugs have been fixed each and every time: http://www.sqlite.org/cvstrac/wiki?p=DatabaseCorruption Secondly, the following FAQ entry and documentation from Mozilla is of concern, specifically the last paragraph of the FAQ entry, since there is ongoing work in the ports collection to support parallel building, which would surely use fork() or locks): http://www.sqlite.org/faq.html#q6 http://developer.mozilla.org/en/docs/Storage#SQLite_Locking Thirdly, there is the concern of disk space. There are a substantial number of users who have FreeBSD on embedded systems or systems with little or no space, who use binary packages (pkg_add -r), which of course still makes use of /var/db/pkg. SQLite DBs grow over time, and don't shrink in size unless you use auto-vacuum mode or execute the VACUUM query: http://www.sqlite.org/faq.html#q12 Fourthly, it doesn't sound as if storage SQLite databases on an NFS mount is a good idea, and I've a feeling some people have /var NFS mounted (diskless systems for example): http://www.sqlite.org/faq.html#q5 Finally, there's the concern about the SQLite file format changing. Sometimes they're backwards-compatible, other times not. http://www.sqlite.org/formatchng.html http://www.sqlite.org/fileformat.html Just stuff to consider. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adding .db support to pkg_tools
On Sun, 11 May 2008 21:07:41 +0200, Jos Backus <[EMAIL PROTECTED]> wrote: On Sun, May 11, 2008 at 06:38:25AM -0700, Garrett Cooper wrote: +1. BDB is quite easy to corrupt... If we're going to use a binary file format, please consider using SQLite instead. It has the right license, a nice API (transactions!) and is robust enough for yum to use it for a similar purpose. One of the reasons for using BDB is that it is in the base system, SQLite however is not. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adding .db support to pkg_tools
On Sun, May 11, 2008 at 06:38:25AM -0700, Garrett Cooper wrote: > +1. BDB is quite easy to corrupt... If we're going to use a binary file format, please consider using SQLite instead. It has the right license, a nice API (transactions!) and is robust enough for yum to use it for a similar purpose. -- Jos Backus jos at catnook.com ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adding .db support to pkg_tools
On May 9, 2008, at 5:43 AM, Joerg Sonnenberger wrote: On Fri, May 09, 2008 at 01:52:46PM +0200, Anders Nore wrote: I'm working on adding .db support to the pkg_tools( i.e. pkg_add, pkg_info, etc. ) as part of SoC 2008. The database api used is BerkeleyDB that comes with the base system (/usr/src/include/db.h). BerkeleyDB is not you're typical relational db, and can only save key/value pairs. The way I'm thinking of storing information to the .db is to name the keys as the directory names in /var/db/pkg. And save the +* files in the directories to the value element in the db, separated with a special character or similar. As one of the persons hacking on pkg_install in pkgsrc/NetBSD, I would *strongly* advisy you against storing the files only in a bdb file. The change of major and complete corruption with bdb185 is high, consider pulling the plug in the middle of a long update. +1. BDB is quite easy to corrupt... -Garrett ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adding .db support to pkg_tools
On Fri, May 09, 2008 at 07:54:40PM +0200, Anders Nore wrote: > You are probably right, but how would you store the key's? Is storing the > key as e.g., 'portname-1.2_3+CONTENT' a good solution? I'd just use a different db file. I am not sure how much the following applies to FreeBSD as pkg_install has diverted a lot. The most expensive operations during pkg_add and pkg_info are scans for conflicts (explicit via @pkgcfl or implicit due to overlapping file lists) as they need to compare the to-be-installed package with all existing ones. After that come directory scans to resolve dependencies. Everything else is really just "open this small file and extract some data from it", where small usually means less than one block. Putting that into a database can help or not, but I don't think it is relevant. So the most important operations to support a btree of all files (implemented in NetBSD/pkgsrc) and a btree of all @pkgcfl/@pkgdb (not implemented yet). Joerg ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adding .db support to pkg_tools
On Fri, 09 May 2008 19:06:33 +0200, Joerg Sonnenberger <[EMAIL PROTECTED]> wrote: On Fri, May 09, 2008 at 06:50:10PM +0200, Anders Nore wrote: Yes that would probably be bad for the database, but I'm sure one can manage to get around this problem by copying it before changing the db and delete the copy if it doesn't fail. At the next time executed it will check for a copy, use that and assume that the last run was unsuccessful. /var/db/pkg contains 10MB for the various packages installed on my laptop and 10MB for the cache of +CONTENTS. You don't want to copy that around all the time. Secondly, I would also advisy against just storing all meta data in a single key/value pair. For example, +CONTENTS can be extremely large. Check texmf for a good example. When it comes to storing large values in a key/value pair, I think that's what bdb was designed for, handling large amounts of data (in the orders of gigabytes even in key's) fast. No, actually that is exactly what it was *not* designed for. Having billions of keys is fine, but data that exceeds the size of a database page is going to slow down. While it might not be a problem of you are copying the data to a new file anyway, it also means that fragmentation in the database will be more problematic. My main point is that for the interesting operations you want to actually look up with fine grained keys and that's what is not possible if you store the meta data as blob. In fact, storing the meta data as blob is not faster than just using the filesystem. You are probably right, but how would you store the key's? Is storing the key as e.g., 'portname-1.2_3+CONTENT' a good solution? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adding .db support to pkg_tools
According to Anders Nore: > with the base system (/usr/src/include/db.h). BerkeleyDB is not you're > typical relational db, and can only save key/value pairs. The way I'm > thinking of storing information to the .db is to name the keys as the > directory names in /var/db/pkg. And save the +* files in the directories to > the value element in the db, separated with a special character or similar. Please make it compatible with the existing infrastructure in /var/db/pkg maintained by portinstall/portupgrade. If it can be improved as a side-effect, that would be even better. -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- [EMAIL PROTECTED] Darwin sidhe.keltia.net Version 9.2.0: Tue Feb 5 16:13:22 PST 2008; i386 ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adding .db support to pkg_tools
On Fri, May 09, 2008 at 06:50:10PM +0200, Anders Nore wrote: > Yes that would probably be bad for the database, but I'm sure one can > manage to get around this problem by copying it before changing the db and > delete the copy if it doesn't fail. At the next time executed it will check > for a copy, use that and assume that the last run was unsuccessful. /var/db/pkg contains 10MB for the various packages installed on my laptop and 10MB for the cache of +CONTENTS. You don't want to copy that around all the time. >> Secondly, I would also advisy against just storing all meta data in a >> single key/value pair. For example, +CONTENTS can be extremely large. >> Check texmf for a good example. > > When it comes to storing large values in a key/value pair, I think that's > what bdb was designed for, handling large amounts of data (in the orders of > gigabytes even in key's) fast. No, actually that is exactly what it was *not* designed for. Having billions of keys is fine, but data that exceeds the size of a database page is going to slow down. While it might not be a problem of you are copying the data to a new file anyway, it also means that fragmentation in the database will be more problematic. My main point is that for the interesting operations you want to actually look up with fine grained keys and that's what is not possible if you store the meta data as blob. In fact, storing the meta data as blob is not faster than just using the filesystem. Joerg ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adding .db support to pkg_tools
On Fri, 09 May 2008 14:43:08 +0200, Joerg Sonnenberger <[EMAIL PROTECTED]> wrote: On Fri, May 09, 2008 at 01:52:46PM +0200, Anders Nore wrote: I'm working on adding .db support to the pkg_tools( i.e. pkg_add, pkg_info, etc. ) as part of SoC 2008. The database api used is BerkeleyDB that comes with the base system (/usr/src/include/db.h). BerkeleyDB is not you're typical relational db, and can only save key/value pairs. The way I'm thinking of storing information to the .db is to name the keys as the directory names in /var/db/pkg. And save the +* files in the directories to the value element in the db, separated with a special character or similar. As one of the persons hacking on pkg_install in pkgsrc/NetBSD, I would *strongly* advisy you against storing the files only in a bdb file. The change of major and complete corruption with bdb185 is high, consider pulling the plug in the middle of a long update. Yes that would probably be bad for the database, but I'm sure one can manage to get around this problem by copying it before changing the db and delete the copy if it doesn't fail. At the next time executed it will check for a copy, use that and assume that the last run was unsuccessful. Note that this is not a replacement for the existing tools, although it might be in the future. It's a standalone replica of the old tools with bdb support and other smaller improvements. And for compatibility reasons the tools will use a hybrid method (saving to both .db and flat db), but reading and querying will only be done on the .db file. Secondly, I would also advisy against just storing all meta data in a single key/value pair. For example, +CONTENTS can be extremely large. Check texmf for a good example. When it comes to storing large values in a key/value pair, I think that's what bdb was designed for, handling large amounts of data (in the orders of gigabytes even in key's) fast. -Anders ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adding .db support to pkg_tools
On Friday 09 May 2008 13:52:46 Anders Nore wrote: > I'm working on adding .db support to the pkg_tools( i.e. pkg_add, > pkg_info, etc. ) as part of SoC 2008. Is this gonna be optional? > One problem lies with the +* files which is scripts (e.g., +INSTALL, > +DEINSTALL). I've gotten some input that it's bad to save scripts in the > db, but if it's not going to be saved there, then where? Isn't it possible > to execute a script without saving a file to disk? Like using "sh -c > 'string'". In my personal opinion it should not be a hybrid solution where > you save the script files in an old fashion way, for example > /var/db/pkg/someport-1.2_1/+INSTALL, and the rest of the information lies > in the .db file. Because then you have redundancy and that could lead to > inconsistencies. Don't know what the reasons are for people to say it's bad to save scripts in the db, but for me it would be that I can't inspect and/or modify them, though if there were support in the ports for PKG(DE)INSTALL_LOCAL, I suppose I could do without the modification part. -- Mel ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Adding .db support to pkg_tools
On Fri, May 09, 2008 at 01:52:46PM +0200, Anders Nore wrote: > I'm working on adding .db support to the pkg_tools( i.e. pkg_add, pkg_info, > etc. ) as part of SoC 2008. The database api used is BerkeleyDB that comes > with the base system (/usr/src/include/db.h). BerkeleyDB is not you're > typical relational db, and can only save key/value pairs. The way I'm > thinking of storing information to the .db is to name the keys as the > directory names in /var/db/pkg. And save the +* files in the directories to > the value element in the db, separated with a special character or similar. As one of the persons hacking on pkg_install in pkgsrc/NetBSD, I would *strongly* advisy you against storing the files only in a bdb file. The change of major and complete corruption with bdb185 is high, consider pulling the plug in the middle of a long update. Secondly, I would also advisy against just storing all meta data in a single key/value pair. For example, +CONTENTS can be extremely large. Check texmf for a good example. Joerg ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"