Re: Adding .db support to pkg_tools

2008-05-14 Thread Jeremy Lea
Hi,

On Wed, May 14, 2008 at 05:55:33PM +0930, Daniel O'Connor wrote:
> So long as you can tell it is corrupted..
> It's also a drag from a user POV when the tool crashes because the DB is 
> hosed (seen in portupgrade a number of times)

The simplest way to achieve some measure of protection (provided the
database itself is not buggy), is to use a single lock file in
/var/db/pkg.  The ports tools can create and lock this as needed, and if
the file exists, but they are able to obtain an exclusive lock on it,
then it's a dangling lock.  In that case, the tools crashed, and the
database caches should be rebuilt.

This is not fool proof, but it would reduce the number of FAQs on the
mailing lists.

This kills two birds with one stone, since it would serialize access to
/var/db/pkg, so that various tools don't stomp on one another.   An
exclusive lock can be held by pkg_create, pkg_add and pkg_delete as
needed.  pkg_info and pkg_version can downgrade the lock to a shared
lock, and just continue if they can get a shared lock, or block if the
file is exclusively locked.

One problem is the pkg_add calls itself recursively, so it would have to
drop the lock around those calls.  That could be hard.  If I recall
correctly, it was not that hard to remove this recursive behaviour. 
Also, since bsd.port.mk grubs around in /var/db/pkg, it would need to
learn to participate in the locking.

Regards,
  -Jeremy

-- 
FreeBSD - Because the best things in life are free...
   http://www.freebsd.org/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Adding .db support to pkg_tools

2008-05-14 Thread Daniel O'Connor
On Tue, 13 May 2008, Tim Kientzle wrote:
> > I think this is a really bad idea.  The problem with the tools is
> > not with the files.  It is that the files need to be parsed on each
> > run, often recursively, and your solution would not help at all.
>
> Parsing one file isn't expensive; parsing several hundred files
> to find one bit of information is expensive.
>
> > The database(s) should just be a cache of the information stored in
> > the files.
>
> Bingo!  As long as the .db version can be easily recreated
> from scratch from the master data stored in the same files
> as always, it doesn't really matter if the BDB is occasionally
> corrupted, as long as it can be rebuilt fairly quickly.

So long as you can tell it is corrupted..
It's also a drag from a user POV when the tool crashes because the DB is 
hosed (seen in portupgrade a number of times)

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


signature.asc
Description: This is a digitally signed message part.


Re: Adding .db support to pkg_tools

2008-05-12 Thread Tim Kientzle

I think this is a really bad idea.  The problem with the tools is not
with the files.  It is that the files need to be parsed on each run,
often recursively, and your solution would not help at all.


Parsing one file isn't expensive; parsing several hundred files
to find one bit of information is expensive.


The database(s) should just be a cache of the information stored in the
files.


Bingo!  As long as the .db version can be easily recreated
from scratch from the master data stored in the same files
as always, it doesn't really matter if the BDB is occasionally
corrupted, as long as it can be rebuilt fairly quickly.

So, the key points are:
  * Use the .db file for lookups that can't be done quickly
against the existing data.  In particular, look for operations
where the pkg tools have to scan a lot of files to verify a single
fact.
  * Ensure the .db file can be deleted and rebuilt from scratch
from the data in the regular files.

Tim Kientzle
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Adding .db support to pkg_tools

2008-05-12 Thread Jeremy Lea
Hi,

On Fri, May 09, 2008 at 01:52:46PM +0200, Anders Nore wrote:
> I'm working on adding .db support to the pkg_tools

I hope that you know that you're stepping into a hotly debated part of
the project...  Good luck.  My advice to you is to ignore any calls for
a 'complete rewrite' and to focus on releasing small patches from your
project rather than one big patch at the end.  Back each patch up with
performance numbers (I suggest pkg_add gnome2-x (from a local package
source), pkg_version and pkg_delete -r glib-2.x).

> The way I'm thinking of storing information to the .db is to name the
> keys as the directory names in /var/db/pkg. And save the +* files in
> the directories to the value element in the db, separated with a
> special character or similar.

I think this is a really bad idea.  The problem with the tools is not
with the files.  It is that the files need to be parsed on each run,
often recursively, and your solution would not help at all.

The database(s) should just be a cache of the information stored in the
files.  This is what the existing databases used by portupgrade and
NetBSD's tools do.  They store the files and which ports installed them,
so that it is easy to do conflict resolution, and some pkg_info
operations.

In addition, what you need is a cache which stores {pkg_name,pkg_origin}
pairs, where both are unique keys with a 1:1 relationship, and then a
{pkg_name,pkg_dep_name} directed graph, which stores the dependency
network between ports (aslo replacing +REQUIRED_BY).  Both can be
reconstructed if needed from the +CONTENTS files.  You could do both in
BDB, but you'll be better off loading both into memory completely (as
needed), using C to do the queries, and just maintaining the database
when things change (on pkg_add or pkg_delete).

SQLite would make this job easier in some ways, but hurt in others.  You
could use queries, which would make what was going on much more obvious,
but then it would be slower.  You also have to fight the anti-SQLite
lobby.

Ignore people who talk about disk space on embedded systems.  They're
just trolling. ;-) If they don't have the disk space for /var/db/pkg,
then they don't have the space for the ports themselves.  Besides,
removing the redundant copy of +MTREE_DIRS for each package would save
more space (and is as easy as checking the CVS revision of the new file
in each package and updating a single master copy (even
/etc/mtree/BSD.local.dist), if it is newer).

With regards to locking, which someone raised, the package tools are a
long way away from parallel modifications to /var/db/pkg.  People are
talking about parallel ports builds, but those have small windows where
the package database is being updated, and those updates need to be
sync'ed via a lock.

Regards,
  -Jeremy
 
-- 
FreeBSD - Because the best things in life are free...
   http://www.freebsd.org/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Adding .db support to pkg_tools

2008-05-12 Thread Jos Backus
On Mon, May 12, 2008 at 05:12:56PM +0200, Anders Nore wrote:
> One of the reasons for using BDB is that it is in the base system, SQLite 
> however is not.

I'm aware of that. But I believe that the pain and suffering of importing and
maintaining SQLite in the base (that is, the cost) is outweighed by its richer
feature set and higher-level API, facilitating easier tools development.

-- 
Jos Backus
jos at catnook.com
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Adding .db support to pkg_tools

2008-05-12 Thread Joerg Sonnenberger
On Mon, May 12, 2008 at 08:47:53AM -0700, Jeremy Chadwick wrote:
> Secondly, the following FAQ entry and documentation from Mozilla is of
> concern, specifically the last paragraph of the FAQ entry, since there
> is ongoing work in the ports collection to support parallel building,
> which would surely use fork() or locks):

I don't think this is applicable.

> Thirdly, there is the concern of disk space.  There are a substantial
> number of users who have FreeBSD on embedded systems or systems with
> little or no space, who use binary packages (pkg_add -r), which of
> course still makes use of /var/db/pkg.  SQLite DBs grow over time, and
> don't shrink in size unless you use auto-vacuum mode or execute the
> VACUUM query:

It is trivial to recreate the database, so I don't think this is a
concern either. I should point again that it will result in a
non-trivial increase in size.

> Fourthly, it doesn't sound as if storage SQLite databases on an NFS
> mount is a good idea, and I've a feeling some people have /var NFS
> mounted (diskless systems for example):

Same applies to BDB and basically anything that depends on fcntl to
work. Basically, as long as you have /var per host (you don't share
/var, do you?), it will just work as the *local* locks are working. The
problem with NFS comes form the fact that multiple hosts might not share
them.

Joerg
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Adding .db support to pkg_tools

2008-05-12 Thread Jeremy Chadwick
On Mon, May 12, 2008 at 05:12:56PM +0200, Anders Nore wrote:
> On Sun, 11 May 2008 21:07:41 +0200, Jos Backus <[EMAIL PROTECTED]> wrote:
>
>> On Sun, May 11, 2008 at 06:38:25AM -0700, Garrett Cooper wrote:
>>> +1. BDB is quite easy to corrupt...
>>
>> If we're going to use a binary file format, please consider using SQLite
>> instead. It has the right license, a nice API (transactions!) and is 
>> robust
>> enough for yum to use it for a similar purpose.
>>
>
> One of the reasons for using BDB is that it is in the base system, SQLite 
> however is not.

Interesting discussion, and there's some irony in it (SQLite uses a very
BDB-like a storage medium, despite being a relational database system).

SQLite's license is compatible/friendly, its API is decent, the source
is fairly small and binary footprint (disk and memory both) are small,
and it includes tools for dumping databases to ASCII so developers
aren't forced to write DB dumping tools and users don't have to resort
to using hexdump.

A few things worth pointing out:

With regards to DB1.x being easily corrupted, this is true, and I have
heard claims of it from a couple developer-folk I worked with many years
ago at Best Internet.  On the other hand, SQLite appears to have a
history of this as well, although those bugs have been fixed each and
every time:

http://www.sqlite.org/cvstrac/wiki?p=DatabaseCorruption

Secondly, the following FAQ entry and documentation from Mozilla is of
concern, specifically the last paragraph of the FAQ entry, since there
is ongoing work in the ports collection to support parallel building,
which would surely use fork() or locks):

http://www.sqlite.org/faq.html#q6
http://developer.mozilla.org/en/docs/Storage#SQLite_Locking

Thirdly, there is the concern of disk space.  There are a substantial
number of users who have FreeBSD on embedded systems or systems with
little or no space, who use binary packages (pkg_add -r), which of
course still makes use of /var/db/pkg.  SQLite DBs grow over time, and
don't shrink in size unless you use auto-vacuum mode or execute the
VACUUM query:

http://www.sqlite.org/faq.html#q12

Fourthly, it doesn't sound as if storage SQLite databases on an NFS
mount is a good idea, and I've a feeling some people have /var NFS
mounted (diskless systems for example):

http://www.sqlite.org/faq.html#q5

Finally, there's the concern about the SQLite file format changing.
Sometimes they're backwards-compatible, other times not.

http://www.sqlite.org/formatchng.html
http://www.sqlite.org/fileformat.html

Just stuff to consider.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Adding .db support to pkg_tools

2008-05-12 Thread Anders Nore

On Sun, 11 May 2008 21:07:41 +0200, Jos Backus <[EMAIL PROTECTED]> wrote:


On Sun, May 11, 2008 at 06:38:25AM -0700, Garrett Cooper wrote:

+1. BDB is quite easy to corrupt...


If we're going to use a binary file format, please consider using SQLite
instead. It has the right license, a nice API (transactions!) and is  
robust

enough for yum to use it for a similar purpose.



One of the reasons for using BDB is that it is in the base system, SQLite  
however is not.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Adding .db support to pkg_tools

2008-05-11 Thread Jos Backus
On Sun, May 11, 2008 at 06:38:25AM -0700, Garrett Cooper wrote:
> +1. BDB is quite easy to corrupt...

If we're going to use a binary file format, please consider using SQLite
instead. It has the right license, a nice API (transactions!) and is robust
enough for yum to use it for a similar purpose.

-- 
Jos Backus
jos at catnook.com
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Adding .db support to pkg_tools

2008-05-11 Thread Garrett Cooper

On May 9, 2008, at 5:43 AM, Joerg Sonnenberger wrote:


On Fri, May 09, 2008 at 01:52:46PM +0200, Anders Nore wrote:
I'm working on adding .db support to the pkg_tools( i.e. pkg_add,  
pkg_info,
etc. ) as part of SoC 2008. The database api used is BerkeleyDB  
that comes
with the base system (/usr/src/include/db.h). BerkeleyDB is not  
you're

typical relational db, and can only save key/value pairs. The way I'm
thinking of storing information to the .db is to name the keys as the
directory names in /var/db/pkg. And save the +* files in the  
directories to
the value element in the db, separated with a special character or  
similar.


As one of the persons hacking on pkg_install in pkgsrc/NetBSD, I would
*strongly* advisy you against storing the files only in a bdb file.
The change of major and complete corruption with bdb185 is high,
consider pulling the plug in the middle of a long update.


+1. BDB is quite easy to corrupt...
-Garrett
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Adding .db support to pkg_tools

2008-05-09 Thread Joerg Sonnenberger
On Fri, May 09, 2008 at 07:54:40PM +0200, Anders Nore wrote:
> You are probably right, but how would you store the key's? Is storing the 
> key as e.g., 'portname-1.2_3+CONTENT' a good solution?

I'd just use a different db file. I am not sure how much the following
applies to FreeBSD as pkg_install has diverted a lot. The most expensive
operations during pkg_add and pkg_info are scans for conflicts (explicit
via @pkgcfl or implicit due to overlapping file lists) as they need to
compare the to-be-installed package with all existing ones. After that
come directory scans to resolve dependencies. Everything else is really
just "open this small file and extract some data from it", where small
usually means less than one block. Putting that into a database can help
or not, but I don't think it is relevant. So the most important
operations to support a btree of all files (implemented in
NetBSD/pkgsrc) and a btree of all @pkgcfl/@pkgdb (not implemented yet).

Joerg
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Adding .db support to pkg_tools

2008-05-09 Thread Anders Nore
On Fri, 09 May 2008 19:06:33 +0200, Joerg Sonnenberger  
<[EMAIL PROTECTED]> wrote:



On Fri, May 09, 2008 at 06:50:10PM +0200, Anders Nore wrote:

Yes that would probably be bad for the database, but I'm sure one can
manage to get around this problem by copying it before changing the db  
and
delete the copy if it doesn't fail. At the next time executed it will  
check

for a copy, use that and assume that the last run was unsuccessful.


/var/db/pkg contains 10MB for the various packages installed on my
laptop and 10MB for the cache of +CONTENTS. You don't want to copy that
around all the time.


Secondly, I would also advisy against just storing all meta data in a
single key/value pair. For example, +CONTENTS can be extremely large.
Check texmf for a good example.


When it comes to storing large values in a key/value pair, I think  
that's
what bdb was designed for, handling large amounts of data (in the  
orders of

gigabytes even in key's) fast.


No, actually that is exactly what it was *not* designed for. Having
billions of keys is fine, but data that exceeds the size of a database
page is going to slow down. While it might not be a problem of you are
copying the data to a new file anyway, it also means that fragmentation
in the database will be more problematic.

My main point is that for the interesting operations you want to
actually look up with fine grained keys and that's what is not possible
if you store the meta data as blob. In fact, storing the meta data as
blob is not faster than just using the filesystem.



You are probably right, but how would you store the key's? Is storing the  
key as e.g., 'portname-1.2_3+CONTENT' a good solution?

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Adding .db support to pkg_tools

2008-05-09 Thread Ollivier Robert
According to Anders Nore:
> with the base system (/usr/src/include/db.h). BerkeleyDB is not you're 
> typical relational db, and can only save key/value pairs. The way I'm 
> thinking of storing information to the .db is to name the keys as the 
> directory names in /var/db/pkg. And save the +* files in the directories to 
> the value element in the db, separated with a special character or similar.

Please make it compatible with the existing infrastructure in
/var/db/pkg maintained by portinstall/portupgrade.  If it can be improved
as a side-effect, that would be even better.
-- 
Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- [EMAIL PROTECTED]
Darwin sidhe.keltia.net Version 9.2.0: Tue Feb  5 16:13:22 PST 2008; i386

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Adding .db support to pkg_tools

2008-05-09 Thread Joerg Sonnenberger
On Fri, May 09, 2008 at 06:50:10PM +0200, Anders Nore wrote:
> Yes that would probably be bad for the database, but I'm sure one can 
> manage to get around this problem by copying it before changing the db and 
> delete the copy if it doesn't fail. At the next time executed it will check 
> for a copy, use that and assume that the last run was unsuccessful.

/var/db/pkg contains 10MB for the various packages installed on my
laptop and 10MB for the cache of +CONTENTS. You don't want to copy that
around all the time.

>> Secondly, I would also advisy against just storing all meta data in a
>> single key/value pair. For example, +CONTENTS can be extremely large.
>> Check texmf for a good example.
>
> When it comes to storing large values in a key/value pair, I think that's 
> what bdb was designed for, handling large amounts of data (in the orders of 
> gigabytes even in key's) fast.

No, actually that is exactly what it was *not* designed for. Having
billions of keys is fine, but data that exceeds the size of a database
page is going to slow down. While it might not be a problem of you are
copying the data to a new file anyway, it also means that fragmentation
in the database will be more problematic.

My main point is that for the interesting operations you want to
actually look up with fine grained keys and that's what is not possible
if you store the meta data as blob. In fact, storing the meta data as
blob is not faster than just using the filesystem.

Joerg
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Adding .db support to pkg_tools

2008-05-09 Thread Anders Nore
On Fri, 09 May 2008 14:43:08 +0200, Joerg Sonnenberger  
<[EMAIL PROTECTED]> wrote:



On Fri, May 09, 2008 at 01:52:46PM +0200, Anders Nore wrote:
I'm working on adding .db support to the pkg_tools( i.e. pkg_add,  
pkg_info,
etc. ) as part of SoC 2008. The database api used is BerkeleyDB that  
comes

with the base system (/usr/src/include/db.h). BerkeleyDB is not you're
typical relational db, and can only save key/value pairs. The way I'm
thinking of storing information to the .db is to name the keys as the
directory names in /var/db/pkg. And save the +* files in the  
directories to
the value element in the db, separated with a special character or  
similar.


As one of the persons hacking on pkg_install in pkgsrc/NetBSD, I would
*strongly* advisy you against storing the files only in a bdb file.
The change of major and complete corruption with bdb185 is high,
consider pulling the plug in the middle of a long update.


Yes that would probably be bad for the database, but I'm sure one can  
manage to get around this problem by copying it before changing the db and  
delete the copy if it doesn't fail. At the next time executed it will  
check for a copy, use that and assume that the last run was unsuccessful.


Note that this is not a replacement for the existing tools, although it  
might be in the future. It's a standalone replica of the old tools with  
bdb support and other smaller improvements. And for compatibility reasons  
the tools will use a hybrid method (saving to both .db and flat db), but  
reading and querying will only be done on the .db file.



Secondly, I would also advisy against just storing all meta data in a
single key/value pair. For example, +CONTENTS can be extremely large.
Check texmf for a good example.


When it comes to storing large values in a key/value pair, I think that's  
what bdb was designed for, handling large amounts of data (in the orders  
of gigabytes even in key's) fast.


-Anders
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Adding .db support to pkg_tools

2008-05-09 Thread Mel
On Friday 09 May 2008 13:52:46 Anders Nore wrote:

> I'm working on adding .db support to the pkg_tools( i.e. pkg_add,
> pkg_info, etc. ) as part of SoC 2008.

Is this gonna be optional?

> One problem lies with the +* files which is scripts (e.g., +INSTALL,
> +DEINSTALL). I've gotten some input that it's bad to save scripts in the
> db, but if it's not going to be saved there, then where? Isn't it possible
> to execute a script without saving a file to disk? Like using "sh -c
> 'string'". In my personal opinion it should not be a hybrid solution where
> you save the script files in an old fashion way, for example
> /var/db/pkg/someport-1.2_1/+INSTALL, and the rest of the information lies
> in the .db file. Because then you have redundancy and that could lead to
> inconsistencies.

Don't know what the reasons are for people to say it's bad to save scripts in 
the db, but for me it would be that I can't inspect and/or modify them, 
though if there were support in the ports for PKG(DE)INSTALL_LOCAL, I suppose 
I could do without the modification part.
-- 
Mel
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Adding .db support to pkg_tools

2008-05-09 Thread Joerg Sonnenberger
On Fri, May 09, 2008 at 01:52:46PM +0200, Anders Nore wrote:
> I'm working on adding .db support to the pkg_tools( i.e. pkg_add, pkg_info, 
> etc. ) as part of SoC 2008. The database api used is BerkeleyDB that comes 
> with the base system (/usr/src/include/db.h). BerkeleyDB is not you're 
> typical relational db, and can only save key/value pairs. The way I'm 
> thinking of storing information to the .db is to name the keys as the 
> directory names in /var/db/pkg. And save the +* files in the directories to 
> the value element in the db, separated with a special character or similar.

As one of the persons hacking on pkg_install in pkgsrc/NetBSD, I would
*strongly* advisy you against storing the files only in a bdb file.
The change of major and complete corruption with bdb185 is high,
consider pulling the plug in the middle of a long update.

Secondly, I would also advisy against just storing all meta data in a
single key/value pair. For example, +CONTENTS can be extremely large.
Check texmf for a good example.

Joerg
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"