On May 14, 2007, at 10:03 PM, Garrett Cooper wrote:

Bert JW Regeer wrote:
On May 12, 2007, at 5:14 AM, Philippe Laquet wrote:
Stanislav Sedov a écrit :
On Fri, 11 May 2007 02:10:05 +0200
Ivan Voras <[EMAIL PROTECTED]> mentioned:


- I think it's time to give up on using BDB+directory tree full of text files for storing the installed packages database, and I propose all of this be replaced by a single SQLite database. SQLite is public domain (can be slurped into base system), embeddable, stores all data in a single file, lightweight, fast, and can be used to do fancy things such
as reporting.


What is the reason to use SQL-based database? You'll perform direct
queries to database? The packaging system is for ordinal users, not sql geeks, so they should not have to use sql for managing packages. So a simple set of hashes will suffer or needs. I agree with Julian that we should have a backup of packaging database in plain text format, and utility to rebuild it. This way we can always restore the database if something goes wrong. Furhtermore, that should not make a great impact
on performance, since we don't have to rebuild it every day.

I agree with Stan ;)

"fast and improved" package utilities uses mainly some indexed berkeley DB combined with flat files, aren't they? I, and may be many other FreeBSD users use light systems for efficiency and easier management, if we use some database system it will require Disk Space, resources for the DB to run, dependencies and so on... And we also may be exposed to a "that DB is better" war ;)

SQLite is compiled inside a program, and as such does not require any resources other than one file handle and some CPU time when querying. The file is stored on disk, and requires no separate process to be running to query. Maybe I misunderstood what you were trying to say. SQLite will require less resources than flat text files, since SQLite is a one time open then process, instead of what is currently happening, having to open and close hundreds of files depending on how many ports are installed. With this regard, SQLite is like BDB. Where SQLite uses standards compliant SQL statements to get data.

Correct. From what I was reading shared memory read access and locking are two available features of BDB databases.

The only thing is that I do agree that there should be a dumping and importing mechanism of some kind for semi-formatted text files, for backup, debugging, and modification purposes. That's just my personal idea on the topic though :).

--
Stanislav Sedov
ST4096-RIPE


I am able to understand many of the gripes with using a databases, and have to import yet another code base into the FreeBSD base, however as one of the young ones, and knowing sed/awk/grep and SQL, I prefer SQL over having to process hundreds of text files using text processing tools. It saddens me each time I run one of the pkg_* tools that needs to parse the flat file structure since it takes so long. I have friends running Ubuntu and their apt-get returns results much faster. In a world where hard drives are becoming more reliable, and are automatically relocating sectors that go bad, do we really have to worry about database corruption as much? I feel that many of the fears that are being put forward will do harm to a text based "storage" system as well. If one block drops out, it can cause tools to not be able to parse the files. Create a backup copy of the database after each successful transaction? There are ways to battle data corruption.

True. I was thinking of backup, and recreation from scratch, considering that the database wouldn't be more than a few megs. In place replacement just seems like a hairy situation sometimes..

Using BDB is not an real option either. I can not even count the amount of times that the BDB database that portupgrade created has become corrupt because I accidently ran two portupgrades at the same time, or even remembered that I did not want to upgrade something and hit Ctrl+C.

I'm sorry but nothing's completely solid in that respect, AFAIK. In terms of the first problem you mentioned, Wade is working on the locking <http://wiki.freebsd.org/WadeWesolowsky>.

In terms of transactions, maybe we should take a look at Subversion for inspiration: <http://svn.haxx.se/dev/ archive-2005-03/0301.shtml>. I'm a firm believer that it's easier to incorporate code than it is to remove it.

I am unable to see any references to transaction support for BDB databases, maybe I am missing something. Subversion in that thread is suggesting SQL for a totally different reason. fsfs is what most people are using as a subversion backend to help avoid BDB corruption. From the many people I have talked to that used to use Subversion with BDB have had major issues, whereas fsfs has not had any issues at all.

Just what I have experienced myself as a Subversion repository administrator.


The experience I got from running SVN with BDB as the back-end database to store my data, I say no thanks. In that case I would much rather stick with the flat text files than go with a database.

Well, a few comments:

-Text files are bloated. Although many people are for XML, it takes much longer to parse than binary databases.

/var/db/pkg/ are all plain flat text files. I am not a supporter of XML at all.

-Custom text files require custom format capable parsers, no matter what the format, and the less coverage a parser has, the more probable the likelihood of bugs IMO.

We already have these in the pkg_* functions, so i'd hope they are fairly solid!

-In the event that features changed or were added, some required modifications to the parser could be trivial to major. With databases you can get away from that mentality to some degree IMHO.

Changing an SQL query versus re-writing a parser for text files is a huge difference.

-Garrett

I am not opposed to text files, other than that they can be slow. I am against BDB because over the years, in my experience they have shown to be extremely unreliable and easily corrupted. If we are going to be making changes to the way the ports/packages store the information about what exists, it should be done in such a way that it is scalable and at the same time extensible (is this a word?).

Bert JW Regeer

Reply via email to