Jesús Cea Avión <j...@jcea.es> added the comment:

The database compatibility is dictated by the underlying Berkeley DB library 
used. Reporter, please do this: (asuming you are using "bsddb" lib in the 
standard lib, not external project "pybsddb")

1. Open a python2.5 shell.

2. "import bsddb"

3. "print bsddb.__version__, bsddb.db.version()"

4. Post the numbers.

5. Repeat under python2.6.

In my machine, I get:

python2.5: 4.4.5.3 (4, 5, 20)
python2.6: 4.7.3 (4, 7, 25)

So under python2.5 I would be using Berkeley DB 4.5, and under python2.6 I am 
using Berkeley DB 4.7.

Berkeley DB has a defined procedure to upgrade databases. This is specially 
important if you are using a transactional datastore. BEWARE: There is *NO* 
downgrade path.

http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/am_upgrade.html

Most of the time, the database format doesn't change from version to version, 
but the environment do (specially the log format). Each Berkeley DB database 
release documentation includes a very detailed "upgrading" section. For 
instance:

http://www.oracle.com/technology/documentation/berkeley-db/db/installation/upgrade_11gr2_toc.html

Anyway the details are the following:

1. A database created with a X Berkeley DB can not be used in a Y version, if 
Y<X.

2. A database created with a X Berkeley DB can be used in a Y version, if Y>X, 
if you upgrade the environment/databases first to the new version.

The documented upgrade procedure is:

http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/upgrade_process.html


If you try to use an old format with a new library without updating, you should 
get a CLEAR error message:

"""
Python 2.5.2 (r252:60911, Mar 14 2008, 19:21:46) 
[GCC 4.2.1] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> dbenv=bsddb.db.DBEnv()
>>> dbenv.open(".", bsddb.db.DB_INIT_TXN | bsddb.db.DB_INIT_MPOOL | 
>>> bsddb.db.DB_INIT_LOG | bsddb.db.DB_CREATE)
>>> db=bsddb.db.DB(dbenv)
>>> db.open("file.db",flags=bsddb.db.DB_CREATE, dbtype=bsddb.db.DB_HASH)
>>> db.close()
>>> dbenv.close()
>>> 
Python 2.6.5 (r265:79063, Mar 22 2010, 12:17:26) 
[GCC 4.4.3] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> dbenv=bsddb.db.DBEnv()
>>> dbenv.open(".", bsddb.db.DB_INIT_TXN | bsddb.db.DB_INIT_MPOOL | 
>>> bsddb.db.DB_INIT_LOG | bsddb.db.DB_CREATE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
bsddb.db.DBError: (-30971, "DB_VERSION_MISMATCH: Database environment version 
mismatch -- Program version 4.7 doesn't match environment version 4.5")
"""

The error is pretty obvious.

If you go the other way:

"""
Python 2.6.5 (r265:79063, Mar 22 2010, 12:17:26) 
[GCC 4.4.3] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> 
>>> dbenv=bsddb.db.DBEnv()
>>> dbenv.open(".", bsddb.db.DB_INIT_TXN | bsddb.db.DB_INIT_MPOOL | 
>>> bsddb.db.DB_INIT_LOG | bsddb.db.DB_CREATE)
>>> db=bsddb.db.DB(dbenv)
>>> db.open("file.db",flags=bsddb.db.DB_CREATE, dbtype=bsddb.db.DB_HASH)
>>> db.close()
>>> dbenv.close()
>>> 
Python 2.5.2 (r252:60911, Mar 14 2008, 19:21:46) 
[GCC 4.2.1] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import bsddb
>>> dbenv=bsddb.db.DBEnv()
>>> dbenv.open(".", bsddb.db.DB_INIT_TXN | bsddb.db.DB_INIT_MPOOL | 
>>> bsddb.db.DB_INIT_LOG | bsddb.db.DB_CREATE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
bsddb.db.DBError: (-30972, "DB_VERSION_MISMATCH: Database environment version 
mismatch -- Program version 4.5 doesn't match environment version 4.7")
"""

So, no database corruption, but a clear error message. I guess Gram is 
reporting database corruption because it can't open the database for some 
reason, not that the DB is actually corrupted.

In your case, anyway, you are saying that you are using the same Berkeley DB 
both in python2.5 and 2.6, so all this explanation is not actually related. 
Please CONFIRM that.

If you are using actually the same BDB version, the next step is to try to open 
the DB manually (with a short python script like the posted one).

Remember, ALSO, that if you are using a BDB previous to 4.7, you can not simply 
copy an environment between different endianess machines. For instance, moving 
from PowerPC to x86. I think that was solved in BDB 4.7, IIRC. Or maybe 4.8.

Look at http://forums.oracle.com/forums/thread.jspa?messageID=3725053

About the speed, if you are using the same BerkeleyDB release, the speed should 
be the same. So the first step would be to actually know if you are using the 
same BDB version.

I guess the importing is doing a new transaction per imported record, flushing 
them to disk. Flushing is an expensive and slow operation. In a regular HD, 
that would limit the speed to 30-120 transactions per second, maximum 
(depending of your filesystem). The dependency of the filesystem could explain 
the difference between Linux and Windows.

The an approach would be to enclose ALL the imported records in a single 
transaction. If the imported is huge you can run out of BDB resources, so 
enclose every 1000 register in a transaction, for instance. Or increase BDB 
resource pool (shared regions).

Another option (the right approach :) I would do would be to insert each record 
in its own transaction, but configuring those transactions as "not flushing", 
to keep them in memory as long as possible. When the last transaction is 
committed, do a final huge flush/checkpoint.

Berkeley DB is amazing, but mastering it is difficult.

Anyway, confirm you are using the same BDB in python2.5 and 2.6, that you are 
not migrating from PowerPC to x86 and that you are not flushing transactions 
wildy (under Linux, use "dtrace" and lookout for "sync", "fsync", "datasync", 
or other related syscalls).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8504>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to