On Tue, Aug 07, 2007 at 06:53:32AM +0200, "Martin v. L?wis" wrote: > > I guess we have to rethink our use of these databases somewhat. > > Ok. In the interest of progress, I'll be looking at coming up with > some fixes for the code base right now; as we agree that the > underlying semantics is bytes:bytes, any encoding wrappers on > top of it can be added later.
The underlying Modules/_bsddb.c today uses PyArg_Parse(..., "s#", ...) which if i read Python/getargs.c correctly is very lenient on the input types it accepts. It appears to accept anything with a buffer API, auto-converting unicode to the default encoding as needed. IMHO all of that is desirable in many situations but it is not strict. bytes:bytes or int:bytes (depending on the database type) are fundamentally all the C berkeleydb library knows. Attaching meaning to the keys and values is up to the user. I'm about to try a _bsddb.c that strictly enforces bytes as values for the underlying bsddb.db API provided by _bsddb in my sandbox under the assumption that being strict about bytes is desired at that level there. I predict lots of Lib/bsddb/test/ edits. > My concern is that people need to access existing databases. It's > all fine that the code accessing them breaks, and that they have > to actively port to Py3k. However, telling them that they have to > represent the keys in their dbm disk files in a different manner > might cause a revolt... agreed. thus the importance of allowing bytes:bytes. _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
