hi folks -- a quick python and sqlite3 performance question. i find that inserting a million rows of in-memory data into an in-memory database via a single executemany() is about 30% slower than using the sqlite3 CLI and the .import command (reading the same data from a disk file, even.) i find this surprising, executemany() i assume is using a prepared statement and this is exactly what the .import command does (based on my quick perusal of the source.)
is this discrepancy to be expected? where is the overhead coming from? for full disclosure: the python code is at the end; run it first to generate the data file. to test the CLI, i couldn't find a better way than to create an init file "sqlcmds" containing create table test (k int primary key, v int not null); .import data test and then run time sqlite3 -init sqlcmds ':memory:' '.quit' the python code is #!/usr/bin/env python import sqlite3, random, timeit con = None def prepare(): global con, cur con=sqlite3.connect(':memory:') con.isolation_level="EXCLUSIVE" cur=con.cursor() def ins(): global con, data, cur try: cur.execute('drop table test') except: pass cur.execute('create table test (key int primary key, val int not null);') con.commit() cur.executemany("INSERT into test (key, val) values (?, 10)",data) con.commit() hs = {} print 'generating data...' size = 1000000 data = [[a] for a in random.sample(xrange(10000000), size)] print 'done!' # save the data for the sqlite3 CLI f = file('data', 'w') f.writelines([str(a[0])+"|10\n" for a in data]) f.close() print 'testing ' + str(size) + ' inserts...' # test 100K inserts t = timeit.Timer(stmt='ins()', setup="import sqlite3\nfrom __main__ import prepare, ins \nprepare()") print t.repeat(3,1) thanks, ben -- http://mail.python.org/mailman/listinfo/python-list