I'm writing a small application for detecting source code plagiarism that currently relies on a database to store lines of code.
The application has two primary functions: adding a new file to the database and comparing a file to those that are already stored in the database. I started out using sqlite3, but was not satisfied with the performance results. I then tried using psycopg2 with a local postgresql server, and the performance got even worse. My simple benchmarks show that sqlite3 is an average of 3.5 times faster at inserting a file, and on average less than a tenth of a second slower than psycopg2 at matching a file. I expected postgresql to be a lot faster ... is there some peculiarity in psycopg2 that could be causing slowdown? Are these performance results typical? Any suggestions on what to try from here? I don't think my code/queries are inherently slow, but I'm not a DBA or a very accomplished Python developer, so I could be wrong. Any advice is appreciated.
-- http://mail.python.org/mailman/listinfo/python-list