Re: [sqlite] Huge performance drop when using prepared statement
On 10/12/2007, Ed Pasma [EMAIL PROTECTED] wrote: Hello, think I got it, but it is disappointingly simple, see below. Ed. Markus Gritsch wrote: Even more strange: c.execute(SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB ? , ('tes*',)) takes less than 1ms but c.execute(SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB ? , ('test',)) takes several hundred ms. The execute in Python includes prepare (or get from cache), bind and the first step. The answer must be that the wait time lies in the first step. The engine is doing a full scan and it all depends how far in the table it needs to go to find the first match. So the bind values with * just come across a match sooner. Yes, I think your explanation is correct. Thank you. If I actually fetch all results by calling c.fetchall() after issuing the queries, both of them take several hundret ms. Thank you again, Markus - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Huge performance drop when using prepared statement
Hi, when using bind variables I get a huge performace drop compared to using a plain string. The query is demonstrated in the attached file problematic_query.py. The database used can be downloaded from http://xile.org/le/prepared_statement.zip (1.75 MB) or generated by using the attached file create_test_db.py. Kind regards, Markus # -*- coding: utf-8 -*- import time from pysqlite2 import dbapi2 as sqlite print 'pysqlite %s, sqlite %s' % (sqlite.version, sqlite.sqlite_version) x = sqlite.connect('test.db3') c = x.cursor() # get all entries which contain a specific word start = time.time() c.execute(SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB 'hui*' ) print '%.3f seconds' % (time.time() - start) # 0.000 seconds start = time.time() c.execute(SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB ? , ('hui*',)) print '%.3f seconds' % (time.time() - start) # 0.297 seconds c.close() x.close() # -*- coding: utf-8 -*- from random import choice import re from pysqlite2 import dbapi2 as sqlite ##x = sqlite.connect(':memory:') x = sqlite.connect('test.db3') c = x.cursor() def generate_words(): words = [] for count in range(20): words.append(''.join([choice('aeiou' if i%2 else 'bcdfghklmnprstw') for i in range(4)])) return ' '.join(words) # schema c.execute(CREATE TABLE entry ( id INTEGER PRIMARY KEY, note LONGTEXT NOT NULL )) c.execute(CREATE TABLE word ( id INTEGER PRIMARY KEY, word VARCHAR(40) NOT NULL UNIQUE )) c.execute(CREATE TABLE word_entry ( id INTEGER PRIMARY KEY, word_id INT NOT NULL CONSTRAINT word_id_exists REFERENCES word(id) , entry_id INT NOT NULL CONSTRAINT entry_id_exists REFERENCES entry(id) )) c.execute(CREATE INDEX word_entry_entryIndex ON word_entry (entry_id)) c.execute(CREATE UNIQUE INDEX word_entry_wordEntryIndex ON word_entry (word_id, entry_id)) # fill 'entry' table for count in range(1, 5001): c.execute('INSERT INTO entry (id, note) VALUES (?, ?)', (count, generate_words())) # build index regexp = re.compile('\w+', re.UNICODE) wordDict = {} id = 1 c.execute('SELECT id, note FROM entry') for entry_id, note in c.fetchall(): for word in set(regexp.findall(note.lower())): if len(word) = 40: if word in wordDict: word_id = wordDict[word] else: word_id = id c.execute('INSERT INTO word (id, word) VALUES (?, ?)', (id, word)) wordDict[word] = id id += 1 c.execute('INSERT INTO word_entry (word_id, entry_id) VALUES (?, ?)', (word_id, entry_id)) c.close() x.commit() x.close() - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Huge performance drop when using prepared statement
On Sun, 9 Dec 2007 10:55:16 +0100, Markus Gritsch [EMAIL PROTECTED] wrote: Hi, when using bind variables I get a huge performace drop compared to using a plain string. The query is demonstrated in the attached file problematic_query.py. Problematic SELECT: c.execute(SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB ? , ('hui*',)) I must admit I never use Python, but, considering the docs in http://docs.python.org/lib/module-sqlite3.html, shouldn't the last line be: , ('hui*')) (without the extra comma)? The database used can be downloaded from http://xile.org/le/prepared_statement.zip (1.75 MB) or generated by using the attached file create_test_db.py. Kind regards, Markus -- ( Kees Nuyt ) c[_] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Huge performance drop when using prepared statement
On 09/12/2007, Kees Nuyt [EMAIL PROTECTED] wrote: On Sun, 9 Dec 2007 10:55:16 +0100, Markus Gritsch [EMAIL PROTECTED] wrote: Hi, when using bind variables I get a huge performace drop compared to using a plain string. The query is demonstrated in the attached file problematic_query.py. Problematic SELECT: c.execute(SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB ? , ('hui*',)) I must admit I never use Python, but, considering the docs in http://docs.python.org/lib/module-sqlite3.html, shouldn't the last line be: , ('hui*')) (without the extra comma)? No, the extra comma is necessary to make ('hui*',) a tuple, however this is totally unrelated to the problem. Markus - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Huge performance drop when using prepared statement
On Dec 9, 2007, at 5:27 AM, Kees Nuyt wrote: Problematic SELECT: c.execute(SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB ? , ('hui*',)) SQLite will optimize a GLOB where the right parameter is a literal string. It will not do so if the right parameter is a parameter. http://www.sqlite.org/optoverview.html#like_opt D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Huge performance drop when using prepared statement
hi Markus, I just tried for fun: start = time.time() SQLString=('''SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB '%s' ''' % hui*) c.execute(SQLString) and it is as fast as your first one - seems a pysqlite problem to me I know they say this is not a secure way to do it -- well. Kind regards, W.Braun Markus Gritsch wrote: Hi, when using bind variables I get a huge performace drop compared to using a plain string. The query is demonstrated in the attached file problematic_query.py. The database used can be downloaded from http://xile.org/le/prepared_statement.zip (1.75 MB) or generated by using the attached file create_test_db.py. Kind regards, Markus - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Huge performance drop when using prepared statement
On 09/12/2007, D. Richard Hipp [EMAIL PROTECTED] wrote: On Dec 9, 2007, at 5:27 AM, Kees Nuyt wrote: Problematic SELECT: c.execute(SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB ? , ('hui*',)) SQLite will optimize a GLOB where the right parameter is a literal string. It will not do so if the right parameter is a parameter. http://www.sqlite.org/optoverview.html#like_opt Hmm, if I replace 'hui*' by 'hu*' c.execute(SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB ? , ('hu*',)) the query using bind variables also takes an unmeasurable short time period. Markus - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Huge performance drop when using prepared statement
On 09/12/2007, Mag. Wilhelm Braun [EMAIL PROTECTED] wrote: hi Markus, Hi Wilhelm I just tried for fun: start = time.time() SQLString=('''SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB '%s' ''' % hui*) c.execute(SQLString) and it is as fast as your first one - seems a pysqlite problem to me I know they say this is not a secure way to do it -- well. Your version does not use a prepared statement / bind variables. It just forges the string and passes it to SQLite a one string, being essentially the exact same query as the first one in my example. Markus - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Huge performance drop when using prepared statement
On 09/12/2007, Markus Gritsch [EMAIL PROTECTED] wrote: On 09/12/2007, D. Richard Hipp [EMAIL PROTECTED] wrote: SQLite will optimize a GLOB where the right parameter is a literal string. It will not do so if the right parameter is a parameter. http://www.sqlite.org/optoverview.html#like_opt Hmm, if I replace 'hui*' by 'hu*' c.execute(SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB ? , ('hu*',)) the query using bind variables also takes an unmeasurable short time period. Even more strange: c.execute(SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB ? , ('tes*',)) takes less than 1ms but c.execute(SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB ? , ('test',)) takes several hundred ms. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Huge performance drop when using prepared statement
Hello, think I got it, but it is disappointingly simple, see below. Ed. Markus Gritsch wrote: Even more strange: c.execute(SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB ? , ('tes*',)) takes less than 1ms but c.execute(SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB ? , ('test',)) takes several hundred ms. The execute in Python includes prepare (or get from cache), bind and the first step. The answer must be that the wait time lies in the first step. The engine is doing a full scan and it all depends how far in the table it needs to go to find the first match. So the bind values with * just come across a match sooner. Wilhelm Braun wrote: I just tried for fun: start = time.time() SQLString=('''SELECT * FROM entry, word, word_entry WHERE entry.id = word_entry.entry_id AND word.id = word_entry.word_id AND word.word GLOB '%s' ''' % hui*) c.execute(SQLString) and it is as fast as your first one - seems a pysqlite problem to me I know they say this is not a secure way to do it -- well. This seems the only solution after all. But it floods the wonderful pysqlite statement cache, with new SQL statements for each new bind value. Preferably, only the operator is substituted in the SQL, for GLOB or just =, depending on the actual bind value. That leaves just two different statements. But I don't know if the result is the same as I don't know GLOB very well. If it is affected by the case_sensitive_like pragma my idea is too simple. - To unsubscribe, send email to [EMAIL PROTECTED] -