Re: [sqlite] Huge performance drop when using prepared statement

2007-12-10 Thread Markus Gritsch
On 10/12/2007, Ed Pasma [EMAIL PROTECTED] wrote:
 Hello, think I got it, but it is disappointingly simple, see below. Ed.

 Markus Gritsch wrote:

  Even more strange:
 
  c.execute(SELECT * FROM entry, word, word_entry WHERE
   entry.id = word_entry.entry_id AND
   word.id = word_entry.word_id AND
   word.word GLOB ?
  , ('tes*',))
 
  takes less than 1ms but
 
  c.execute(SELECT * FROM entry, word, word_entry WHERE
   entry.id = word_entry.entry_id AND
   word.id = word_entry.word_id AND
   word.word GLOB ?
  , ('test',))
 
  takes several hundred ms.

 The execute in Python includes prepare (or get from cache), bind and
 the first step.
 The answer must be that the wait time lies in the first step.
 The engine is doing a full scan and it all depends how far in the
 table it needs to go to find the first match.
 So the bind values with * just come across a match sooner.

Yes, I think your explanation is correct.  Thank you.  If I actually
fetch all results by calling c.fetchall() after issuing the queries,
both of them take several hundret ms.

Thank you again,
Markus

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Huge performance drop when using prepared statement

2007-12-09 Thread Markus Gritsch
Hi,

when using bind variables I get a huge performace drop compared to
using a plain string.  The query is demonstrated in the attached file
problematic_query.py.

The database used can be downloaded from
  http://xile.org/le/prepared_statement.zip (1.75 MB)

or generated by using the attached file create_test_db.py.

Kind regards,
Markus
# -*- coding: utf-8 -*-

import time
from pysqlite2 import dbapi2 as sqlite
print 'pysqlite %s, sqlite %s' % (sqlite.version, sqlite.sqlite_version)

x = sqlite.connect('test.db3')
c = x.cursor()

# get all entries which contain a specific word

start = time.time()
c.execute(SELECT * FROM entry, word, word_entry WHERE
 entry.id = word_entry.entry_id AND
 word.id = word_entry.word_id AND
 word.word GLOB 'hui*'
)
print '%.3f seconds' % (time.time() - start) # 0.000 seconds

start = time.time()
c.execute(SELECT * FROM entry, word, word_entry WHERE
 entry.id = word_entry.entry_id AND
 word.id = word_entry.word_id AND
 word.word GLOB ?
, ('hui*',))
print '%.3f seconds' % (time.time() - start) # 0.297 seconds

c.close()
x.close()
# -*- coding: utf-8 -*-

from random import choice
import re
from pysqlite2 import dbapi2 as sqlite

##x = sqlite.connect(':memory:')
x = sqlite.connect('test.db3')
c = x.cursor()

def generate_words():
words = []
for count in range(20):
 words.append(''.join([choice('aeiou' if i%2 else 'bcdfghklmnprstw') for i in range(4)]))
return ' '.join(words)

# schema
c.execute(CREATE TABLE entry (
id INTEGER PRIMARY KEY,
note LONGTEXT NOT NULL
))
c.execute(CREATE TABLE word (
id INTEGER PRIMARY KEY,
word VARCHAR(40) NOT NULL UNIQUE
))
c.execute(CREATE TABLE word_entry (
id INTEGER PRIMARY KEY,
word_id INT NOT NULL CONSTRAINT word_id_exists REFERENCES word(id) ,
entry_id INT NOT NULL CONSTRAINT entry_id_exists REFERENCES entry(id)
))
c.execute(CREATE INDEX word_entry_entryIndex ON word_entry (entry_id))
c.execute(CREATE UNIQUE INDEX word_entry_wordEntryIndex ON word_entry (word_id, entry_id))

# fill 'entry' table
for count in range(1, 5001):
c.execute('INSERT INTO entry (id, note) VALUES (?, ?)', (count, generate_words()))

# build index
regexp = re.compile('\w+', re.UNICODE)
wordDict = {}
id = 1
c.execute('SELECT id, note FROM entry')
for entry_id, note in c.fetchall():
for word in set(regexp.findall(note.lower())):
if len(word) = 40:
if word in wordDict:
word_id = wordDict[word]
else:
word_id = id
c.execute('INSERT INTO word (id, word) VALUES (?, ?)', (id, word))
wordDict[word] = id
id += 1
c.execute('INSERT INTO word_entry (word_id, entry_id) VALUES (?, ?)', (word_id, entry_id))

c.close()
x.commit()
x.close()
-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] Huge performance drop when using prepared statement

2007-12-09 Thread Kees Nuyt
On Sun, 9 Dec 2007 10:55:16 +0100, Markus Gritsch
[EMAIL PROTECTED] wrote:

Hi,

when using bind variables I get a huge performace drop compared to
using a plain string.  The query is demonstrated in the attached file
problematic_query.py.

Problematic SELECT:
 c.execute(SELECT * FROM entry, word, word_entry WHERE
  entry.id = word_entry.entry_id AND
  word.id = word_entry.word_id AND
  word.word GLOB ?
 , ('hui*',))

I must admit I never use Python, but, considering the docs in
http://docs.python.org/lib/module-sqlite3.html, shouldn't the
last line be:

, ('hui*'))

(without the extra comma)?

The database used can be downloaded from
  http://xile.org/le/prepared_statement.zip (1.75 MB)

or generated by using the attached file create_test_db.py.

Kind regards,
Markus
-- 
  (  Kees Nuyt
  )
c[_]

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Huge performance drop when using prepared statement

2007-12-09 Thread Markus Gritsch
On 09/12/2007, Kees Nuyt [EMAIL PROTECTED] wrote:
 On Sun, 9 Dec 2007 10:55:16 +0100, Markus Gritsch
 [EMAIL PROTECTED] wrote:

 Hi,
 
 when using bind variables I get a huge performace drop compared to
 using a plain string.  The query is demonstrated in the attached file
 problematic_query.py.

 Problematic SELECT:
  c.execute(SELECT * FROM entry, word, word_entry WHERE
   entry.id = word_entry.entry_id AND
   word.id = word_entry.word_id AND
   word.word GLOB ?
  , ('hui*',))

 I must admit I never use Python, but, considering the docs in
 http://docs.python.org/lib/module-sqlite3.html, shouldn't the
 last line be:

 , ('hui*'))

 (without the extra comma)?

No, the extra comma is necessary to make ('hui*',) a tuple, however
this is totally unrelated to the problem.

Markus

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Huge performance drop when using prepared statement

2007-12-09 Thread D. Richard Hipp


On Dec 9, 2007, at 5:27 AM, Kees Nuyt wrote:



Problematic SELECT:

c.execute(SELECT * FROM entry, word, word_entry WHERE
 entry.id = word_entry.entry_id AND
 word.id = word_entry.word_id AND
 word.word GLOB ?
, ('hui*',))




SQLite will optimize a GLOB where the right parameter is
a literal string.  It will not do so if the right parameter is a
parameter.  http://www.sqlite.org/optoverview.html#like_opt



D. Richard Hipp
[EMAIL PROTECTED]




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Huge performance drop when using prepared statement

2007-12-09 Thread Mag. Wilhelm Braun

hi Markus,

I just tried for fun:

start = time.time()
SQLString=('''SELECT * FROM entry, word, word_entry WHERE
entry.id = word_entry.entry_id AND
word.id = word_entry.word_id AND
word.word GLOB '%s'
''' % hui*)

c.execute(SQLString)

and it is as fast as your first one - seems a pysqlite problem to me

I know they say this is not a secure way to do it -- well.


Kind regards,

W.Braun



Markus Gritsch wrote:

Hi,

when using bind variables I get a huge performace drop compared to
using a plain string.  The query is demonstrated in the attached file
problematic_query.py.

The database used can be downloaded from
  http://xile.org/le/prepared_statement.zip (1.75 MB)

or generated by using the attached file create_test_db.py.

Kind regards,
Markus
  



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Huge performance drop when using prepared statement

2007-12-09 Thread Markus Gritsch
On 09/12/2007, D. Richard Hipp [EMAIL PROTECTED] wrote:

 On Dec 9, 2007, at 5:27 AM, Kees Nuyt wrote:

  Problematic SELECT:
  c.execute(SELECT * FROM entry, word, word_entry WHERE
   entry.id = word_entry.entry_id AND
   word.id = word_entry.word_id AND
   word.word GLOB ?
  , ('hui*',))
 

 SQLite will optimize a GLOB where the right parameter is
 a literal string.  It will not do so if the right parameter is a
 parameter.  http://www.sqlite.org/optoverview.html#like_opt

Hmm, if I replace 'hui*' by 'hu*'

c.execute(SELECT * FROM entry, word, word_entry WHERE
 entry.id = word_entry.entry_id AND
 word.id = word_entry.word_id AND
 word.word GLOB ?
, ('hu*',))

the query using bind variables also takes an unmeasurable short time period.

Markus

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Huge performance drop when using prepared statement

2007-12-09 Thread Markus Gritsch
On 09/12/2007, Mag. Wilhelm Braun [EMAIL PROTECTED] wrote:
 hi Markus,

Hi Wilhelm

 I just tried for fun:

 start = time.time()
 SQLString=('''SELECT * FROM entry, word, word_entry WHERE
  entry.id = word_entry.entry_id AND
  word.id = word_entry.word_id AND
  word.word GLOB '%s'
 ''' % hui*)

 c.execute(SQLString)

 and it is as fast as your first one - seems a pysqlite problem to me

 I know they say this is not a secure way to do it -- well.

Your version does not use a prepared statement / bind variables.  It
just forges the string and passes it to SQLite a one string, being
essentially the exact same query as the first one in my example.

Markus

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Huge performance drop when using prepared statement

2007-12-09 Thread Markus Gritsch
On 09/12/2007, Markus Gritsch [EMAIL PROTECTED] wrote:
 On 09/12/2007, D. Richard Hipp [EMAIL PROTECTED] wrote:
 
  SQLite will optimize a GLOB where the right parameter is
  a literal string.  It will not do so if the right parameter is a
  parameter.  http://www.sqlite.org/optoverview.html#like_opt

 Hmm, if I replace 'hui*' by 'hu*'

 c.execute(SELECT * FROM entry, word, word_entry WHERE
  entry.id = word_entry.entry_id AND
  word.id = word_entry.word_id AND
  word.word GLOB ?
 , ('hu*',))

 the query using bind variables also takes an unmeasurable short time period.

Even more strange:

c.execute(SELECT * FROM entry, word, word_entry WHERE
 entry.id = word_entry.entry_id AND
 word.id = word_entry.word_id AND
 word.word GLOB ?
, ('tes*',))

takes less than 1ms but

c.execute(SELECT * FROM entry, word, word_entry WHERE
 entry.id = word_entry.entry_id AND
 word.id = word_entry.word_id AND
 word.word GLOB ?
, ('test',))

takes several hundred ms.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Huge performance drop when using prepared statement

2007-12-09 Thread Ed Pasma

Hello, think I got it, but it is disappointingly simple, see below. Ed.

Markus Gritsch wrote:


Even more strange:

c.execute(SELECT * FROM entry, word, word_entry WHERE
 entry.id = word_entry.entry_id AND
 word.id = word_entry.word_id AND
 word.word GLOB ?
, ('tes*',))

takes less than 1ms but

c.execute(SELECT * FROM entry, word, word_entry WHERE
 entry.id = word_entry.entry_id AND
 word.id = word_entry.word_id AND
 word.word GLOB ?
, ('test',))

takes several hundred ms.


The execute in Python includes prepare (or get from cache), bind and  
the first step.

The answer must be that the wait time lies in the first step.
The engine is doing a full scan and it all depends how far in the  
table it needs to go to find the first match.

So the bind values with * just come across a match sooner.

Wilhelm Braun wrote:


I just tried for fun:

start = time.time()
SQLString=('''SELECT * FROM entry, word, word_entry WHERE
entry.id = word_entry.entry_id AND
word.id = word_entry.word_id AND
word.word GLOB '%s'
''' % hui*)
c.execute(SQLString)

and it is as fast as your first one - seems a pysqlite problem to me

I know they say this is not a secure way to do it -- well.


This seems the only solution after all.
But it floods the wonderful pysqlite statement cache, with new SQL  
statements for each new bind value.
Preferably, only the operator is substituted in the SQL, for GLOB  
or just =, depending on the actual bind value.

That leaves just two different statements.
But I don't know if the result is the same as I don't know GLOB very  
well.
If it is affected by the case_sensitive_like pragma my idea is too  
simple.



-
To unsubscribe, send email to [EMAIL PROTECTED]
-