[sqlite] Performance problem LEFT OUTER JOIN and string data from right table

2011-09-17 Thread Mira Suk

Hey everyone.
 
new to SQLite so please have patience with me
having two tables and doing left outer join
 
A
IDI INTEGER PRIMARY KEY
Parent INTEGER INDEX
Status INTEGER

 
B
IDR UNIQUE INTEGER FOREIGN KEY IndexME.IDI
Points TEXT (at average ~120 character string)
 (this table is primarily used for large data, so it contains some blobs 
etc...)
 
SELECT A.IDI, B.Text
FROM A
LEFT OUTER JOIN B ON B.IDR = A.IDI
WHERE Parent = ?
 
query written here is a lot simplified (for example Points column is filtered 
using custom function) however main culprit seems to be LEFT OUTER JOIN as 
accessing that same column in query which only has B table in it is lightning 
fast.
result of query is just around 40 rows, (due to Parent filter) and it takes ~2 
seconds on I7 3.4Ghz with database stored on SSD HDD.
 
am I doing something terribly wrong ? suggestions ?
 
oh and I tried running ANALYZE on database connection without any visible 
improvement.
 
thanks in advance,
M.
 
 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance problem LEFT OUTER JOIN and string data fromright table

2011-09-17 Thread Igor Tandetnik
Mira Suk mira@centrum.cz wrote:
 query written here is a lot simplified (for example Points column is 
 filtered using custom function) however main culprit seems
 to be LEFT OUTER JOIN as accessing that same column in query which only has B 
 table in it is lightning fast.

 result of query is
 just around 40 rows, (due to Parent filter)

You may want an index on A(Parent) then.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance problem LEFT OUTER JOIN and string datafromright table

2011-09-17 Thread Mira Suk

 
 Mira Suk  wrote:
 query written here is a lot simplified (for example Points column is 
 filtered using custom function) however main culprit seems
 to be LEFT OUTER JOIN as accessing that same column in query which only has 
 B table in it is lightning fast.

 result of query is
 just around 40 rows, (due to Parent filter)

You may want an index on A(Parent) then.
-- 
Igor Tandetnik
 
A.Parent is has Index
 
query
SELECT IDI FROM A WHERE Parent = ?
is fast - aka this have nothing to with that condition. IMHO only with result 
column from B table.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance problem LEFT OUTER JOIN and stringdatafromright table

2011-09-17 Thread Igor Tandetnik
Mira Suk mira@centrum.cz wrote:
 Mira Suk  wrote:
 query written here is a lot simplified (for example Points column is 
 filtered using custom function) however main culprit
 seems to be LEFT OUTER JOIN as accessing that same column in query which 
 only has B table in it is lightning fast.
 
 result of query is
 just around 40 rows, (due to Parent filter)
 
 You may want an index on A(Parent) then.
 
 A.Parent is has Index

Ok then, show the result of prepending EXPLAIN QUERY PLAN to your statement.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance problem LEFT OUTER JOIN andstringdatafromright table

2011-09-17 Thread Mira Suk

Ok then, show the result of prepending EXPLAIN QUERY PLAN to your statement.
-- 
Igor Tandetnik

First of all thanks for bearing with me :)

functions
TZB_MATCHRECURSIVE(int,int) 
- disabled for this test - always return 1. applies filter recursively
TZB_ISCHILD(int) 
- bitmask check
TZB_MATCHDIM(int,text) 
- disabled for this test - always return 1. does weird stuff with text parameter


You probably by now realized the table forms tree of items. Parent links to IDI 
and this way tree structure is formed.
parent parameter is for testing set to 0 - which is root of tree (query should 
return top-most folders)
in this query all ItemsME.Status and ItemsME.Points are NULL. there is no entry 
in ItemsME for directories and there are no files in root


test 1.

query
SELECT [IndexME].[IDI], [IndexME].[Status], [IndexME].[Icon], [IndexME].[Text]
FROM [IndexME] LEFT OUTER JOIN [ItemsME]
ON [ItemsME].[IDR] = [IndexME].[IDI] WHERE 
[IndexME].[Parent] = ?1 AND 
(TZB_MATCHRECURSIVE([IndexME].[IDI], [IndexME].[Status]) OR
(TZB_ISCHILD([IndexME].[Status]) AND TZB_MATCHDIM([ItemsME].[Status], 
[ItemsME].[Points])))
ORDER BY [IndexME].[Order];


plan
SEARCH TABLE IndexME USING AUTOMATIC COVERING INDEX (Parent=?) (~5 rows)
SEARCH TABLE ItemsME USING AUTOMATIC COVERING INDEX (IDR=?) (~5 rows)
USE TEMP B-TREE FOR ORDER BY

runtime ~2seconds


test 2. (TZB_MATCHDIM removed)

query
SELECT [IndexME].[IDI], [IndexME].[Status], [IndexME].[Icon], [IndexME].[Text]
FROM [IndexME] LEFT OUTER JOIN [ItemsME]
ON [ItemsME].[IDR] = [IndexME].[IDI] WHERE 
[IndexME].[Parent] = ?1 AND 
(TZB_MATCHRECURSIVE([IndexME].[IDI], [IndexME].[Status]) OR
(TZB_ISCHILD([IndexME].[Status])))
ORDER BY [IndexME].[Order];

plan
SEARCH TABLE IndexME USING AUTOMATIC COVERING INDEX (Parent=?) (~5 rows)
SEARCH TABLE ItemsME USING AUTOMATIC COVERING INDEX (IDR=?) (~5 rows)
USE TEMP B-TREE FOR ORDER BY

runtime ~160miliseconds

test 3. (TZB_MATCHDIM replaced with LENGTH to prove it's not my func)
results equal to 1)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance problem LEFT OUTER JOINandstringdatafromright table

2011-09-17 Thread Igor Tandetnik
Mira Suk mira@centrum.cz wrote:
 test 1.
 
 query
 SELECT [IndexME].[IDI], [IndexME].[Status], [IndexME].[Icon], [IndexME].[Text]
 FROM [IndexME] LEFT OUTER JOIN [ItemsME]
 ON [ItemsME].[IDR] = [IndexME].[IDI] WHERE
 [IndexME].[Parent] = ?1 AND
 (TZB_MATCHRECURSIVE([IndexME].[IDI], [IndexME].[Status]) OR
 (TZB_ISCHILD([IndexME].[Status]) AND TZB_MATCHDIM([ItemsME].[Status], 
 [ItemsME].[Points])))
 ORDER BY [IndexME].[Order];
 
 test 2. (TZB_MATCHDIM removed)
 
 query
 SELECT [IndexME].[IDI], [IndexME].[Status], [IndexME].[Icon], [IndexME].[Text]
 FROM [IndexME] LEFT OUTER JOIN [ItemsME]
 ON [ItemsME].[IDR] = [IndexME].[IDI] WHERE
 [IndexME].[Parent] = ?1 AND
 (TZB_MATCHRECURSIVE([IndexME].[IDI], [IndexME].[Status]) OR
 (TZB_ISCHILD([IndexME].[Status])))
 ORDER BY [IndexME].[Order];

The difference is that #2 mentions only one field from ItemsME, namely IDR. The 
value of that field comes from the index, the table itself doesn't need to be 
read at all. It's not even clear why #2 bothers to join with ItemsME at all - 
it's a no-op.

#1 uses more fields from ItemsME, so it needs to actually look up and read 
records from that table.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance problem LEFT OUTERJOINandstringdatafromright table

2011-09-17 Thread Mira Suk



The difference is that #2 mentions only one field from ItemsME, namely IDR. The 
value of that field comes from the index, the table itself doesn't need to be 
read at all. It's not even clear why #2 bothers to join with ItemsME at all - 
it's a no-op.
#1 uses more fields from ItemsME, so it needs to actually look up and read 
records from that table.


However fact is that IDR is non-existant - for that specific query (which 
SQLite should see from index) so it should not touch table - just fill with 
NULLs.
The more important thing is - even if I add [ItemsME].[insert whatever integer 
field here] into result fields it's same speed as in case 2) - that is 
relatively fast. Just touching any text/blob fields in query make it run ~12x 
slower. And considering resultset containing few rows even if all texts were 
split over 2 pages (none is longer than 300 chars) this would equal to records 
* 2kB IO. in case of 50 records that's 100kB.
Do you really think SQLite needs 1.8 seconds for 100kB in reads on SSD drive ? 
I certainly hope not. Maybe there is reason RIGHT JOIN isn't implemented ?
 
BTW even the 2nd case is considerably slower than single table selects. the 
TZB_MATCHRECURSIVE is doing way more difficult job (selecting all childs and 
filtering them) but only in single table and is still faster (around 40ms per 
top level parent folder which usually contain ~700 folders/items).
 
Just out of curiosity is it possible to download some older versions of SQLite 
to test if this behaviour is present in it ? I can only find current version on 
download page. nOOb here.
 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance Problem

2011-02-16 Thread sasikumar . u
Hi,

I'm using Motorola MC55 device, with 2GB external memory card.

For the SQlite Db I have used the following Pragma values 

PRAGMA cache_size = 16000
PRAGMA  temp_store = 2
PRAGMA  synchronous = OFF
PRAGMA  locking_mode = EXCLUSIVE

for some performance improvement

For insertion of records I use

sqlite3_prepare

then binding the values using

sqlite3_bind_int64

and finally executing using

sqlite3_step

We use to insert a set of 10 records every time, so I use BEGIN and 
COMMIT transaction .

Despite these the insertion is slow, like it takes around 700 milliseconds 
to insert one record, is there anyother way to improve the performance.
Kindly help in this regard

Regards,
Sasikumar U
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Problem

2011-02-16 Thread Richard Hipp
On Wed, Feb 16, 2011 at 6:13 AM, sasikuma...@tcs.com wrote:

 Hi,

 I'm using Motorola MC55 device, with 2GB external memory card.

 For the SQlite Db I have used the following Pragma values

 PRAGMA cache_size = 16000
 PRAGMA  temp_store = 2
 PRAGMA  synchronous = OFF
 PRAGMA  locking_mode = EXCLUSIVE

 for some performance improvement

 For insertion of records I use

 sqlite3_prepare

 then binding the values using

 sqlite3_bind_int64

 and finally executing using

 sqlite3_step

 We use to insert a set of 10 records every time, so I use BEGIN and
 COMMIT transaction .

 Despite these the insertion is slow, like it takes around 700 milliseconds
 to insert one record, is there anyother way to improve the performance.
 Kindly help in this regard


I have no idea what a Motorola MC55 is, but either it is a very slow device
or else you are doing something wrong.  Some things to try:

(1) Increase the number of inserts within a single transaction to 1000.
Does that make it 100 times faster?  (It should.)

(2) Set PRAGMA journal_mode=WAL;

(3) Populate your database on a workstation then copy it over to the device.



 Regards,
 Sasikumar U
 =-=-=
 Notice: The information contained in this e-mail
 message and/or attachments to it may contain
 confidential or privileged information. If you are
 not the intended recipient, any dissemination, use,
 review, distribution, printing or copying of the
 information contained in this e-mail message
 and/or attachments to it are strictly prohibited. If
 you have received this communication in error,
 please notify us by reply e-mail or telephone and
 immediately and permanently delete the message
 and any attachments. Thank you


 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users




-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Problem

2011-02-16 Thread Black, Michael (IS)
 of sasikuma...@tcs.com [sasikuma...@tcs.com]
Sent: Wednesday, February 16, 2011 5:13 AM
To: sqlite-users@sqlite.org
Subject: EXT :[sqlite] Performance Problem

Hi,

I'm using Motorola MC55 device, with 2GB external memory card.

For the SQlite Db I have used the following Pragma values

PRAGMA cache_size = 16000
PRAGMA  temp_store = 2
PRAGMA  synchronous = OFF
PRAGMA  locking_mode = EXCLUSIVE

for some performance improvement

For insertion of records I use

sqlite3_prepare

then binding the values using

sqlite3_bind_int64

and finally executing using

sqlite3_step

We use to insert a set of 10 records every time, so I use BEGIN and
COMMIT transaction .

Despite these the insertion is slow, like it takes around 700 milliseconds
to insert one record, is there anyother way to improve the performance.
Kindly help in this regard

Regards,
Sasikumar U
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Problem

2011-02-16 Thread Jim Morris
 )
 {
   sprintf(SqlTxt, UPDATE t2 SET n = %d;, n+1);
   sqlite3_exec(db, SqlTxt,NULL,NULL,NULL);
 }
}
if (interval != 1)
   sqlite3_exec(db,commit,NULL,NULL,NULL);
printf(commits per second: %.1f\n,nrec/(elapsed()-t1));
sqlite3_close(db);
 }


 Michael D. Black
 Senior Scientist
 NG Information Systems
 Advanced Analytics Directorate



 
 From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
 behalf of sasikuma...@tcs.com [sasikuma...@tcs.com]
 Sent: Wednesday, February 16, 2011 5:13 AM
 To: sqlite-users@sqlite.org
 Subject: EXT :[sqlite] Performance Problem

 Hi,

 I'm using Motorola MC55 device, with 2GB external memory card.

 For the SQlite Db I have used the following Pragma values

 PRAGMA cache_size = 16000
 PRAGMA  temp_store = 2
 PRAGMA  synchronous = OFF
 PRAGMA  locking_mode = EXCLUSIVE

 for some performance improvement

 For insertion of records I use

 sqlite3_prepare

 then binding the values using

 sqlite3_bind_int64

 and finally executing using

 sqlite3_step

 We use to insert a set of 10 records every time, so I use BEGIN and
 COMMIT transaction .

 Despite these the insertion is slow, like it takes around 700 milliseconds
 to insert one record, is there anyother way to improve the performance.
 Kindly help in this regard

 Regards,
 Sasikumar U
 =-=-=
 Notice: The information contained in this e-mail
 message and/or attachments to it may contain
 confidential or privileged information. If you are
 not the intended recipient, any dissemination, use,
 review, distribution, printing or copying of the
 information contained in this e-mail message
 and/or attachments to it are strictly prohibited. If
 you have received this communication in error,
 please notify us by reply e-mail or telephone and
 immediately and permanently delete the message
 and any attachments. Thank you


 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance problem with count(*) calculation

2010-04-01 Thread Alexey Pechnikov
Hello!

$ time sqlite3 test32k.db select count(*) from role_exist
1250

real0m58.908s
user0m0.056s
sys 0m0.864s

$ sqlite3 test32k.db
SQLite version 3.6.23
sqlite .schema role_exist
CREATE TABLE role_exist (
  id INTEGER PRIMARY KEY,
  uid BLOB NOT NULL DEFAULT (randomblob(16)) UNIQUE ON CONFLICT REPLACE
);

So 58s for count of all records! The count(*) for all records may use
the counter from primary key b-tree, is't it?
==
HARDWARE:

$ grep CPU /proc/cpuinfo 
model name  : Intel(R) Core(TM)2 Quad CPUQ6700  @ 2.66GHz
model name  : Intel(R) Core(TM)2 Quad CPUQ6700  @ 2.66GHz
model name  : Intel(R) Core(TM)2 Quad CPUQ6700  @ 2.66GHz
model name  : Intel(R) Core(TM)2 Quad CPUQ6700  @ 2.66GHz

$ free
 total   used   free sharedbuffers cached
Mem:   83108927552880 758012  0  294966667708
-/+ buffers/cache: 8556767455216
Swap:  3903784 3012403602544

Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance problem with count(*) calculation

2010-04-01 Thread Adam DeVita
How does
$ time sqlite3 test32k.db select count(1) from role_exist

perform?

On Thu, Apr 1, 2010 at 5:52 AM, Alexey Pechnikov pechni...@mobigroup.ruwrote:

 Hello!

 $ time sqlite3 test32k.db select count(*) from role_exist
 1250

 real0m58.908s
 user0m0.056s
 sys 0m0.864s

 $ sqlite3 test32k.db
 SQLite version 3.6.23
 sqlite .schema role_exist
 CREATE TABLE role_exist (
  id INTEGER PRIMARY KEY,
  uid BLOB NOT NULL DEFAULT (randomblob(16)) UNIQUE ON CONFLICT REPLACE
 );

 So 58s for count of all records! The count(*) for all records may use
 the counter from primary key b-tree, is't it?
 ==
 HARDWARE:

 $ grep CPU /proc/cpuinfo
 model name  : Intel(R) Core(TM)2 Quad CPUQ6700  @ 2.66GHz
 model name  : Intel(R) Core(TM)2 Quad CPUQ6700  @ 2.66GHz
 model name  : Intel(R) Core(TM)2 Quad CPUQ6700  @ 2.66GHz
 model name  : Intel(R) Core(TM)2 Quad CPUQ6700  @ 2.66GHz

 $ free
 total   used   free sharedbuffers cached
 Mem:   83108927552880 758012  0  294966667708
 -/+ buffers/cache: 8556767455216
 Swap:  3903784 3012403602544

 Best regards, Alexey Pechnikov.
 http://pechnikov.tel/
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users




-- 
VerifEye Technologies Inc.
905-948-0015x245
7100 Warden Ave, Unit 3
Markham ON, L3R 8B5
Canada
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance problem with count(*) calculation

2010-04-01 Thread Pavel Ivanov
 So 58s for count of all records! The count(*) for all records may use
 the counter from primary key b-tree, is't it?

What does this mean? I believe there's no any kind of counters in
b-tree. If you meant counter from auto-increment key then how about
gaps in the middle?


Pavel

On Thu, Apr 1, 2010 at 5:52 AM, Alexey Pechnikov pechni...@mobigroup.ru wrote:
 Hello!

 $ time sqlite3 test32k.db select count(*) from role_exist
 1250

 real    0m58.908s
 user    0m0.056s
 sys     0m0.864s

 $ sqlite3 test32k.db
 SQLite version 3.6.23
 sqlite .schema role_exist
 CREATE TABLE role_exist (
  id INTEGER PRIMARY KEY,
  uid BLOB NOT NULL DEFAULT (randomblob(16)) UNIQUE ON CONFLICT REPLACE
 );

 So 58s for count of all records! The count(*) for all records may use
 the counter from primary key b-tree, is't it?
 ==
 HARDWARE:

 $ grep CPU /proc/cpuinfo
 model name      : Intel(R) Core(TM)2 Quad CPU    Q6700  @ 2.66GHz
 model name      : Intel(R) Core(TM)2 Quad CPU    Q6700  @ 2.66GHz
 model name      : Intel(R) Core(TM)2 Quad CPU    Q6700  @ 2.66GHz
 model name      : Intel(R) Core(TM)2 Quad CPU    Q6700  @ 2.66GHz

 $ free
             total       used       free     shared    buffers     cached
 Mem:       8310892    7552880     758012          0      29496    6667708
 -/+ buffers/cache:     855676    7455216
 Swap:      3903784     301240    3602544

 Best regards, Alexey Pechnikov.
 http://pechnikov.tel/
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance problem with count(*) calculation

2010-04-01 Thread Jay A. Kreibich
On Thu, Apr 01, 2010 at 10:44:51AM -0400, Pavel Ivanov scratched on the wall:
  So 58s for count of all records! The count(*) for all records may use
  the counter from primary key b-tree, is't it?
 
 What does this mean? I believe there's no any kind of counters in
 b-tree. If you meant counter from auto-increment key then how about
 gaps in the middle?

  If a statement is in the form SELECT count(*) FROM table there is
  an optimization that short-cuts the standard aggregate functions and
  gets the leaf count directly from a BTree.

  However, I believe you are correct that the BTree does not keep a
  global leaf count in the header, so the whole tree must be loaded and
  walked.  The optimization will favor a low-column index BTree over the
  table root BTree, since this will typically require less I/O.  Time is
  also saved in short-cutting the column loading and aggregation functions
  (countStep() and countFinalize()).  But in the end, it looks like
  most of the tree is still loaded.

  See the function isSimpleCount() in the source for more specifics.

   -j

-- 
Jay A. Kreibich  J A Y  @  K R E I B I.C H 

Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor.   I'll go home and see if I can scrounge up a ruler
 and a piece of string.  --from Anathem by Neal Stephenson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance problem with count(*) calculation

2010-04-01 Thread Alexey Pechnikov
Hello!

On Thursday 01 April 2010 18:04:10 Adam DeVita wrote:
 How does
 $ time sqlite3 test32k.db select count(1) from role_exist
 perform?

Equal to count(*).

Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance problem for a simple select with range

2007-10-31 Thread Dani Va

First, thanks, your suggestion worked. 

To my surprise, it was enough to add limit 1 to the original query.

So:

select * from blocks,locations where locations.locid = blocks.locid AND ? =
blocks.startIpNum AND ? = blocks.endIpNum limit 1
takes about 1.398-005 seconds 

and 

select * from blocks,locations where locations.locid = blocks.locid AND ? =
blocks.startIpNum AND ? = blocks.endIpNum 
takes about 3 seconds.





Igor Tandetnik wrote:
 
 Dani Valevski [EMAIL PROTECTED] wrote:
 I think I have a performance problem for a simple select with range.

 My Tables:
 CREATE TABLE locations(
locidINTEGER PRIMARY KEY,
country TEXT,
regionTEXT,
cityTEXT,
postalCode TEXT,
latitude REAL,
longitude REAL,
dmaCode INTEGER,
areaCode INTEGER)

 CREATE TABLE blocks(
startIpNum INTEGER,
endIpNum INTEGER,
locId INTEGER)

 My Data:
 http://www.maxmind.com/app/geolitecity
 Blocks table has 2,776,436 rows
 Locations table has 159,488 rows

 After inserting the data I run analyze.

 My Query:
 select * from blocks,locations where locations.locid = blocks.locid
 AND ? = blocks.startIpNum AND ? = blocks.endIpNum
 (replace ? with a number)

 Performance issues:
 I use python's sqlite3 module to run the query.
 With this configuration it takes about 0.6 seconds to complete the
 query. I
 think this is too slow. I could write a binary tree myself and have
 searches
 like this take, O(log(num_rows)) which is
 7*something_which_shouldnt_take_too_much. Am I wrong?
 
 And what would you use as a key for this binary tree? I bet you would 
 utilize additional information that the DB engine doesn't have - that 
 your blocks don't overlap (they don't, right?) Try coming up with a 
 search strategy without making this assumption.
 
 Try this: create an index on startIpNum, and run a query like this:
 
 select * from blocks, locations
 where blocks.startIpNum = ? and blocks.locid = locations.locid
 order by blocks.startIpNum desc limit 1;
 
 This gives you the record with the largest value of startIpNum that is 
 still smaller than the threshold, and should be very fast. It can 
 produce a false positive - make the additional check for (? = 
 startIpEnd) in your application code. Don't put this check into the 
 query though, or you will force it back into O(N) behavior in case your 
 target value doesn't fall within any block after all.
 
 Igor Tandetnik 
 
 
 -
 To unsubscribe, send email to [EMAIL PROTECTED]
 -
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Performance-problem-for-a-simple-select-with-range-tf4711654.html#a13509241
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] Performance problem for a simple select with range

2007-10-31 Thread Doug
I'm not an SQL guru by any means, so seeing this made a light go on.  Does
that mean it is a good idea in the general case to always add limit 1 to a
select that you know should only return 1 row?  I'm assuming this works
because the engine can short-cut out as soon as it finds that first matching
row.

 -Original Message-
 From: Dani Va [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, October 31, 2007 8:30 AM
 To: sqlite-users@sqlite.org
 Subject: Re: [sqlite] Performance problem for a simple select with range
 
 
 First, thanks, your suggestion worked.
 
 To my surprise, it was enough to add limit 1 to the original query.
 
 So:
 
 select * from blocks,locations where locations.locid = blocks.locid AND ?
=
 blocks.startIpNum AND ? = blocks.endIpNum limit 1
 takes about 1.398-005 seconds
 
 and
 
 select * from blocks,locations where locations.locid = blocks.locid AND ?
=
 blocks.startIpNum AND ? = blocks.endIpNum
 takes about 3 seconds.
 
 
 
 
 
 Igor Tandetnik wrote:
 
  Dani Valevski [EMAIL PROTECTED] wrote:
  I think I have a performance problem for a simple select with range.
 
  My Tables:
  CREATE TABLE locations(
 locidINTEGER PRIMARY KEY,
 country TEXT,
 regionTEXT,
 cityTEXT,
 postalCode TEXT,
 latitude REAL,
 longitude REAL,
 dmaCode INTEGER,
 areaCode INTEGER)
 
  CREATE TABLE blocks(
 startIpNum INTEGER,
 endIpNum INTEGER,
 locId INTEGER)
 
  My Data:
  http://www.maxmind.com/app/geolitecity
  Blocks table has 2,776,436 rows
  Locations table has 159,488 rows
 
  After inserting the data I run analyze.
 
  My Query:
  select * from blocks,locations where locations.locid = blocks.locid
  AND ? = blocks.startIpNum AND ? = blocks.endIpNum
  (replace ? with a number)
 
  Performance issues:
  I use python's sqlite3 module to run the query.
  With this configuration it takes about 0.6 seconds to complete the
  query. I
  think this is too slow. I could write a binary tree myself and have
  searches
  like this take, O(log(num_rows)) which is
  7*something_which_shouldnt_take_too_much. Am I wrong?
 
  And what would you use as a key for this binary tree? I bet you would
  utilize additional information that the DB engine doesn't have - that
  your blocks don't overlap (they don't, right?) Try coming up with a
  search strategy without making this assumption.
 
  Try this: create an index on startIpNum, and run a query like this:
 
  select * from blocks, locations
  where blocks.startIpNum = ? and blocks.locid = locations.locid
  order by blocks.startIpNum desc limit 1;
 
  This gives you the record with the largest value of startIpNum that is
  still smaller than the threshold, and should be very fast. It can
  produce a false positive - make the additional check for (? =
  startIpEnd) in your application code. Don't put this check into the
  query though, or you will force it back into O(N) behavior in case your
  target value doesn't fall within any block after all.
 
  Igor Tandetnik
 
 
 

-
  To unsubscribe, send email to [EMAIL PROTECTED]
 

-
 
 
 
 
 --
 View this message in context:
http://www.nabble.com/Performance-problem-for-a-
 simple-select-with-range-tf4711654.html#a13509241
 Sent from the SQLite mailing list archive at Nabble.com.
 
 


-
 To unsubscribe, send email to [EMAIL PROTECTED]


-


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Performance problem for a simple select with range

2007-10-31 Thread Dani Va


Igor Tandetnik wrote:
 
 Try searching for a value that doesn't fall into any block - you'll 
 likely find that the query takes a noticeable time to produce zero 
 records. Pick a large value that's greater than all startIpNum's.
 

Yes, you are right. That's why I'm going with the original query you
suggested. 

Thanks again
Dani



-- 
View this message in context: 
http://www.nabble.com/Performance-problem-for-a-simple-select-with-range-tf4711654.html#a13517991
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Performance problem for a simple select with range

2007-10-29 Thread Dani Valevski
I think I have a performance problem for a simple select with range.

My Tables:
CREATE TABLE locations(
locidINTEGER PRIMARY KEY,
country TEXT,
regionTEXT,
cityTEXT,
postalCode TEXT,
latitude REAL,
longitude REAL,
dmaCode INTEGER,
areaCode INTEGER)

CREATE TABLE blocks(
startIpNum INTEGER,
endIpNum INTEGER,
locId INTEGER)

My Data:
http://www.maxmind.com/app/geolitecity
Blocks table has 2,776,436 rows
Locations table has 159,488 rows

After inserting the data I run analyze.

My Query:
select * from blocks,locations where locations.locid = blocks.locid AND ? =
blocks.startIpNum AND ? = blocks.endIpNum
(replace ? with a number)

Disclaimer:
I'm a bit new to databases.

Performance issues:
I use python's sqlite3 module to run the query.
With this configuration it takes about 0.6 seconds to complete the query. I
think this is too slow. I could write a binary tree myself and have searches
like this take, O(log(num_rows)) which is
7*something_which_shouldnt_take_too_much. Am I wrong? (see the disclaimer)

Anyway, I thought the problem was that startIpNum, endIpNum are not indexed.
So I added indices for them (even tried indexing them both). This only makes
the query take about 3 seconds.
Ideas anyone?

Source:
is attached.

Thank you for your help





-- 
Dani
http://daniva.googlepages.com


-- 
Dani
http://daniva.googlepages.com
'''geolite
GeoLite City is a free IP to city database provided by MaxMind. 
They provide a C API (and a python wrapper) for the database.
If you can't compile the C sources on your server (or get a binary 
version), this script might be helpful for you. 
The script puts the geoip data in a sqllite database, and provides
interfaces for updating and searching the database.

To use this script, get the database in CSV format:
http://www.maxmind.com/app/geolitecity

You also need to have python 2.5 for this script (sqlite3 is used)
'''

import sqlite3 as sqlite
import os

def dottedQuadToNum(ip):
convert decimal dotted quad string to long integer

hexn = ''.join([%02X % long(i) for i in ip.split('.')])
return long(hexn, 16)


def cursorToDict(cursor):
val = cursor.next()
return dict([(cursor.description[i][0],val[i]) for i in xrange(len(cursor.description))])

def test():
import sqlite3
from time import clock
x = sqlite3.connect('geolite.db')
y = x.cursor()
ip = dottedQuadToNum(84.108.189.94)
res = y.execute('select * from blocks,locations where locations.locid = blocks.locid AND ? = blocks.startIpNum AND ? = blocks.endIpNum', [ip,ip])
begin = clock()
f = res.next() 
end = clock()
y.close()
x.close()
return end-begin, f

def test2():
from time import clock
x = GeoLiteDB()
x.connect();
begin = clock()
x.ipLocation(84.108.189.94);
end = clock()
x.close()
return end - begin


def createDB(dbPath = 'geolite.db', locationsPath='GeoLiteCity-Location.csv', blocksPath='GeoLiteCity-Blocks.csv', warnOnDelete = True):
if os.path.exists(dbPath):
if warnOnDelete:
	print file %s will be deleted. Press any key to continue, or 'n' to abort... % (os.path.abspath(dbPath))
	if getch() == 'n':
	print 'aborted.'
	return None
	os.remove(os.path.abspath(dbPath))	
conn = sqlite.connect(dbPath)
cursor = conn.cursor()
try:
cursor.execute('''CREATE TABLE locations(
locid	INTEGER PRIMARY KEY,
country TEXT,
region	TEXT,
city	TEXT,
postalCode TEXT,
latitude REAL,
longitude REAL,
dmaCode INTEGER,
areaCode INTEGER)''')

	cursor.execute('''CREATE TABLE blocks(
startIpNum INTEGER,
endIpNum INTEGER,
locId INTEGER)''')

	locations = file(locationsPath,'r')
	print ('parsing locations. This will a while.')
	print locations.readline().strip() #should print copyright note
print locations.readline().strip() #should print column names
lines = ([x.strip('') for x in line.strip().split(',')] for line in locations.xreadlines())
cursor.executemany('insert into locations values (?,?,?,?,?,?,?,?,?)', lines)
	locations.close()

	blocks = file(blocksPath,'r')
	print ('parsing blocks. This will take longer.')
	print blocks.readline().strip() #should print copyright note
print blocks.readline().strip() #should print column names
lines = ([x.strip('') for x in line.strip().split(',')] for line in blocks.xreadlines())
	cursor.executemany('insert into blocks values (?,?,?)', lines)	
	blocks.close()

#cursor.execute('''CREATE UNIQUE INDEX startIpNumIx ON blocks(startIpNum);''')
#	cursor.execute('''CREATE UNIQUE INDEX endIpNumIx ON blocks(endIpNum);''')

conn.commit()

	print 'analyze'
	cursor.execute('''ANALYZE;''')

numBlocks = cursor.execute('select count(*) from 

Re: [sqlite] Performance problem for a simple select with range

2007-10-29 Thread Kees Nuyt
[Default] On Mon, 29 Oct 2007 15:25:18 +0200, Dani Valevski
[EMAIL PROTECTED] wrote:

I think I have a performance problem for a simple select with range.

My Tables:
CREATE TABLE locations(
locidINTEGER PRIMARY KEY,
country TEXT,
regionTEXT,
cityTEXT,
postalCode TEXT,
latitude REAL,
longitude REAL,
dmaCode INTEGER,
areaCode INTEGER)

CREATE TABLE blocks(
startIpNum INTEGER,
endIpNum INTEGER,
locId INTEGER)

My Data:
http://www.maxmind.com/app/geolitecity
Blocks table has 2,776,436 rows
Locations table has 159,488 rows

After inserting the data I run analyze.

My Query:
select * from blocks,locations where locations.locid = blocks.locid AND ? =
blocks.startIpNum AND ? = blocks.endIpNum
(replace ? with a number)

Disclaimer:
I'm a bit new to databases.

Performance issues:
I use python's sqlite3 module to run the query.
With this configuration it takes about 0.6 seconds to complete the query. I
think this is too slow. I could write a binary tree myself and have searches
like this take, O(log(num_rows)) which is
7*something_which_shouldnt_take_too_much. Am I wrong? (see the disclaimer)

Anyway, I thought the problem was that startIpNum, endIpNum are not indexed.
So I added indices for them (even tried indexing them both). This only makes
the query take about 3 seconds.
Ideas anyone?

Source:
is attached.

Thank you for your help

Just some suggestions:
Index locid in both tables, and rewrite

 select *
   from blocks,locations 
  where locations.locid = blocks.locid
  AND ? = blocks.startIpNum 
  AND ? = blocks.endIpNum

to:

 select *
   from blocks
  INNER JOIN locations USING (locid)
  where ? = blocks.startIpNum 
AND ? = blocks.endIpNum

or:

 select *
   from locations
  INNER JOIN blocks USING (locid)
  where ? = blocks.startIpNum 
AND ? = blocks.endIpNum

HTH
-- 
  (  Kees Nuyt
  )
c[_]

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Performance problem for a simple select with range

2007-10-29 Thread drh
Dani Valevski [EMAIL PROTECTED] wrote:
 I think I have a performance problem for a simple select with range.
 
 My Tables:
 CREATE TABLE locations(locidINTEGER PRIMARY KEY, ...);
 
 CREATE TABLE blocks(
 startIpNum INTEGER,
 endIpNum INTEGER,
 locId INTEGER)
 
 My Data:
 Blocks table has 2,776,436 rows
 Locations table has 159,488 rows
 
 My Query:
 select * from blocks,locations where locations.locid = blocks.locid AND ? =
 blocks.startIpNum AND ? = blocks.endIpNum
 (replace ? with a number)
 

To do searches of this kind with maximum efficiency, you normally
want to use a 1-dimensional R-Tree index.  SQLite does not support
RTree indices natively, though it is conceivable that you could
write a RTree virtual table extension for SQLite.

Without an RTree index, and unless you can exploit the distribution
of data in the blocks table, you really cannot do much better than a
full table scan on blocks with an indexed lookup of locations for
each matching block.  That is probably what is happening on your
original query before you added indices.

--
D. Richard Hipp [EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Performance problem

2007-03-01 Thread Stephen Toney
Dear experts:

I'm having a performance problem I can't understand. I am running a
select count(*) query joining a table on itself, and the query runs
for five minutes using Sqlite3.exe before I get bored and kill it. This
is on a dual-core box with 4GB of memory, running Windows XP Pro. The
Sqlite version is 3.3.7.

Here's the problem query with the plan:

select count(*) from keyword a, keyword b where a.key=b.key and
a.value='music' and b.value='history';

0|0|TABLE keyword AS a WITH INDEX value
1|1|TABLE keyword AS b WITH INDEX value

Here's the schema

CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int,
value, nextword, sec, ipr, fldseq int);
CREATE INDEX key on keyword(key);
CREATE INDEX nextword on keyword(nextword);
CREATE INDEX value on keyword(value);

The table has 3,486,410 records and the SQLite database totals 320MB.
There are a few small tables in the db besides the KEYWORD table.

4,318 records have value='music' and 27,058 have value='history'. The
keys are 12-byte strings. That doesn't seem like an extreme case to me. 

Using DBI::ODBC::SQLite in a web application the result is just as bad
-- the server times out.

Any suggestions would be much appreciated!


Stephen Toney
Systems Planning
[EMAIL PROTECTED]
http://www.systemsplanning.com


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Performance problem

2007-03-01 Thread drh
[EMAIL PROTECTED] wrote:
 Stephen Toney [EMAIL PROTECTED] wrote:
  
  Here's the problem query with the plan:
  
  select count(*) from keyword a, keyword b where a.key=b.key and
  a.value='music' and b.value='history';
  
 
 A faster approach would be:
 
SELECT (SELECT count(*) FROM keyword WHERE value='music')*
   (SELECT count(*) FROM keyword WHERE value='history');
 

Never mind.  I overlooked the a.key=b.key term in your original
query
--
D. Richard Hipp  [EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] Performance problem

2007-03-01 Thread Stephen Toney
Thanks, Igor, Richard, and Tom,

Why doesn't SQLite use the index on key? I can see from the plan that it
doesn't, but why not? Can only one index be used per query?

This seems strange. I have used SQL Server and Visual Foxpro for this
same problem, and they both handle this query in a second if the indexes
are there.

Is there a good place to read more about this SQLite behavior? I'm
fairly familiar with the online documentation and don't recall reading
this.

Thanks a million!
Stephen


On Thu, 2007-03-01 at 07:54 -0500, Tom Briggs wrote:
You will likely be well served by a compound index on (value,key).
 As the schema stands now, the indexes will help find records with
 matching values, but not with matching keys; providing one index that
 correlates the two should help.
 
Disclaimer: I haven't recreated your schema, added said index and
 checked that the query plan produced is better.  Don't assume this to be
 good advice without trying it. :)
 
-Tom
 
  -Original Message-
  From: Stephen Toney [mailto:[EMAIL PROTECTED] 
  Sent: Thursday, March 01, 2007 7:00 AM
  To: sqlite-users@sqlite.org
  Subject: [sqlite] Performance problem
  
  Dear experts:
  
  I'm having a performance problem I can't understand. I am running a
  select count(*) query joining a table on itself, and the query runs
  for five minutes using Sqlite3.exe before I get bored and 
  kill it. This
  is on a dual-core box with 4GB of memory, running Windows XP Pro. The
  Sqlite version is 3.3.7.
  
  Here's the problem query with the plan:
  
  select count(*) from keyword a, keyword b where a.key=b.key and
  a.value='music' and b.value='history';
  
  0|0|TABLE keyword AS a WITH INDEX value
  1|1|TABLE keyword AS b WITH INDEX value
  
  Here's the schema
  
  CREATE TABLE keyword (key, contyp int, imagecount int, searchcat int,
  value, nextword, sec, ipr, fldseq int);
  CREATE INDEX key on keyword(key);
  CREATE INDEX nextword on keyword(nextword);
  CREATE INDEX value on keyword(value);
  
  The table has 3,486,410 records and the SQLite database totals 320MB.
  There are a few small tables in the db besides the KEYWORD table.
  
  4,318 records have value='music' and 27,058 have value='history'. The
  keys are 12-byte strings. That doesn't seem like an extreme 
  case to me. 
  
  Using DBI::ODBC::SQLite in a web application the result is just as bad
  -- the server times out.
  
  Any suggestions would be much appreciated!
  
  
  Stephen Toney
  Systems Planning
  [EMAIL PROTECTED]
  http://www.systemsplanning.com
  
  
  --
  ---
  To unsubscribe, send email to [EMAIL PROTECTED]
  --
  ---
  
  
-- 

Stephen Toney
Systems Planning
[EMAIL PROTECTED]
http://www.systemsplanning.com


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Performance problem

2007-03-01 Thread Stephen Toney
On Thu, 2007-03-01 at 12:46 +, [EMAIL PROTECTED] wrote:

 Or maybe better yet:  Have you looked into using FTS2 for whatever
 it is you are trying to do?  Full-text search is hard to get right
 and you appear to be trying to create your own.  Why not use a FTS
 subsystem that is already written and testing and available to you?
 
 --
 D. Richard Hipp  [EMAIL PROTECTED]


Several reasons:
1. App is 10 years old and working well with other DBMSs, so why mess
with it? This problem only occurred since using SQLite as the DBMS.

2. Queries must work with other DBMSs with minimal tinkering (SQL
Server, Oracle, Foxpro, etc.) -- using ODBC.

3. Our indexing is tuned to museums, libraries, and other cultural
organizations. For example, certain characters are converted before
indexing (such as OE diphthong to the two letters OE). We also index
words with hyphens and apostrophes both with and without the punctuation
so the searcher can enter them various ways.

4. We do not preserve case in the index, so it can ignore incorrect
capitalization in the search terms. Maybe FTS does this too?

5. For historical reasons, we use NCRs like #233; instead of UTF-8. Our
programs remove these before indexing.

I am considering FTS for another project though. I appreciate the
suggestion!

Stephen
-- 

Stephen Toney
Systems Planning
[EMAIL PROTECTED]
http://www.systemsplanning.com


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] Performance problem

2007-03-01 Thread Griggs, Donald
 

Regarding:

   Can only one index be used per query?


Yes, I believe that *is* the defined behaviour of sqlite (though it does
support compound indicies).  Larger DBMS often have very involved code
to determine query plans.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Performance problem

2007-03-01 Thread drh
Stephen Toney [EMAIL PROTECTED] wrote:
 Thanks, Igor, Richard, and Tom,
 
 Why doesn't SQLite use the index on key? I can see from the plan that it
 doesn't, but why not? Can only one index be used per query?
 
 This seems strange. I have used SQL Server and Visual Foxpro for this
 same problem, and they both handle this query in a second if the indexes
 are there.

SQLite is limited to a single index per table of the FROM clause.
(In your case the same table occurs twice in the FROM clause, so
each instance can use a separate indices, but each instance can
only use a single index.)  Other systems relax this restriction
through the use of bitmap indices.  SQLite does not (directly) 
support bitmap indices.  You can achieve about the same thing
as a bitmap index by playing games with rowids, but the SQL
needed to do so is convoluted.  In your case, I think the query
would need to be:

 SELECT count(*)
   FROM keyword AS a CROSS JOIN keyword AS b
  WHERE a.value='music'
AND b.rowid IN (
 SELECT rowid FROM keyword WHERE value='history'
 INTERSECT
 SELECT rowid FROM keyword WHERE key=a.key
);

It seems so much simpler to use a multi-column index.  It is almost
certainly going to be faster.

 
 Is there a good place to read more about this SQLite behavior? I'm
 fairly familiar with the online documentation and don't recall reading
 this.
 

You might get a few hints at http://www.sqlite.org/php2004/page-001.html
and the pages that follow.  That is from a talk I gave in 2004.  It
is somewhat out of date.  My goal for this calendar year is to get
some detailed documentation online about the kinds of issues you
are seeing.
--
D. Richard Hipp  [EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Performance problem

2007-03-01 Thread drh
Stephen Toney [EMAIL PROTECTED] wrote:
 
 4. We do not preserve case in the index, so it can ignore incorrect
 capitalization in the search terms. Maybe FTS does this too?

That's a function of your stemmer.  The default stemmers in FTS2
both ignore capitalization.

 
 5. For historical reasons, we use NCRs like #233; instead of UTF-8. Our
 programs remove these before indexing.
 

You can do this in your stemmer.
--
D. Richard Hipp  [EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Performance problem

2007-03-01 Thread Stephen Toney
Richard,

Thanks for the additional info. I'll look into the multi-column index
idea. Sounds as if it might be the solution.

Stephen



On Thu, 2007-03-01 at 14:42 +, [EMAIL PROTECTED] wrote:
 Stephen Toney [EMAIL PROTECTED] wrote:
  Thanks, Igor, Richard, and Tom,
  
  Why doesn't SQLite use the index on key? I can see from the plan that it
  doesn't, but why not? Can only one index be used per query?
  
  This seems strange. I have used SQL Server and Visual Foxpro for this
  same problem, and they both handle this query in a second if the indexes
  are there.
 
 SQLite is limited to a single index per table of the FROM clause.
 (In your case the same table occurs twice in the FROM clause, so
 each instance can use a separate indices, but each instance can
 only use a single index.)  Other systems relax this restriction
 through the use of bitmap indices.  SQLite does not (directly) 
 support bitmap indices.  You can achieve about the same thing
 as a bitmap index by playing games with rowids, but the SQL
 needed to do so is convoluted.  In your case, I think the query
 would need to be:
 
  SELECT count(*)
FROM keyword AS a CROSS JOIN keyword AS b
   WHERE a.value='music'
 AND b.rowid IN (
  SELECT rowid FROM keyword WHERE value='history'
  INTERSECT
  SELECT rowid FROM keyword WHERE key=a.key
 );
 
 It seems so much simpler to use a multi-column index.  It is almost
 certainly going to be faster.
 
  
  Is there a good place to read more about this SQLite behavior? I'm
  fairly familiar with the online documentation and don't recall reading
  this.
  
 
 You might get a few hints at http://www.sqlite.org/php2004/page-001.html
 and the pages that follow.  That is from a talk I gave in 2004.  It
 is somewhat out of date.  My goal for this calendar year is to get
 some detailed documentation online about the kinds of issues you
 are seeing.
 --
 D. Richard Hipp  [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, send email to [EMAIL PROTECTED]
 -
-- 

Stephen Toney
Systems Planning
[EMAIL PROTECTED]
http://www.systemsplanning.com


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Performance problem with simple queries

2007-02-12 Thread Makavy, Erez \(Erez\)
Problem summery:
---
Simple queries sometimes take ~400 ms
 
Analysis:
---
- A php script runs the same SQL query several times in different places
(in different transactions).
Some of the queries take around 400ms while the others (identical
queries) takes only 4-20ms.
(The timings are consistent for the same php script.)
 
- Testing showed that when adding multiple identical queries after a
problematic query run very fast, I.e.: ...
 
- time measurement is done with explode(' ', microtime());
 
Test script:
---
A Test script shows that the first query take long time, but the others
are quick.
It happens also not in the first query, but we don't have a simple test
script that reproduces that yet.
 
Script output:
pDB is sqlite:/usr/local/web/443/lbaProvDBponeOpenTest:
br 0.66058707 SELECT * FROM MediaGateway
br 0.00106287 SELECT * FROM MediaGateway
br 0.00017715 SELECT * FROM MediaGateway
br 0.12085819 SELECT * FROM GeneralPlatformProperties
br 0.00016093 SELECT * FROM GeneralPlatformProperties WHERE 1
br 0.0649929 SELECT * FROM ClassOfRestriction WHERE id IS NULL
br 0.05895591 SELECT * FROM ClassOfRestriction
br 0.05869102 SELECT * From BriefHuntGroupList ORDER BY huntGroupName
COLLATE NOCASE
Second run:
br 0.04814911 SELECT * FROM MediaGateway
br 0.00015306 SELECT * FROM MediaGateway
br 0.00019789 SELECT * FROM MediaGateway
br 0.00516605 SELECT * FROM GeneralPlatformProperties
br 0.000175 SELECT * FROM GeneralPlatformProperties WHERE 1
br 0.02837396 SELECT * FROM ClassOfRestriction WHERE id IS NULL
br 0.00037098 SELECT * FROM ClassOfRestriction
br 0.05728984 SELECT * From BriefHuntGroupList ORDER BY huntGroupName
COLLATE NOCASE

Questrions:
---
 
What can explain the long procces time for the first query?
What can explain log proccessing time for queries whic are not the first
?
 
Thanks,
Erez.
 


Re: [sqlite] Performance problem with simple queries

2007-02-12 Thread John Stanton

You are almost certainly encountering disk cacheing effects.

Makavy, Erez (Erez) wrote:

Problem summery:
---
Simple queries sometimes take ~400 ms
 
Analysis:

---
- A php script runs the same SQL query several times in different places
(in different transactions).
Some of the queries take around 400ms while the others (identical
queries) takes only 4-20ms.
(The timings are consistent for the same php script.)
 
- Testing showed that when adding multiple identical queries after a

problematic query run very fast, I.e.: ...
 
- time measurement is done with explode(' ', microtime());
 
Test script:

---
A Test script shows that the first query take long time, but the others
are quick.
It happens also not in the first query, but we don't have a simple test
script that reproduces that yet.
 
Script output:

pDB is sqlite:/usr/local/web/443/lbaProvDBponeOpenTest:
br 0.66058707 SELECT * FROM MediaGateway
br 0.00106287 SELECT * FROM MediaGateway
br 0.00017715 SELECT * FROM MediaGateway
br 0.12085819 SELECT * FROM GeneralPlatformProperties
br 0.00016093 SELECT * FROM GeneralPlatformProperties WHERE 1
br 0.0649929 SELECT * FROM ClassOfRestriction WHERE id IS NULL
br 0.05895591 SELECT * FROM ClassOfRestriction
br 0.05869102 SELECT * From BriefHuntGroupList ORDER BY huntGroupName
COLLATE NOCASE
Second run:
br 0.04814911 SELECT * FROM MediaGateway
br 0.00015306 SELECT * FROM MediaGateway
br 0.00019789 SELECT * FROM MediaGateway
br 0.00516605 SELECT * FROM GeneralPlatformProperties
br 0.000175 SELECT * FROM GeneralPlatformProperties WHERE 1
br 0.02837396 SELECT * FROM ClassOfRestriction WHERE id IS NULL
br 0.00037098 SELECT * FROM ClassOfRestriction
br 0.05728984 SELECT * From BriefHuntGroupList ORDER BY huntGroupName
COLLATE NOCASE

Questrions:
---
 
What can explain the long procces time for the first query?

What can explain log proccessing time for queries whic are not the first
?
 
Thanks,

Erez.
 




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Performance problem with 3.2.7

2005-11-21 Thread Chris Schirlinger
Are you wrapping the transactions in between Begin/End Transactions?

BEGIN TRANSACTION;
INSERT INTO table (foo) VALUES (bar);
INSERT INTO table (foo) VALUES (par);
INSERT INTO table (foo) VALUES (tar);
INSERT INTO table (foo) VALUES (far);
..
INSERT INTO table (foo) VALUES (car);
INSERT INTO table (foo) VALUES (jar);
INSERT INTO table (foo) VALUES (mar);
COMMIT TRANSACTION;

Check out this document for more info 
http://www.sqlite.org/lang_transaction.html

 I'm sure I must be doing something wrong.  This is my first attempt at
 working with SQLite.




Re: [sqlite] Performance problem with 3.2.7

2005-11-21 Thread Christian Smith
On Mon, 21 Nov 2005, Shane Baker wrote:

I'm sure I must be doing something wrong.  This is my first attempt at
working with SQLite.


We'll see...



I have a simple table, with 7 columns.  There are 6 integers and a BLOB,
with the primary key being on an integer.  When I try to run inserts (one
insert per transacion - I know this is not optimal, but it represents my
application's usage), I am only getting about 7 inserts per second, on
average.

My first suspect was the BLOB and the fact that I was binding this
parameter using SQLITE_TRANSIENT (using sqlite3_bind_blob).  I removed the
BLOB from the schema altogether, leaving just 6 integers, and I still have
the same performance.


The performance problem is the synchronous IO bottleneck of doing only a
single insert per transaction.



For reference, I am getting around 10,000 queries per second when I lookup
a row based on the primary key column.

All performance measurements I've seen posted by others suggest between
200 and 300 inserts per second with one insert per transaction.


Probably not, unless this is to a FLASH device, for example. The
Bottleneck in hard disk IO is the rotational and head movement latencies
to write data to the platters. Assuming no head movement, a 7200 rpm disk
will only allow the same sector to be rewritten 1/7200 times a minute,
which is 120 times a second. Add in that many different sectors need to be
updated synchronously, and throughput drops dramatically.

A quick test indicates that I can almost double the performance on
Linux/ext3 by having data=journal option set in the mount flags. This is
because head movement is reduced significantly. A test that previously
took ~500 seconds (13785 inserts without transactions) took 280 seconds
with data=journal. For reference, the same data inserted with a single
transaction took ~1.2 seconds!



I haven't run a profiler yet but hope to do this tomorrow.  Does anyone
have any ideas as to what I might be doing wrong, or where I should look?


If you can change your model to insert more than 1 row per transaction,
you should see a significant performance increase. You'll see roughly N
times the performance for small N.

If this is not an option, look at your storage and how you can reduce
latency. FLASH devices have low latency, being solid state, and some RAID
controllers have battery backed buffers, and so may have lower latency.


Thanks in advance.


Christian

-- 
/\
\ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
 X   - AGAINST MS ATTACHMENTS
/ \


Re: [sqlite] Performance problem with 3.2.7

2005-11-21 Thread Shane Baker
No, as I mentioned in my original message, I am not wrapping them.  I
don't want to test an unrealistic scenario for my application.  In my
application, there are multiple sources that will be inserting into the
database and pooling the information for a bulk insert won't work.

I understand that I will get better performance by inserting all of my
rows inside a transaction.  What I don't understand is why, when NOT using
a transaction, I get about 7 inserts per second compared to others who are
reporting between 200 and 300.  I am working with reasonable hardware.

I just need to figure out why my performance is about 30x slower than what
others are reporting when using the library in similar ways.



On Mon, 21 Nov 2005, Chris Schirlinger wrote:

 Are you wrapping the transactions in between Begin/End Transactions?

 BEGIN TRANSACTION;
 INSERT INTO table (foo) VALUES (bar);
 INSERT INTO table (foo) VALUES (par);
 INSERT INTO table (foo) VALUES (tar);
 INSERT INTO table (foo) VALUES (far);
 ..
 INSERT INTO table (foo) VALUES (car);
 INSERT INTO table (foo) VALUES (jar);
 INSERT INTO table (foo) VALUES (mar);
 COMMIT TRANSACTION;

 Check out this document for more info
 http://www.sqlite.org/lang_transaction.html

  I'm sure I must be doing something wrong.  This is my first attempt at
  working with SQLite.



Re: [sqlite] Performance problem with 3.2.7

2005-11-21 Thread Shane Baker
Thank you very much for the feedback.  I understand your point, hardware
takes a deterministic amount of time.

I have been basing my assumptions on these sources:

http://www.sqlite.org/cvstrac/wiki?p=PerformanceConsiderations (See
Transactions and performance)
http://blog.amber.org/2004/11/28/sqlite-insertion-performance/

There was one other, but I can't find it.

For the time being, I don't think that inserts are going to happen very
frequently in my application and I can probably roll updates into
transactions.

Thanks again.


On Mon, 21 Nov 2005, Christian Smith wrote:

 On Mon, 21 Nov 2005, Shane Baker wrote:

 I'm sure I must be doing something wrong.  This is my first attempt at
 working with SQLite.


 We'll see...


 
 I have a simple table, with 7 columns.  There are 6 integers and a BLOB,
 with the primary key being on an integer.  When I try to run inserts (one
 insert per transacion - I know this is not optimal, but it represents my
 application's usage), I am only getting about 7 inserts per second, on
 average.
 
 My first suspect was the BLOB and the fact that I was binding this
 parameter using SQLITE_TRANSIENT (using sqlite3_bind_blob).  I removed the
 BLOB from the schema altogether, leaving just 6 integers, and I still have
 the same performance.


 The performance problem is the synchronous IO bottleneck of doing only a
 single insert per transaction.


 
 For reference, I am getting around 10,000 queries per second when I lookup
 a row based on the primary key column.
 
 All performance measurements I've seen posted by others suggest between
 200 and 300 inserts per second with one insert per transaction.


 Probably not, unless this is to a FLASH device, for example. The
 Bottleneck in hard disk IO is the rotational and head movement latencies
 to write data to the platters. Assuming no head movement, a 7200 rpm disk
 will only allow the same sector to be rewritten 1/7200 times a minute,
 which is 120 times a second. Add in that many different sectors need to be
 updated synchronously, and throughput drops dramatically.

 A quick test indicates that I can almost double the performance on
 Linux/ext3 by having data=journal option set in the mount flags. This is
 because head movement is reduced significantly. A test that previously
 took ~500 seconds (13785 inserts without transactions) took 280 seconds
 with data=journal. For reference, the same data inserted with a single
 transaction took ~1.2 seconds!


 
 I haven't run a profiler yet but hope to do this tomorrow.  Does anyone
 have any ideas as to what I might be doing wrong, or where I should look?


 If you can change your model to insert more than 1 row per transaction,
 you should see a significant performance increase. You'll see roughly N
 times the performance for small N.

 If this is not an option, look at your storage and how you can reduce
 latency. FLASH devices have low latency, being solid state, and some RAID
 controllers have battery backed buffers, and so may have lower latency.

 
 Thanks in advance.
 

 Christian

 --
 /\
 \ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
  X   - AGAINST MS ATTACHMENTS
 / \



Re: [sqlite] Performance problem with 3.2.7

2005-11-21 Thread Akira Higuchi
Hi,

On Mon, 21 Nov 2005 10:56:41 -0500 (EST)
Shane Baker [EMAIL PROTECTED] wrote:

 I just need to figure out why my performance is about 30x slower than what
 others are reporting when using the library in similar ways.

Are you using sqlite on windows or MacOS X?

As I tested, sqlite performs 10 write transactions per sec on
windows and MacOS X. On Linux, sqlite performs up to 300 write
transactions per sec if HDD cache is enabled. However, when HDD cache
is disabled (/sbin/hdparm -W0 /dev/hda etc.), it drops to 10
transactions per sec.

To ensure ACID compliance, sqlite seems to be using FlushFileBuffers() on
windows, F_FULLFSYNC fcntl() on MacOS X, and fsync() on other unixes.
FlushFileBuffers and F_FULLFSYNC fcntl() flush HDD write-back
cache, and they are very slow. However, many implementations of fsync()
don't flush HDD cache (they flush OScache only). It's very fast, but
dangerous (not ACID compliant) if HDD cache has no battery backup.
(So i'm using sqlite on Linux with HDD cache off.)


Re: [sqlite] Performance problem with 3.2.7

2005-11-21 Thread Shane Baker
Thank you very much.  I am happy to hear that the performance I am seeing
is in line with what others have observed.  I am running this on Windows
XP.

On Tue, 22 Nov 2005, Akira Higuchi wrote:

 Hi,

 On Mon, 21 Nov 2005 10:56:41 -0500 (EST)
 Shane Baker [EMAIL PROTECTED] wrote:

  I just need to figure out why my performance is about 30x slower than what
  others are reporting when using the library in similar ways.

 Are you using sqlite on windows or MacOS X?

 As I tested, sqlite performs 10 write transactions per sec on
 windows and MacOS X. On Linux, sqlite performs up to 300 write
 transactions per sec if HDD cache is enabled. However, when HDD cache
 is disabled (/sbin/hdparm -W0 /dev/hda etc.), it drops to 10
 transactions per sec.

 To ensure ACID compliance, sqlite seems to be using FlushFileBuffers() on
 windows, F_FULLFSYNC fcntl() on MacOS X, and fsync() on other unixes.
 FlushFileBuffers and F_FULLFSYNC fcntl() flush HDD write-back
 cache, and they are very slow. However, many implementations of fsync()
 don't flush HDD cache (they flush OScache only). It's very fast, but
 dangerous (not ACID compliant) if HDD cache has no battery backup.
 (So i'm using sqlite on Linux with HDD cache off.)



Re: [sqlite] sqlite performance problem

2005-04-13 Thread Christian Smith
On Tue, 12 Apr 2005, Maksim Yevmenkin wrote:

Dear SQLite users,

consider this

 [snip]

it only took 4+ seconds to read, parse, perform hash table lookup and
sum the data. note that for unique 1417 keys it had to do hash lookup
and hash insert.

so, just with plain ascii file i get four times the speed i get with
sqlite. note that my c program will scale linearly with the size of
dataset (just like i see with sqlite).

so,

- what am i doing wrong here?


Not managing your expectations. Try the test with another SQL database,
and see what sort of speed you get.



- any suggestions on how to speed things up with sqlite?


Buy a faster machine. No, I'm not being facetious.



- is sqlite optimized to retrieve one row from the table?


When fetching that one row using an index, yes, that's what indexes are
for. But when an index scan touches a significant proportion of the
corresponding table, then it becomes less than optimal, due the reasons
given previously in the thread:
- Cache thrashing
- Index + Table access
- CPU overhead
- Non-linear file access


i'm starting to think that having index and data in the same file is
not such a great idea when you want to perform range requests.


Unless the index and data are on different disks, you'll get no benefit by
splitting them from the same file.



thanks,
max


Christian



On Apr 12, 2005 11:33 AM, Maksim Yevmenkin [EMAIL PROTECTED] wrote:
 Gé,

 thanks for the suggestion. unfortunately it did not make any
 difference :( below is the results. as you can see it takes 7+ seconds
 to group by  333,392 records and i'm grouping by column on which i
 have index. again, i'm not a database guy, but i think that is slow.
 perhaps someone can comment if that is the best sqlite can do?


 sqlite .schema data
 CREATE TABLE data(
   a INTEGER,
   b INTEGER,
   c CHAR,
   d INTEGER,
   e INTEGER,
   n1 FLOAT,
   n2 FLOAT
 );
 CREATE INDEX data_by_a on data (a);


 sqlite select count(*) from data;
 92

 test-1.sql
 
 PRAGMA cache_size = 30;
 PRAGMA cache_size;
 PRAGMA page_size;
 SELECT count(*) FROM (SELECT a FROM data WHERE a = 18234721 GROUP BY a);

 test-2.sql
 
 PRAGMA cache_size;
 PRAGMA page_size;
 SELECT count(*) FROM (SELECT a FROM data WHERE a = 18234721 GROUP BY a);

  time sqlite3 db  test-1.sql
 30
 1024
 1417
 6.89u 0.33s 0:07.55 95.6%

  time sqlite3 db  test-2.sql
 2000
 1024
 1417
 6.91u 0.19s 0:07.39 96.0%

  time sqlite3 db2048  test-1.sql
 30
 2048
 1417
 6.80u 0.08s 0:07.32 93.9%

  time sqlite3 db2048  test-2.sql
 2000
 2048
 1417
 6.77u 0.12s 0:07.10 97.0%

  time sqlite3 db4096  test-1.sql
 30
 4096
 1417
 6.80u 0.15s 0:07.21 96.3%

  time sqlite3 db4096  test-2.sql
 2000
 4096
 1417
 6.79u 0.15s 0:07.15 97.0%

  time sqlite3 db8192  test-1.sql
 30
 8192
 1417
 6.70u 0.11s 0:07.01 97.1%

  time sqlite3 db8192  test-2.sql
 2000
 8192
 1417
 6.73u 0.09s 0:07.01 97.2%

 thanks,
 max


 On Apr 12, 2005 7:10 AM, Gé Weijers [EMAIL PROTECTED] wrote:
  Maksim,
 
  Some things you could try:
 
  1) increase cache memory
 
  You may be causing a lot of cache misses if the size of the query result
  is very large compared to the size of the cache. Index-based searches
  can cause multiple reloads of the same page because of a lack of
  locality in the cache. An index-less search will just load each page once.
 
  as an experiment, try 'PRAGMA cache_size = 30', before you run the
  query. 1 GB of ram should be able to support 300MB of cache.
 
  2) use 8192-byte pages
 
  Larger pages seem to improve performance quite a bit, in my experience.
 
  Do 'PRAGMA page_size = 8192' before you create the database.
 
  Doing both may cause excessive memory use (20 * 8K = ...). I've
  never tried that.
 
 
  Gé
 
 
  Maksim Yevmenkin wrote:
 
  Robert,
  
  
  
  [snip]
  
  
  
  i said i print these rows to /dev/null too in my perl code. plus the
  perl code does some other things such as joining these rows with other
  hashes and summing the numbers.
  
  
  That's fine.  I was merely trying to account for the 50% speed difference
  between the two differing column tests, which has been accomplished.
  
  
  
  As for the temp table ... I haven't tried this, but isn't
  
  
  temp a reserved
  
  
  word in SQLite?  More importantly, you should be doing this
  
  
  statement inside
  
  yes, it is. i really want to create 'temporary table' in memory. i was
  really hoping it would speed things up.
  
  
  I misread the statement, so ignore me on that part.  However, 339,000 
  rows
  into a temporary in-memory table ... I tried some experiments locally 
  here
  and none of them took more than 2 seconds to execute.  Are you sure 
  you're
  not using up all available memory, which is causing the system to hit the
  swapfile?  What does this same query look like when you drop the temp 
  from
  the query?
  
  
  
  the system has 1G of ram. i was monitoring sqlite3 memory usage with
  'top'. the SIZE and 

Re: [sqlite] sqlite performance problem

2005-04-13 Thread D. Richard Hipp
On Fri, 2005-04-08 at 11:53 -0700, Maksim Yevmenkin wrote:
 CREATE INDEX data_by_a ON data (a);
 
  time sqlite3 db 'select n2 from data where a = 18234721'  /dev/null
 25.95u 0.71s 0:27.02 98.6%
 

If you make the index look like this:

  CREATE INDEX data_by_a ON data(a, n2);

Then SQLite only has to look at the index to get the information
it needs.  It never has to consult the original table.  This will
make the query about twice as fast or maybe even a little faster.
-- 
D. Richard Hipp [EMAIL PROTECTED]



RE: [sqlite] sqlite performance problem

2005-04-13 Thread Thomas Briggs
 

 well, it could be true, but not in the queries i have posted. i group
 by column a and there is an index on column a, so sqlite does not
 have to do anything to compute key. it does not even have to back to

   Do not confuse the index key with the aggregator key.  The two may be
the same in your case, but in general they won't be, so I wouldn't
expect SQLite to reuse the index key as the aggregator key even if it
can.

 the disk and fetch the value of column a because its right in the
 index. the only work sqlite has to do is to go back to the disk and
 fetch the value of column n1 and sum it.

   You're underestimating how much work has to be done to just fetch
the value of column n1.  The entire row containing n1 has to be read,
then the value of column n1 extracted from it.  That means disk seeks,
disk reads, moving around within the data row, etc.

 what you say is correct, but four (!) times performance increase?
 please, you have got to admit that something is not right here.

   It may well be possible to make your query run faster.  The biggest
problem here is simply that you're expecting a magically
high-performance solution without understanding what needs to be done in
order to satisfy your request.

 why? i'm using relational database to do exactly what it should do
 fast. that is i want select the range of rows from a much bigger set

   Do not confuse what you want it to do fast, or even what you think it
should do fast, with what it actually does or how fast it does it.

 and do something with it. i'm not asking it to do complex computations
 or anything else.

   You don't think you're doing anything complex, but you really are,
you just don't understand that you are.  Locating a variable-sized row
in an unpredictable location and performing arbitrary operations on some
sub-section of that row is a heck of a lot more complicated than
scanning through a flat file and computing hash keys.

 i think, i know what is going on here. the problem is that every time
 i do an indexed scan sqlite has to
 
 1) fetch index pages
 
 2) fetch data pages that match where condition
 
 because both index and data are in the same file sqlite has to perform
 insane amount of seek() calls and this is clearly demonstrated in my
 previous post.

   Once again, the fact that the data and the index are in the same file
is irrelevant.  Even if the data and the index were in separate files,
the disk heads would be doing a lot of random I/O to move from the
location of one row to the next given that the table rows are not
accessed sequentially when using an index.

 it still takes 14 whooping seconds to group by  300,000+ records
 down to 1417 records and that is just unacceptable (imo) for
 relational database.

   Why does the fact that SQLite is a relational database mean that
response times should be measured in seconds?  Have you tried this with
any other relational database, as another poster suggested?  You may
well get better performance out of something like MySQL or Oracle, but
it'll come at a price - much more memory usage and an SMP system that
allows you to effectively multithread.  I can guarantee you that any
other relational database, if configured to use the same amount of
memory as SQLite (a few MBs) and the same number of threads (1) that
they would all perform markedly *worse* than SQLite.

 from what i can see sqlite took extra 7 seconds to fetch the data of
 the disk. that is it had to seek back to data page, fetch it, process
 it, seek back to next index page etc.

   That's correct.  That's the price you pay for using an index to look
up the data.

 yes, i tried all that. it is not that easy to read sqlite explain
 output (imo) but i convince myself that it uses all the proper indexes
 when i make the query.

   Just because you think it's using the correct index doesn't mean that
that's the optimal access path.  Assuming that it is, however... Given
that you do seem to understand the overhead incurred by using an index
to look up data that is located elsewhere, what's the issue?  It seems
pretty clear that SQLite isn't doing anything wrong.  It isn't doing it
as optimally as your C program, but we've already covered the reasons
for that, so what's left to discuss?

 i think sqlite can only do good job when resulting set is very small.
 ideally one row. in this case it does not have to go back to disk to
 fetch the columns that are not in index. it is that simple (imo). the
 more columns (that are not in the index) you request the more time it
 takes.

   Why would you expect the processing of a large number of rows to take
the same amount of time as processing a single row?  Once again, your
expectations are unreasonable.  It's going to take more time to process
a large number of rows simply because there's more data to be managed.
   The number of columns requested isn't much of a factor either.
Requesting 10 columns that are not in the index is only slightly more
expensive than 

Re: [sqlite] sqlite performance problem

2005-04-12 Thread Gé Weijers
Maksim,

Some things you could try:

1) increase cache memory

You may be causing a lot of cache misses if the size of the query result
is very large compared to the size of the cache. Index-based searches
can cause multiple reloads of the same page because of a lack of
locality in the cache. An index-less search will just load each page once.

as an experiment, try 'PRAGMA cache_size = 30', before you run the
query. 1 GB of ram should be able to support 300MB of cache.

2) use 8192-byte pages

Larger pages seem to improve performance quite a bit, in my experience.

Do 'PRAGMA page_size = 8192' before you create the database.

Doing both may cause excessive memory use (20 * 8K = ...). I've
never tried that.


Gé



Maksim Yevmenkin wrote:

Robert,

  

[snip]



i said i print these rows to /dev/null too in my perl code. plus the
perl code does some other things such as joining these rows with other
hashes and summing the numbers.
  

That's fine.  I was merely trying to account for the 50% speed difference
between the two differing column tests, which has been accomplished.



As for the temp table ... I haven't tried this, but isn't


temp a reserved
  

word in SQLite?  More importantly, you should be doing this


statement inside

yes, it is. i really want to create 'temporary table' in memory. i was
really hoping it would speed things up.
  

I misread the statement, so ignore me on that part.  However, 339,000 rows
into a temporary in-memory table ... I tried some experiments locally here
and none of them took more than 2 seconds to execute.  Are you sure you're
not using up all available memory, which is causing the system to hit the
swapfile?  What does this same query look like when you drop the temp from
the query?



the system has 1G of ram. i was monitoring sqlite3 memory usage with
'top'. the SIZE and RES did not exceed 30M. so i do not think the
memory is the issue here.

  

time sqlite3 db 'create table foo as select * from data where a = 18234721' 
 /dev/null


22.06u 1.39s 0:27.75 84.5%

so pretty much the same time without 'temp'.

i'm starting to suspect disk. here is what i did. i created a separate
database with only one table. this table contains subset of 92
rows from original data table. it also has the same index on a
column, i.e. i did

  

sqlite3 db1


sqlite attach db as s;
sqlite create table data as select * from s.data where a = 18234721;
sqlite create index data_by_a on data (a);

full scan

  

time sqlite3 db1 'select n1 from data'  /dev/null


17.19u 0.55s 0:19.06 93.0%

bad index scan, because it is guaranteed then the table only has
keys that match where

  

time sqlite3 db1 'select n1 from data where a = 18234721'  /dev/null


25.73u 0.59s 0:28.37 92.7%

+10 seconds! is this the overhead of indexed scan? is this what it
really takes to seek back and forth between index and data? what am i
missing here?

thanks,
max
  




Re: [sqlite] sqlite performance problem

2005-04-12 Thread Maksim Yevmenkin
Gé,

thanks for the suggestion. unfortunately it did not make any
difference :( below is the results. as you can see it takes 7+ seconds
to group by  333,392 records and i'm grouping by column on which i
have index. again, i'm not a database guy, but i think that is slow.
perhaps someone can comment if that is the best sqlite can do?


sqlite .schema data
CREATE TABLE data(
  a INTEGER,
  b INTEGER,
  c CHAR,
  d INTEGER,
  e INTEGER,
  n1 FLOAT,
  n2 FLOAT
);
CREATE INDEX data_by_a on data (a);



sqlite select count(*) from data;
92



test-1.sql

PRAGMA cache_size = 30;
PRAGMA cache_size;
PRAGMA page_size;
SELECT count(*) FROM (SELECT a FROM data WHERE a = 18234721 GROUP BY a);



test-2.sql

PRAGMA cache_size;
PRAGMA page_size;
SELECT count(*) FROM (SELECT a FROM data WHERE a = 18234721 GROUP BY a);



 time sqlite3 db  test-1.sql
30
1024
1417
6.89u 0.33s 0:07.55 95.6%

 time sqlite3 db  test-2.sql
2000
1024
1417
6.91u 0.19s 0:07.39 96.0%



 time sqlite3 db2048  test-1.sql
30
2048
1417
6.80u 0.08s 0:07.32 93.9%

 time sqlite3 db2048  test-2.sql
2000
2048
1417
6.77u 0.12s 0:07.10 97.0%



 time sqlite3 db4096  test-1.sql
30
4096
1417
6.80u 0.15s 0:07.21 96.3%

 time sqlite3 db4096  test-2.sql
2000
4096
1417
6.79u 0.15s 0:07.15 97.0%



 time sqlite3 db8192  test-1.sql
30
8192
1417
6.70u 0.11s 0:07.01 97.1%

 time sqlite3 db8192  test-2.sql
2000
8192
1417
6.73u 0.09s 0:07.01 97.2%

thanks,
max


On Apr 12, 2005 7:10 AM, Gé Weijers [EMAIL PROTECTED] wrote:
 Maksim,
 
 Some things you could try:
 
 1) increase cache memory
 
 You may be causing a lot of cache misses if the size of the query result
 is very large compared to the size of the cache. Index-based searches
 can cause multiple reloads of the same page because of a lack of
 locality in the cache. An index-less search will just load each page once.
 
 as an experiment, try 'PRAGMA cache_size = 30', before you run the
 query. 1 GB of ram should be able to support 300MB of cache.
 
 2) use 8192-byte pages
 
 Larger pages seem to improve performance quite a bit, in my experience.
 
 Do 'PRAGMA page_size = 8192' before you create the database.
 
 Doing both may cause excessive memory use (20 * 8K = ...). I've
 never tried that.
 
 
 Gé
 
 
 Maksim Yevmenkin wrote:
 
 Robert,
 
 
 
 [snip]
 
 
 
 i said i print these rows to /dev/null too in my perl code. plus the
 perl code does some other things such as joining these rows with other
 hashes and summing the numbers.
 
 
 That's fine.  I was merely trying to account for the 50% speed difference
 between the two differing column tests, which has been accomplished.
 
 
 
 As for the temp table ... I haven't tried this, but isn't
 
 
 temp a reserved
 
 
 word in SQLite?  More importantly, you should be doing this
 
 
 statement inside
 
 yes, it is. i really want to create 'temporary table' in memory. i was
 really hoping it would speed things up.
 
 
 I misread the statement, so ignore me on that part.  However, 339,000 rows
 into a temporary in-memory table ... I tried some experiments locally here
 and none of them took more than 2 seconds to execute.  Are you sure you're
 not using up all available memory, which is causing the system to hit the
 swapfile?  What does this same query look like when you drop the temp from
 the query?
 
 
 
 the system has 1G of ram. i was monitoring sqlite3 memory usage with
 'top'. the SIZE and RES did not exceed 30M. so i do not think the
 memory is the issue here.
 
 
 
 time sqlite3 db 'create table foo as select * from data where a = 
 18234721'  /dev/null
 
 
 22.06u 1.39s 0:27.75 84.5%
 
 so pretty much the same time without 'temp'.
 
 i'm starting to suspect disk. here is what i did. i created a separate
 database with only one table. this table contains subset of 92
 rows from original data table. it also has the same index on a
 column, i.e. i did
 
 
 
 sqlite3 db1
 
 
 sqlite attach db as s;
 sqlite create table data as select * from s.data where a = 18234721;
 sqlite create index data_by_a on data (a);
 
 full scan
 
 
 
 time sqlite3 db1 'select n1 from data'  /dev/null
 
 
 17.19u 0.55s 0:19.06 93.0%
 
 bad index scan, because it is guaranteed then the table only has
 keys that match where
 
 
 
 time sqlite3 db1 'select n1 from data where a = 18234721'  /dev/null
 
 
 25.73u 0.59s 0:28.37 92.7%
 
 +10 seconds! is this the overhead of indexed scan? is this what it
 really takes to seek back and forth between index and data? what am i
 missing here?
 
 thanks,
 max
 
 
 



RE: [sqlite] sqlite performance problem

2005-04-12 Thread Thomas Briggs
 
 with sum(n1) added query runs twice as slow. as i was told its because
 sqlite has to fetch data row. fine, but why its soo slow?! and it

   Because for each row it has to compute the aggregate key, find the
aggregator for that key and increment the sum for that aggregate key.
That's a lot more work than just selecting the row and doing nothing
with it.

 so, just with plain ascii file i get four times the speed i get with
 sqlite. note that my c program will scale linearly with the size of
 dataset (just like i see with sqlite).

   With anything related to computers, there are always tradeoffs - most
commonly power for complexity, and flexibility for speed.  Your C
program *should* be faster than anything SQLite can do - it's simpler
and more specific to the problem you're trying to solve.  On the flip
side, it'll never do anything other than what it already does - it can
never be used to solve any other problem.

 - what am i doing wrong here?

   Your expectations are too high, for starters.  For reasons I will
never understand, people expect a relational database to be a silver
bullet that answers all questions instantaneously.  RDBMSs are useful
because they're flexible, but they're also not 100% optimal because
they're flexible.  That's the price you pay for being able to answer
questions with SQL instead of C.
   The other problem you may be running into is lack of a clear problem
definition.  So far I've seen mention of a half dozen things which you
think should perform better, but you haven't dug into any of them to
find out *why* they're slow.  An earlier poster helped identify disk I/O
as a bottleneck.  If from there you can prove that that's a bottleneck
because SQLite is performing too much disk I/O, then you have cause to
claim that SQLite is slow.  Otherwise, all you can do is blame your
disks.  Until you get a clear picture of what you want, what you need
and what's actually happening, however, you'll never be able to tell the
difference.

 - any suggestions on how to speed things up with sqlite?

   First, you must understand what SQLite is doing.  Have you tried
EXPLAINing the query that you're unhappy with to see what SQLite is
going with it under the hood?

   Also, an earlier poster mentioned increasing your block size.  How
high did you raise it?  I've seen significant performance increases with
block sizes of 8k and 16k; just how much difference it makes seems to
depend on the system in question, however.  On some systems I've tested,
8k block sizes are faster than 16k block sizes; my guess is that all
this comes down to the block size used by the file system and/or OS (and
the size of the data itself can factor in there as well), though I
haven't yet dug deep enough to be sure.  From my experience though, in
most cases, there are certainly gains to be had by using larger block
sizes.

 - is sqlite optimized to retrieve one row from the table?

   If there is an index on the column or columns referenced in your
WHERE clause, yes.  Moreso if the column or columns referenced contain
unique values.  Note however that this has nothing to do with SQLite -
all relational databases (all databases with B-tree indexes, actually,
not just RDBMSs) work this way.

 i'm starting to think that having index and data in the same file is
 not such a great idea when you want to perform range requests.

   Having the data and the index in the same file is irrelevant.  It is
entirely possible that SQLite might be using an index to satisfy your
query, and entirely possible that it would be more optimal to execute
the query without using the index, but the fact that the index and data
are in the same file has nothing to do with it.  Whether to use the
index or not is an age-old problem that other RDBMSs solve using
cost-based optimizers; I'm not sure if there's a solution for that in
SQLite, as I believe query plan execution is rule-based (I'm not 100%
sure about that though, someone please correct me if I'm wrong).  If
that's the case, then you'll need to try to find another way to write
your query such that SQLite executes it more optimally.

   If I remember correctly, the old trick to make sure you didn't use an
index was to change your WHERE clause slightly to something like:

   SELECT COUNT(*) FROM (SELECT a FROM data WHERE (a + 0) = 18234721
GROUP BY a);

   This should force a table scan rather than an index scan.  Try that
and see how your queries perform; if things go faster, that means the
index is hurting you, and you need to either drop it or rewrite your
queries to avoid using it.

   -Tom


Re: [sqlite] sqlite performance problem

2005-04-12 Thread Maksim Yevmenkin
Thomas,

  with sum(n1) added query runs twice as slow. as i was told its because
  sqlite has to fetch data row. fine, but why its soo slow?! and it
 
Because for each row it has to compute the aggregate key, find the
 aggregator for that key and increment the sum for that aggregate key.
 That's a lot more work than just selecting the row and doing nothing
 with it.

well, it could be true, but not in the queries i have posted. i group
by column a and there is an index on column a, so sqlite does not
have to do anything to compute key. it does not even have to back to
the disk and fetch the value of column a because its right in the
index. the only work sqlite has to do is to go back to the disk and
fetch the value of column n1 and sum it.

  so, just with plain ascii file i get four times the speed i get with
  sqlite. note that my c program will scale linearly with the size of
  dataset (just like i see with sqlite).
 
With anything related to computers, there are always tradeoffs - most
 commonly power for complexity, and flexibility for speed.  Your C
 program *should* be faster than anything SQLite can do - it's simpler
 and more specific to the problem you're trying to solve.  On the flip
 side, it'll never do anything other than what it already does - it can
 never be used to solve any other problem.

what you say is correct, but four (!) times performance increase?
please, you have got to admit that something is not right here.

  - what am i doing wrong here?
 
Your expectations are too high, for starters.  For reasons I will
 never understand, people expect a relational database to be a silver
 bullet that answers all questions instantaneously.  RDBMSs are useful
 because they're flexible, but they're also not 100% optimal because
 they're flexible.  That's the price you pay for being able to answer
 questions with SQL instead of C.

why? i'm using relational database to do exactly what it should do
fast. that is i want select the range of rows from a much bigger set
and do something with it. i'm not asking it to do complex computations
or anything else.

The other problem you may be running into is lack of a clear problem
 definition.  So far I've seen mention of a half dozen things which you
 think should perform better, but you haven't dug into any of them to
 find out *why* they're slow.  An earlier poster helped identify disk I/O
 as a bottleneck.  If from there you can prove that that's a bottleneck
 because SQLite is performing too much disk I/O, then you have cause to
 claim that SQLite is slow.  Otherwise, all you can do is blame your
 disks.  Until you get a clear picture of what you want, what you need
 and what's actually happening, however, you'll never be able to tell the
 difference.

i think, i know what is going on here. the problem is that every time
i do an indexed scan sqlite has to

1) fetch index pages

2) fetch data pages that match where condition

because both index and data are in the same file sqlite has to perform
insane amount of seek() calls and this is clearly demonstrated in my
previous post.

even if i remote the sum() but just add extra column that is not in
the index, like so

SELECT count(*) FROM (SELECT a,n1 FROM data WHERE a = 18234721 GROUP BY a);

 time sqlite3 db  test.sql
30
1024
1417
13.14u 1.06s 0:14.40 98.6%

it still takes 14 whooping seconds to group by  300,000+ records
down to 1417 records and that is just unacceptable (imo) for
relational database.

now if you only request column that is  in the index

SELECT count(*) FROM (SELECT a FROM data WHERE a = 18234721 GROUP BY a);

 time sqlite3 db  test.sql
30
1024
1417
6.84u 0.24s 0:07.18 98.6%

it only takes 7+ seconds. 

from what i can see sqlite took extra 7 seconds to fetch the data of
the disk. that is it had to seek back to data page, fetch it, process
it, seek back to next index page etc.

  - any suggestions on how to speed things up with sqlite?
 
First, you must understand what SQLite is doing.  Have you tried
 EXPLAINing the query that you're unhappy with to see what SQLite is
 going with it under the hood?

yes, i tried all that. it is not that easy to read sqlite explain
output (imo) but i convince myself that it uses all the proper indexes
when i make the query.

Also, an earlier poster mentioned increasing your block size.  How
 high did you raise it?  I've seen significant performance increases with
 block sizes of 8k and 16k; just how much difference it makes seems to
 depend on the system in question, however.  On some systems I've tested,
 8k block sizes are faster than 16k block sizes; my guess is that all
 this comes down to the block size used by the file system and/or OS (and
 the size of the data itself can factor in there as well), though I
 haven't yet dug deep enough to be sure.  From my experience though, in
 most cases, there are certainly gains to be had by using larger block
 sizes.

by default you can not raise page_size above 

Re: [sqlite] sqlite performance problem

2005-04-12 Thread D. Richard Hipp
On Tue, 2005-04-12 at 16:17 -0700, Maksim Yevmenkin wrote:

 SELECT count(*) FROM (SELECT a,n1 FROM data WHERE a = 18234721 GROUP BY a);
 
  time sqlite3 db  test.sql
 30
 1024
 1417
 13.14u 1.06s 0:14.40 98.6%
 

Have you tried doing the query like this:

  SELECT count(*) FROM (SELECT a,nl FROM data WHERE a-18234721=0 GROUP
BY a);


-- 
D. Richard Hipp [EMAIL PROTECTED]



Re: [sqlite] sqlite performance problem

2005-04-12 Thread John LeSueur
Maksim Yevmenkin wrote:
i think, i know what is going on here. the problem is that every time
i do an indexed scan sqlite has to
1) fetch index pages
2) fetch data pages that match where condition
because both index and data are in the same file sqlite has to perform
insane amount of seek() calls and this is clearly demonstrated in my
previous post.
even if i remote the sum() but just add extra column that is not in
the index, like so
SELECT count(*) FROM (SELECT a,n1 FROM data WHERE a = 18234721 GROUP BY a);
 

time sqlite3 db  test.sql
   

30
1024
1417
13.14u 1.06s 0:14.40 98.6%
it still takes 14 whooping seconds to group by  300,000+ records
down to 1417 records and that is just unacceptable (imo) for
relational database.
 

If seeking was the problem, you'd see more system time. The i/o time is 
at most 1.06 seconds. The file is probably
being cached in the os, and therefore, the seeks just become memory 
operations.

You might also get more help if you presented your requests in a milder 
tone. You might even get responses
from the people who wrote sqlite, and therefore have knowledge of its 
internals.

but one thing to try:
CREATE TABLE data(
 a INTEGER,
 b INTEGER,
 c CHAR,
 d INTEGER,
 e INTEGER,
 n1 FLOAT,
 n2 FLOAT
);
CREATE INDEX data_by_a on data (a, b, c, d, e, n1, n2);
This is a big waste of space, and will slow down all your updates and 
inserts, but since you
seem to be mostly worried about the speed of selects, those shouldn't 
make a difference to you.

You may already know this, but when you create an index with multiple 
keys, it can use a subset
of those keys to look up rows. The limitation is that any key that 
appears in the index before a column
you would like to limit must also be limited, e.g.,

select 
   * 
from 
   a 
where 
   a = 12 and b = 16 and c = 16 

will use the index, but
select
   *
from 
   a
where 
   b = 22 and c = 23

will not use the index.
You claim that the reason sqlite is slower than you expect is because 
sqlite is using the index to
look up the data. If so, this should fix the problem, by making the 
index have all the data right
there.

I'm not so sure that this will resolve your problems, but it may take 
you closer to a solution. Another
thing to try is to try the same experiment with postgresql or mysql, and 
see if they have the same sort
of performance you're expecting to see. If not, you may want to consider 
that Thomas was right, and
a general solution may not be able to perform the calculations with the 
same speed that you are hoping
for.

John LeSueur


Re: [sqlite] sqlite performance problem

2005-04-12 Thread Maksim Yevmenkin
  SELECT count(*) FROM (SELECT a,n1 FROM data WHERE a = 18234721 GROUP BY a);
 
   time sqlite3 db  test.sql
  30
  1024
  1417
  13.14u 1.06s 0:14.40 98.6%
 
 Have you tried doing the query like this:
 
   SELECT count(*) FROM (SELECT a,nl FROM data WHERE a-18234721=0 GROUP
 BY a);

i just tried. i think this effectively disables the index on column
a and on 9+ million records data set it took

 time sqlite3 db   test.sql
30
1024
1417
61.16u 40.69s 2:05.01 81.4%

thanks,
max


Re: [sqlite] sqlite performance problem

2005-04-12 Thread Maksim Yevmenkin
John.

 i think, i know what is going on here. the problem is that every time
 i do an indexed scan sqlite has to
 
 1) fetch index pages
 
 2) fetch data pages that match where condition
 
 because both index and data are in the same file sqlite has to perform
 insane amount of seek() calls and this is clearly demonstrated in my
 previous post.
 
 even if i remote the sum() but just add extra column that is not in
 the index, like so
 
 SELECT count(*) FROM (SELECT a,n1 FROM data WHERE a = 18234721 GROUP BY a);
 
 
 
 time sqlite3 db  test.sql
 
 
 30
 1024
 1417
 13.14u 1.06s 0:14.40 98.6%
 
 it still takes 14 whooping seconds to group by  300,000+ records
 down to 1417 records and that is just unacceptable (imo) for
 relational database.
 
 
 If seeking was the problem, you'd see more system time. The i/o time is
 at most 1.06 seconds. The file is probably
 being cached in the os, and therefore, the seeks just become memory
 operations.

i agree that most of the data are cached by filesystem. it would be
even slower it sqlite had to go and fetch all the data from the disk
plates. i'm not sure if system time is counted if the process was put
to sleep due to, say, disk i/o.

 You might also get more help if you presented your requests in a milder
 tone. You might even get responses
 from the people who wrote sqlite, and therefore have knowledge of its
 internals.

 i did not intend to offended anyone, if i did i apologize. 

 but one thing to try:
 
 CREATE TABLE data(
   a INTEGER,
   b INTEGER,
   c CHAR,
   d INTEGER,
   e INTEGER,
   n1 FLOAT,
   n2 FLOAT
 );
 CREATE INDEX data_by_a on data (a, b, c, d, e, n1, n2);
 
 This is a big waste of space, and will slow down all your updates and
 inserts, but since you
 seem to be mostly worried about the speed of selects, those shouldn't
 make a difference to you.

that will not work for me either. i have to fit 60 million rows table
in under 15 gig.

 You may already know this, but when you create an index with multiple
 keys, it can use a subset
 of those keys to look up rows. The limitation is that any key that
 appears in the index before a column
 you would like to limit must also be limited, e.g.,
 
 select
 *
 from
 a
 where
 a = 12 and b = 16 and c = 16
 
 will use the index, but
 
 select
 *
 from
 a
 where
 b = 22 and c = 23
 
 will not use the index.

yes, i know this. i do not see how this is applicable in my tests.
 
 You claim that the reason sqlite is slower than you expect is because
 sqlite is using the index to
 look up the data. If so, this should fix the problem, by making the
 index have all the data right
 there.

that is not what i said. the index is properly gives me the much (30
times) smaller data set. however, the column i have index on is only
one of the columns i want to fetch. the other columns are still on
disk. in my case it takes longer to get the other columns i want.
again, i'd rather not place index on all the columns because i might
not meet size requirements.

 I'm not so sure that this will resolve your problems, but it may take
 you closer to a solution. Another
 thing to try is to try the same experiment with postgresql or mysql, and
 see if they have the same sort
 of performance you're expecting to see. If not, you may want to consider

i will try another database. its just a already spend too much time
trying to make sqlite work :( i still hope that i'm doing something
wrong here. i just dont know what it is :)

 that Thomas was right, and
 a general solution may not be able to perform the calculations with the
 same speed that you are hoping
 for.

but i'm not doing anything out of the ordinary here. select rows, sum
some columns and do join on a couple of other tables. why would sqlite
not work for me?

thanks,
max


Re: [sqlite] sqlite performance problem

2005-04-12 Thread Gé Weijers
Maksim Yevmenkin wrote:


so, just with plain ascii file i get four times the speed i get with
sqlite. note that my c program will scale linearly with the size of
dataset (just like i see with sqlite).
  

   With anything related to computers, there are always tradeoffs - most
commonly power for complexity, and flexibility for speed.  Your C
program *should* be faster than anything SQLite can do - it's simpler
and more specific to the problem you're trying to solve.  On the flip
side, it'll never do anything other than what it already does - it can
never be used to solve any other problem.



what you say is correct, but four (!) times performance increase?
please, you have got to admit that something is not right here.

  

I have to agree with Thomas, your expectations are too high. If I'd be
using a relational database and I could get within a factor of 4 of what
I can concoct in C I'd declare victory. Relational databases are often
far from speedy, even on simple queries. You pay for:

* variable record formats
* integrity checks
* duplicate storage of keys in the BTree
* duplicate storage of keys in multiple tables
* the ACID property, even if you're not using it in your samples
* the ability to perform queries in a flexible way
* .

If your database is simple you may be better off performance wise by
rolling your own solution, or using another database. MySQL is pretty
fast if you run it using ISAM tables, but you pay with data corruption
if the DB or system crashes.

If your queries generally produce a sizeable percentage of the records
stored you might as well do a sequential scan over a file, if written
with care, performance will be completely I/O bound. Use the 'mmap'
system call or equivalent to map the DB into memory, and you can read
your DB using pointer arithmetic, and use 'memmove' for updates.

Gé



Re: [sqlite] sqlite performance problem

2005-04-11 Thread Maksim Yevmenkin
Christian,

thanks for the reply.

 i'm having strange performance problem with sqlite-3.2.0. consider the
 following table
 
  [snip]
 
 now the problem:
 
 1) if i do a select with an idex it takes 27 sec. to get 92 rows
 
  time sqlite3 db 'select n2 from data where a = 18234721'  /dev/null
 25.95u 0.71s 0:27.02 98.6%
 
  time sqlite3 db 'select n2 from data where a = 18234721'  /dev/null
 26.02u 0.66s 0:27.53 96.9%
 
 2) if i do a select with sequential lookup it takes 1min to get 9818210 rows
 
  time sqlite3 db 'select n2 from data where a+0 = 18234721'  /dev/null
 49.54u 14.65s 1:04.98 98.7%
 
  time sqlite3 db 'select n2 from data where a+0 = 18234721'  /dev/null
 49.80u 14.42s 1:05.03 98.7%
 
 - so how come it take only twice as much time to get 30 times more rows?
 
 When doing an index scan, you will be accessing the index as well as the
 table pages. For a sequential scan, only the table pages are touched,
 reducing thrashing of the cache.

i guess, i can believe this. however its pretty disappointing to get
50% improvement on 30 times less dataset :(

but how do you explain this?

sqlite .schema data
CREATE TABLE data
(
   a INTEGER,
   b INTEGER,
   c CHAR,
   d INTEGER,
   e INTEGER,
   n1 FLOAT,
   n2 FLOAT
);
CREATE INDEX data_by_a ON data (a);

 time sqlite3 db 'select n1 from data where a = 18234721'  /dev/null
26.15u 0.59s 0:27.00 99.0%
 time sqlite3 db 'select n1 from data where a = 18234721'  /dev/null
26.04u 0.61s 0:26.91 99.0%

and

 time sqlite3 db 'select e from data where a = 18234721'  /dev/null
12.22u 0.41s 0:12.67 99.6%
 time sqlite3 db 'select e from data where a = 18234721'  /dev/null
12.01u 0.61s 0:12.64 99.8%

so just by selecting two different rows (both next to each other). i
get another 50% time difference? sure the column types are different,
and i can even understand that FLOAT might be 8 bytes and INTEGER 4
bytes, but 50% time difference? it just cant be that linear can it?

do you think it is possible to get better results by issuing multiple
queries each of which will return even less dataset? and, no, i'm not
a database guy.

 Also, if the data is not in index order in the table, and/or dispersed
 across the database file, you may have to visit each page more than once
 when traversing in index order. In the full table scan, you'll read the
 rows in table order, hence only touching each page once before moving on
 to later pages, thus reducing cache thrashing even more.

all rows were inserted into the table in order that matches the index.
i can not say if the records on disk will have the same order.

 - and why is it taking 27 seconds to get 92 rows anyway?
 
 You think 12347 rows/s is bad?

because i have the same data set in perl multilevel hash stored on
disk (storable format), and it takes about 1 minute to perform the
same query on the same hardware. this time includes reading the data
from the disk, traversing every single key on each hash level, etc. i
was hoping that sqlite  would do something like in just a few ( 10)
seconds.

 - is there any way to create an index in separate file?
 
 No, SQLite is a single file embedded database. Keeps administration
 almost non-existent (by design.)

- so, if i have a large table (  9 million records) with an index
then sqlite would have constantly seek back and forth between index
and data (within the same file)  probably wasting lots of disk
bandwidth?

- if above is correct than can i force sqlite to get all (or as much
as possible) of index into the memory?

 the hardware is sun netra t1 running solaris 5.7. the db file size is
 about 800 mbytes.
 
 just for the record i'd like to have at least 15 times more records in
 the 'data' table.
 
 If you can match SQLite for simple read-only throughput with another
 database, you'll be doing well.

i do not know about other database, but compared to plain perl its not
that impressive. i must be doing something wrong here.

thanks,
max


RE: [sqlite] sqlite performance problem

2005-04-11 Thread Robert Simpson
 -Original Message-
 From: Maksim Yevmenkin [mailto:[EMAIL PROTECTED] 
 Sent: Monday, April 11, 2005 9:59 AM
 To: Christian Smith
 Cc: sqlite-users@sqlite.org
 Subject: Re: [sqlite] sqlite performance problem
 
 i guess, i can believe this. however its pretty disappointing to get
 50% improvement on 30 times less dataset :(
 
 but how do you explain this?
 
 sqlite .schema data
 CREATE TABLE data
 (
a INTEGER,
b INTEGER,
c CHAR,
d INTEGER,
e INTEGER,
n1 FLOAT,
n2 FLOAT
 );
 CREATE INDEX data_by_a ON data (a);
 
  time sqlite3 db 'select n1 from data where a = 18234721'  
 /dev/null
 26.15u 0.59s 0:27.00 99.0%
  time sqlite3 db 'select n1 from data where a = 18234721'  
 /dev/null
 26.04u 0.61s 0:26.91 99.0%
 
 and
 
  time sqlite3 db 'select e from data where a = 18234721'  /dev/null
 12.22u 0.41s 0:12.67 99.6%
  time sqlite3 db 'select e from data where a = 18234721'  /dev/null
 12.01u 0.61s 0:12.64 99.8%
 
 so just by selecting two different rows (both next to each other). i
 get another 50% time difference? sure the column types are different,
 and i can even understand that FLOAT might be 8 bytes and INTEGER 4
 bytes, but 50% time difference? it just cant be that linear can it?
 
 do you think it is possible to get better results by issuing multiple
 queries each of which will return even less dataset? and, no, i'm not
 a database guy.

The most glaring fault in your time tests that I see is that you're running
the command-line sqlite3, which is printf'ing all 339,000+ rows to dev/null.
It doesn't show on your screen, but the work is still being done regardless.
If you really want a fair speed test, change those statements to:

select count(n1) from data where a = 18234721

And 

select count(e) from data where a = 18234721

Or even

select count(*) from data where a = 18234721

THEN tell us what the difference in performance is ...

Robert




Re: [sqlite] sqlite performance problem

2005-04-11 Thread Maksim Yevmenkin
Robert,

  i guess, i can believe this. however its pretty disappointing to get
  50% improvement on 30 times less dataset :(
 
  but how do you explain this?
 
  sqlite .schema data
  CREATE TABLE data
  (
 a INTEGER,
 b INTEGER,
 c CHAR,
 d INTEGER,
 e INTEGER,
 n1 FLOAT,
 n2 FLOAT
  );
  CREATE INDEX data_by_a ON data (a);
 
   time sqlite3 db 'select n1 from data where a = 18234721' 
  /dev/null
  26.15u 0.59s 0:27.00 99.0%
   time sqlite3 db 'select n1 from data where a = 18234721' 
  /dev/null
  26.04u 0.61s 0:26.91 99.0%
 
  and
 
   time sqlite3 db 'select e from data where a = 18234721'  /dev/null
  12.22u 0.41s 0:12.67 99.6%
   time sqlite3 db 'select e from data where a = 18234721'  /dev/null
  12.01u 0.61s 0:12.64 99.8%
 
  so just by selecting two different rows (both next to each other). i
  get another 50% time difference? sure the column types are different,
  and i can even understand that FLOAT might be 8 bytes and INTEGER 4
  bytes, but 50% time difference? it just cant be that linear can it?
 
  do you think it is possible to get better results by issuing multiple
  queries each of which will return even less dataset? and, no, i'm not
  a database guy.
 
 The most glaring fault in your time tests that I see is that you're running
 the command-line sqlite3, which is printf'ing all 339,000+ rows to dev/null.
 It doesn't show on your screen, but the work is still being done regardless.

well, i do print these rows to /dev/null in perl test too.

 If you really want a fair speed test, change those statements to:
 
 select count(n1) from data where a = 18234721

 time sqlite3 db 'select count(n1) from data where a = 18234721'  /dev/null
7.79u 0.70s 0:08.50 99.8%

 And
 
 select count(e) from data where a = 18234721

 time sqlite3 db 'select count(e) from data where a = 18234721'  /dev/null
7.90u 0.42s 0:08.31 100.1%

 Or even
 
 select count(*) from data where a = 18234721

 time sqlite3 db 'select count(*) from data where a = 18234721'  /dev/null
1.35u 0.16s 0:01.47 102.7%

8 times faster then count(n1) or count(e)? i'm confused. i guess it
just used a field (on which it had index?)

 THEN tell us what the difference in performance is ...

fine, if i ask sqlite just to count the rows it wins hands-down, but i
really want these rows. even more i 'd like to then natural join
these rows with a couple of other tables to really do what the perl
code currently does.

but, it takes 22 seconds to just to create a temp table with the
required dataset

 time sqlite3 db 'create temp table foo as select * from data where a = 
 18234721'  /dev/null
21.93u 0.89s 0:22.95 99.4%

and i do not understand what i'm doing wrong here :(

thanks,
max


RE: [sqlite] sqlite performance problem

2005-04-11 Thread Robert Simpson
Let's recap ...

 time sqlite3 db 'select n1 from data where a = 18234721'  /dev/null
 26.15u 0.59s 0:27.00 99.0%

 time sqlite3 db 'select n1 from data where a = 18234721'  /dev/null
 26.04u 0.61s 0:26.91 99.0%

 time sqlite3 db 'select e from data where a = 18234721'  /dev/null
 12.22u 0.41s 0:12.67 99.6%

 time sqlite3 db 'select e from data where a = 18234721'  /dev/null
 12.01u 0.61s 0:12.64 99.8%

 so just by selecting two different rows (both next to each other). I
 get another 50% time difference? sure the column types 
 are different,
 and i can even understand that FLOAT might be 8 bytes and 
 INTEGER 4
 bytes, but 50% time difference? it just cant be that 
 linear can it?

By doing these same two queries using select count(), you've proven my
original theory that the time difference is due to the volume of bytes being
transmitted to dev/null and NOT because of the actual data types of the
columns or ANY OTHER FACTOR.  If you really want, change your pipe output to
two file names, I'm 100% sure you'll find that the 'select n1 ...' query
results in an output file significantly larger than the 'select e ...'
output file.  This is where the differing performance comes from -- the time
it is taking to parse and print your output.

The select count(*) does indeed use the index and hence the significant
performance difference.  

 fine, if i ask sqlite just to count the rows it wins hands-down, but i
 really want these rows. even more i 'd like to then natural join
 these rows with a couple of other tables to really do what the perl
 code currently does.
 
 but, it takes 22 seconds to just to create a temp table with the
 required dataset
 
  time sqlite3 db 'create temp table foo as select * from 
 data where a = 18234721'  /dev/null
 21.93u 0.89s 0:22.95 99.4%
 
 and i do not understand what i'm doing wrong here :(

Again, the only reason I suggested using count() in your timing test was to
ensure that the command-line sqlite3 program's output was consistent for
both tests and to eliminate dev/null printf's from factoring into the total
time.  In your application, you'll call select * (or whatever) without the
count to retrieve the rows -- but since you're not printf'ing them and
instead are doing your own thing with them, you will indeed see close to
identical times in your selects just like you did in the count() test.

As for the temp table ... I haven't tried this, but isn't temp a reserved
word in SQLite?  More importantly, you should be doing this statement inside
a transaction.  Transactions are critically important in SQLite.  I would go
so far as to say NEVER EVER perform ANY bulk write in SQLite outside a
transaction!  The performance difference is beyond phenomenal.

Robert




Re: [sqlite] sqlite performance problem

2005-04-11 Thread Maksim Yevmenkin
Robert,

  time sqlite3 db 'select n1 from data where a = 18234721'  /dev/null
  26.15u 0.59s 0:27.00 99.0%
 
  time sqlite3 db 'select n1 from data where a = 18234721'  /dev/null
  26.04u 0.61s 0:26.91 99.0%
 
  time sqlite3 db 'select e from data where a = 18234721'  /dev/null
  12.22u 0.41s 0:12.67 99.6%
 
  time sqlite3 db 'select e from data where a = 18234721'  /dev/null
  12.01u 0.61s 0:12.64 99.8%
 
  so just by selecting two different rows (both next to each other). I
  get another 50% time difference? sure the column types
  are different,
  and i can even understand that FLOAT might be 8 bytes and
  INTEGER 4
  bytes, but 50% time difference? it just cant be that
  linear can it?
 
 By doing these same two queries using select count(), you've proven my
 original theory that the time difference is due to the volume of bytes being
 transmitted to dev/null and NOT because of the actual data types of the
 columns or ANY OTHER FACTOR.  If you really want, change your pipe output to

ok, i agree, redirecting output to /dev/null might have impact on times. 

 two file names, I'm 100% sure you'll find that the 'select n1 ...' query
 results in an output file significantly larger than the 'select e ...'
 output file.  This is where the differing performance comes from -- the time
 it is taking to parse and print your output.

here you are wrong actually. 

 time sqlite3 db 'select e from data where a = 18234721'  x
12.01u 0.64s 0:12.80 98.8%

 time sqlite3 db 'select n1 from data where a = 18234721'  y
26.06u 0.62s 0:26.86 99.3%

the size of x is 1070681 (output of column e) and the size of y is
1004219 (output of column n1), so the file sizes are about the same. x
is 66462 bytes more.  it is probably possible that printf() is 2 times
slower on float's.

 The select count(*) does indeed use the index and hence the significant
 performance difference.

fine

  fine, if i ask sqlite just to count the rows it wins hands-down, but i
  really want these rows. even more i 'd like to then natural join
  these rows with a couple of other tables to really do what the perl
  code currently does.
 
  but, it takes 22 seconds to just to create a temp table with the
  required dataset
 
   time sqlite3 db 'create temp table foo as select * from
  data where a = 18234721'  /dev/null
  21.93u 0.89s 0:22.95 99.4%
 
  and i do not understand what i'm doing wrong here :(
 
 Again, the only reason I suggested using count() in your timing test was to
 ensure that the command-line sqlite3 program's output was consistent for
 both tests and to eliminate dev/null printf's from factoring into the total
 time.  In your application, you'll call select * (or whatever) without the
 count to retrieve the rows -- but since you're not printf'ing them and
 instead are doing your own thing with them, you will indeed see close to
 identical times in your selects just like you did in the count() test.

i said i print these rows to /dev/null too in my perl code. plus the
perl code does some other things such as joining these rows with other
hashes and summing the numbers.

 As for the temp table ... I haven't tried this, but isn't temp a reserved
 word in SQLite?  More importantly, you should be doing this statement inside

yes, it is. i really want to create 'temporary table' in memory. i was
really hoping it would speed things up.

 a transaction.  Transactions are critically important in SQLite.  I would go
 so far as to say NEVER EVER perform ANY bulk write in SQLite outside a
 transaction!  The performance difference is beyond phenomenal.

nope.

 time sqlite3 db 'begin; create temp table foo as select * from data where a 
 = 18234721; commit'  /dev/null
21.90u 0.77s 0:22.87 99.1%

still 22 seconds to just create a table with 300,000+ records, and
that is, unfortunately, too slow :(

doing strace/truss on sqlite shows that it performs huge amount of seek's. 

so the original questions stay:

- what am i doing wrong here?

- is sqlite going to be not as fast on a fairly large index'ed table
because it has to seek back and forth between index and data?

thanks,
max


Re: [sqlite] sqlite performance problem

2005-04-11 Thread Maksim Yevmenkin
Robert,

 [snip]
 
  i said i print these rows to /dev/null too in my perl code. plus the
  perl code does some other things such as joining these rows with other
  hashes and summing the numbers.
 
 That's fine.  I was merely trying to account for the 50% speed difference
 between the two differing column tests, which has been accomplished.
 
   As for the temp table ... I haven't tried this, but isn't
  temp a reserved
   word in SQLite?  More importantly, you should be doing this
  statement inside
 
  yes, it is. i really want to create 'temporary table' in memory. i was
  really hoping it would speed things up.
 
 I misread the statement, so ignore me on that part.  However, 339,000 rows
 into a temporary in-memory table ... I tried some experiments locally here
 and none of them took more than 2 seconds to execute.  Are you sure you're
 not using up all available memory, which is causing the system to hit the
 swapfile?  What does this same query look like when you drop the temp from
 the query?

the system has 1G of ram. i was monitoring sqlite3 memory usage with
'top'. the SIZE and RES did not exceed 30M. so i do not think the
memory is the issue here.

 time sqlite3 db 'create table foo as select * from data where a = 18234721' 
  /dev/null
22.06u 1.39s 0:27.75 84.5%

so pretty much the same time without 'temp'.

i'm starting to suspect disk. here is what i did. i created a separate
database with only one table. this table contains subset of 92
rows from original data table. it also has the same index on a
column, i.e. i did

 sqlite3 db1
sqlite attach db as s;
sqlite create table data as select * from s.data where a = 18234721;
sqlite create index data_by_a on data (a);

full scan

 time sqlite3 db1 'select n1 from data'  /dev/null
17.19u 0.55s 0:19.06 93.0%

bad index scan, because it is guaranteed then the table only has
keys that match where

 time sqlite3 db1 'select n1 from data where a = 18234721'  /dev/null
25.73u 0.59s 0:28.37 92.7%

+10 seconds! is this the overhead of indexed scan? is this what it
really takes to seek back and forth between index and data? what am i
missing here?

thanks,
max


Re: [sqlite] Performance problem

2004-03-22 Thread Hugh Gibson
 SQLite only uses a single index per table on any give query.
 This is unlikely to change.

Would it be able to use a multi-column query on ipnode + author?

Hugh

 Shi Elektronische Medien GmbH, Peter Spiske wrote:
  
  the following simple query is very slow:
SELECT title FROM t1 WHERE ipnode='VZ' ORDER BY author;
  
  The database is about 250 MB in size and the table the query is run 
  against
  has 12 cols and 120,000 rows.
  Every col has an index.
  The above query returns about 80% of the records.
  As soon as the ORDER BY statement is left away, the query ist fast.
  
 
 SQLite only uses a single index per table on any give query.
 This is unlikely to change.
 
 Since your query is returning 80% of the rows in the table,
 the use of an index to implement the WHERE clause is not
 really helping you any.  But the simple-minded query
 optimizer of SQLite does not realize this.  SQLite always
 prefers to use an index to implement the WHERE clause when
 it can.  SQLite will also use that same index to implement
 ORDER BY, if possible, or if no index was helpful for
 implementing WHERE it will try to find an index to implement
 ORDER BY.
 
 In your case, the best solution would be to trick sqlite into
 not using the index on the WHERE clause.  This will make it
 use the index to implement ORDER BY and you should get much
 better performance.  I suggest trying this query:
 
 SELECT title FROM t1 WHERE ipnode||'x'='VZx' ORDER BY author;
 
 By appending the string 'x' onto the end of the ipnode column
 prevents the query optimizer from use an index on ipnode.  This
 leave the optimizer free to use an index to implement ORDER BY.
 
 The other thing you would try is to DROP the index on the
 ipnode column.
 
 
 -- 
 D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[sqlite] Performance problem

2004-03-20 Thread Shi Elektronische Medien GmbH, Peter Spiske
Hi,

the following simple query is very slow:
  SELECT title FROM t1 WHERE ipnode='VZ' ORDER BY author;

The database is about 250 MB in size and the table the query is run against
has 12 cols and 120,000 rows.
Every col has an index.
The above query returns about 80% of the records.
As soon as the ORDER BY statement is left away, the query ist fast.

If i interpret the output of explain in the sqlite commandline tool
correctly, only one index - the one for the column ipnode - is used.
No index is used for sorting!!

Usually the above table would be joined with another one which makes things
even worse.

Other tests produced the same results:
Wether there is an ORDER BY clause or multiple columns are referenced in the
WHERE clause - there will allways be only one index used.

What am i doing wrong?
Is it true that sqlite can only use one index per table in a query?
How can i get sqlite to use more than one index per table at a time?

cheers
Peter


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [sqlite] Performance problem

2004-03-20 Thread Darren Duncan
At 1:33 PM +0100 3/20/04, Shi Elektronische Medien GmbH, Peter Spiske wrote:
the following simple query is very slow:
  SELECT title FROM t1 WHERE ipnode='VZ' ORDER BY author;
The database is about 250 MB in size and the table the query is run against
has 12 cols and 120,000 rows.
Every col has an index.
The above query returns about 80% of the records.
As soon as the ORDER BY statement is left away, the query ist fast.
Whether or not indexes are used, sorting tends to be a very labour 
intensive operation and slows things down considerably.

You can try to isolate your slowdown like this:

1. Run your existing select and time it.

2. Take the order-by off your select and run it.

3. Make your existing select, with order-by, into a subselect (in 
from clause) of a larger select that does something trivial such as 
returning a count of rows from the inner select.  You would have to 
be careful with your choice, though, to make sure that SQLite doesn't 
optimize anything away by your choice, as you want all the work for 
the inner select to do all the same work.  The point is that the 
outer select returns almost no data.  Then network speed is basically 
taken out of the equation and all the time is basically just the 
select execute and not the select fetch.

-- Darren Duncan

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [sqlite] Performance problem

2004-03-20 Thread D. Richard Hipp
Shi Elektronische Medien GmbH, Peter Spiske wrote:
the following simple query is very slow:
  SELECT title FROM t1 WHERE ipnode='VZ' ORDER BY author;
The database is about 250 MB in size and the table the query is run against
has 12 cols and 120,000 rows.
Every col has an index.
The above query returns about 80% of the records.
As soon as the ORDER BY statement is left away, the query ist fast.
SQLite only uses a single index per table on any give query.
This is unlikely to change.
Since your query is returning 80% of the rows in the table,
the use of an index to implement the WHERE clause is not
really helping you any.  But the simple-minded query
optimizer of SQLite does not realize this.  SQLite always
prefers to use an index to implement the WHERE clause when
it can.  SQLite will also use that same index to implement
ORDER BY, if possible, or if no index was helpful for
implementing WHERE it will try to find an index to implement
ORDER BY.
In your case, the best solution would be to trick sqlite into
not using the index on the WHERE clause.  This will make it
use the index to implement ORDER BY and you should get much
better performance.  I suggest trying this query:
   SELECT title FROM t1 WHERE ipnode||'x'='VZx' ORDER BY author;

By appending the string 'x' onto the end of the ipnode column
prevents the query optimizer from use an index on ipnode.  This
leave the optimizer free to use an index to implement ORDER BY.
The other thing you would try is to DROP the index on the
ipnode column.
--
D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [sqlite] Performance problem

2003-11-06 Thread ben . carlyle
- Forwarded by Ben Carlyle/AU/IRSA/Rail on 07/11/2003 10:00 AM -


Ben Carlyle
07/11/2003 10:00 AM


To: Mrs. Brisby [EMAIL PROTECTED]@CORP
cc: 
Subject:Re: [sqlite] Performance problem






Mrs. Brisby [EMAIL PROTECTED]
07/11/2003 12:08 AM

 
To: Jonas Forsman / Axier.SE [EMAIL PROTECTED]
cc: Clark, Chris [EMAIL PROTECTED], D. Richard Hipp [EMAIL 
PROTECTED], 
[EMAIL PROTECTED]
Subject:Re: [sqlite] Performance problem


 On Wed, 2003-11-05 at 23:59, Jonas Forsman / Axier.SE wrote:
  Note: Testing has shown PostgreSQL's hash indexes to be similar or 
slower
  than B-tree indexes, and the index size and build time for hash 
indexes is
  much worse. Hash indexes also suffer poor performance under high
  concurrency. For these reasons, hash index use is discouraged.
 Please note I'm note I'm not talking about a hash of the entire key- I'm
 talking about n distinct b-trees that are selected by an 8-n bit
 function. This transformation can be made very fast: We get a speed
 improvement here on searches if our 8-n bit function takes less time
 than n-1 random memcmp()'s.

How would you handle the lack of ordering associate with hash tables? 
Sqlite can currently use indicies for three main tests: equals, less than, 
and greater than. While hash-tables are good at finding equal-to in 
constant time it usually means linear time (a table-scan) to test for less 
than or greater than. Do you have a solution to this problem?

Benjamin.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [sqlite] Performance problem

2003-11-06 Thread Mrs. Brisby
On Thu, 2003-11-06 at 19:00, [EMAIL PROTECTED] wrote:
 How would you handle the lack of ordering associate with hash tables? 
 Sqlite can currently use indicies for three main tests: equals, less than, 
 and greater than. While hash-tables are good at finding equal-to in 
 constant time it usually means linear time (a table-scan) to test for less 
 than or greater than. Do you have a solution to this problem?

You presume that a defining characteristic of a hash-table is that the
function be non-linear. Consider a n-3 bit function:
f(x) - x mod 8
A very linear function that it is perfectly easy to iterate _in_order_
all values of the table.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [sqlite] Performance problem

2003-11-05 Thread Jonas Forsman / Axier.SE
According to the postgresql documentation, the hash algorithm is
discouraged compared to b-tree for performance reasons.

http://www.postgresql.org/docs/view.php?version=7.3idoc=1file=indexes-types.html

Note: Testing has shown PostgreSQL's hash indexes to be similar or slower
than B-tree indexes, and the index size and build time for hash indexes is
much worse. Hash indexes also suffer poor performance under high
concurrency. For these reasons, hash index use is discouraged.

 /jonas

- Original Message - 
From: Mrs. Brisby [EMAIL PROTECTED]
To: Clark, Chris [EMAIL PROTECTED]
Cc: D. Richard Hipp [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Thursday, November 06, 2003 4:14 AM
Subject: RE: [sqlite] Performance problem


 On Wed, 2003-11-05 at 13:44, Clark, Chris wrote:
   -Original Message-
   From: Mrs. Brisby [mailto:[EMAIL PROTECTED]
  
   MySQL has stated in-documentation that it uses a B-tree for
   it's index.
   I think this is a mistake- especially for larger indexes.
   Using several B-trees attached to a hash-table is much faster
   (if the hash is fast or your data is uniform).
 
  Following this train of thought (this isn't a feature request!); some
DBMS's support different structures for tables/indices and the DBA can
specify the required structure depending in the expected data usage/patterns
(and a 2ndary index need not be the same structure as the primary table
structure, allowing for, say, a hash table and b-tree 2ndary's as per the
example above). E.g. Ingres has; Heap (yep, completely unstructured),
B-tree, Hash, and ISAM (there is also an R-tree but that is only for spatial
datatypes so it's not as interesting for this discussion). It all depends on
the data and how it is used as to which structure should/could be used.
 
  A typical example of the hash primary and b-tree 2ndary is a unique
customer id so that the customer record can be hit directly with the hash,
or if the hash is not perfect, through a couple of overflow pages (compared
to a b-tree which always will need to jump through a few pages in the index,
admittedly that may only be an improvement of microsecs versus millisecs in
lookup time). The b-tree 2ndary would then be for things like customer name
(which are often duplicated, potentual for lots of people called Mr
smith) in case one needs to perform searches on a customer name (who say,
forgot their customer id).

 It wouldn't strictly require a hash-table either (after further
 thought); perhaps just the first octet of the key could be permuted
 though an 8-4 (15-bit) or 8-3 (7-bit) function. If the keys are
 otherwise random this should be just as well and you wouldn't lose the
 ability to use substrings...


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [sqlite] Performance problem

2003-11-04 Thread D. Richard Hipp
Avner Levy wrote:
We have just finished testing the same scenario with MySql at amazingly 
they continued to insert 1500-3000 rows per second even when the 
database had 60,000,000 records. I don't know how this magic is done...
Nor do I.  If anybody can clue me in, I would appreciate it.  I suspect
the secret must be in the file format for their indices.  Does anybody
have any idea what that format is?  Or how indices work in MySQL?
FWIW: Big changes are looming on the horizon for SQLite.  The format
of indices will probably have to change at some point in the future
in order to accomodate BLOBs.  This will result in an an incompatible
file format change (version 3.0.0).  The change is not imminent
(think summer of 2004) nor inevitable.  But if other changes could
be made at the same time to support faster inserts of non-localized
data, the transition from 2.x.x to 3.x.x would be a good time to toss
them in.
--
D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: [sqlite] Performance problem

2003-11-04 Thread Allan Edwards
H, the name of the database IS sql LITE.  My guess is that if
it was mean to be the monstrous scalable solution, it would have been
architected this way in the first place.  

One thing I find interesting about Sqlite is that I tested in this past week
up to 360,000 records.  I had no performance degradations inserting, or
querying, updating, or deleting. I am very pleased.  In fact, for the
solution I have used it for, I have tossed SQL server on the enterprise
distributed  version.  It scales plenty well on just everyday hardware now.

The funny thing is that for its simplicity it is probably the best database
there is for it's current area of use.  I think if you want to scale it, go
buy a mainframe! : - )

If you want to go in and modify it to be scalable, you ought to put in the
time to do this requested stuff, then sell it.  You have an excellent base
of code to start from.  

Just some thoughts.

Allan 

-Original Message-
From: Avner Levy [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 04, 2003 3:56 AM
To: D. Richard Hipp
Cc: [EMAIL PROTECTED]
Subject: Re: [sqlite] Performance problem

Hi,
We have just finished testing the same scenario with MySql at amazingly they
continued to insert 1500-3000 rows per second even when the database had
60,000,000 records.
I don't know how this magic is done, but that what the tests showed (with
the same machine / OS).
There is one thing I don't understand. Sqlite works in optimistic mode,
which means the changes are written to the database and the original data is
backup.
In this case I would expect that the commit will be fast as deleting a
file. But the test shows that the commit can take up to 5 minutes when the
database is large.
Does any body has an explanation ?

I saw the comments about using OS feature to solve the problem. 
Personally if we will choose Sqlite, we need it for lots of platforms and
for product installation. Demanding us as users of Sqlite to start
configuring  journaling on the customer's computers is not practical.

Personally I'm a big fan of Sqlite (great tool), and I really hope that this
issue can be solved (so we can choose Sqlite).
I understand that solving this issue demands big changes, but I believe this
will take Sqlite to the big players league.

Thanks
Avner

D. Richard Hipp wrote:

 Avner Levy wrote:

 Hi,
 I've written the following program to test the sqlite performance for 
 a specific scenario I have.
 I've used most tricks I've found but still when the database gets big 
 the performance gets unacceptable.
 The performance problem happens of course only if the indexes are 
 defined.
 Since I need the indexes, is there any other option that can make it 
 faster ?
 When the database gets big the insert rate gets to 50 rows  per second.
 Thanks in advance.

 * It seems that the larger the DB gets, the journal copies more and 
 more pages out of the database into the file, which kills performance.


 Each transaction in your test involves 3 INSERTs into a single 
 table defined (roughly) as follows:

 CREATE TABLE test(a,b);
 CREATE INDEX idx1 ON test(a);
 CREATE INDEX idx2 ON test(b);

 In these INSERTs, the values for test.a are non-decreasing.  That 
 means that new entries into the test table and into the idx1 index 
 always go into the same place in their B*Trees - specifically at the end.
 But the values for test.b are random, which means that entries into
 idx2 are distributed throughout its B*Tree.  Because changes to test 
 and idx1 are localized, only a handful of disk blocks are changed 
 (many others are added, but few existing ones are changed) but the 
 changes to idx2 are not localized, which means that nearly every disk 
 block associated with idx2 must be changed.  Modifying the (thousands) 
 of disk blocks associated with idx2 is what is taking so long.  I do 
 not know anything I can do inside SQLite to make it go any faster.  I 
 do not know of any other way of creating an index that could work 
 around this issue.

 If you disable idx2, you will see that the inserts go much faster.  I 
 also observe that the journal file is about 1/3rd the size of the 
 database, which is consistent with the observation that every disk 
 block associated with idx2 needs to be changed.  My TCL test script 
 for this situation is appended to this message.

 If you really are doing millions of INSERTs prior to doing any 
 SELECTs, you might consider deferring the creation of idx2 until after 
 all the data has been put into the database.  Generally speaking, it 
 is a little faster to create an index for existing data than it is to 
 create the index at the same time that the data is being inserted.  
 But the difference is not that great.  And, of course, this won't help 
 if in reality you need to do some SELECTs along the way...

 I'm curious to know how other database engines deal with this problem.
 Have you tried a similar experiment on MySQL, or PostgreSQL, or even 
 Oracle

Re: [sqlite] Performance problem

2003-11-04 Thread D. Richard Hipp
Allan Edwards wrote:

 I have YET to see a database, small to massively scalable that could handle
 BLOBS worth anything. ...  I prefer the simplicity talk given early.  If
 someone wants blobs, do it the old fashioned way!

Your concerns are understood, for BLOBs that are truely large.  But what
about short pieces of binary data.  (Do you call them BOBs?)  For example,
what about that 30-byte unicode string that just happens to contain embedded
NULs.  Or a 75-byte GIF image.  I'd like to be able to put those kinds of
things directly into a column of an SQLite table.  Even an indexed column.
Unfortunately, a file format change is going to be required to achieve
that goal, I think.
--
D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re[2]: [sqlite] Performance problem

2003-11-04 Thread Doug Currie
 Your suggestion was to reconstruct the index from original
 table data on a rollback.  For a large transaction that touches
 most pages of the index, this can give (at best) a 2:1 speedup.
 But the other side, the potential slowdown is extreme.

Yeah, there is that drawback. Other DBMSs avoid this problem by some
(decidedly NOT simple) mechanisms such as keeping before images in
main memory, keeping semantic records of changes in main memory,
keeping page serial numbers (to aid recovery), etc.

A decision can be made dynamically to stop recording before-images
(say, because some threshold percentage of the index's pages are being
changed, or because your main memory buffer is full), and transition
to my suggested approach.

Presently if P% of the pages are modified, 2P% pages are written for a
commit, and 3P% for a rollback. With the dynamic scheme, these change
to
  smaller of 2P or P + X
  smaller of 3P or P + X + 100
percent, respectively, where X is the threshold.

If the threshold was 25% then you'd never write more than 125% of the
size of the index for a committed transaction, and never write more
than 225% for a rolled-back transaction. Note that the present scheme
writes up to 200% for a committed and 300% for a rolled-back
transaction. Here is a table comparing the approaches for various
percentage of index pages modified. The left column is the percentage
of index pages modified, the right columns are the percentage of index
pages written for each of the approaches and situations (present c, r
and dynamic c' and r' for committed and rolled-back).

  25% 50%   25% 50%
  %c   c'  c'r   r'  r'
 10   20  20  2030  30  30
 20   40  40  4060  60  60
 30   60  55  6090 155  90
 40   80  65  80   120 165 120
 50  100  75 100   150 175 200
 60  120  85 110   180 185 210
 70  140  95 120   210 195 220
 80  160 105 130   240 205 230
 90  180 115 140   270 215 240
100  200 125 150   300 225 250

Note that the dynamic approach is always better or equivalent for
committed transactions, and only worse for rolled-back transactions
when the percentage of pages modified is between X and (X+1)/2 where X
is the threshold. Finding an optimal value for X is application
dependent, unfortunately.

Another approach...

One database I implemented used newly allocated pages for all updates
to indexes, relying on the original versions to remain intact so
rollback was free (not counting the reclaiming of the free space).
Commit was implemented by writing the pointer to the root of the index
BTree to point to the new pages. The advantage of this method is
journaling for free. The drawback to this method is that even small
changes require writing the BTree along an entire path from leaf to
root so a single disk write can commit the index changes, and a free
space recovery mechanism. Overall it was very reliable and effective.

e


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [sqlite] Performance problem

2003-11-04 Thread D. Richard Hipp
[EMAIL PROTECTED] wrote:
DRH: Will the changes to indicies allow us to define arbitrary collation 
functions? If so, will those indicies be used when a query is done that 
could use the arbitrary collation function?

Likely so.  But no promises yet.

--
D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Fw: [sqlite] Performance problem

2003-11-04 Thread Greg Obleshchuk
Sorry Richard I meant to send this to the group


Hello,
Last week I raised an issue about case sensitivity in where clauses. In doing a little 
research I happened to talk to an Oracle DBA here where I work and asked him the 
question of how Oracle handled case sensitivity .  He explained it is handle in the 
same way and suggested the same fix drop both to lower case and then compare.  The 
next thing is mentions is really what I want to raise, he said the latest release of 
Oracle allows you to build indexes which include functions.  He also said that these 
new type of indexes were used when the user issued a select statement with a function 
in the where clause or a like (i.e select Col1, Col2 from tableName where col1 like 
'Gr%' or select * from tablename where lower(col1) ='fred'  ).  Without knowing all 
the ins and outs of it, this seems like a great feature to have.

Any change of having someone look into the possibility of implementing some thing like 
that?

regards
Greg 




- Original Message - 
  From: D. Richard Hipp 
  To: [EMAIL PROTECTED] 
  Sent: Wednesday, November 05, 2003 1:45 PM
  Subject: Re: [sqlite] Performance problem


  [EMAIL PROTECTED] wrote:
   
   DRH: Will the changes to indicies allow us to define arbitrary collation 
   functions? If so, will those indicies be used when a query is done that 
   could use the arbitrary collation function?
   

  Likely so.  But no promises yet.

  -- 
  D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565


  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]



Re: Re[2]: [sqlite] Performance problem

2003-11-03 Thread ben . carlyle
- Forwarded by Ben Carlyle/AU/IRSA/Rail on 04/11/2003 02:26 PM -


Ben Carlyle
04/11/2003 02:01 PM


To: Doug Currie [EMAIL PROTECTED]
cc: 
Subject:Re: Re[2]: [sqlite] Performance problem






Doug Currie [EMAIL PROTECTED]
03/11/2003 05:39 AM
Please respond to Doug Currie

 
To: [EMAIL PROTECTED]
cc: 
Subject:Re[2]: [sqlite] Performance problem


  Can any readers suggest ways that I have not thought of for making
  large numbers of non-localized INSERTs go faster?

 Instead of journaling the old (undo) pages of idx2, simply record
 the fact that idx2 is modified (i.e., make a single record in the
 journal). Upon recovery, noticing that idx2 is modified will cause it
 to be recreated from the rolled-back data.

I've been thinking about his for a while and I think the solution, as with 
many other potential sqlite performance issues is to solve it in the 
operating system, not sqlite it's self. If you don't have enough bandwidth 
to your file, raid it. If you want to be able to write a large number of 
transactions per second using journalling that lazily writes to the 
database enable data-journalling on your filesystem.

This is a couter-intuitve approach. When you apply data journalling on a 
filesystem such as linux's ext3 the operating system writes all data to a 
journal before writing it to the appropriate locations on disk. You may 
think that because its being written twice (four times in the case of 
sqlite, because the journal will be written twice and the file will be 
written twice) it will take longer to work. When you use fsync() alot the 
opposite is actually true. You get the behaviour that other list members 
have been discussing with the combined undo/redo log. The data updates are 
written sequentially by the operating system to disk so seek-time doesn't 
come into it, and the fsync() simply ensures that your file has been 
written to the sequential log. The operating system it's self deals with 
the lazy writing to the actual disk, and it's transparent to sqlite 
because the operating system cache makes it appear that its already been 
written.

This only works when you use all-data journalling, not just meta-data like 
directory entries, and regular writes to the same filesystem might be a 
little slower... but try it. You should get better performance with 
data-journalling and sqlite.

As this is a problem that can be solved very nicely indeed in the 
operating system I think that sqlite should make no special provisions for 
it. I do suggest that the relevant FAQs should contain the instruction to 
use a data-journalled fs if a large number of transactions must be pushed 
through sqlite.

Benjamin.