Re: [sqlite] SQlite query performs 10 times slower than MS Access query

2010-01-29 Thread Max Vlasov
>
>
> The sqlite3_stmt_status() interface was designed for this purpose.
> http://www.sqlite.org/c3ref/stmt_status.html
>

Thanks for the info, I hope it will be extended also with other counters

I did some test with this Rowid/Id trick. A larger base, 22mb, 100,000
records (220 bytes per record average) and the difference for "SELECT rowid
.." query was more noticable (20 mb read without Id index, 1mb with it).

But I also noticed that sometimes sqlite doesn't allow to use explicit index
with ORDER BY clause considering them (primary and explicitely created)
equivalent. I think it's not a bug, but in terms of performance maybe it
would be more appropriate to obey ORDER BY clause in this case.

The table is
CREATE TABLE [MainData] ([Id] INTEGER PRIMARY KEY AUTOINCREMENT,...
The index is
CREATE INDEX [idx_ID] ON [MainData] ([ID] )

Different examples (the second line is EXPLAIN QUERY PLAN result):

SELECT rowid FROM MainData
 TABLE MainData
22 Mb read (expected, Primary index used)

SELECT rowid FROM MainData ORDER BY Id
 TABLE MainData WITH INDEX idx_ID ORDER BY
1,2 Mb read (expected, idx_ID used)

SELECT rowid FROM MainData Where Id > 1 AND Id < 30
 TABLE MainData USING PRIMARY KEY
22 Mb read (expected, Primary index used)

SELECT rowid FROM MainData Where Id > 1 AND Id < 30 ORDER BY Id
 TABLE MainData USING PRIMARY KEY ORDER BY
22 Mb read (not fully expected, ORDER BY points to Id field, but its
actually an alias)

SELECT rowid FROM MainData INDEXED BY idx_ID Where Id > 1 AND Id < 30
ORDER BY Id
 TABLE MainData WITH INDEX idx_ID ORDER BY
1,2 Mb read (expected, but INDEXED BY is not recommended for query tuning)

So when the query contains WHERE clause only when INDEXED BY was added this
explicit index was used.

As for use usefullness of this trick, I think it really can be useful when
the application wants to load one "virtual" long list of data actually
loading only rowids and optionally query full data record for visible rows.
I think in this case 22 MB disk access vs 1 MB disk access makes difference

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite query performs 10 times slower than MS Access query

2010-01-29 Thread D. Richard Hipp

On Jan 7, 2010, at 8:18 AM, Tim Romano wrote:

> I don't claim any SQLite expertise, but indexes on values like
> True/False, Male/Female -- i.e. where there are only a couple or a few
> different values possible in the column --   are normally to be  
> avoided
> because low-cardinality indexes tend to be inefficient.  What is the
> advice of the SQLite gurus in this respect?


This is good advice.

If you run ANALYZE, SQLite will figure out that the index is mostly  
useless and will hardly ever use it.  (There are some obscure queries  
for which such an index would be useful, but they are the exception  
rather than the rule.)  On the other hand, SQLite still has to  
maintain the index on INSERT, UPDATE, and DELETE operations which will  
slow those operations down.

D. Richard Hipp
d...@hwaci.com



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite query performs 10 times slower than MS Access query

2010-01-29 Thread D. Richard Hipp

On Jan 29, 2010, at 6:15 AM, Max Vlasov wrote:
>  To be sure that there's no
> full or almost-full table scan I use VFS monitoring in such cases.

The sqlite3_stmt_status() interface was designed for this purpose.  
http://www.sqlite.org/c3ref/stmt_status.html

D. Richard Hipp
d...@hwaci.com



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite query performs 10 times slower than MS Access query

2010-01-29 Thread Max Vlasov
On Fri, Jan 29, 2010 at 6:02 AM, Doyel5  wrote:

> I deleted all indexes, kept only the indexes on the temporary tables,
> namely
> - tempSimArgs and tempPartArgs, and seemingly my original query's running
> time decreased to around 1 sec (when I ran the query the second time). The
> execution time of the first time I run my query varies wildly, I dunno why.
> Any ideas?
>


You mentioned 800 MB as the size of your base. I think that although indexes
effectively used, the actual data can be spread across the size of your base
so first query involves many hard disk seek operations and disk cache doing
unnecessary job (for example sqlite needs only 1024 bytes at a particular
offset but the OS reads much more than that). To be sure that there's no
full or almost-full table scan I use VFS monitoring in such cases. If you
install your functions as your own file system but forward all calls to
original VFS, you can calculate real data flow for your query.

As for Access and MS db engines like Jet, we only can guess how it works,
maybe when you first open your db, the engine starts some caching
immediately, we don't know, it's closed software.

Also If you want your first join or select operations for a large base to be
faster as for general rule I'd suggest to minimize data record size for any
table that involved in join or select operations or make the page size
lower.

I did some tests to explain why I think it would help. My explanation can be
questionable so please correct me if I'm wrong. Sqlite bases works with
pages that for example can have size 1024 bytes, all data packed in pages
and operate by pages. As long as I know the only operation involving fewer
sizes is header operations. Also B-tree cells always come with the data so
they're not separated inside the db file. The only complex thing is overflow
pages when one single record can not fit inside one page. The consequence of
such format is that if the data record is fewer than 1024 bytes, visiting a
record is always full record read. Just made a quick test with VFS
monitoring:

CREATE TABLE [TestTable] ([Id] INTEGER PRIMARY KEY AUTOINCREMENT,[Text]
TEXT)
100,000 records
INSERT INTO TestTable (Text) VALUES ("1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ")

db size: 4,671,488

SELECT * FROM TestTable
VFS read flow: 4,669,456
SELECT rowid FROM TestTable
VFS read flow: 4,669,456

But if insert records with large texts (I chose intentionally 2 kb for
overflow pages to be present), SELECT rowid made twice as low page reading
comparing to SELECT *.

By the way, just discovered a strange trick that may help in similar or
other cases.
Creating Index for this table on [Id] looks like terrible idea wasting the
space (since rowid already ordered), but actually after creating the index

CREATE INDEX [idx_ID] ON [TestTable] ([ID] )

changing SELECT rowid query to

SELECT rowid FROM TestTable ORDER BY Id

led to only 1,312,000 data flow (comparing to 4,669,456) and it really makes
sence because:
- indexes in sqlite contains the data and rowid so it doesn't need the main
table to return the results, - comparing to the rowid B-tree that contains
the full datarecords, this index B-tree more effectively packed so need to
visit fewer pages. I don't know what one should do to apply this trick in
complex queries, but I hope it is possible.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite query performs 10 times slower than MS Access query

2010-01-28 Thread Jean-Christophe Deschamps

>The execution time of the first time I run my query varies wildly, I 
>dunno why.

Probably due to an empty or dirty cache.  That's fairly common with 
cache inclined applications, subsystems or OSes.  The second time, most 
of what's needed is found in cache with much less variability.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite query performs 10 times slower than MS Access query

2010-01-28 Thread Doyel5

Hi Everybody,

Thanks a lot for the suggestions. I really appreciate your help. :-)

Just wanted to share my results..

I deleted all indexes, kept only the indexes on the temporary tables, namely
- tempSimArgs and tempPartArgs, and seemingly my original query's running
time decreased to around 1 sec (when I ran the query the second time). The
execution time of the first time I run my query varies wildly, I dunno why.
Any ideas?

Thanks in advance!

Regards,
Saswati


Igor Tandetnik wrote:
> 
> Doyel5  wrote:
>> I have a 800MB MS Access database that I migrated to SQLite. The
>> structure of the database is as follows (the SQLite database, after
>> migration, is around 330MB):
>> 
>> The table ‘Occurrence’ has  1,600,000 records. The table looks like:
>> CREATE TABLE Occurrence
>> (
>> SimulationID  INTEGER,SimRunID   INTEGER,OccurrenceID  
>> INTEGER, OccurrenceTypeIDINTEGER,PeriodINTEGER,   
>> HasSucceededBOOL, PRIMARY KEY (SimulationID,  SimRunID,  
>> OccurrenceID) )
>> 
>> It has the following indexes:
>> CREATE INDEX "Occurrence_HasSucceeded_idx" ON "Occurrence"
>> ("HasSucceeded" ASC)
> 
> Don't create indexes on fields with low selectivity (fields that only have
> a small number of distinct values, such as boolean fields). A query like
> "select * from Occurrence where HasSucceeded" would likely run faster
> doing a full scan than using this index.
> 
>> CREATE INDEX "Occurrence_OccurrenceID_idx" ON "Occurrence"
>> ("OccurrenceID" ASC)
>> 
>> CREATE INDEX "Occurrence_SimRunID_idx" ON "Occurrence" ("SimRunID"
>> ASC) 
>> 
>> CREATE INDEX "Occurrence_SimulationID_idx" ON "Occurrence"
>> ("SimulationID" ASC)
> 
> This index is redundant. Your primary key can be used for searches on
> SimulationID.
> 
> 
>> The table ‘InitialParticipant’ has 130 records. The structure of the
>> table is
>> 
>> CREATE TABLE InitialParticipant
>> (
>> ParticipantIDINTEGER  PRIMARY KEY, ParticipantTypeID   
>> INTEGER, ParticipantGroupID INTEGER
>> )
>> 
>> The table has the following indexes:
>> CREATE INDEX "initialpart_participantTypeID_idx" ON
>> "InitialParticipant" ("ParticipantGroupID" ASC)
>> 
>> CREATE INDEX "initialpart_ParticipantID_idx" ON "InitialParticipant"
>> ("ParticipantID" ASC)
> 
> Similarly, this index is redundant. The fact that ParticipantID is a
> primary key means it's automatically indexed.
> 
>> The table ‘ParticipantGroup’ has 22 records. It looks like
>> CREATE TABLE ParticipantGroup
>> (
>> ParticipantGroupIDINTEGER,ParticipantGroupTypeID INTEGER,
>> Descriptionvarchar (50),  PRIMARY KEY(  ParticipantGroupID  )
>> )
>> 
>> The table has the following index:
>> CREATE INDEX "ParticipantGroup_ParticipantGroupID_idx" ON
>> "ParticipantGroup" ("ParticipantGroupID" ASC)
> 
> Again, this index is redundant.
> 
>> The query in Access is (the entire query takes 1 second to execute):
>> SELECT ParticipantGroup.Description, Occurrence.SimulationID,
>> Occurrence.SimRunID, Occurrence.Period,
>> Count(OccurrenceParticipant.ParticipantID) AS CountOfParticipantID
>> FROM (
>> ParticipantGroup INNER JOIN InitialParticipant ON
>> ParticipantGroup.ParticipantGroupID =
>> InitialParticipant.ParticipantGroupID ) INNER JOIN
>> (
>>tmpPartArgs INNER JOIN
>>   (
>>   (
>>   tmpSimArgs INNER JOIN Occurrence ON
>> (tmpSimArgs.SimRunID = Occurrence.SimRunID) AND
>>   (tmpSimArgs.SimulationID = Occurrence.SimulationID) ) INNER
>> JOIN OccurrenceParticipant ON (Occurrence.OccurrenceID =
>> OccurrenceParticipant.OccurrenceID) AND (Occurrence.SimRunID =
>> OccurrenceParticipant.SimRunID) AND (Occurrence.SimulationID =
>>  OccurrenceParticipant.SimulationID) ) ON
>> tmpPartArgs.participantID = OccurrenceParticipant.ParticipantID ) ON
>> InitialParticipant.ParticipantID =
>> OccurrenceParticipant.ParticipantID WHERE
>> (((OccurrenceParticipant.RoleTypeID)=52 Or
>> (OccurrenceParticipant.RoleTypeID)=49)) AND Occurrence.HasSucceeded =
>> True GROUP BY ParticipantGroup.Description, Occurrence.SimulationID,
>> Occurrence.SimRunID, Occurrence.Period;  
> 
> I recommend you drop all those parentheses around joins - they only
> confuse matters. Just write
> 
> SELECT ParticipantGroup.Description, Occurrence.SimulationID,
> Occurrence.SimRunID, Occurrence.Period,
> Count(OccurrenceParticipant.ParticipantID) AS CountOfParticipantID
> FROM
> ParticipantGroup INNER JOIN InitialParticipant ON
> (ParticipantGroup.ParticipantGroupID =
> InitialParticipant.ParticipantGroupID)
> INNER JOIN OccurrenceParticipant ON
> (InitialParticipant.ParticipantID =
> OccurrenceParticipant.ParticipantID)
> INNER JOIN tmpPartArgs ON
> (tmpPartArgs.participantID = OccurrenceParticipant.ParticipantID)
> INNER JOIN Occurrence ON
> (Occurrence.OccurrenceID = OccurrenceParticipant.OccurrenceID) AND
> (Occurrence.SimRunID = OccurrenceParticipant.SimRunID) AND
> 

Re: [sqlite] SQlite query performs 10 times slower than MS Access query

2010-01-07 Thread Todd F. Richmond
If you test the last query in this email, be careful to test with your original 
" where x or y" clause along with the "in" version. There is a serious problem 
where "in" sometimes confuses the search in even simple queries and sqlite ends 
up doing a table scan instead of an obvious index. I sent detailed email about 
this a couple months ago but never got any traction other than one other 
person's confirmation that it happened for them as well. For my case changing 
the statement made my query time improvement from nearly 100ms to < 2ms 

Todd 

- Original Message - 
From: "Igor Tandetnik" <itandet...@mvps.org> 
To: sqlite-users@sqlite.org 
Sent: Wednesday, January 6, 2010 3:02:23 PM 
Subject: Re: [sqlite] SQlite query performs 10 times slower than MS Access 
query 

Doyel5 <doyel...@gmail.com> wrote: 
> I have a 800MB MS Access database that I migrated to SQLite. The 
> structure of the database is as follows (the SQLite database, after 
> migration, is around 330MB): 
> 
> The table ‘Occurrence’ has 1,600,000 records. The table looks like: 
> CREATE TABLE Occurrence 
> ( 
> SimulationID INTEGER, SimRunID INTEGER, OccurrenceID 
> INTEGER, OccurrenceTypeID INTEGER, Period INTEGER, 
> HasSucceeded BOOL, PRIMARY KEY (SimulationID, SimRunID, 
> OccurrenceID) ) 
> 
> It has the following indexes: 
> CREATE INDEX "Occurrence_HasSucceeded_idx" ON "Occurrence" 
> ("HasSucceeded" ASC) 

Don't create indexes on fields with low selectivity (fields that only have a 
small number of distinct values, such as boolean fields). A query like "select 
* from Occurrence where HasSucceeded" would likely run faster doing a full scan 
than using this index. 

> CREATE INDEX "Occurrence_OccurrenceID_idx" ON "Occurrence" 
> ("OccurrenceID" ASC) 
> 
> CREATE INDEX "Occurrence_SimRunID_idx" ON "Occurrence" ("SimRunID" 
> ASC) 
> 
> CREATE INDEX "Occurrence_SimulationID_idx" ON "Occurrence" 
> ("SimulationID" ASC) 

This index is redundant. Your primary key can be used for searches on 
SimulationID. 


> The table ‘InitialParticipant’ has 130 records. The structure of the 
> table is 
> 
> CREATE TABLE InitialParticipant 
> ( 
> ParticipantID INTEGER PRIMARY KEY, ParticipantTypeID 
> INTEGER, ParticipantGroupID INTEGER 
> ) 
> 
> The table has the following indexes: 
> CREATE INDEX "initialpart_participantTypeID_idx" ON 
> "InitialParticipant" ("ParticipantGroupID" ASC) 
> 
> CREATE INDEX "initialpart_ParticipantID_idx" ON "InitialParticipant" 
> ("ParticipantID" ASC) 

Similarly, this index is redundant. The fact that ParticipantID is a primary 
key means it's automatically indexed. 

> The table ‘ParticipantGroup’ has 22 records. It looks like 
> CREATE TABLE ParticipantGroup 
> ( 
> ParticipantGroupID INTEGER, ParticipantGroupTypeID INTEGER, 
> Description varchar (50), PRIMARY KEY( ParticipantGroupID ) 
> ) 
> 
> The table has the following index: 
> CREATE INDEX "ParticipantGroup_ParticipantGroupID_idx" ON 
> "ParticipantGroup" ("ParticipantGroupID" ASC) 

Again, this index is redundant. 

> The query in Access is (the entire query takes 1 second to execute): 
> SELECT ParticipantGroup.Description, Occurrence.SimulationID, 
> Occurrence.SimRunID, Occurrence.Period, 
> Count(OccurrenceParticipant.ParticipantID) AS CountOfParticipantID 
> FROM ( 
> ParticipantGroup INNER JOIN InitialParticipant ON 
> ParticipantGroup.ParticipantGroupID = 
> InitialParticipant.ParticipantGroupID ) INNER JOIN 
> ( 
> tmpPartArgs INNER JOIN 
> ( 
> ( 
> tmpSimArgs INNER JOIN Occurrence ON 
> (tmpSimArgs.SimRunID = Occurrence.SimRunID) AND 
> (tmpSimArgs.SimulationID = Occurrence.SimulationID) ) INNER 
> JOIN OccurrenceParticipant ON (Occurrence.OccurrenceID = 
> OccurrenceParticipant.OccurrenceID) AND (Occurrence.SimRunID = 
> OccurrenceParticipant.SimRunID) AND (Occurrence.SimulationID = 
> OccurrenceParticipant.SimulationID) ) ON 
> tmpPartArgs.participantID = OccurrenceParticipant.ParticipantID ) ON 
> InitialParticipant.ParticipantID = 
> OccurrenceParticipant.ParticipantID WHERE 
> (((OccurrenceParticipant.RoleTypeID)=52 Or 
> (OccurrenceParticipant.RoleTypeID)=49)) AND Occurrence.HasSucceeded = 
> True GROUP BY ParticipantGroup.Description, Occurrence.SimulationID, 
> Occurrence.SimRunID, Occurrence.Period; 

I recommend you drop all those parentheses around joins - they only confuse 
matters. Just write 

SELECT ParticipantGroup.Description, Occurrence.SimulationID, 
Occurrence.SimRunID, Occurrence.Period, 
Count(OccurrenceParticipan

Re: [sqlite] SQlite query performs 10 times slower than MS Access query

2010-01-07 Thread Jay A. Kreibich
On Thu, Jan 07, 2010 at 08:18:56AM -0500, Tim Romano scratched on the wall:
> I don't claim any SQLite expertise, but indexes on values like 
> True/False, Male/Female -- i.e. where there are only a couple or a few 
> different values possible in the column --   are normally to be avoided 
> because low-cardinality indexes tend to be inefficient.  What is the 
> advice of the SQLite gurus in this respect?

  "Yes, but..."

  In general, yes.  If a condition and index can't cut a table down to
  about 15% or less, indexes are usually not worth it... a full table scan
  will be faster.  This is because an index will typically provide a
  ROWID, which then needs to be looked up in the main table.  This
  makes each index look-up at least twice as expensive, and typically
  much more due to increased I/O, scattered I/O, and cache thrash.

  However, in the case of multi-column indexes, if SQLite can extract
  the column data it needs from the index itself the second table look-up
  will be skipped.  This means there are times when multi-column indexes
  and boost performance, even if the later columns aren't used as part
  of the index look-up.  There is a insert/update/delete cost associated
  with that, however.

  Many server-based database systems keep statistics on their indexes.
  For example, the number of values vs unique values to measure the
  selectivity of the index.  This meta-data is typically built during
  idle times.  SQLite doesn't have "idle times", but you can manually
  build basic statistics using ANALYZE.  If you built the library with
  the extended statistics, ANALYZE will also build a histogram to
  compute the selectivity of ranges.  I have no idea how advanced the
  query optimizer is, or how much this data is used, but it is one more
  thing to try.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite query performs 10 times slower than MS Access query

2010-01-07 Thread Simon Slavin

On 7 Jan 2010, at 1:18pm, Tim Romano wrote:

> I don't claim any SQLite expertise, but indexes on values like 
> True/False, Male/Female -- i.e. where there are only a couple or a few 
> different values possible in the column --   are normally to be avoided 
> because low-cardinality indexes tend to be inefficient.  What is the 
> advice of the SQLite gurus in this respect?

It's complicated because more factors are involved than that.  For instance, if 
you have a database where INSERT is used thousands of times a day but you only 
SELECT records once a week, all indexes will be inefficient: you want INSERT to 
work as quickly as possible and it doesn't matter as much if your SELECT works 
slowly.  For a situation like this you often make no indexes besides the 
primary one.  This lets each INSERT operation finish faster.

However, it is true that if data in a column is 'chunky' then an index on that 
column is less useful.  Some database systems like MySQL use chunkiness as part 
of a profiling system to work out what the most efficient indexes will be.  
Certainly if you need two columns in a WHERE clause, you tend to put the 
chunkier one second when you're making the index the SELECT will use.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite query performs 10 times slower than MS Access query

2010-01-07 Thread Tim Romano
I don't claim any SQLite expertise, but indexes on values like 
True/False, Male/Female -- i.e. where there are only a couple or a few 
different values possible in the column --   are normally to be avoided 
because low-cardinality indexes tend to be inefficient.  What is the 
advice of the SQLite gurus in this respect?

Regards
Tim Romano

P.S. I've noticed a striking performance boost in SQLite when joined 
inline views (as shown below) are used instead of standard table joins.  
I haven't delved into it but it could be that because the transient 
table instantiated by the inline select contains only a few rows and 
SQLite uses this set of rows as the "inner loop", this turns out be a 
better approach for the query than the indexes that might have been 
chosen otherwise.

select a, b, c
from myTable T1
JOIN
(select x, y, z from myOtherTable where ...) as T2
on T1.a = T2.z


On 1/6/2010 5:28 PM, Doyel5 wrote:
> The table ‘Occurrence’ has  1,600,000 records. The table looks like:
> CREATE TABLE Occurrence
> (
> SimulationID  INTEGER,SimRunID   INTEGER,OccurrenceID   INTEGER,
> OccurrenceTypeIDINTEGER,PeriodINTEGER,HasSucceededBOOL,
> PRIMARY KEY (SimulationID,  SimRunID,   OccurrenceID)
> )
>
>
> CREATE INDEX "Occurrence_HasSucceeded_idx" ON "Occurrence" ("HasSucceeded"
> ASC)
>
>
> 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite query performs 10 times slower than MS Access query

2010-01-06 Thread Igor Tandetnik
Doyel5  wrote:
> I have a 800MB MS Access database that I migrated to SQLite. The
> structure of the database is as follows (the SQLite database, after
> migration, is around 330MB):
> 
> The table ‘Occurrence’ has  1,600,000 records. The table looks like:
> CREATE TABLE Occurrence
> (
> SimulationID  INTEGER,SimRunID   INTEGER,OccurrenceID  
> INTEGER, OccurrenceTypeIDINTEGER,PeriodINTEGER,   
> HasSucceededBOOL, PRIMARY KEY (SimulationID,  SimRunID,  
> OccurrenceID) )
> 
> It has the following indexes:
> CREATE INDEX "Occurrence_HasSucceeded_idx" ON "Occurrence"
> ("HasSucceeded" ASC)

Don't create indexes on fields with low selectivity (fields that only have a 
small number of distinct values, such as boolean fields). A query like "select 
* from Occurrence where HasSucceeded" would likely run faster doing a full scan 
than using this index.

> CREATE INDEX "Occurrence_OccurrenceID_idx" ON "Occurrence"
> ("OccurrenceID" ASC)
> 
> CREATE INDEX "Occurrence_SimRunID_idx" ON "Occurrence" ("SimRunID"
> ASC) 
> 
> CREATE INDEX "Occurrence_SimulationID_idx" ON "Occurrence"
> ("SimulationID" ASC)

This index is redundant. Your primary key can be used for searches on 
SimulationID.


> The table ‘InitialParticipant’ has 130 records. The structure of the
> table is
> 
> CREATE TABLE InitialParticipant
> (
> ParticipantIDINTEGER  PRIMARY KEY, ParticipantTypeID   
> INTEGER, ParticipantGroupID INTEGER
> )
> 
> The table has the following indexes:
> CREATE INDEX "initialpart_participantTypeID_idx" ON
> "InitialParticipant" ("ParticipantGroupID" ASC)
> 
> CREATE INDEX "initialpart_ParticipantID_idx" ON "InitialParticipant"
> ("ParticipantID" ASC)

Similarly, this index is redundant. The fact that ParticipantID is a primary 
key means it's automatically indexed.

> The table ‘ParticipantGroup’ has 22 records. It looks like
> CREATE TABLE ParticipantGroup
> (
> ParticipantGroupIDINTEGER,ParticipantGroupTypeID INTEGER,
> Descriptionvarchar (50),  PRIMARY KEY(  ParticipantGroupID  )
> )
> 
> The table has the following index:
> CREATE INDEX "ParticipantGroup_ParticipantGroupID_idx" ON
> "ParticipantGroup" ("ParticipantGroupID" ASC)

Again, this index is redundant.

> The query in Access is (the entire query takes 1 second to execute):
> SELECT ParticipantGroup.Description, Occurrence.SimulationID,
> Occurrence.SimRunID, Occurrence.Period,
> Count(OccurrenceParticipant.ParticipantID) AS CountOfParticipantID
> FROM (
> ParticipantGroup INNER JOIN InitialParticipant ON
> ParticipantGroup.ParticipantGroupID =
> InitialParticipant.ParticipantGroupID ) INNER JOIN
> (
>tmpPartArgs INNER JOIN
>   (
>   (
>   tmpSimArgs INNER JOIN Occurrence ON
> (tmpSimArgs.SimRunID = Occurrence.SimRunID) AND
>   (tmpSimArgs.SimulationID = Occurrence.SimulationID) ) INNER
> JOIN OccurrenceParticipant ON (Occurrence.OccurrenceID =
> OccurrenceParticipant.OccurrenceID) AND (Occurrence.SimRunID =
> OccurrenceParticipant.SimRunID) AND (Occurrence.SimulationID =
>  OccurrenceParticipant.SimulationID) ) ON
> tmpPartArgs.participantID = OccurrenceParticipant.ParticipantID ) ON
> InitialParticipant.ParticipantID =
> OccurrenceParticipant.ParticipantID WHERE
> (((OccurrenceParticipant.RoleTypeID)=52 Or
> (OccurrenceParticipant.RoleTypeID)=49)) AND Occurrence.HasSucceeded =
> True GROUP BY ParticipantGroup.Description, Occurrence.SimulationID,
> Occurrence.SimRunID, Occurrence.Period;  

I recommend you drop all those parentheses around joins - they only confuse 
matters. Just write

SELECT ParticipantGroup.Description, Occurrence.SimulationID,
Occurrence.SimRunID, Occurrence.Period,
Count(OccurrenceParticipant.ParticipantID) AS CountOfParticipantID
FROM
ParticipantGroup INNER JOIN InitialParticipant ON
(ParticipantGroup.ParticipantGroupID = 
InitialParticipant.ParticipantGroupID)
INNER JOIN OccurrenceParticipant ON
(InitialParticipant.ParticipantID = OccurrenceParticipant.ParticipantID)
INNER JOIN tmpPartArgs ON
(tmpPartArgs.participantID = OccurrenceParticipant.ParticipantID)
INNER JOIN Occurrence ON
(Occurrence.OccurrenceID = OccurrenceParticipant.OccurrenceID) AND
(Occurrence.SimRunID = OccurrenceParticipant.SimRunID) AND
(Occurrence.SimulationID = OccurrenceParticipant.SimulationID)
INNER JOIN tmpSimArgs ON
(tmpSimArgs.SimRunID = Occurrence.SimRunID) AND
(tmpSimArgs.SimulationID = Occurrence.SimulationID)
WHERE
OccurrenceParticipant.RoleTypeID IN (49, 52) AND Occurrence.HasSucceeded
GROUP BY
ParticipantGroup.Description, Occurrence.SimulationID, Occurrence.SimRunID, 
Occurrence.Period;


> The SQLite  query is as follows (this query takes around 10 seconds):

Why did you feel the need to rewrite it? This mess of nested selects makes 
SQLite's job much harder than it needs to be.

The original query should have worked out of the box 

Re: [sqlite] SQlite query performs 10 times slower than MS Access query

2010-01-06 Thread Emilio Platzer
Why you change the query?

Use de Access version or someone without joining a subquery.

Is preferible to join table to table (to more tables) than join table to 
subqueries.

Emilio

Doyel5 escribió:
> I have a 800MB MS Access database that I migrated to SQLite. The structure of
> the database is as follows (the SQLite database, after migration, is around
> 330MB):
>
> The table ‘Occurrence’ has  1,600,000 records. The table looks like:
> CREATE TABLE Occurrence 
> (
> SimulationID  INTEGER,SimRunID   INTEGER,OccurrenceID   INTEGER,
> OccurrenceTypeIDINTEGER,PeriodINTEGER,HasSucceededBOOL, 
> PRIMARY KEY (SimulationID,  SimRunID,   OccurrenceID)
> )
>
> It has the following indexes:
> CREATE INDEX "Occurrence_HasSucceeded_idx" ON "Occurrence" ("HasSucceeded"
> ASC)
>
> CREATE INDEX "Occurrence_OccurrenceID_idx" ON "Occurrence" ("OccurrenceID"
> ASC)
>
> CREATE INDEX "Occurrence_SimRunID_idx" ON "Occurrence" ("SimRunID" ASC)
>
> CREATE INDEX "Occurrence_SimulationID_idx" ON "Occurrence" ("SimulationID"
> ASC)
>
>
>
> The table ‘OccurrenceParticipant’ has 3,400,000 records.  The table looks
> like:
> CREATE TABLE OccurrenceParticipant 
> (
> SimulationIDINTEGER, SimRunIDINTEGER,OccurrenceID
> INTEGER,
> RoleTypeID INTEGER, ParticipantIDINTEGER
> )
>
> It has the following indexes:
> CREATE INDEX "OccurrenceParticipant_OccurrenceID_idx" ON
> "OccurrenceParticipant" ("OccurrenceID" ASC)
>
> CREATE INDEX "OccurrenceParticipant_ParticipantID_idx" ON
> "OccurrenceParticipant" ("ParticipantID" ASC)
>
> CREATE INDEX "OccurrenceParticipant_RoleType_idx" ON "OccurrenceParticipant"
> ("RoleTypeID" ASC)
>
> CREATE INDEX "OccurrenceParticipant_SimRunID_idx" ON "OccurrenceParticipant"
> ("SimRunID" ASC)
>
> CREATE INDEX "OccurrenceParticipant_SimulationID_idx" ON
> "OccurrenceParticipant" ("SimulationID" ASC)
> The table ‘InitialParticipant’ has 130 records. The structure of the table
> is 
>
>
> CREATE TABLE InitialParticipant 
> (
> ParticipantIDINTEGER  PRIMARY KEY, ParticipantTypeIDINTEGER,
> ParticipantGroupID INTEGER
> )
>
> The table has the following indexes:
> CREATE INDEX "initialpart_participantTypeID_idx" ON "InitialParticipant"
> ("ParticipantGroupID" ASC)
>
> CREATE INDEX "initialpart_ParticipantID_idx" ON "InitialParticipant"
> ("ParticipantID" ASC)
>
>
>
> The table ‘ParticipantGroup’ has 22 records. It looks like
> CREATE TABLE ParticipantGroup  
> (
> ParticipantGroupIDINTEGER,ParticipantGroupTypeID INTEGER,
> Descriptionvarchar (50),  PRIMARY KEY(  ParticipantGroupID  )
> )
>
> The table has the following index:
> CREATE INDEX "ParticipantGroup_ParticipantGroupID_idx" ON "ParticipantGroup"
> ("ParticipantGroupID" ASC)
>
>
>
> The table ‘tmpSimArgs’ has 18 records. It has the following structure:
> CREATE TABLE tmpSimArgs (SimulationID varchar, SimRunID int(10))
>
> And the following indexes:
> CREATE INDEX tmpSimArgs_SimRunID_idx ON tmpSimArgs(SimRunID ASC)
>
> CREATE INDEX tmpSimArgs_SimulationID_idx ON tmpSimArgs(SimulationID ASC)
>
>
>
> The table ‘tmpPartArgs’ has 80 records. It has the below structure:
> CREATE TABLE tmpPartArgs(participantID INT)
>
> And the below index:
> CREATE INDEX tmpPartArgs_participantID_idx ON tmpPartArgs(participantID ASC)
>
> I have a query that involves multiple INNER JOINs and the problem I am
> facing is the Access version of the query takes about a second whereas the
> SQLite version of the same query takes 10 seconds (about 10 times slow!) It
> is impossible for me to migrate back to Access and SQLite is my only option. 
>
> I am new to writing database queries so these queries might look stupid, so
> please advise on anything you see faulty or kid-dish.
>
> The query in Access is (the entire query takes 1 second to execute):
> SELECT ParticipantGroup.Description, Occurrence.SimulationID,
> Occurrence.SimRunID, Occurrence.Period,
> Count(OccurrenceParticipant.ParticipantID) AS CountOfParticipantID FROM 
> ( 
>  ParticipantGroup INNER JOIN InitialParticipant ON
> ParticipantGroup.ParticipantGroupID = InitialParticipant.ParticipantGroupID
> ) INNER JOIN 
> (
> tmpPartArgs INNER JOIN 
>(
>(
>tmpSimArgs INNER JOIN Occurrence ON (tmpSimArgs.SimRunID =
> Occurrence.SimRunID) AND (tmpSimArgs.SimulationID = Occurrence.SimulationID)
>) INNER JOIN OccurrenceParticipant ON (Occurrence.OccurrenceID =   
> OccurrenceParticipant.OccurrenceID) AND (Occurrence.SimRunID =
> OccurrenceParticipant.SimRunID) AND (Occurrence.SimulationID =
> OccurrenceParticipant.SimulationID)
>   ) ON tmpPartArgs.participantID = OccurrenceParticipant.ParticipantID
> ) ON InitialParticipant.ParticipantID = OccurrenceParticipant.ParticipantID
> WHERE (((OccurrenceParticipant.RoleTypeID)=52 Or
> (OccurrenceParticipant.RoleTypeID)=49)) AND Occurrence.HasSucceeded = True
> GROUP BY ParticipantGroup.Description, Occurrence.SimulationID,
> 

[sqlite] SQlite query performs 10 times slower than MS Access query

2010-01-06 Thread Doyel5

I have a 800MB MS Access database that I migrated to SQLite. The structure of
the database is as follows (the SQLite database, after migration, is around
330MB):

The table ‘Occurrence’ has  1,600,000 records. The table looks like:
CREATE TABLE Occurrence 
(
SimulationID  INTEGER,SimRunID   INTEGER,OccurrenceID   INTEGER,
OccurrenceTypeIDINTEGER,PeriodINTEGER,HasSucceededBOOL, 
PRIMARY KEY (SimulationID,  SimRunID,   OccurrenceID)
)

It has the following indexes:
CREATE INDEX "Occurrence_HasSucceeded_idx" ON "Occurrence" ("HasSucceeded"
ASC)

CREATE INDEX "Occurrence_OccurrenceID_idx" ON "Occurrence" ("OccurrenceID"
ASC)

CREATE INDEX "Occurrence_SimRunID_idx" ON "Occurrence" ("SimRunID" ASC)

CREATE INDEX "Occurrence_SimulationID_idx" ON "Occurrence" ("SimulationID"
ASC)



The table ‘OccurrenceParticipant’ has 3,400,000 records.  The table looks
like:
CREATE TABLE OccurrenceParticipant 
(
SimulationIDINTEGER, SimRunIDINTEGER,OccurrenceID
INTEGER,
RoleTypeID INTEGER, ParticipantIDINTEGER
)

It has the following indexes:
CREATE INDEX "OccurrenceParticipant_OccurrenceID_idx" ON
"OccurrenceParticipant" ("OccurrenceID" ASC)

CREATE INDEX "OccurrenceParticipant_ParticipantID_idx" ON
"OccurrenceParticipant" ("ParticipantID" ASC)

CREATE INDEX "OccurrenceParticipant_RoleType_idx" ON "OccurrenceParticipant"
("RoleTypeID" ASC)

CREATE INDEX "OccurrenceParticipant_SimRunID_idx" ON "OccurrenceParticipant"
("SimRunID" ASC)

CREATE INDEX "OccurrenceParticipant_SimulationID_idx" ON
"OccurrenceParticipant" ("SimulationID" ASC)
The table ‘InitialParticipant’ has 130 records. The structure of the table
is 


CREATE TABLE InitialParticipant 
(
ParticipantIDINTEGER  PRIMARY KEY, ParticipantTypeIDINTEGER,
ParticipantGroupID INTEGER
)

The table has the following indexes:
CREATE INDEX "initialpart_participantTypeID_idx" ON "InitialParticipant"
("ParticipantGroupID" ASC)

CREATE INDEX "initialpart_ParticipantID_idx" ON "InitialParticipant"
("ParticipantID" ASC)



The table ‘ParticipantGroup’ has 22 records. It looks like
CREATE TABLE ParticipantGroup  
(
ParticipantGroupIDINTEGER,ParticipantGroupTypeID INTEGER,
Descriptionvarchar (50),  PRIMARY KEY(  ParticipantGroupID  )
)

The table has the following index:
CREATE INDEX "ParticipantGroup_ParticipantGroupID_idx" ON "ParticipantGroup"
("ParticipantGroupID" ASC)



The table ‘tmpSimArgs’ has 18 records. It has the following structure:
CREATE TABLE tmpSimArgs (SimulationID varchar, SimRunID int(10))

And the following indexes:
CREATE INDEX tmpSimArgs_SimRunID_idx ON tmpSimArgs(SimRunID ASC)

CREATE INDEX tmpSimArgs_SimulationID_idx ON tmpSimArgs(SimulationID ASC)



The table ‘tmpPartArgs’ has 80 records. It has the below structure:
CREATE TABLE tmpPartArgs(participantID INT)

And the below index:
CREATE INDEX tmpPartArgs_participantID_idx ON tmpPartArgs(participantID ASC)

I have a query that involves multiple INNER JOINs and the problem I am
facing is the Access version of the query takes about a second whereas the
SQLite version of the same query takes 10 seconds (about 10 times slow!) It
is impossible for me to migrate back to Access and SQLite is my only option. 

I am new to writing database queries so these queries might look stupid, so
please advise on anything you see faulty or kid-dish.

The query in Access is (the entire query takes 1 second to execute):
SELECT ParticipantGroup.Description, Occurrence.SimulationID,
Occurrence.SimRunID, Occurrence.Period,
Count(OccurrenceParticipant.ParticipantID) AS CountOfParticipantID FROM 
( 
 ParticipantGroup INNER JOIN InitialParticipant ON
ParticipantGroup.ParticipantGroupID = InitialParticipant.ParticipantGroupID
) INNER JOIN 
(
tmpPartArgs INNER JOIN 
   (
   (
   tmpSimArgs INNER JOIN Occurrence ON (tmpSimArgs.SimRunID =
Occurrence.SimRunID) AND (tmpSimArgs.SimulationID = Occurrence.SimulationID)
   ) INNER JOIN OccurrenceParticipant ON (Occurrence.OccurrenceID =   
OccurrenceParticipant.OccurrenceID) AND (Occurrence.SimRunID =
OccurrenceParticipant.SimRunID) AND (Occurrence.SimulationID =
OccurrenceParticipant.SimulationID)
  ) ON tmpPartArgs.participantID = OccurrenceParticipant.ParticipantID
) ON InitialParticipant.ParticipantID = OccurrenceParticipant.ParticipantID
WHERE (((OccurrenceParticipant.RoleTypeID)=52 Or
(OccurrenceParticipant.RoleTypeID)=49)) AND Occurrence.HasSucceeded = True
GROUP BY ParticipantGroup.Description, Occurrence.SimulationID,
Occurrence.SimRunID, Occurrence.Period;


The SQLite  query is as follows (this query takes around 10 seconds):
SELECT ij1.Description, ij2.occSimulationID, ij2.occSimRunID, ij2.Period,
Count(ij2.occpParticipantID) AS CountOfParticipantID FROM 
(
 SELECT ip.ParticipantGroupID AS ipParticipantGroupID, ip.ParticipantID
AS ipParticipantID, ip.ParticipantTypeID, pg.ParticipantGroupID AS
pgParticipantGroupID, pg.ParticipantGroupTypeID,