Re: [sqlite] How to optimize a multi-condition query
On Mon, 18 Oct 2010 15:07:35 +0200, Hilmar Bergerwrote: > Hi, > >thanks to everybody that answered. I tried your suggestions but there >was no measurable improvement. Possibly this is the best what I can get >out of Sqlite. >However, I tried a similar query on a larger table using both SQlite and >Postgresql (same machine, same table structure + indices, same data). >Postrgresql took 25secs for this query while it took more than 1 hour >with SQlite. Obviously there is room for improvement. If I manage to >find out what makes the difference I will post it here. Did you already try to run ANALYZE; on the fully populated database? It gathers statistics about indexes and those are used by the optimizer to choose a query plan. It might not help in your case, but it doesn't hurt to try. >Thanks again and best regards, >Hilmar -- ( Kees Nuyt ) c[_] ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to optimize a multi-condition query
Hi, thanks to everybody that answered. I tried your suggestions but there was no measurable improvement. Possibly this is the best what I can get out of Sqlite. However, I tried a similar query on a larger table using both SQlite and Postgresql (same machine, same table structure + indices, same data). Postrgresql took 25secs for this query while it took more than 1 hour with SQlite. Obviously there is room for improvement. If I manage to find out what makes the difference I will post it here. Thanks again and best regards, Hilmar El 10/15/2010 4:44 PM, Igor Tandetnik escribió: > Hilmar Bergerwrote: >> Indexes has been created for all fields in A and B (e.g. create index >> name on A(chr, start, stop, strand)); >> >> The query is as follows: >> >> select * from >> a, b >> where a.chr = b.chr and a.strand = b.strand and a.start<= b.start >> and a.stop>= b.stop and b.start<= a.stop and a.start<= b.stop ; > Drop those indexes you created, change the query to > > select * from a, b > where a.chr = b.chr and a.strand = b.strand > and b.start between a.start and a.stop > and b.stop<= a.stop; > > and create a single index on b(chr, strand, start) or b(chr, strand, start, > stop), listing fields in this exact order (adding stop to the index may or > may not result in marginal performance improvement - experiment). ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to optimize a multi-condition query
Hilmar Bergerwrote: > I used EXPLAIN QUERY PLAN on the query and it looks like it does not use > any index on b at all, only if I use hardcoded conditions like b > 0. > > It appears that the real problem is that SQlite does not use indices for > both tables For your problem, there's no way to use indices on both tables. An index is only helpful if it allows one to look at only some, but not all, rows in the table. In your query, one has to look at all records in A and find suitable matching records in B, or else look at all records in B and find matching records in A. You have to look at all records in at least one table, there's no way around that. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to optimize a multi-condition query
Hilmar Bergerwrote: > Indexes has been created for all fields in A and B (e.g. create index > name on A(chr, start, stop, strand)); > > The query is as follows: > > select * from > a, b > where a.chr = b.chr and a.strand = b.strand and a.start <= b.start > and a.stop >= b.stop and b.start <= a.stop and a.start <= b.stop ; Drop those indexes you created, change the query to select * from a, b where a.chr = b.chr and a.strand = b.strand and b.start between a.start and a.stop and b.stop <= a.stop; and create a single index on b(chr, strand, start) or b(chr, strand, start, stop), listing fields in this exact order (adding stop to the index may or may not result in marginal performance improvement - experiment). -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to optimize a multi-condition query
Black, Michael (IS)wrote: > I love simple examples like this can help people with understanding > things...so I tried this which I thought would do what Hilmar > wants...but alaswhat concept am I missing? > > SQLite version 3.7.2 > sqlite> create table c(achr char,bchr char); > sqlite> create index c_chr on c(achr,bchr); > sqlite> explain query plan select achr,bchr from c where achr=bchr; > 0|0|TABLE c > > Why no use of the index in this case? Try this: select t.achr, t.bchr from (select distinct achr from c) as alphabet join c as t on (t.achr=alphabet.achr and t.bchr=alphabet.achr); Under some circumstances - achr only has a small number of distinct values, the number of rows where achr=bchr is a small percentage of all rows in the table - this may be significantly faster than full table scan. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to optimize a multi-condition query
> sqlite> create table c(achr char,bchr char); > sqlite> create index c_chr on c(achr,bchr); > sqlite> explain query plan select achr,bchr from c where achr=bchr; > 0|0|TABLE c > > Why no use of the index in this case? How do you think it should be used here? It's not that rows with the same values of achr and bchr stored together in the index - they are spread all over the place. And thus using index it will have to make full scan of it which is less efficient than full scan of the table. Or should I say it's more efficient than full scan of the table in very rare situations, so that I'm not sure if SQLite implements it at all. Pavel On Fri, Oct 15, 2010 at 8:34 AM, Black, Michael (IS) <michael.bla...@ngc.com> wrote: > I love simple examples like this can help people with understanding > things...so I tried this which I thought would do what Hilmar wants...but > alaswhat concept am I missing? > > SQLite version 3.7.2 > sqlite> create table c(achr char,bchr char); > sqlite> create index c_chr on c(achr,bchr); > sqlite> explain query plan select achr,bchr from c where achr=bchr; > 0|0|TABLE c > > Why no use of the index in this case? > > Michael D. Black > Senior Scientist > Advanced Analytics Directorate > Northrop Grumman Information Systems > > > > > From: sqlite-users-boun...@sqlite.org on behalf of Simon Davies > Sent: Fri 10/15/2010 5:16 AM > To: General Discussion of SQLite Database > Subject: EXTERNAL:Re: [sqlite] How to optimize a multi-condition query > > > > On 15 October 2010 10:43, Hilmar Berger <hilmar.ber...@integromics.com> wrote: >> Thanks ! >> >> However, I tried what you suggested and there was no change. >> >> I used EXPLAIN QUERY PLAN on the query and it looks like it does not use >> any index on b at all, only if I use hardcoded conditions like b > 0. >> >> It appears that the real problem is that SQlite does not use indices for >> both tables, e.g.: >> >> CREATE INDEX a_chr on a(chr) >> CREATE INDEX b_chr on b(chr) >> >> explain query plan >> select a.chr, b.chr >> from b, a >> where b.chr = a.chr; >> >> Output: >> 0 0 TABLE b >> 1 1 TABLE a WITH INDEX a_chr > > There is no benefit in using an index on b for this query. > > Change the query to: > select a.chr, b.chr > from b, a > where b.chr = a.chr and b.chr>0; > > and the query plan becomes: > 0|0|TABLE b WITH INDEX b_chr > 1|1|TABLE a WITH INDEX a_chr > >> >> >> There is an example on how to use multiple indices on the same table >> here: http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuning >> However, I'm not sure how to extend this to joined tables. >> >> Thanks ! >> >> Best regards, >> Hilmar >> > > Regarding > > "select * from > a, b > where a.chr = b.chr and a.strand = b.strand and > a.start <= b.start and a.stop >= b.stop and > b.start <= a.stop and a.start <= b.stop ;" > > surely the last 2 conditions are redundant assuming a.start<a.stop, > and b.start > > Regards, > Simon > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to optimize a multi-condition query
Hi, the condition in your query select achr,bchr from c where achr=bchr involves 2 columns of the table. In this case, an index is useless. If you do select achr,bchr from c where achr='foo' then the index will be used. Martin Am 15.10.2010 15:09, schrieb Black, Michael (IS): > Ok then... I added 67,600 records like this and still no index use. > > SQLite version 3.7.2 > sqlite> select count(*) from c; > 67600 > sqlite> explain query plan select achr,bchr from c where achr=bchr; > 0|0|TABLE c > sqlite> create index c_chr on c(achr,bchr); > sqlite> explain query plan select achr,bchr from c where achr=bchr; > 0|0|TABLE c > > Here's my record add: > #include > #include > #include "sqlite3.h" > int main() > { > sqlite3 *db; > char *errmsg=NULL; > int rc; > int i,j,k; > sqlite3_open("test.db",); > rc=sqlite3_exec(db, "CREATE TABLE c (achr char, bchr > char)",NULL,NULL,); > if (rc != SQLITE_OK) { >puts(errmsg); >sqlite3_free(errmsg); > } > sqlite3_exec(db,"BEGIN",NULL,NULL,); > for(k=0;k<100;k++) { >for(i=0;i<26;i++) { > char sql[4096]; > for(j=0;j<26;j++) { > sprintf(sql,"INSERT INTO c VALUES ('%c','%c')",'a'+i,'a'+j); > rc=sqlite3_exec(db, sql,NULL,NULL,); > if (rc != SQLITE_OK) { > puts(sql); > puts(errmsg); > sqlite3_free(errmsg); > exit(-1); > } > } >} > } > sqlite3_exec(db,"COMMIT",NULL,NULL,); > sqlite3_close(db); > return 0; > } > > Michael D. Black > Senior Scientist > Advanced Analytics Directorate > Northrop Grumman Information Systems > > > > > From: sqlite-users-boun...@sqlite.org on behalf of luuk34 > Sent: Fri 10/15/2010 7:40 AM > To: General Discussion of SQLite Database > Subject: EXTERNAL:Re: [sqlite] How to optimize a multi-condition query > > > > >On 15-10-10 14:34, Black, Michael (IS) wrote: >> I love simple examples like this can help people with understanding >> things...so I tried this which I thought would do what Hilmar wants...but >> alaswhat concept am I missing? >> >> SQLite version 3.7.2 >> sqlite> create table c(achr char,bchr char); >> sqlite> create index c_chr on c(achr,bchr); >> sqlite> explain query plan select achr,bchr from c where achr=bchr; >> 0|0|TABLE c >> >> Why no use of the index in this case? >> > because there are no records in the database, > so its quicker to read just all records, > than to read all record in the order of the index... > > -- > Luuk > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to optimize a multi-condition query
Ok then... I added 67,600 records like this and still no index use. SQLite version 3.7.2 sqlite> select count(*) from c; 67600 sqlite> explain query plan select achr,bchr from c where achr=bchr; 0|0|TABLE c sqlite> create index c_chr on c(achr,bchr); sqlite> explain query plan select achr,bchr from c where achr=bchr; 0|0|TABLE c Here's my record add: #include #include #include "sqlite3.h" int main() { sqlite3 *db; char *errmsg=NULL; int rc; int i,j,k; sqlite3_open("test.db",); rc=sqlite3_exec(db, "CREATE TABLE c (achr char, bchr char)",NULL,NULL,); if (rc != SQLITE_OK) { puts(errmsg); sqlite3_free(errmsg); } sqlite3_exec(db,"BEGIN",NULL,NULL,); for(k=0;k<100;k++) { for(i=0;i<26;i++) { char sql[4096]; for(j=0;j<26;j++) { sprintf(sql,"INSERT INTO c VALUES ('%c','%c')",'a'+i,'a'+j); rc=sqlite3_exec(db, sql,NULL,NULL,); if (rc != SQLITE_OK) { puts(sql); puts(errmsg); sqlite3_free(errmsg); exit(-1); } } } } sqlite3_exec(db,"COMMIT",NULL,NULL,); sqlite3_close(db); return 0; } Michael D. Black Senior Scientist Advanced Analytics Directorate Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org on behalf of luuk34 Sent: Fri 10/15/2010 7:40 AM To: General Discussion of SQLite Database Subject: EXTERNAL:Re: [sqlite] How to optimize a multi-condition query On 15-10-10 14:34, Black, Michael (IS) wrote: > I love simple examples like this can help people with understanding > things...so I tried this which I thought would do what Hilmar wants...but > alaswhat concept am I missing? > > SQLite version 3.7.2 > sqlite> create table c(achr char,bchr char); > sqlite> create index c_chr on c(achr,bchr); > sqlite> explain query plan select achr,bchr from c where achr=bchr; > 0|0|TABLE c > > Why no use of the index in this case? > because there are no records in the database, so its quicker to read just all records, than to read all record in the order of the index... -- Luuk ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to optimize a multi-condition query
On 15-10-10 14:34, Black, Michael (IS) wrote: > I love simple examples like this can help people with understanding > things...so I tried this which I thought would do what Hilmar wants...but > alaswhat concept am I missing? > > SQLite version 3.7.2 > sqlite> create table c(achr char,bchr char); > sqlite> create index c_chr on c(achr,bchr); > sqlite> explain query plan select achr,bchr from c where achr=bchr; > 0|0|TABLE c > > Why no use of the index in this case? > because there are no records in the database, so its quicker to read just all records, than to read all record in the order of the index... -- Luuk ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to optimize a multi-condition query
I love simple examples like this can help people with understanding things...so I tried this which I thought would do what Hilmar wants...but alaswhat concept am I missing? SQLite version 3.7.2 sqlite> create table c(achr char,bchr char); sqlite> create index c_chr on c(achr,bchr); sqlite> explain query plan select achr,bchr from c where achr=bchr; 0|0|TABLE c Why no use of the index in this case? Michael D. Black Senior Scientist Advanced Analytics Directorate Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org on behalf of Simon Davies Sent: Fri 10/15/2010 5:16 AM To: General Discussion of SQLite Database Subject: EXTERNAL:Re: [sqlite] How to optimize a multi-condition query On 15 October 2010 10:43, Hilmar Berger <hilmar.ber...@integromics.com> wrote: > Thanks ! > > However, I tried what you suggested and there was no change. > > I used EXPLAIN QUERY PLAN on the query and it looks like it does not use > any index on b at all, only if I use hardcoded conditions like b > 0. > > It appears that the real problem is that SQlite does not use indices for > both tables, e.g.: > > CREATE INDEX a_chr on a(chr) > CREATE INDEX b_chr on b(chr) > > explain query plan > select a.chr, b.chr > from b, a > where b.chr = a.chr; > > Output: > 00TABLE b > 11TABLE a WITH INDEX a_chr There is no benefit in using an index on b for this query. Change the query to: select a.chr, b.chr from b, a where b.chr = a.chr and b.chr>0; and the query plan becomes: 0|0|TABLE b WITH INDEX b_chr 1|1|TABLE a WITH INDEX a_chr > > > There is an example on how to use multiple indices on the same table > here: http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuning > However, I'm not sure how to extend this to joined tables. > > Thanks ! > > Best regards, > Hilmar > Regarding "select * from a, b where a.chr = b.chr and a.strand = b.strand and a.start <= b.start and a.stop >= b.stop and b.start <= a.stop and a.start <= b.stop ;" surely the last 2 conditions are redundant assuming a.start<a.stop, and b.starthttp://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to optimize a multi-condition query
On 15 October 2010 10:43, Hilmar Bergerwrote: > Thanks ! > > However, I tried what you suggested and there was no change. > > I used EXPLAIN QUERY PLAN on the query and it looks like it does not use > any index on b at all, only if I use hardcoded conditions like b > 0. > > It appears that the real problem is that SQlite does not use indices for > both tables, e.g.: > > CREATE INDEX a_chr on a(chr) > CREATE INDEX b_chr on b(chr) > > explain query plan > select a.chr, b.chr > from b, a > where b.chr = a.chr; > > Output: > 0 0 TABLE b > 1 1 TABLE a WITH INDEX a_chr There is no benefit in using an index on b for this query. Change the query to: select a.chr, b.chr from b, a where b.chr = a.chr and b.chr>0; and the query plan becomes: 0|0|TABLE b WITH INDEX b_chr 1|1|TABLE a WITH INDEX a_chr > > > There is an example on how to use multiple indices on the same table > here: http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuning > However, I'm not sure how to extend this to joined tables. > > Thanks ! > > Best regards, > Hilmar > Regarding "select * from a, b where a.chr = b.chr and a.strand = b.strand and a.start <= b.start and a.stop >= b.stop and b.start <= a.stop and a.start <= b.stop ;" surely the last 2 conditions are redundant assuming a.start
Re: [sqlite] How to optimize a multi-condition query
Thanks ! However, I tried what you suggested and there was no change. I used EXPLAIN QUERY PLAN on the query and it looks like it does not use any index on b at all, only if I use hardcoded conditions like b > 0. It appears that the real problem is that SQlite does not use indices for both tables, e.g.: CREATE INDEX a_chr on a(chr) CREATE INDEX b_chr on b(chr) explain query plan select a.chr, b.chr from b, a where b.chr = a.chr; Output: 00TABLE b 11TABLE a WITH INDEX a_chr There is an example on how to use multiple indices on the same table here: http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuning However, I'm not sure how to extend this to joined tables. Thanks ! Best regards, Hilmar El 10/14/2010 7:12 PM, Simon Slavin escribió: > On 14 Oct 2010, at 5:46pm, Hilmar Berger wrote: > >> The query is as follows: >> >> select * from >> a, b >> where a.chr = b.chr and a.strand = b.strand and a.start<= b.start >> and a.stop>= b.stop and b.start<= a.stop and a.start<= b.stop ; >> Indexes has been created for all fields in A and B (e.g. >> create index name on A(chr, start, stop, strand)); > I not certain from your phrasing, but this may not do what you think. It is > different matter to do > > create index Achr on A (chr) > create index Astart on A (start) > create index Astop on A (stop) > ... > > to what you did above. One creates one index on the sequence of four > variables, the other creates four indexes each on one variable. > > However, to make this SELECT go fast, > CREAT INDEX Bkey ON B (chr,strand) > CREAT INDEX Bstart ON B (start) > CREAT INDEX Bstop ON B (stop) > > May help. > > Also I recommend doing this: > > http://www.sqlite.org/lang_analyze.html > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to optimize a multi-condition query
On 14 Oct 2010, at 5:46pm, Hilmar Berger wrote: > The query is as follows: > > select * from > a, b > where a.chr = b.chr and a.strand = b.strand and a.start <= b.start > and a.stop >= b.stop and b.start <= a.stop and a.start <= b.stop ; > Indexes has been created for all fields in A and B (e.g. > create index name on A(chr, start, stop, strand)); I not certain from your phrasing, but this may not do what you think. It is different matter to do create index Achr on A (chr) create index Astart on A (start) create index Astop on A (stop) ... to what you did above. One creates one index on the sequence of four variables, the other creates four indexes each on one variable. However, to make this SELECT go fast, CREAT INDEX Bkey ON B (chr,strand) CREAT INDEX Bstart ON B (start) CREAT INDEX Bstop ON B (stop) May help. Also I recommend doing this: http://www.sqlite.org/lang_analyze.html Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] How to optimize a multi-condition query
Hi all, I have a query that does not perform as fast as expected and would like to know if anyone here has an idea on how to optimize it. There are two tables, A and B. Both have fields chr, start, stop and strand. A has about 50k entries, B about 12k. Both contain intervals defined by start...stop. The intervals in B have length 1, those in A any possible length. I want to identify all intervals in B that fall into one of those in A and get the corresponding interval data from A for each match of B. Indexes has been created for all fields in A and B (e.g. create index name on A(chr, start, stop, strand)); The query is as follows: select * from a, b where a.chr = b.chr and a.strand = b.strand and a.start <= b.start and a.stop >= b.stop and b.start <= a.stop and a.start <= b.stop ; This query takes about 130 seconds, but if when I remove the two last conditions it drops to 0.5 seconds. Replacing both last conditions with conditions like b.start > 1000 or b.start <> 0 the execution time will stay at 0.5 seconds. Any help appreciated. Thanks ! Hilmar ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users