Re: [sqlite] How to optimize a multi-condition query

2010-10-18 Thread Hilmar Berger
  Hi,

thanks to everybody that answered. I tried your suggestions but there 
was no measurable improvement. Possibly this is the best what I can get 
out of Sqlite.
However, I tried a similar query on a larger table using both SQlite and 
Postgresql (same machine, same table structure + indices, same data).
Postrgresql took 25secs for this query while it took more than 1 hour 
with SQlite. Obviously there is room for improvement. If I manage to 
find out what makes the difference I will post it here.

Thanks again and best regards,
Hilmar


El 10/15/2010 4:44 PM, Igor Tandetnik escribió:
> Hilmar Berger  wrote:
>> Indexes has been created for all fields in A and B (e.g. create index
>> name on A(chr, start, stop, strand));
>>
>> The query is as follows:
>>
>> select * from
>>  a, b
>>  where a.chr = b.chr and a.strand = b.strand and a.start<= b.start
>> and  a.stop>= b.stop  and b.start<= a.stop and a.start<= b.stop ;
> Drop those indexes you created, change the query to
>
> select * from a, b
>   where a.chr = b.chr and a.strand = b.strand
>   and b.start between a.start and a.stop
>   and b.stop<= a.stop;
>
> and create a single index on b(chr, strand, start) or b(chr, strand, start, 
> stop), listing fields in this exact order  (adding stop to the index may or 
> may not result in marginal performance improvement - experiment).
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to optimize a multi-condition query

2010-10-15 Thread Hilmar Berger
  Thanks !

However, I tried what you suggested and there was no change.

I used EXPLAIN QUERY PLAN on the query and it looks like it does not use 
any index on b at all, only if I use hardcoded conditions like b > 0.

It appears that the real problem is that SQlite does not use indices for 
both tables, e.g.:

CREATE INDEX a_chr on a(chr)
CREATE INDEX b_chr on b(chr)

explain query plan
   select a.chr, b.chr
 from b, a
 where b.chr = a.chr;

Output:
00TABLE b
11TABLE a WITH INDEX a_chr


There is an example on how to use multiple indices on the same table 
here: http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuning
However, I'm not sure how to extend this to joined tables.

Thanks !

Best regards,
Hilmar

El 10/14/2010 7:12 PM, Simon Slavin escribió:
> On 14 Oct 2010, at 5:46pm, Hilmar Berger wrote:
>
>> The query is as follows:
>>
>> select * from
>>  a, b
>>  where a.chr = b.chr and a.strand = b.strand and a.start<= b.start
>> and  a.stop>= b.stop  and b.start<= a.stop and a.start<= b.stop ;
>> Indexes has been created for all fields in A and B (e.g.
>> create index name on A(chr, start, stop, strand));
> I not certain from your phrasing, but this may not do what you think.  It is 
> different matter to do
>
> create index Achr on A (chr)
> create index Astart on A (start)
> create index Astop on A (stop)
> ...
>
> to what you did above.  One creates one index on the sequence of four 
> variables, the other creates four indexes each on one variable.
>
> However, to make this SELECT go fast,
> CREAT INDEX Bkey ON B (chr,strand)
> CREAT INDEX Bstart ON B (start)
> CREAT INDEX Bstop ON B (stop)
>
> May help.
>
> Also I recommend doing this:
>
> http://www.sqlite.org/lang_analyze.html
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] How to optimize a multi-condition query

2010-10-14 Thread Hilmar Berger
  Hi all,

I have a query that does not perform as fast as expected and would like 
to know if anyone here has an idea on how to optimize it.

There are two tables, A and B. Both have fields chr, start, stop and 
strand. A has about 50k entries, B about 12k.
Both contain intervals defined by start...stop. The intervals in B have 
length 1, those in A any possible length.
I want to identify all intervals in B that fall into one of those in A 
and get the corresponding interval data from A for each match of B.
Indexes has been created for all fields in A and B (e.g. create index 
name on A(chr, start, stop, strand));

The query is as follows:

select * from
 a, b
 where a.chr = b.chr and a.strand = b.strand and a.start <= b.start 
and  a.stop >= b.stop  and b.start <= a.stop and a.start <= b.stop ;

This query takes about 130 seconds, but if when I remove the two last 
conditions it drops to 0.5 seconds. Replacing both last conditions with 
conditions like b.start > 1000 or b.start <> 0 the execution time 
will stay at 0.5 seconds.

Any help appreciated.

Thanks !

Hilmar

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users