Re: [sqlite] Index and ORDER BY

2008-06-28 Thread jsg72

sqlite> SELECT max(eid) from events;
16643833

sqlite> SELECT count(eid) FROM events;
16643833

sqlite> SELECT count(eid) FROM events WHERE type=22;
8206183

sqlite> SELECT count(eid) FROM events WHERE eid<=3261976;
3261976

sqlite> SELECT count(eid) FROM events WHERE eid<=3261976 AND type=22;
2062728

And for performance:

sqlite> CREATE INDEX ev4_idx ON events(type);

No ordering:

sqlite> SELECT events.* FROM events WHERE eid<=3261976 AND type=22  
LIMIT 1;
13|63922|6|0|22|9|4
CPU Time: user 0.00 sys 0.044993

Ascending order:

sqlite> SELECT events.* FROM events WHERE eid<=3261976 AND type=22  
ORDER BY eid ASC LIMIT 1;
13|63922|6|0|22|9|4
CPU Time: user 0.00 sys 0.00

Descending order:

sqlite> SELECT events.* FROM events WHERE eid<=3261976 AND type=22  
ORDER BY eid DESC LIMIT 1;
3261891|4910298|1206924|1|22|9|4
CPU Time: user 4.204361 sys 0.885865
(wall clock time is roughly double user+sys - I guess time waiting for  
disk isn't being counted in system time)

Sanity check:

sqlite> EXPLAIN QUERY PLAN SELECT events.* FROM events WHERE  
eid<=3261976 AND type=22 ORDER BY eid DESC LIMIT 1;
0|0|TABLE events WITH INDEX ev4_idx ORDER BY

With a different index:

sqlite> CREATE INDEX ev5_idx ON events(type,eid desc);

sqlite> EXPLAIN QUERY PLAN SELECT events.* FROM events WHERE  
eid<=3261976 AND type=22 ORDER BY eid DESC LIMIT 1;
0|0|TABLE events WITH INDEX ev5_idx ORDER BY

sqlite> SELECT events.* FROM events WHERE eid<=3261976 AND type=22  
ORDER BY eid DESC LIMIT 1;
3261891|4910298|1206924|1|22|9|4
CPU Time: user 4.282349 sys 0.901862
(again, wall-clock time is roughly double this amount)

And ascending order is very fast:

sqlite> SELECT events.* FROM events WHERE eid<=3261976 AND type=22  
ORDER BY eid ASC LIMIT 1;
13|63922|6|0|22|9|4
CPU Time: user 0.00 sys 0.052992

It seems that sqlite wants to do its index scan in ascending order, so  
returning the first one is very quick, but returning the last one  
(first in descending order) is slow.  Is there any way to give the  
engine an idea that it should do its index scan in descending order so  
that the ORDER BY is cheap?

Thanks,
Jeff


On Jun 28, 2008, at 11:50 AM, Alexey Pechnikov wrote:

> Show results of this queries:
>
> select max(eid) from events;
> select count(eid) from events;
> select count(eid) from events where type=22;
> select count(eid) from events where eid<=3261976;
> select count(eid) from events where eid<=3261976 and type=22;
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Index and ORDER BY

2008-06-28 Thread jsg72
Very strange.  I modified my query to not use verbose or tid, so only  
the indexed columns are relevant.

With:

CREATE INDEX ev4_idx ON events(type);

The query runs in about 9 seconds.

With:

CREATE INDEX ev4_idx ON events(type,eid desc)

It runs in 11 seconds.

I'm not using the most accurate timing in the world (not using  
database functions for the timing, since I don't know if that would  
distort the results) - literally, a wall clock.  But it is noticeably  
a little slower.  Any ideas?
Thanks,
Jeff


On Jun 28, 2008, at 4:29 AM, Alexey Pechnikov wrote:

> В сообщении от Saturday 28 June 2008 02:28:05 Jeff Gibson  
> написал(а):
>> When I do the following query:
>>
>> SELECT events.* FROM events WHERE ( events.type=22) AND  
>> ( events.tid=9)
>> AND (events.eid<=3261976) AND (events.verbose<=1) ORDER BY events.eid
>> DESC LIMIT 1;
>>
>> it's very slow.  If I switch the ORDER BY to "ASC" instead of "DESC",
>> it's very fast.
>
> As described in http://www.sqlite.org/lang_createindex.html
> "sql-statement ::=  CREATE [UNIQUE] INDEX [IF NOT EXISTS] [database- 
> name .]
> index-name
> ON table-name ( column-name [, column-name]* )
>  column-name ::=  name [ COLLATE collation-name] [ ASC | DESC ]"
>
> You can try create additional index as
> CREATE INDEX ev4_idx ON events (type,eid desc);
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Index and ORDER BY

2008-06-28 Thread jsg72
I tried taking the tid and verbose tests out of the WHERE clause, and  
it made very little difference in the performance.  I was thinking  
that if I can at least speed it up with just eid and type, I could try  
to extend it to the other columns.
Thanks,
Jeff

On Jun 28, 2008, at 6:25 AM, Emilio Platzer wrote:

> (sorry about my poor english)
>
> I think that the problem doesn't correct by creating a DESC index. The
> problema was at de 'where clausula':
>
> (events.eid<=3261976)
>
> For some reason SQLITE doesn't optimize the query to use the index to
> locate the last index item that have type=22 and eid<=3261976. Of  
> course
> if you have only a few items that have tid=9, SQL must have to read
> every item starting with de last until he find the item that haves  
> tid=9.
>
> Do you try to the prevoius sugest: add a index that have type and tid?
>
> You must know that, SQLite must read the items to find verbose<=1
>
> good luck!
>
> Emilio
>
> Alexey Pechnikov escribio':
>> В сообщении от Saturday 28 June 2008 02:28:05 Jeff  
>> Gibson написал(а):
>>> When I do the following query:
>>>
>>> SELECT events.* FROM events WHERE ( events.type=22) AND  
>>> ( events.tid=9)
>>> AND (events.eid<=3261976) AND (events.verbose<=1) ORDER BY  
>>> events.eid
>>> DESC LIMIT 1;
>>>
>>> it's very slow.  If I switch the ORDER BY to "ASC" instead of  
>>> "DESC",
>>> it's very fast.
>>
>> As described in http://www.sqlite.org/lang_createindex.html
>> "sql-statement ::=  CREATE [UNIQUE] INDEX [IF NOT EXISTS] [database- 
>> name .]
>> index-name
>> ON table-name ( column-name [, column-name]* )
>>  column-name ::=  name [ COLLATE collation-name] [ ASC | DESC ]"
>>
>> You can try create additional index as
>> CREATE INDEX ev4_idx ON events (type,eid desc);
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Index and ORDER BY

2008-06-28 Thread jsg72
Sounds promising.  I'll give it a try.  Thanks!
Jeff

On Jun 28, 2008, at 4:29 AM, Alexey Pechnikov wrote:

> В сообщении от Saturday 28 June 2008 02:28:05 Jeff Gibson  
> написал(а):
>> When I do the following query:
>>
>> SELECT events.* FROM events WHERE ( events.type=22) AND  
>> ( events.tid=9)
>> AND (events.eid<=3261976) AND (events.verbose<=1) ORDER BY events.eid
>> DESC LIMIT 1;
>>
>> it's very slow.  If I switch the ORDER BY to "ASC" instead of "DESC",
>> it's very fast.
>
> As described in http://www.sqlite.org/lang_createindex.html
> "sql-statement ::=  CREATE [UNIQUE] INDEX [IF NOT EXISTS] [database- 
> name .]
> index-name
> ON table-name ( column-name [, column-name]* )
>  column-name ::=  name [ COLLATE collation-name] [ ASC | DESC ]"
>
> You can try create additional index as
> CREATE INDEX ev4_idx ON events (type,eid desc);
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Index and ORDER BY

2008-06-28 Thread jsg72
16 million


On Jun 28, 2008, at 4:25 AM, Alexey Pechnikov wrote:

> В сообщении от Saturday 28 June 2008 02:28:05 Jeff Gibson  
> написал(а):
>> I have a large table and a two column index:
>
> How much rows are you have?
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] indexing rows from a query

2008-05-16 Thread jsg72

On May 16, 2008, at 5:04 PM, Jay A. Kreibich wrote:

> On Fri, May 16, 2008 at 04:44:10PM -0700, [EMAIL PROTECTED] scratched on  
> the wall:
>> Sorry if this is a silly question - I don't have much experience with
>> databases.
>>
>> Say I have a table with many (millions+) of rows and I have a query:
>>
>> SELECT * FROM mytable WHERE some_condition ORDER BY rowid
>>
>> First, I'm assuming that in addition to whatever time some_condition
>> takes, I'll see an overhead of O( N log N ) for the sort in the worst
>> case, but probably much less (O(N) or O(1)?) because it's probably be
>> sorted anyway by rowid.  Is that correct?
>
>  If "some_condition" doesn't trigger the use of another index, then
>  yes... the table will be scanned sequentially by rowid, so after the
>  condition is assessed there shouldn't be any sorting overhead.
>
>  If some_condition causes the use of a different index, then the
>  sorting will be post-processed and the sort costs will be higher--
>  but your conditional costs might be much lower.
>
>  Which is better depends a lot on the percentage of total rows
>  returned.  If you have a large table, but only return a dozen or so
>  rows, you want to make the condition efficient.  If you're returning
>  20% of the table, you want to sort to be cheap.

Gotcha.  Thanks.  In my application, I expect the common case to be  
returning a large fraction of the table, so I'm most concerned with  
the sort overhead.


>
>
>> My real question is if there is an efficient way to index the results
>> of such a query.  In other words, I'm looking for rows N through N 
>> +100
>> of the result.  Can I do much better than just executing the query  
>> and
>> throwing away the first N rows?  I thought of making an auxiliary
>> table to map rowid in the table with row number of the query for  
>> large
>> chunks of the table, but that can get to be a big memory footprint if
>> some_condition changes often.
>
>  If you scan things in order-- e.g. 1-100, 101-200, etc. just remember
>  the row id of the last item in the last results set and add a  
> "rowid >
>  [last]" condition to the next query.  Use a limit clause to define
>  the size of your window.  The big thing is that you'd like to avoid
>  using an OFFSET.
>
>  Have a look here for more ideas:
>  http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor

This makes sense.  I am, in fact, using this to send data to a  
scrollable window, and it seems like this would work great to scroll  
down to the next page.  The real problem, though, is if the user grabs  
the scrollbar and drags it to somewhere far away (or tries to scroll  
up).  It sounds like I'm just stuck using OFFSET in that case, right?
Thanks,
Jeff



>
>
>   -j
>
> -- 
> Jay A. Kreibich < J A Y  @  K R E I B I.C H >
>
> "'People who live in bamboo houses should not throw pandas.' Jesus  
> said that."
>   - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech  
> 2006"
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] indexing rows from a query

2008-05-16 Thread jsg72
Sorry if this is a silly question - I don't have much experience with  
databases.

Say I have a table with many (millions+) of rows and I have a query:

SELECT * FROM mytable WHERE some_condition ORDER BY rowid

First, I'm assuming that in addition to whatever time some_condition  
takes, I'll see an overhead of O( N log N ) for the sort in the worst  
case, but probably much less (O(N) or O(1)?) because it's probably be  
sorted anyway by rowid.  Is that correct?

My real question is if there is an efficient way to index the results  
of such a query.  In other words, I'm looking for rows N through N+100  
of the result.  Can I do much better than just executing the query and  
throwing away the first N rows?  I thought of making an auxiliary  
table to map rowid in the table with row number of the query for large  
chunks of the table, but that can get to be a big memory footprint if  
some_condition changes often.

Does anyone have any suggestions?
Thanks,
Jeff
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users