Re: [sqlite] Specifing which index to use. Was: Performance/bug in multikey 'group by' in 3.6.2
On Sep 26, 2008, at 2:29 PM, Kees Nuyt wrote: > On Fri, 26 Sep 2008 12:54:36 -0400, Russell wrote: > >> I need a 2 key index for some queries and also want to aggregate on >> these 2 columns. I need this index BUT I have many large sqlite dbs I >> iterate over and they won't fit in the filesystem cache. Run time >> when >> the index is present is 105min. Run time with out the index is 3min. > > Did you populate the sqlite_stat1 index statistics table > with ANALYZE ? It might influence the query plan. Yup. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Final Resolution for Index Control?
DRH, I think most people like the nature of your proposal to allow control of indexes in queries. There were a number of suggestions on variations of the syntax. Have you made a final decision on what the syntax will be? Thx Russ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance/bug in multikey 'group by' in 3.6.2
Personally, I like the Mysql syntax best of the below links. Seems natural and unobtrusive (unlike 'hints'). On Sep 21, 2008, at 12:12 PM, Alex Scotti wrote: > not surprisingly, the db2 approach is the only one that seems to > follow the "ideal" of the relational dbms. looks like you get to > provide to the engine information about your data, rather than > describing to the engine what steps it's supposed to take. hints > that directly influence query plans seem like a huge step backwards. > if you want tight control over what the engine does you use a pure > rules based (rather than cost based) optimizer. but nobody > (including sqlite) does that anymore for good reasons. > > > On Sep 21, 2008, at 12:59 PM, Stephen Woodbridge wrote: > >> D. Richard Hipp wrote: >>> On Sep 21, 2008, at 8:51 AM, Russ Leighton wrote: >>> >>>> I am interested in ... a way >>>> to constraint/control index selection on queries. >>>> >>> >>> What other SQL database engines have this capability and what syntax >>> do they use? >> >> Richard, >> >> Hope this are useful: >> >> Oracle: >> http://www.adp-gmbh.ch/ora/sql/hints/index.html >> http://download.oracle.com/docs/cd/E12096_01/books/admintool/ >> admintool_PhysicalSetup32.html >> >> Postgresql: >> Does not support it unless it is a very recent addition, but its >> planner >> occasionally/frequently? makes a poor choice and adding support for >> hints ala Oracle and/or discussions like "Why didn't Postgresql use >> my >> index?" regularly occur on the user list. >> >> DB2: >> http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/ >> com.ibm.db2.udb.doc/admin/t0005308.htm >> >> MySQL: >> http://dev.mysql.com/doc/refman/5.0/en/index-hints.html >> >> -Steve >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance/bug in multikey 'group by' in 3.6.2
On Sep 19, 2008, at 6:30 PM, D. Richard Hipp wrote: > > On Sep 19, 2008, at 5:47 PM, Russ Leighton wrote: > >> >> Observation 1: Group by (in my case) is faster w/out using the index >> than with using the index by 10X >> >> In my app I have a table- >> >> create table foo(k1integer ,k2 integer,k3 integer,...); >> create index foo_idx on foo(k1,k2,k3); >> >> when I do- >> >> select k1,k2,sum() as s,count(1) as c from foo group by k1,k2; >> >> The query is 10X slower with the above index created as when I drop >> the index and don't use it. Bug or >> just happens to be the case for some data distribution this is just >> the way it is? Yes, I know I have >> a 3 column index and only using 2 for this query. > > I do not see how it is possible for what you say to be true - unless > you have omitted important details of your query, such as a WHERE > clause. > Okay, this is a bit embarrassing but I can't seem to reproduce the performance difference. I spent hours tracking this issue down previously and now , going back to reproduce, I don't see it. Index'd group by's are always faster. Must have been some strange state of the equipment. Sorry for the false alarm. That said, the _real_ thing I am interested in as a requested enhancement is a way to constraint/control index selection on queries. This would give me an out should the situation above repeat itself in a more consistent manner. Thanks for sqlite. It is an amazing tool. BTW, in my mind lite != small, rather lite == simple+powerful+fast . We are using it over very large quantities of data on very powerful servers. -Russ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Performance/bug in multikey 'group by' in 3.6.2
Observation 1: Group by (in my case) is faster w/out using the index than with using the index by 10X In my app I have a table- create table foo(k1integer ,k2 integer,k3 integer,...); create index foo_idx on foo(k1,k2,k3); when I do- select k1,k2,sum() as s,count(1) as c from foo group by k1,k2; The query is 10X slower with the above index created as when I drop the index and don't use it. Bug or just happens to be the case for some data distribution this is just the way it is? Yes, I know I have a 3 column index and only using 2 for this query. Observation 2: I can force the index to not be used with '+' but the result has nulls for key values! If I do - select k1,k2,sum() as s,count(1) as c from foo group by +k1,+k2; then the index is not used, it is fast BUT the output has null values for k1,k2! Seems like a bug Thoughts? -Russ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users