[sqlite] bug in R-tree table syntax checking
Hi, creating an R-tree table with what looks like a table constraint results in an inconsistent number of columns, with funny results: > create virtual table t using rtree(id, x1, x2, y1, check(1)); > insert into t default values; > select * from t; 1|0.0|0.0|1.74906711200709e-38 Regards, Clemens
[sqlite] incorrect R-tree documentation
Hi, the R-tree documentation says: | Attempts to insert something other than an integer into the first | column, or something other than a numeric value into the other | columns, will result in an error. This is not actually true: > create virtual table t using rtree(id, x1, x2); > insert into t values(null, null, null); > insert into t values('xxx', 'xxx', x'1234'); > select * from t; 1|0.0|0.0 0|0.0|0.0 Regards, Clemens
[sqlite] Speed of count distinct rows if unique index on all fields
On 2015-10-02 07:28 PM, Bart Smissaert wrote: >> if any single column in those two rows is NULL. > OK, I got it and that will make it even less likely that this situation > will occur. > I think we got this exhausted now. Sure - but don't be dismayed though, every opportunity for optimization should be investigated, that's how progress is made. This one turned out after investigation to not be suitable, but the next one might. Keep at it.
[sqlite] Speed of count distinct rows if unique index on all fields
On 2015-10-02 05:41 PM, Bart Smissaert wrote: >> you're just throwing random terms around and hoping something sticks. > Not sure where you got that idea from, but let me explain better: > > Say we have a table: > > CREATE TABLE TABLE1 > ([FIELD1] INTEGER, > [FIELD2] TEXT, > [FIELD3] TEXT, > [FIELD4] REAL) > > and we have a unique index on all 4 fields: > > CREATE UNIQUE INDEX IDX1 > ON TABLE1 > (FIELD1, FIELD2, FIELD3, FIELD4) > > Now I want to count all the unique rows of this table (without any limiting > where or join or whatever). > > Unless I am mistaken here this is done with a SQL like this: > > SELECT COUNT(*) AS UNIQUE_ROWS > FROM (SELECT DISTINCT * FROM TABLE1) > > But if we take advantage of the fact that this table has a unique index on > all the fields of the table > we can simply do this SQL: > > SELECT COUNT(*) FROM TABLE1 > > It will have the same result and that is a lot faster. > Hope this makes it clear. This explanation is much more clear - thanks. I think the initial source of confusion is the original explanation being hard to follow, but now that it is clear and with SQL, we can see exactly what you mean. In the case where where the table has rows that are not unique, there is no gain. In the case where we could use the optimization, all the following has to be true: A - The Table must NOT have a Primary Key, but B - The table must have a complete covering Index and C - The covering Index must be UNIQUE and D - All fields must be NOT NULL E - The Query must be a COUNT() aggregate query F - There must NOT be any JOINs G - There must NOT be a WHERE clause. H - There must be a correlated sub-query which accesses the table, I - The sub-query MUST force a DISTINCT Query output constraint, and J - The constraint MUST be on the * wildcard identifier (or alternatively, the complete field list). The only way this optimization can ever be accommodated is if ALL of the above are true, and the odds of having all those things true are even less than Lotto-winning odds. One might look at the sub-query alone and say "What if the sub query (or any query) that Uses 'DISTINCT *' in a table that has a Unique covering index without a where clause or joins can be optimized" - which it probably can, but the odds of this are only very slightly less low than the first case. Add to that the code that needs to be added to check for all those things - it would probably make the query planner slower by a (very) insignificant amount, but more still than the odds of running into such a query+schema combination. That said, if you do have this situation sometimes and it does slow down things for you - Perhaps a design change in the schemata would help?
[sqlite] Speed of count distinct rows if unique index on all fields
> Keep at it. Will do. RBS On Fri, Oct 2, 2015 at 6:46 PM, R.Smith wrote: > > > On 2015-10-02 07:28 PM, Bart Smissaert wrote: > >> if any single column in those two rows is NULL. >>> >> OK, I got it and that will make it even less likely that this situation >> will occur. >> I think we got this exhausted now. >> > > Sure - but don't be dismayed though, every opportunity for optimization > should be investigated, that's how progress is made. This one turned out > after investigation to not be suitable, but the next one might. Keep at it. > > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Speed of count distinct rows if unique index on all fields
No, whether the column contains a null is irrelevant. It is whether the column CAN contain a null. > -Original Message- > From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users- > bounces at mailinglists.sqlite.org] On Behalf Of Bart Smissaert > Sent: Friday, 2 October, 2015 11:15 > To: General Discussion of SQLite Database > Subject: Re: [sqlite] Speed of count distinct rows if unique index on all > fields > > OK, thanks. > Do you mean that is only valid if there are no rows where all columns are > NULL? > In any case, I can see that this optimization (in SQLite) is just not > worth > the trouble. > > RBS > > > On Fri, Oct 2, 2015 at 6:02 PM, Richard Hipp wrote: > > > On 10/2/15, Bart Smissaert wrote: > > > > > > Unless I am mistaken here this is done with a SQL like this: > > > > > > SELECT COUNT(*) AS UNIQUE_ROWS > > > FROM (SELECT DISTINCT * FROM TABLE1) > > > > > > But if we take advantage of the fact that this table has a unique > index > > on > > > all the fields of the table > > > we can simply do this SQL: > > > > > > SELECT COUNT(*) FROM TABLE1 > > > > > > > No. That is only valid if all columns are individually NOT NULL. > > > > -- > > D. Richard Hipp > > drh at sqlite.org > > ___ > > sqlite-users mailing list > > sqlite-users at mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Speed of count distinct rows if unique index on all fields
You are making an error. sqlite> create table x(a, b, c, d); sqlite> create unique index y on x(a,b,c,d); sqlite> insert into x values(1,2,3,null); sqlite> insert into x values(1,2,3,null); sqlite> insert into x values(1,2,3,null); sqlite> select * from x; 1|2|3| 1|2|3| 1|2|3| sqlite> select distinct * from x; 1|2|3| sqlite> As you can see, the number of "DISTINCT" rows does not equal the number of "unique" rows in the index. Using DISTINCT requires visitating of all rows and counting the DISTINCT entries, whereas merely counting rows does not require visiting all the rows. In other words, where columns are nullable DISTINCT(*) is not the same as UNIQUE(*). > -Original Message- > From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users- > bounces at mailinglists.sqlite.org] On Behalf Of Bart Smissaert > Sent: Friday, 2 October, 2015 09:42 > To: General Discussion of SQLite Database > Subject: Re: [sqlite] Speed of count distinct rows if unique index on all > fields > > > you're just throwing random terms around and hoping something sticks. > > Not sure where you got that idea from, but let me explain better: > > Say we have a table: > > CREATE TABLE TABLE1 > ([FIELD1] INTEGER, > [FIELD2] TEXT, > [FIELD3] TEXT, > [FIELD4] REAL) > > and we have a unique index on all 4 fields: > > CREATE UNIQUE INDEX IDX1 > ON TABLE1 > (FIELD1, FIELD2, FIELD3, FIELD4) > > Now I want to count all the unique rows of this table (without any > limiting > where or join or whatever). > > Unless I am mistaken here this is done with a SQL like this: > > SELECT COUNT(*) AS UNIQUE_ROWS > FROM (SELECT DISTINCT * FROM TABLE1) > > But if we take advantage of the fact that this table has a unique index on > all the fields of the table > we can simply do this SQL: > > SELECT COUNT(*) FROM TABLE1 > > It will have the same result and that is a lot faster. > Hope this makes it clear. > > > RBS > > > > > On Fri, Oct 2, 2015 at 4:01 PM, Scott Hess wrote: > > > Why does any of that matter? SELECT COUNT(*) FROM table; already knows > all > > of that information. > > > > If you have a question about why one query is faster/slower than another > > query given one schema versus another schema, then post representative > > schema and queries. Right now you're just throwing random terms around > and > > hoping something sticks. Earlier you said that you weren't talking > about > > DISTINCT as applied to result sets, but just now you're re-introducing > > COUNT DISTINCT for some reason. SQL code will make your question > concrete. > > > > -scott > > > > > > On Fri, Oct 2, 2015 at 7:54 AM, Bart Smissaert > > > wrote: > > > > > It is faster because if it knows there is no where or join or whatever > > row > > > limiting condition and it also knows there is > > > a unique index on all fields it can simply do select count(rowid) from > > > table1 and not do any count distinct. > > > > > > RBS > > > > > > > > > On Fri, Oct 2, 2015 at 3:51 PM, Scott Hess wrote: > > > > > > > On Fri, Oct 2, 2015 at 7:43 AM, Bart Smissaert < > > bart.smissaert at gmail.com > > > > > > > > wrote: > > > > > > > > > > The Uniqueness of the output depends on which fields are > included, > > > > JOINs, > > > > > UNIONs, etc. etc. > > > > > > > > > > I am not talking about that situation. I am only referring to a > > > situation > > > > > where you want to count all > > > > > rows in a table. I know it will be uncommon to have an index on > all > > > > fields > > > > > and this is not really a practical > > > > > question. I suppose as it so uncommon it is not worth it to put > this > > > > > optimization in. > > > > > > > > > > > > Is your case of having the unique index on all fields faster than > > having > > > a > > > > unique index on a single field? > > > > > > > > Maybe you should include an example of your schema. I can't think > of > > how > > > > scanning an index on all fields could be smaller than the underlying > > > table, > > > > so it's unclear how that could be faster. But a unique index on a > > subset > > > > of the data could be faster simply from being smaller. > > > > > > > > Also, 10x faster makes me wonder about whether you're accounting for > > > > caching effects. A blunt way to test for that is to run your > queries a > > > > couple times. If the first time is slow and the second and later > times > > > are > > > > much faster, then it's likely the cache is causing the speedup. > > > > > > > > -scott > > > > ___ > > > > sqlite-users mailing list > > > > sqlite-users at mailinglists.sqlite.org > > > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > > > > > ___ > > > sqlite-users mailing list > > > sqlite-users at mailinglists.sqlite.org > > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > > > ___ > > sqlite-use
[sqlite] Speed of count distinct rows if unique index on all fields
> if any single column in those two rows is NULL. OK, I got it and that will make it even less likely that this situation will occur. I think we got this exhausted now. RBS On Fri, Oct 2, 2015 at 6:24 PM, Richard Hipp wrote: > On 10/2/15, Bart Smissaert wrote: > > Do you mean that is only valid if there are no rows where all columns are > > NULL? > > No, I mean that two rows can be identical (not distinct) event if > there is a unique index on all columns, if any single column in those > two rows is NULL. Example: > > SQLite version 3.8.11 2015-07-15 23:15:59 > Enter ".help" for usage hints. > Connected to a transient in-memory database. > Use ".open FILENAME" to reopen on a persistent database. > sqlite> create table t1(a,b,c); > sqlite> create unique index t1u on t1(a,b,c); > sqlite> insert into t1(a,b,c) values(1,null,3),(1,null,3),(1,null,3); > sqlite> > > -- > D. Richard Hipp > drh at sqlite.org > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Speed of count distinct rows if unique index on all fields
Yes, I can see that there is no point in SQLite having this optimization. Of course I can do this optimization myself (just run select count(*) from table, instead of using distinct) if I know all conditions are met. This wasn't anything practical I needed, just a fleeting thought. RBS On Fri, Oct 2, 2015 at 6:14 PM, R.Smith wrote: > > > On 2015-10-02 05:41 PM, Bart Smissaert wrote: > >> you're just throwing random terms around and hoping something sticks. >>> >> Not sure where you got that idea from, but let me explain better: >> >> Say we have a table: >> >> CREATE TABLE TABLE1 >> ([FIELD1] INTEGER, >> [FIELD2] TEXT, >> [FIELD3] TEXT, >> [FIELD4] REAL) >> >> and we have a unique index on all 4 fields: >> >> CREATE UNIQUE INDEX IDX1 >> ON TABLE1 >> (FIELD1, FIELD2, FIELD3, FIELD4) >> >> Now I want to count all the unique rows of this table (without any >> limiting >> where or join or whatever). >> >> Unless I am mistaken here this is done with a SQL like this: >> >> SELECT COUNT(*) AS UNIQUE_ROWS >> FROM (SELECT DISTINCT * FROM TABLE1) >> >> But if we take advantage of the fact that this table has a unique index on >> all the fields of the table >> we can simply do this SQL: >> >> SELECT COUNT(*) FROM TABLE1 >> >> It will have the same result and that is a lot faster. >> Hope this makes it clear. >> > > This explanation is much more clear - thanks. I think the initial source > of confusion is the original explanation being hard to follow, but now that > it is clear and with SQL, we can see exactly what you mean. > > In the case where where the table has rows that are not unique, there is > no gain. > In the case where we could use the optimization, all the following has to > be true: > > A - The Table must NOT have a Primary Key, but > B - The table must have a complete covering Index and > C - The covering Index must be UNIQUE and > D - All fields must be NOT NULL > E - The Query must be a COUNT() aggregate query > F - There must NOT be any JOINs > G - There must NOT be a WHERE clause. > H - There must be a correlated sub-query which accesses the table, > I - The sub-query MUST force a DISTINCT Query output constraint, and > J - The constraint MUST be on the * wildcard identifier (or alternatively, > the complete field list). > > The only way this optimization can ever be accommodated is if ALL of the > above are true, and the odds of having all those things true are even less > than Lotto-winning odds. > > One might look at the sub-query alone and say "What if the sub query (or > any query) that Uses 'DISTINCT *' in a table that has a Unique covering > index without a where clause or joins can be optimized" - which it probably > can, but the odds of this are only very slightly less low than the first > case. > > Add to that the code that needs to be added to check for all those things > - it would probably make the query planner slower by a (very) insignificant > amount, but more still than the odds of running into such a query+schema > combination. > > That said, if you do have this situation sometimes and it does slow down > things for you - Perhaps a design change in the schemata would help? > > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Speed of count distinct rows if unique index on all fields
OK, thanks. Do you mean that is only valid if there are no rows where all columns are NULL? In any case, I can see that this optimization (in SQLite) is just not worth the trouble. RBS On Fri, Oct 2, 2015 at 6:02 PM, Richard Hipp wrote: > On 10/2/15, Bart Smissaert wrote: > > > > Unless I am mistaken here this is done with a SQL like this: > > > > SELECT COUNT(*) AS UNIQUE_ROWS > > FROM (SELECT DISTINCT * FROM TABLE1) > > > > But if we take advantage of the fact that this table has a unique index > on > > all the fields of the table > > we can simply do this SQL: > > > > SELECT COUNT(*) FROM TABLE1 > > > > No. That is only valid if all columns are individually NOT NULL. > > -- > D. Richard Hipp > drh at sqlite.org > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Speed of count distinct rows if unique index on all fields
On 2 Oct 2015, at 4:41pm, Bart Smissaert wrote: > Say we have a table: > > CREATE TABLE TABLE1 > ([FIELD1] INTEGER, > [FIELD2] TEXT, > [FIELD3] TEXT, > [FIELD4] REAL) > > and we have a unique index on all 4 fields: > > CREATE UNIQUE INDEX IDX1 > ON TABLE1 > (FIELD1, FIELD2, FIELD3, FIELD4) This would be rather unusual. It implies that the combination of all the fields in the table has to be unique, but that it's okay to have two rows with the same values for FIELD1, FIELD2 and FIELD3. As you (I think it was you, may have been someone else) mentioned upthread, the extreme rarity of this means that it's not worth checking for it in every use of COUNT(*). The only common case I can think of where there's a UNIQUE requirement for all the columns in a table is where all the table is really just a primary key in itself. And in that situation the software would just scan the primary key index anyway, which would probably be just a rowid, and that is already the optimal case. Simon.
[sqlite] Speed of count distinct rows if unique index on all fields
> you're just throwing random terms around and hoping something sticks. Not sure where you got that idea from, but let me explain better: Say we have a table: CREATE TABLE TABLE1 ([FIELD1] INTEGER, [FIELD2] TEXT, [FIELD3] TEXT, [FIELD4] REAL) and we have a unique index on all 4 fields: CREATE UNIQUE INDEX IDX1 ON TABLE1 (FIELD1, FIELD2, FIELD3, FIELD4) Now I want to count all the unique rows of this table (without any limiting where or join or whatever). Unless I am mistaken here this is done with a SQL like this: SELECT COUNT(*) AS UNIQUE_ROWS FROM (SELECT DISTINCT * FROM TABLE1) But if we take advantage of the fact that this table has a unique index on all the fields of the table we can simply do this SQL: SELECT COUNT(*) FROM TABLE1 It will have the same result and that is a lot faster. Hope this makes it clear. RBS On Fri, Oct 2, 2015 at 4:01 PM, Scott Hess wrote: > Why does any of that matter? SELECT COUNT(*) FROM table; already knows all > of that information. > > If you have a question about why one query is faster/slower than another > query given one schema versus another schema, then post representative > schema and queries. Right now you're just throwing random terms around and > hoping something sticks. Earlier you said that you weren't talking about > DISTINCT as applied to result sets, but just now you're re-introducing > COUNT DISTINCT for some reason. SQL code will make your question concrete. > > -scott > > > On Fri, Oct 2, 2015 at 7:54 AM, Bart Smissaert > wrote: > > > It is faster because if it knows there is no where or join or whatever > row > > limiting condition and it also knows there is > > a unique index on all fields it can simply do select count(rowid) from > > table1 and not do any count distinct. > > > > RBS > > > > > > On Fri, Oct 2, 2015 at 3:51 PM, Scott Hess wrote: > > > > > On Fri, Oct 2, 2015 at 7:43 AM, Bart Smissaert < > bart.smissaert at gmail.com > > > > > > wrote: > > > > > > > > The Uniqueness of the output depends on which fields are included, > > > JOINs, > > > > UNIONs, etc. etc. > > > > > > > > I am not talking about that situation. I am only referring to a > > situation > > > > where you want to count all > > > > rows in a table. I know it will be uncommon to have an index on all > > > fields > > > > and this is not really a practical > > > > question. I suppose as it so uncommon it is not worth it to put this > > > > optimization in. > > > > > > > > > Is your case of having the unique index on all fields faster than > having > > a > > > unique index on a single field? > > > > > > Maybe you should include an example of your schema. I can't think of > how > > > scanning an index on all fields could be smaller than the underlying > > table, > > > so it's unclear how that could be faster. But a unique index on a > subset > > > of the data could be faster simply from being smaller. > > > > > > Also, 10x faster makes me wonder about whether you're accounting for > > > caching effects. A blunt way to test for that is to run your queries a > > > couple times. If the first time is slow and the second and later times > > are > > > much faster, then it's likely the cache is causing the speedup. > > > > > > -scott > > > ___ > > > sqlite-users mailing list > > > sqlite-users at mailinglists.sqlite.org > > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > > > ___ > > sqlite-users mailing list > > sqlite-users at mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] When was PRAGMA busy_timeout added to SQLite3
I'm using an archaic version, I know, but busy_timeout doesn't seem to work. It doesn't return the value I set. When was it added? sqlite3 temp.db SQLite version 3.5.9 Enter ".help" for instructions sqlite> PRAGMA busy_timeout=3; sqlite> PRAGMA busy_timeout; sqlite>
[sqlite] Speed of count distinct rows if unique index on all fields
On 2015-10-02 10:05 AM, Bart Smissaert wrote: > Noticed that if I have table with a unique index on all fields, counting > all rows is still a lot faster > (about a factor 10 on my particular test table) than counting distinct rows. > Could maybe an optimization be added to SQLite to speed this up, taking > advantage of the fact that there is a unique index on all fields? > I am running SQLite 3.8.10. I'm sure it can and I'm not answering (since the devs may do whatever they decide), but I would like to mention that a Unique index on all fields is: A - More uncommon than Lotto-winners. B - In no way related to a DISTINCT specifier. The DISTINCT specifier checks that the results (output list) from a Query is Unique - There is no way to determine that from the Input table's uniqueness. In fact ALL tables with a Primary Key is by definition row-unique, but it has no bearing on the output of a query from it. The Uniqueness of the output depends on which fields are included, JOINs, UNIONs, etc. etc. In other words: Considering the above points, I believe this is an optimization that would serve so small a possible set of use-cases as to not deserve the few bytes needed to implement it.
[sqlite] Speed of count distinct rows if unique index on all fields
It is faster because if it knows there is no where or join or whatever row limiting condition and it also knows there is a unique index on all fields it can simply do select count(rowid) from table1 and not do any count distinct. RBS On Fri, Oct 2, 2015 at 3:51 PM, Scott Hess wrote: > On Fri, Oct 2, 2015 at 7:43 AM, Bart Smissaert > wrote: > > > > The Uniqueness of the output depends on which fields are included, > JOINs, > > UNIONs, etc. etc. > > > > I am not talking about that situation. I am only referring to a situation > > where you want to count all > > rows in a table. I know it will be uncommon to have an index on all > fields > > and this is not really a practical > > question. I suppose as it so uncommon it is not worth it to put this > > optimization in. > > > Is your case of having the unique index on all fields faster than having a > unique index on a single field? > > Maybe you should include an example of your schema. I can't think of how > scanning an index on all fields could be smaller than the underlying table, > so it's unclear how that could be faster. But a unique index on a subset > of the data could be faster simply from being smaller. > > Also, 10x faster makes me wonder about whether you're accounting for > caching effects. A blunt way to test for that is to run your queries a > couple times. If the first time is slow and the second and later times are > much faster, then it's likely the cache is causing the speedup. > > -scott > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Speed of count distinct rows if unique index on all fields
> The Uniqueness of the output depends on which fields are included, JOINs, UNIONs, etc. etc. I am not talking about that situation. I am only referring to a situation where you want to count all rows in a table. I know it will be uncommon to have an index on all fields and this is not really a practical question. I suppose as it so uncommon it is not worth it to put this optimization in. RBS On Fri, Oct 2, 2015 at 3:02 PM, R.Smith wrote: > > > On 2015-10-02 10:05 AM, Bart Smissaert wrote: > >> Noticed that if I have table with a unique index on all fields, counting >> all rows is still a lot faster >> (about a factor 10 on my particular test table) than counting distinct >> rows. >> Could maybe an optimization be added to SQLite to speed this up, taking >> advantage of the fact that there is a unique index on all fields? >> I am running SQLite 3.8.10. >> > > I'm sure it can and I'm not answering (since the devs may do whatever they > decide), but I would like to mention that a Unique index on all fields is: > A - More uncommon than Lotto-winners. > B - In no way related to a DISTINCT specifier. > > The DISTINCT specifier checks that the results (output list) from a Query > is Unique - There is no way to determine that from the Input table's > uniqueness. In fact ALL tables with a Primary Key is by definition > row-unique, but it has no bearing on the output of a query from it. The > Uniqueness of the output depends on which fields are included, JOINs, > UNIONs, etc. etc. > > In other words: Considering the above points, I believe this is an > optimization that would serve so small a possible set of use-cases as to > not deserve the few bytes needed to implement it. > > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Speed of count distinct rows if unique index on all fields
Counting all rows vs counting distinct rows is a very different task. In simple terms For all rows the process is read every leaf page in the index query the cell count field (number of records) add cell count to the to the total count repeat for the remaining leaf pages For distinct records read every leaf page in the index read each cell on page and identify whether it has been seen before add count of distinct cells to total count repeat for the remaining leaf pages a bit over simplified as there may be unique records that are on different pages The thrust of this is that to count distinct records page and every record in the tree must be read and examined, to count all records just every page header in the tree needs to be read. Paul www.sandersonforensics.com skype: r3scue193 twitter: @sandersonforens Tel +44 (0)1326 572786 http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit -Forensic Toolkit for SQLite email from a work address for a fully functional demo licence On 2 October 2015 at 14:15, Simon Slavin wrote: > > On 2 Oct 2015, at 9:05am, Bart Smissaert wrote: > >> Noticed that if I have table with a unique index on all fields, counting >> all rows is still a lot faster >> (about a factor 10 on my particular test table) than counting distinct rows. >> Could maybe an optimization be added to SQLite to speed this up, taking >> advantage of the fact that there is a unique index on all fields? > > The fact that your index is so wide is actually slowing it down. Because it > means that the index takes up more space on disk. The fastest way to count > all the rows in a table would be to have an index on just a single numeric > field. This means that reading the whole index in would involve reading the > least number of pages. > > Simon. > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Speed of count distinct rows if unique index on all fields
On 2 Oct 2015, at 9:05am, Bart Smissaert wrote: > Noticed that if I have table with a unique index on all fields, counting > all rows is still a lot faster > (about a factor 10 on my particular test table) than counting distinct rows. > Could maybe an optimization be added to SQLite to speed this up, taking > advantage of the fact that there is a unique index on all fields? The fact that your index is so wide is actually slowing it down. Because it means that the index takes up more space on disk. The fastest way to count all the rows in a table would be to have an index on just a single numeric field. This means that reading the whole index in would involve reading the least number of pages. Simon.
[sqlite] When was PRAGMA busy_timeout added to SQLite3
I'm using an archaic version, I know, but busy_timeout doesn't seem to work. It doesn't return the value I set. When was it added? sqlite3 temp.db SQLite version 3.5.9 Enter ".help" for instructions sqlite> PRAGMA busy_timeout=3; sqlite> PRAGMA busy_timeout; sqlite>
[sqlite] Speed of count distinct rows if unique index on all fields
On 10/2/15, Bart Smissaert wrote: > Do you mean that is only valid if there are no rows where all columns are > NULL? No, I mean that two rows can be identical (not distinct) event if there is a unique index on all columns, if any single column in those two rows is NULL. Example: SQLite version 3.8.11 2015-07-15 23:15:59 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> create table t1(a,b,c); sqlite> create unique index t1u on t1(a,b,c); sqlite> insert into t1(a,b,c) values(1,null,3),(1,null,3),(1,null,3); sqlite> -- D. Richard Hipp drh at sqlite.org
[sqlite] (no subject)
> > > SQLite uses a counter to enforce deferred foreign keys. Each time an > operation violates a constraint it increments that counter. > > If the counter is greater than 0, then for every operation that might > potentially correct the violation (which means virtually every operation > on any table that is the parent or child in an FK constraint) it does > extra seeks to see if it really does. i.e. each time you insert into the > parent table, it seeks within the child table to see if the insert fixes > a foreign key violation. And if it does, the counter is decremented. > > So we know there is overhead, but I'm quite surprised that it is 1000x > slower. > > How large is the database? So large that these extra seeks could be > performing real IO (i.e. having to go all the way to the disk for every > seek, not just finding the data in the OS cache)? OK, that kinds of makes sense in explaining where the problem comes from. I really do see an exponential growth in execution time depending on how many INSERTS are made after the first FK violation occurs. I have a very simple example that , for example, takes 0.2s with no FK violations and 25s ( about 100x slower) when a single FK violation is forced before I start doing the "mass creation". This is a small database, creating 500 parent objects, each with 50 child objects. If I change (double) the number of objects to 1000 the time changes to 0.4s and 100s. It looks like an N^2 problem. So you can see that very quickly my factor of 1000x is very easy to reach.. When FK checking is in DEFERRED mode, the implication is that no FK checking is made until the COMMIT. That's very useful, as often it is impossible to make the modifications done in the correct order to satisfy all the FK=IMMEDIATE constraints, even though the end result is a "referentially correct" database. So what you are saying above makes DEFERRED mode a lot less useful. DISCLAIMER: I am use ODB ( the awesome C++/ORM tool) to do these tests.
[sqlite] Speed of count distinct rows if unique index on all fields
On 10/2/15, Bart Smissaert wrote: > > Unless I am mistaken here this is done with a SQL like this: > > SELECT COUNT(*) AS UNIQUE_ROWS > FROM (SELECT DISTINCT * FROM TABLE1) > > But if we take advantage of the fact that this table has a unique index on > all the fields of the table > we can simply do this SQL: > > SELECT COUNT(*) FROM TABLE1 > No. That is only valid if all columns are individually NOT NULL. -- D. Richard Hipp drh at sqlite.org
[sqlite] Speed of count distinct rows if unique index on all fields
On Fri, Oct 2, 2015 at 9:41 AM, Bart Smissaert wrote: > > you're just throwing random terms around and hoping something sticks. > > Not sure where you got that idea from, but let me explain better: > > Say we have a table: > > CREATE TABLE TABLE1 > ([FIELD1] INTEGER, > [FIELD2] TEXT, > [FIELD3] TEXT, > [FIELD4] REAL) > > and we have a unique index on all 4 fields: > > CREATE UNIQUE INDEX IDX1 > ON TABLE1 > (FIELD1, FIELD2, FIELD3, FIELD4) > > Now I want to count all the unique rows of this table (without any limiting > where or join or whatever). > > Unless I am mistaken here this is done with a SQL like this: > > SELECT COUNT(*) AS UNIQUE_ROWS > FROM (SELECT DISTINCT * FROM TABLE1) > > But if we take advantage of the fact that this table has a unique index on > all the fields of the table > we can simply do this SQL: > > SELECT COUNT(*) FROM TABLE1 > > It will have the same result and that is a lot faster. > Hope this makes it clear. > I just ran a quick test to see how this might work. Given your schema above: sqlite> create table table1(field1 integer, field2 text, field3 text, field4 real); sqlite> create unique index idx1 on table1(field1, field2, field3, field4); I populated the table with this query: sqlite> with recursive src(f1, f2, f3, f4) as (select 0, null, null, null union all select f1+1, hex(randomblob(16)), hex(randomblob(16)), f1/47.0 from src limit 501) insert into table1 select * from src where f1 > 0; So I have 5 million guaranteed unique rows, and an index to insure that is adhered to. After warming the cache, I ran the following queries with .timer on: sqlite> select count(*) from (select distinct * from table1); 500 Run Time: real 3.682 user 2.886019 sys 0.795605 sqlite> select count(*) from (select distinct * from table1); 500 Run Time: real 3.699 user 2.714417 sys 0.951606 sqlite> select count(*) from (select distinct * from table1); 500 Run Time: real 3.684 user 2.886019 sys 0.811205 sqlite> select count(*) from table1 where field1 > 0; 500 Run Time: real 1.344 user 0.436803 sys 0.904806 sqlite> select count(*) from table1 where field1 > 0; 500 Run Time: real 1.363 user 0.546003 sys 0.811205 sqlite> select count(*) from table1 where field1 > 0; 500 Run Time: real 1.357 user 0.592804 sys 0.764405 sqlite> select count(*) from table1; 500 Run Time: real 1.062 user 0.234002 sys 0.811205 sqlite> select count(*) from table1; 500 Run Time: real 1.061 user 0.312002 sys 0.748805 sqlite> select count(*) from table1; 500 Run Time: real 1.061 user 0.234002 sys 0.811205 Note the first query (using your distinct example, which I understand was just for illustrative purposes) takes about 3.6 seconds to execute. The second query, which I included just to show the effects of running count with a condition (in this case including every row) had to do a lot less work (taking advantage of the fact that all the rows were already unique) and took about 1.3 seconds. Finally, the third query just uses the optimized count without where clause, and it only takes about 1 second. If I'm misunderstanding something, please let me know. Given that row counts aren't stored, reading the database or the index should read about the same amount of data for this type of configuration. -- Scott Robison
[sqlite] Speed of count distinct rows if unique index on all fields
On Fri, Oct 2, 2015 at 9:41 AM, Bart Smissaert wrote: > > you're just throwing random terms around and hoping something sticks. > > Not sure where you got that idea from, but let me explain better: > > Say we have a table: > > CREATE TABLE TABLE1 > ([FIELD1] INTEGER, > [FIELD2] TEXT, > [FIELD3] TEXT, > [FIELD4] REAL) > > and we have a unique index on all 4 fields: > > CREATE UNIQUE INDEX IDX1 > ON TABLE1 > (FIELD1, FIELD2, FIELD3, FIELD4) > > Now I want to count all the unique rows of this table (without any limiting > where or join or whatever). > > Unless I am mistaken here this is done with a SQL like this: > > SELECT COUNT(*) AS UNIQUE_ROWS > FROM (SELECT DISTINCT * FROM TABLE1) > > But if we take advantage of the fact that this table has a unique index on > all the fields of the table > we can simply do this SQL: > > SELECT COUNT(*) FROM TABLE1 > > It will have the same result and that is a lot faster. > Hope this makes it clear. > I think you're already going to be getting the "fastest" way with your structure. You have a row wide unique index that ensures two duplicate rows can't be inserted into the table. Each row of that index is going to be essentially (if not exactly) the same size as each row of the table, since they include all the same columns. There is no real speed difference between reading each row of the index vs each row of the table. Additionally, SQLite already has an optimization in place for the query "SELECT COUNT(*) FROM sometable". It still has to walk the entire B-tree, but it only has to count how many rows are in each page, it doesn't have to actually decode each row to check conditions. Now, I am not an expert on SQLite, but based on my understanding of things, SQLite is going as about fast as it can (without changes to the file format), and the relative size of the table vs the index means there isn't really any size difference to exploit. -- Scott Robison
[sqlite] Speed of count distinct rows if unique index on all fields
On Fri, Oct 2, 2015 at 8:41 AM, Bart Smissaert wrote: > > you're just throwing random terms around and hoping something sticks. > > Not sure where you got that idea from, but let me explain better: > AFAICT this is the first posting where you said "I want to count all the unique rows of this table". Previously, you said that you wanted to count all the rows of the table (without the qualifier "unique") and you also said this was faster if you had a unique index. That didn't make sense, because the only case where an index should make that faster is if the index narrows the table rather than containing the table. I agree, if this is what you want: > SELECT COUNT(*) AS UNIQUE_ROWS > FROM (SELECT DISTINCT * FROM TABLE1) > Then it could in some cases be reduced to just counting the rows in the table: SELECT COUNT(*) FROM TABLE1 > and that would be faster. But the latter statement would not itself become faster with a schema change. This is why I suggested posting schema and query examples. Then we're talking about the same thing. This could have been a two- or three-post thread. -scott
[sqlite] Performance issue with CTE
I think the doc is right. I overcame the problem by using a construct like: SELECT field1, field2? WHERE PKEY IN (SELECT PKEY ? WHERE OFFSET n LIMIT m) That executes a sub query. But your solution looks actually better, as it is: SELECT * FROM (SELECT field1, field2? WHERE OFFSET n LIMIT m) I?ll try it. Thanks for the suggestion.
[sqlite] Performance issue with CTE
Thanks. I know about the technique your mentioned, but the point is not about the use of offset or not. The same issue will happen but using a key. See my other reply above.
[sqlite] Speed of count distinct rows if unique index on all fields
Noticed that if I have table with a unique index on all fields, counting all rows is still a lot faster (about a factor 10 on my particular test table) than counting distinct rows. Could maybe an optimization be added to SQLite to speed this up, taking advantage of the fact that there is a unique index on all fields? I am running SQLite 3.8.10. RBS
[sqlite] Speed of count distinct rows if unique index on all fields
Why does any of that matter? SELECT COUNT(*) FROM table; already knows all of that information. If you have a question about why one query is faster/slower than another query given one schema versus another schema, then post representative schema and queries. Right now you're just throwing random terms around and hoping something sticks. Earlier you said that you weren't talking about DISTINCT as applied to result sets, but just now you're re-introducing COUNT DISTINCT for some reason. SQL code will make your question concrete. -scott On Fri, Oct 2, 2015 at 7:54 AM, Bart Smissaert wrote: > It is faster because if it knows there is no where or join or whatever row > limiting condition and it also knows there is > a unique index on all fields it can simply do select count(rowid) from > table1 and not do any count distinct. > > RBS > > > On Fri, Oct 2, 2015 at 3:51 PM, Scott Hess wrote: > > > On Fri, Oct 2, 2015 at 7:43 AM, Bart Smissaert > > > wrote: > > > > > > The Uniqueness of the output depends on which fields are included, > > JOINs, > > > UNIONs, etc. etc. > > > > > > I am not talking about that situation. I am only referring to a > situation > > > where you want to count all > > > rows in a table. I know it will be uncommon to have an index on all > > fields > > > and this is not really a practical > > > question. I suppose as it so uncommon it is not worth it to put this > > > optimization in. > > > > > > Is your case of having the unique index on all fields faster than having > a > > unique index on a single field? > > > > Maybe you should include an example of your schema. I can't think of how > > scanning an index on all fields could be smaller than the underlying > table, > > so it's unclear how that could be faster. But a unique index on a subset > > of the data could be faster simply from being smaller. > > > > Also, 10x faster makes me wonder about whether you're accounting for > > caching effects. A blunt way to test for that is to run your queries a > > couple times. If the first time is slow and the second and later times > are > > much faster, then it's likely the cache is causing the speedup. > > > > -scott > > ___ > > sqlite-users mailing list > > sqlite-users at mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Speed of count distinct rows if unique index on all fields
On Fri, Oct 2, 2015 at 7:43 AM, Bart Smissaert wrote: > > The Uniqueness of the output depends on which fields are included, JOINs, > UNIONs, etc. etc. > > I am not talking about that situation. I am only referring to a situation > where you want to count all > rows in a table. I know it will be uncommon to have an index on all fields > and this is not really a practical > question. I suppose as it so uncommon it is not worth it to put this > optimization in. Is your case of having the unique index on all fields faster than having a unique index on a single field? Maybe you should include an example of your schema. I can't think of how scanning an index on all fields could be smaller than the underlying table, so it's unclear how that could be faster. But a unique index on a subset of the data could be faster simply from being smaller. Also, 10x faster makes me wonder about whether you're accounting for caching effects. A blunt way to test for that is to run your queries a couple times. If the first time is slow and the second and later times are much faster, then it's likely the cache is causing the speedup. -scott
[sqlite] SQLite slows by a factor of 1000 after foreign key violation
On 10/01/2015 10:32 PM, Andrew Cunningham wrote: > I have noticed that when FOREIGN KEY is in DEFERRED mode and a FOREIGN KEY > violation occurs, SQLite will continue to run, but performance slows down > by about 1000x. Breaking into the code shows that it seems to be > continually re-reading the database. When I finally get to COMMIT , an > exception is thrown, but this can take hours , literally as I have a single > global transactions. > > Of course the solution is to debug the problem by setting the mode to > IMMEDIATE, detecting the issue at the source, then returning to DEFERRED. > > But the '1000x slowdown' behavior is an interesting quirk I don't really > understand. > SQLite uses a counter to enforce deferred foreign keys. Each time an operation violates a constraint it increments that counter. If the counter is greater than 0, then for every operation that might potentially correct the violation (which means virtually every operation on any table that is the parent or child in an FK constraint) it does extra seeks to see if it really does. i.e. each time you insert into the parent table, it seeks within the child table to see if the insert fixes a foreign key violation. And if it does, the counter is decremented. So we know there is overhead, but I'm quite surprised that it is 1000x slower. How large is the database? So large that these extra seeks could be performing real IO (i.e. having to go all the way to the disk for every seek, not just finding the data in the OS cache)? Thanks, Dan.
[sqlite] Documentation bug
On the following page: https://www.sqlite.org/c3ref/funclist.html The following items: *sqlite3_backup_finish *sqlite3_backup_init *sqlite3_backup_pagecount *sqlite3_backup_remaining *sqlite3_backup_step Have incorrect anchors assigned to them in the HTML page code.