Re: [sqlite] Full Table Scan after Analyze
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/26/2011 03:12 AM, Black, Michael (IS) wrote: > When you say "All an index does" don't forget that an index is also usually > smaller than the data, thereby increase cache performance and reducing disk > seeks. That is muddied in the case (probably most common) where the index does not include all the columns needed for the query. Consequently the rowid has to be found in the index and then the main data has disk seeks to retrieve the remaining columns from the row. Seeking in the index will be random access whereas doing a table scan will predominantly be sequential access. These factors are why it is a not a trivial determination as to which is better and why analyze helps. It is also why an index can be slower more commonly than expected. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk2OWrAACgkQmOOfHg372QSNFQCgqXlZu7V09GSA0JWOq9in+JB7 u8EAn19m1Sn8RhV/grSWcIEuPJCAEU0v =LsnS -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Full Table Scan after Analyze
When you say "All an index does" don't forget that an index is also usually smaller than the data, thereby increase cache performance and reducing disk seeks. For a good chunk of typical uses (large tables with simple lookups) an index is notably faster. I'll admit my use of sqtlite3 hasn't been on horrendously complex databases but I can say an index beats the pants off of non-indexed for all my usage (at least where I would expect it to). Much as one would expect. I tend to have 1000's to millions of rows with simple ID lookups. Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Simon Slavin [slav...@bigfraud.org] Sent: Friday, March 25, 2011 6:33 PM To: j...@kreibi.ch; General Discussion of SQLite Database Subject: EXT :Re: [sqlite] Full Table Scan after Analyze On 25 Mar 2011, at 11:11pm, Jay A. Kreibich wrote: > On Fri, Mar 25, 2011 at 10:30:59PM +, Simon Slavin scratched on the wall: > >> Actually I'm surprised and not terribly impressed that SQLite ever >> does a scan when there's an ideal index available. > > Why? Do you want it to run slower? > > Indexes are not magic bullets. Using an index to retrieve a row is > typically 5x to 20x more expensive than scanning a row. There are > plenty of instances when a scan will be faster than an index use, and > not just in small tables. Just as SQLite tries to use any index it > can to speed up a query, it also tries to avoid using indexes that > will slow it down-- and there are plenty of ways this can happen. You know, I'd never thought of that. All an index does is let you search a B-tree rather than a list. Thank you. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Full Table Scan after Analyze
On 25 Mar 2011, at 11:11pm, Jay A. Kreibich wrote: > On Fri, Mar 25, 2011 at 10:30:59PM +, Simon Slavin scratched on the wall: > >> Actually I'm surprised and not terribly impressed that SQLite ever >> does a scan when there's an ideal index available. > > Why? Do you want it to run slower? > > Indexes are not magic bullets. Using an index to retrieve a row is > typically 5x to 20x more expensive than scanning a row. There are > plenty of instances when a scan will be faster than an index use, and > not just in small tables. Just as SQLite tries to use any index it > can to speed up a query, it also tries to avoid using indexes that > will slow it down-- and there are plenty of ways this can happen. You know, I'd never thought of that. All an index does is let you search a B-tree rather than a list. Thank you. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Full Table Scan after Analyze
On Fri, Mar 25, 2011 at 10:30:59PM +, Simon Slavin scratched on the wall: > Actually I'm surprised and not terribly impressed that SQLite ever > does a scan when there's an ideal index available. Why? Do you want it to run slower? Indexes are not magic bullets. Using an index to retrieve a row is typically 5x to 20x more expensive than scanning a row. There are plenty of instances when a scan will be faster than an index use, and not just in small tables. Just as SQLite tries to use any index it can to speed up a query, it also tries to avoid using indexes that will slow it down-- and there are plenty of ways this can happen. The whole idea of an "ideal index" is something of a falsehood. It all depends on context. You cannot tell from the schema and query alone if an index will help. You can make good guesses, but you can't *know* without understanding the size, shape, and distribution of the actual data in the tables. That's the whole point of ANALYZE... to provide that context data to the query optimizer. If SQLite *correctly* gathered stats that say a full scan would be faster, why would you want it to do things the slow way? -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Full Table Scan after Analyze
> Suggestion: After you run ANALYZE, go back and modify values in > sqlite_stat1 (and sqlite_stat2 if you are using SQLITE_ENABLE_STAT2) with > "typical" values that you would expect to see in most databases. Yes, you > can read and write the sqlite_stat1 and sqlite_stat2 tables. Maybe you can > come up with some prepackaged default values for sqlite_stat1 and > sqlite_stat2 based on experiments in your development lab, and then just > put > your prepackaged defaults into the real sqlite_stat1 and sqlite_stat2 after > running ANALYZE. > > Thank you for the helpful suggestion-- I do recall reading previously that you could modify the _stat tables, but haven't ever pursued that approach. (Mostly, I was avoiding the extra work of having to maintain these "magic" stat values in our code; partially lazy, partially in case my test data wasn't as "typical" as I thought it would be.) Either way, it seems a straightforward enough solution that I can try. One question: After I modify this stat data, do I need to close/reopen the connection for the changes to take effect? Or will the results be used automatically the next time I create the prepared statement? Thank you again. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Full Table Scan after Analyze
On Fri, Mar 25, 2011 at 6:30 PM, Simon Slavin wrote: > > On 25 Mar 2011, at 10:18pm, Richard Hipp wrote: > > > Suggestion: After you run ANALYZE, go back and modify values in > > sqlite_stat1 (and sqlite_stat2 if you are using SQLITE_ENABLE_STAT2) with > > "typical" values that you would expect to see in most databases. Yes, > you > > can read and write the sqlite_stat1 and sqlite_stat2 tables. > > Can you DROP them ? > No. But you can do "DELETE FROM sqlite_stat1;" which accomplishes the same thing. > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Full Table Scan after Analyze
On 25 Mar 2011, at 10:18pm, Richard Hipp wrote: > Suggestion: After you run ANALYZE, go back and modify values in > sqlite_stat1 (and sqlite_stat2 if you are using SQLITE_ENABLE_STAT2) with > "typical" values that you would expect to see in most databases. Yes, you > can read and write the sqlite_stat1 and sqlite_stat2 tables. Can you DROP them ? Because this might be simpler and would be more future-proof in case the content of those tables changes in the future. Actually I'm surprised and not terribly impressed that SQLite ever does a scan when there's an ideal index available. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Full Table Scan after Analyze
On Fri, Mar 25, 2011 at 5:24 PM, Eric Minbiole wrote: > We have an application that has been using SQLite successfully for several > years. We use SQLite for various purposes within our product, with a few > dozen tables overall. > > Recently, I started to upgrade from version 3.6.3 to 3.7.5. During that > time, I noticed that several previously fast indexed queries turned into > slow full table scans. For example, on a simple (hypothetical) example: > > CREATE TABLE tbl ( > id INTEGER, > value INTEGER > ); > CREATE INDEX idx_id ON tbl (id); > > The query "SELECT * FROM tbl WHERE id = 1" previously made use of idx_id. > However, in 3.7.5, we noticed that it was doing a full table scan, causing > significant performance issues. > > After some debugging, found that the issue is related to our (mis-)use of > the "analyze" command: Historically, our software runs analyze any time > the > db schema changes. However, if our customer had not yet made use of a > particular feature, the corresponding table(s) might be _empty_ when > analyze > is run. On previous versions, this did not cause any obvious problems. In > the new version, the query planner reasonably assumes that a full table > scan > is faster than an index on a small/empty table. However, when the customer > later makes use of those features (populating the tables), the queries > become quite slow, as it still does a full table scan. > > I'm trying to determine the best way to resolve this issue: > Suggestion: After you run ANALYZE, go back and modify values in sqlite_stat1 (and sqlite_stat2 if you are using SQLITE_ENABLE_STAT2) with "typical" values that you would expect to see in most databases. Yes, you can read and write the sqlite_stat1 and sqlite_stat2 tables. Maybe you can come up with some prepackaged default values for sqlite_stat1 and sqlite_stat2 based on experiments in your development lab, and then just put your prepackaged defaults into the real sqlite_stat1 and sqlite_stat2 after running ANALYZE. > > - Re-running analyze after adding data to the table is an obvious > suggestion. However, as we have lots of tables in use for various > purposes, > I'd need to sprinkle lots of "if (first time data added) analyze()" code > around. > - I could add a user triggered "maintenance" feature to manually > re-analyze, > but that takes away some of the "zero configuration" benefits of our > product. > - Even if I stop calling analyze at all going forward, there may be some > existing (empty) tables that may have already been analyzed by previous > code. > - I could make use of the "INDEXED BY" clause, but this goes against the > documented intent of this feature ("*not* intended for use in tuning the > performance of a query"), and requires re-writing of many queries. > > I'm sure there are other good ideas-- I'm flexible, and open to suggestion. > I'd appreciate any suggestions the group might have. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Full Table Scan after Analyze
We have an application that has been using SQLite successfully for several years. We use SQLite for various purposes within our product, with a few dozen tables overall. Recently, I started to upgrade from version 3.6.3 to 3.7.5. During that time, I noticed that several previously fast indexed queries turned into slow full table scans. For example, on a simple (hypothetical) example: CREATE TABLE tbl ( id INTEGER, value INTEGER ); CREATE INDEX idx_id ON tbl (id); The query "SELECT * FROM tbl WHERE id = 1" previously made use of idx_id. However, in 3.7.5, we noticed that it was doing a full table scan, causing significant performance issues. After some debugging, found that the issue is related to our (mis-)use of the "analyze" command: Historically, our software runs analyze any time the db schema changes. However, if our customer had not yet made use of a particular feature, the corresponding table(s) might be _empty_ when analyze is run. On previous versions, this did not cause any obvious problems. In the new version, the query planner reasonably assumes that a full table scan is faster than an index on a small/empty table. However, when the customer later makes use of those features (populating the tables), the queries become quite slow, as it still does a full table scan. I'm trying to determine the best way to resolve this issue: - Re-running analyze after adding data to the table is an obvious suggestion. However, as we have lots of tables in use for various purposes, I'd need to sprinkle lots of "if (first time data added) analyze()" code around. - I could add a user triggered "maintenance" feature to manually re-analyze, but that takes away some of the "zero configuration" benefits of our product. - Even if I stop calling analyze at all going forward, there may be some existing (empty) tables that may have already been analyzed by previous code. - I could make use of the "INDEXED BY" clause, but this goes against the documented intent of this feature ("*not* intended for use in tuning the performance of a query"), and requires re-writing of many queries. I'm sure there are other good ideas-- I'm flexible, and open to suggestion. I'd appreciate any suggestions the group might have. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users