Re: [sqlite] Full Table Scan after Analyze

2011-03-26 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/26/2011 03:12 AM, Black, Michael (IS) wrote:
> When you say "All an index does" don't forget that an index is also usually 
> smaller than the data, thereby increase cache performance and reducing disk 
> seeks.

That is muddied in the case (probably most common) where the index does not
include all the columns needed for the query.  Consequently the rowid has to
be found in the index and then the main data has disk seeks to retrieve the
remaining columns from the row.  Seeking in the index will be random access
whereas doing a table scan will predominantly be sequential access.

These factors are why it is a not a trivial determination as to which is
better and why analyze helps.  It is also why an index can be slower more
commonly than expected.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2OWrAACgkQmOOfHg372QSNFQCgqXlZu7V09GSA0JWOq9in+JB7
u8EAn19m1Sn8RhV/grSWcIEuPJCAEU0v
=LsnS
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Full Table Scan after Analyze

2011-03-26 Thread Black, Michael (IS)
When you say "All an index does" don't forget that an index is also usually 
smaller than the data, thereby increase cache performance and reducing disk 
seeks.

For a good chunk of typical uses (large tables with simple lookups) an index is 
notably faster.  

I'll admit my use of sqtlite3 hasn't been on horrendously complex databases but 
I can say an index beats the pants off of non-indexed for all my usage (at 
least where I would expect it to).  Much as one would expect.  I tend to have 
1000's to millions of rows with simple ID lookups.

Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Simon Slavin [slav...@bigfraud.org]
Sent: Friday, March 25, 2011 6:33 PM
To: j...@kreibi.ch; General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Full Table Scan after Analyze

On 25 Mar 2011, at 11:11pm, Jay A. Kreibich wrote:

> On Fri, Mar 25, 2011 at 10:30:59PM +, Simon Slavin scratched on the wall:
>
>> Actually I'm surprised and not terribly impressed that SQLite ever
>> does a scan when there's an ideal index available.
>
>  Why?  Do you want it to run slower?
>
>  Indexes are not magic bullets.  Using an index to retrieve a row is
>  typically 5x to 20x more expensive than scanning a row.  There are
>  plenty of instances when a scan will be faster than an index use, and
>  not just in small tables.  Just as SQLite tries to use any index it
>  can to speed up a query, it also tries to avoid using indexes that
>  will slow it down-- and there are plenty of ways this can happen.

You know, I'd never thought of that.  All an index does is let you search a 
B-tree rather than a list.  Thank you.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Full Table Scan after Analyze

2011-03-25 Thread Simon Slavin

On 25 Mar 2011, at 11:11pm, Jay A. Kreibich wrote:

> On Fri, Mar 25, 2011 at 10:30:59PM +, Simon Slavin scratched on the wall:
> 
>> Actually I'm surprised and not terribly impressed that SQLite ever
>> does a scan when there's an ideal index available.
> 
>  Why?  Do you want it to run slower?
> 
>  Indexes are not magic bullets.  Using an index to retrieve a row is
>  typically 5x to 20x more expensive than scanning a row.  There are
>  plenty of instances when a scan will be faster than an index use, and
>  not just in small tables.  Just as SQLite tries to use any index it
>  can to speed up a query, it also tries to avoid using indexes that
>  will slow it down-- and there are plenty of ways this can happen.

You know, I'd never thought of that.  All an index does is let you search a 
B-tree rather than a list.  Thank you.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Full Table Scan after Analyze

2011-03-25 Thread Jay A. Kreibich
On Fri, Mar 25, 2011 at 10:30:59PM +, Simon Slavin scratched on the wall:

> Actually I'm surprised and not terribly impressed that SQLite ever
> does a scan when there's an ideal index available.

  Why?  Do you want it to run slower?

  Indexes are not magic bullets.  Using an index to retrieve a row is
  typically 5x to 20x more expensive than scanning a row.  There are
  plenty of instances when a scan will be faster than an index use, and
  not just in small tables.  Just as SQLite tries to use any index it
  can to speed up a query, it also tries to avoid using indexes that
  will slow it down-- and there are plenty of ways this can happen.

  The whole idea of an "ideal index" is something of a falsehood.  It
  all depends on context.  You cannot tell from the schema and query
  alone if an index will help.  You can make good guesses, but you can't
  *know* without understanding the size, shape, and distribution of the
  actual data in the tables.  That's the whole point of ANALYZE... to
  provide that context data to the query optimizer.

  If SQLite *correctly* gathered stats that say a full scan would be
  faster, why would you want it to do things the slow way?

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Full Table Scan after Analyze

2011-03-25 Thread Eric Minbiole
> Suggestion:  After you run ANALYZE, go back and modify values in
> sqlite_stat1 (and sqlite_stat2 if you are using SQLITE_ENABLE_STAT2) with
> "typical" values that you would expect to see in most databases.  Yes, you
> can read and write the sqlite_stat1 and sqlite_stat2 tables.  Maybe you can
> come up with some prepackaged default values for sqlite_stat1 and
> sqlite_stat2 based on experiments in your development lab, and then just
> put
> your prepackaged defaults into the real sqlite_stat1 and sqlite_stat2 after
> running ANALYZE.
>
>
Thank you for the helpful suggestion-- I do recall reading previously that
you could modify the _stat tables, but haven't ever pursued that approach.
(Mostly, I was avoiding the extra work of having to maintain these "magic"
stat values in our code; partially lazy, partially in case my test data
wasn't as "typical" as I thought it would be.)  Either way, it seems a
straightforward enough solution that I can try.

One question: After I modify this stat data, do I need to close/reopen the
connection for the changes to take effect?  Or will the results be used
automatically the next time I create the prepared statement?

Thank you again.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Full Table Scan after Analyze

2011-03-25 Thread Richard Hipp
On Fri, Mar 25, 2011 at 6:30 PM, Simon Slavin  wrote:

>
> On 25 Mar 2011, at 10:18pm, Richard Hipp wrote:
>
> > Suggestion:  After you run ANALYZE, go back and modify values in
> > sqlite_stat1 (and sqlite_stat2 if you are using SQLITE_ENABLE_STAT2) with
> > "typical" values that you would expect to see in most databases.  Yes,
> you
> > can read and write the sqlite_stat1 and sqlite_stat2 tables.
>
> Can you DROP them ?
>

No.  But you can do "DELETE FROM sqlite_stat1;" which accomplishes the same
thing.


>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Full Table Scan after Analyze

2011-03-25 Thread Simon Slavin

On 25 Mar 2011, at 10:18pm, Richard Hipp wrote:

> Suggestion:  After you run ANALYZE, go back and modify values in
> sqlite_stat1 (and sqlite_stat2 if you are using SQLITE_ENABLE_STAT2) with
> "typical" values that you would expect to see in most databases.  Yes, you
> can read and write the sqlite_stat1 and sqlite_stat2 tables.

Can you DROP them ?  Because this might be simpler and would be more 
future-proof in case the content of those tables changes in the future.

Actually I'm surprised and not terribly impressed that SQLite ever does a scan 
when there's an ideal index available.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Full Table Scan after Analyze

2011-03-25 Thread Richard Hipp
On Fri, Mar 25, 2011 at 5:24 PM, Eric Minbiole  wrote:

> We have an application that has been using SQLite successfully for several
> years.  We use SQLite for various purposes within our product, with a few
> dozen tables overall.
>
> Recently, I started to upgrade from version 3.6.3 to 3.7.5.  During that
> time, I noticed that several previously fast indexed queries turned into
> slow full table scans.  For example, on a simple (hypothetical) example:
>
> CREATE TABLE tbl (
>  id INTEGER,
>  value INTEGER
> );
> CREATE INDEX idx_id ON tbl (id);
>
> The query "SELECT * FROM tbl WHERE id = 1" previously made use of idx_id.
> However, in 3.7.5, we noticed that it was doing a full table scan, causing
> significant performance issues.
>
> After some debugging, found that the issue is related to our (mis-)use of
> the "analyze" command:  Historically, our software runs analyze any time
> the
> db schema changes.  However, if our customer had not yet made use of a
> particular feature, the corresponding table(s) might be _empty_ when
> analyze
> is run.  On previous versions, this did not cause any obvious problems.  In
> the new version, the query planner reasonably assumes that a full table
> scan
> is faster than an index on a small/empty table.  However, when the customer
> later makes use of those features (populating the tables), the queries
> become quite slow, as it still does a full table scan.
>
> I'm trying to determine the best way to resolve this issue:
>

Suggestion:  After you run ANALYZE, go back and modify values in
sqlite_stat1 (and sqlite_stat2 if you are using SQLITE_ENABLE_STAT2) with
"typical" values that you would expect to see in most databases.  Yes, you
can read and write the sqlite_stat1 and sqlite_stat2 tables.  Maybe you can
come up with some prepackaged default values for sqlite_stat1 and
sqlite_stat2 based on experiments in your development lab, and then just put
your prepackaged defaults into the real sqlite_stat1 and sqlite_stat2 after
running ANALYZE.



>
> - Re-running analyze after adding data to the table is an obvious
> suggestion.  However, as we have lots of tables in use for various
> purposes,
> I'd need to sprinkle lots of "if (first time data added) analyze()" code
> around.
> - I could add a user triggered "maintenance" feature to manually
> re-analyze,
> but that takes away some of the "zero configuration" benefits of our
> product.
> - Even if I stop calling analyze at all going forward, there may be some
> existing (empty) tables that may have already been analyzed by previous
> code.
> - I could make use of the "INDEXED BY" clause, but this goes against the
> documented intent of this feature ("*not* intended for use in tuning the
> performance of a query"), and requires re-writing of many queries.
>
> I'm sure there are other good ideas-- I'm flexible, and open to suggestion.
> I'd appreciate any suggestions the group might have.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Full Table Scan after Analyze

2011-03-25 Thread Eric Minbiole
We have an application that has been using SQLite successfully for several
years.  We use SQLite for various purposes within our product, with a few
dozen tables overall.

Recently, I started to upgrade from version 3.6.3 to 3.7.5.  During that
time, I noticed that several previously fast indexed queries turned into
slow full table scans.  For example, on a simple (hypothetical) example:

CREATE TABLE tbl (
  id INTEGER,
  value INTEGER
);
CREATE INDEX idx_id ON tbl (id);

The query "SELECT * FROM tbl WHERE id = 1" previously made use of idx_id.
However, in 3.7.5, we noticed that it was doing a full table scan, causing
significant performance issues.

After some debugging, found that the issue is related to our (mis-)use of
the "analyze" command:  Historically, our software runs analyze any time the
db schema changes.  However, if our customer had not yet made use of a
particular feature, the corresponding table(s) might be _empty_ when analyze
is run.  On previous versions, this did not cause any obvious problems.  In
the new version, the query planner reasonably assumes that a full table scan
is faster than an index on a small/empty table.  However, when the customer
later makes use of those features (populating the tables), the queries
become quite slow, as it still does a full table scan.

I'm trying to determine the best way to resolve this issue:

- Re-running analyze after adding data to the table is an obvious
suggestion.  However, as we have lots of tables in use for various purposes,
I'd need to sprinkle lots of "if (first time data added) analyze()" code
around.
- I could add a user triggered "maintenance" feature to manually re-analyze,
but that takes away some of the "zero configuration" benefits of our
product.
- Even if I stop calling analyze at all going forward, there may be some
existing (empty) tables that may have already been analyzed by previous
code.
- I could make use of the "INDEXED BY" clause, but this goes against the
documented intent of this feature ("*not* intended for use in tuning the
performance of a query"), and requires re-writing of many queries.

I'm sure there are other good ideas-- I'm flexible, and open to suggestion.
I'd appreciate any suggestions the group might have.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users